next up previous contents index
Next: alignment.clear() delete Up: The alignment class: comparison Previous: alignment() create   Contents   Index

alignment.append() -- read sequences and/or their alignment

io = <io_data>   Options for reading atom files
file = <str:1> 'default' partial or complete filename
align_codes = <str:0> 'all' codes of proteins in the alignment
atom_files = <str:0> '' complete or partial atom filenames
alignment_format = <str:1> 'PIR' format of the alignment file: 'PIR' | 'PAP' | 'QUANTA' | 'INSIGHT' | 'FASTA'
remove_gaps = <bool:1> True whether to remove all-gap positions in input alignment
close_file = <bool:1> True whether or not to close the alignment file at the end of READ_ALIGNMENT
rewind_file = <bool:1> False whether or not to rewind the alignment file at the start of READ_ALIGNMENT

Output:
end_of_file

Description:
This command reads the sequence(s) and/or their alignment from a text file. Only sequences with the specified codes are read in; align_codes = 'all' can be used to read all sequences. The sequences are added to any currently in the alignment.

There are several alignment formats:

  1. The 'PIR' format resembles that of the PIR sequence database. It is described in Section 3.9.1 and is used for comparative modeling because it allows for additional data about the proteins that are useful for automated access to the atomic coordinates.

  2. The 'FASTA' format resembles the 'PIR' format but has a missing second `comment' line and a missing star at the end of each sequence.

  3. The 'PAP' format is nicer to look at but contains less information and is not used by other programs. When used in conjunction with PDB files, the PDB files must contain exactly the residues in the sequences in the 'PAP' file; i.e., it is not possible to use only a segment of a PDB file. In addition, the 'PAP' protein codes must be expandable into proper PDB atom filenames, as described in Section 3.1.3. The protein sequence can now start in any column (this was limited to column 11 before release 5).

  4. The 'QUANTA' format can be used to communicate with the QUANTA program. You are not supposed to mix 'QUANTA' format with any other format because the 'QUANTA' format contains residue numbers which do not occur in the other formats and are difficult to guess correctly. MODELLER can write out alignments in the 'QUANTA' format but cannot read them in.

  5. The 'INSIGHT' format is very similar to the 'PAP' format and can sometimes be used to communicate with the INSIGHTII program. When used in conjunction with PDB files, the same rules as for the 'PAP' format apply.

If remove_gaps = True, positions with gaps in all selected sequences are removed from the alignment.

The io argument is required since PIR files can contain empty sequences; in this case, the sequence is read from the corresponding PDB file.

Ordinarily, the alignment file is closed at the end of this commmand. However, when reading 'PIR' or 'FASTA' format files, if close_file is False, then the file is left open. Subsequent calls to alignment.append() will then resume at this point in the file, provided they set rewind_file to False. The end_of_file variable is set to 1 if MODELLER reached the end of the 'PIR' or 'FASTA' file during the read, or 0 otherwise.

Example: examples/commands/read_alignment.py


# Example for: alignment.append(), alignment.write(),
#              alignment.check()

# Read an alignment, write it out in the 'PAP' format, and 
# check the alignment of the N-1 structures as well as the 
# alignment of the N-th sequence with each of the N-1 structures.

log.level(output=1, notes=1, warnings=1, errors=1, memory=0)
env = environ()

aln = alignment(env)
aln.append(file='toxin.ali', align_codes='all')
aln.write(file='toxin.pap', alignment_format='PAP')
aln.write(file='toxin.fasta', alignment_format='FASTA')
aln.check()


next up previous contents index
Next: alignment.clear() delete Up: The alignment class: comparison Previous: alignment() create   Contents   Index
Ben Webb 2005-04-21