next up previous contents index
Next: alignment.clear() delete Up: The alignment class: comparison Previous: alignment.comments alignment   Contents   Index

alignment.append() -- read sequences and/or their alignment

append(file, align_codes='all', atom_files=None, remove_gaps=True, alignment_format='PIR', io=None)
Output:
end_of_file

This command reads the sequence(s) and/or their alignment from a text file. Only sequences with the specified codes are read in; align_codes = 'all' can be used to read all sequences. The sequences are added to any currently in the alignment.

There are several alignment formats:

  1. The 'PIR' format resembles that of the PIR sequence database. It is described in Section B.1 and is used for comparative modeling because it allows for additional data about the proteins that are useful for automated access to the atomic coordinates.

  2. The 'FASTA' format resembles the 'PIR' format but has a missing second `comment' line and a missing star at the end of each sequence.

  3. The 'PAP' format is nicer to look at but contains less information and is not used by other programs. When used in conjunction with PDB files, the PDB files must contain exactly the residues in the sequences in the 'PAP' file; i.e., it is not possible to use only a segment of a PDB file. In addition, the 'PAP' protein codes must be expandable into proper PDB atom filenames, as described in Section 5.1.3. The protein sequence can now start in any column (this was limited to column 11 before release 5).

  4. The 'QUANTA' format can be used to communicate with the QUANTA program. You are not supposed to mix 'QUANTA' format with any other format because the 'QUANTA' format contains residue numbers which do not occur in the other formats and are difficult to guess correctly. MODELLER can write out alignments in the 'QUANTA' format but cannot read them in.

  5. The 'INSIGHT' format is very similar to the 'PAP' format and can sometimes be used to communicate with the INSIGHTII program. When used in conjunction with PDB files, the same rules as for the 'PAP' format apply.

  6. The 'PSS' format is in the .horiz format used by PSI-PRED to report secondary structure predictions of sequences. A confidence of the prediction is also reported as an integer value between 0 and 9 (high).

If remove_gaps = True, positions with gaps in all selected sequences are removed from the alignment.

The io argument is required since PIR files can contain empty sequences; in this case, the sequence is read from the corresponding PDB file.

For 'PIR' and 'FASTA' files, the end_of_file variable is set to 1 if MODELLER reached the end of the file during the read, or 0 otherwise.

This command can raise a FileFormatError if the alignment file format is invalid.

Example: examples/commands/read_alignment.py


# Example for: alignment.append(), alignment.write(),
#              alignment.check()

# Read an alignment, write it out in the 'PAP' format, and
# check the alignment of the N-1 structures as well as the
# alignment of the N-th sequence with each of the N-1 structures.

from modeller import *

log.level(output=1, notes=1, warnings=1, errors=1, memory=0)
env = environ()
env.io.atom_files_directory = ['../atom_files']

aln = alignment(env)
aln.append(file='toxin.ali', align_codes='all')
aln.write(file='toxin.pap', alignment_format='PAP')
aln.write(file='toxin.fasta', alignment_format='FASTA')
aln.check()


next up previous contents index
Next: alignment.clear() delete Up: The alignment class: comparison Previous: alignment.comments alignment   Contents   Index
Ben Webb 2008-05-05