next up previous contents index
Next: READ_ALIGNMENT2 read Up: Comparison and searching of Previous: Alignment file format   Contents   Index

READ_ALIGNMENT -- read sequences and/or their alignment

FILE = $\langle{\tt string:1}\rangle$ 'default' partial or complete filename
DIRECTORY = $\langle{\tt string:1}\rangle$ '' directory list (e.g., 'dir1:dir2:dir3:./:/')
ALIGN_CODES = $\langle{\tt string:0}\rangle$ 'all' codes of proteins in the alignment
ALIGNMENT_FORMAT = $\langle{\tt string:0}\rangle$ 'PIR' format of the alignment file: 'PIR' | 'PAP' | 'QUANTA' | 'INSIGHT' | 'FASTA'
REMOVE_GAPS = $\langle{\tt logical:1}\rangle$ on whether to remove all-gap positions in input alignment
ADD_SEQUENCE = $\langle{\tt logical:1}\rangle$ off whether to add the new sequences to the existing alignment
STOP_ON_ERROR = $\langle{\tt integer:1}\rangle$ 1 whether to stop on error

Output:
MODELLER_STATUS = $\langle{\tt integer:1}\rangle$, NUMB_OF_SEQUENCES, ALIGN_CODES

Description:
This command reads the sequence(s) and/or their alignment from a text file. Only sequences with the specified codes are read in; ALIGN_CODES = 'all' can be used to read all sequences.

There are several alignment formats:

  1. The 'PIR' format resembles that of the PIR sequence database. It is described in Section 2.4.1 and is used for comparative modeling because it allows for additional data about the proteins that are useful for automated access to the atomic coordinates.

  2. The 'FASTA' format resembles the 'PIR' format but has a missing second `comment' line and a missing star at the end of each sequence.

  3. The 'PAP' format is nicer to look at but contains less information and is not used by other programs. When used in conjunction with PDB files, the PDB files must contain exactly the residues in the sequences in the 'PAP' file; i.e., it is not possible to use only a segment of a PDB file. In addition, the 'PAP' protein codes must be expandable into proper PDB atom filenames, as described in Section 2.1.4. The protein sequence can now start in any column (this was limited to column 11 before release 5).

  4. The 'QUANTA' format can be used to communicate with the QUANTA program. You are not supposed to mix 'QUANTA' format with any other format because the 'QUANTA' format contains residue numbers which do not occur in the other formats and are difficult to guess correctly.

  5. The 'INSIGHT' format is very similar to the 'PAP' format and can sometimes be used to communicate with the INSIGHTII program. When used in conjunction with PDB files, the same rules as for the 'PAP' format apply.

If REMOVE_GAPS = on, positions with gaps in all selected sequences are removed from the alignment.

If ADD_SEQUENCE is on, the new sequences are added to the current ones, otherwise the old sequences are deleted.

Example:


# Example for: READ_ALIGNMENT, WRITE_ALIGNMENT, 
#              READ_ALIGNMENT2, WRITE_ALIGNMENT2,
#              CHECK_ALIGNMENT

# Read an alignment, write it out in the 'PAP' format, and 
# check the alignment of the N-1 structures as well as the 
# alignment of the N-th sequence with each of the N-1 structures.

SET OUTPUT_CONTROL = 1 1 1 1 0

READ_ALIGNMENT FILE = 'toxin.ali', ALIGN_CODES = 'all'
WRITE_ALIGNMENT FILE = 'toxin.pap', ALIGNMENT_FORMAT = 'PAP'
WRITE_ALIGNMENT FILE = 'toxin.fasta', ALIGNMENT_FORMAT = 'FASTA'
CHECK_ALIGNMENT


next up previous contents index
Next: READ_ALIGNMENT2 read Up: Comparison and searching of Previous: Alignment file format   Contents   Index
Ben Webb 2004-04-20