next up previous contents index
Next: READ_ALIGNMENT2 read Up: Comparison and searching of Previous: Alignment file format   Contents   Index

READ_ALIGNMENT -- read sequences and/or their alignment

FILE = <string:1> 'default' partial or complete filename
DIRECTORY = <string:1> '' directory list (e.g., 'dir1:dir2:dir3:./:/')
ALIGN_CODES = <string:0> 'all' codes of proteins in the alignment
ALIGNMENT_FORMAT = <string:1> 'PIR' format of the alignment file: 'PIR' | 'PAP' | 'QUANTA' | 'INSIGHT' | 'FASTA'
REMOVE_GAPS = <logical:1> on whether to remove all-gap positions in input alignment
ADD_SEQUENCE = <logical:1> off whether to add the new sequences to the existing alignment
STOP_ON_ERROR = <integer:1> 1 whether to stop on error
CLOSE_FILE = <logical:1> on whether or not to close the alignment file at the end of READ_ALIGNMENT
REWIND_FILE = <logical:1> off whether or not to rewind the alignment file at the start of READ_ALIGNMENT
END_OF_FILE = <integer:1> 0 0 | 1 whether or not reached end of file during READ_ALIGNMENT

Output:
MODELLER_STATUS = <integer:1>, NUMB_OF_SEQUENCES, ALIGN_CODES

Description:
This command reads the sequence(s) and/or their alignment from a text file. Only sequences with the specified codes are read in; ALIGN_CODES = 'all' can be used to read all sequences.

There are several alignment formats:

  1. The 'PIR' format resembles that of the PIR sequence database. It is described in Section 2.4.1 and is used for comparative modeling because it allows for additional data about the proteins that are useful for automated access to the atomic coordinates.

  2. The 'FASTA' format resembles the 'PIR' format but has a missing second `comment' line and a missing star at the end of each sequence.

  3. The 'PAP' format is nicer to look at but contains less information and is not used by other programs. When used in conjunction with PDB files, the PDB files must contain exactly the residues in the sequences in the 'PAP' file; i.e., it is not possible to use only a segment of a PDB file. In addition, the 'PAP' protein codes must be expandable into proper PDB atom filenames, as described in Section 2.1.4. The protein sequence can now start in any column (this was limited to column 11 before release 5).

  4. The 'QUANTA' format can be used to communicate with the QUANTA program. You are not supposed to mix 'QUANTA' format with any other format because the 'QUANTA' format contains residue numbers which do not occur in the other formats and are difficult to guess correctly. MODELLER can write out alignments in the 'QUANTA' format but cannot read them in.

  5. The 'INSIGHT' format is very similar to the 'PAP' format and can sometimes be used to communicate with the INSIGHTII program. When used in conjunction with PDB files, the same rules as for the 'PAP' format apply.

If REMOVE_GAPS = on, positions with gaps in all selected sequences are removed from the alignment.

If ADD_SEQUENCE is on, the new sequences are added to the current ones, otherwise the old sequences are deleted.

Ordinarily, the alignment file is closed at the end of this commmand. However, when reading 'PIR' or 'FASTA' format files, if CLOSE_FILE is off, then the file is left open. Subsequent calls to READ_ALIGNMENT will then resume at this point in the file, provided they set REWIND_FILE to off. The END_OF_FILE variable is set to 1 if MODELLER reached the end of the 'PIR' or 'FASTA' file during the read, or 0 otherwise.

Example:


# Example for: READ_ALIGNMENT, WRITE_ALIGNMENT, 
#              READ_ALIGNMENT2, WRITE_ALIGNMENT2,
#              CHECK_ALIGNMENT

# Read an alignment, write it out in the 'PAP' format, and 
# check the alignment of the N-1 structures as well as the 
# alignment of the N-th sequence with each of the N-1 structures.

SET OUTPUT_CONTROL = 1 1 1 1 0

READ_ALIGNMENT FILE = 'toxin.ali', ALIGN_CODES = 'all'
WRITE_ALIGNMENT FILE = 'toxin.pap', ALIGNMENT_FORMAT = 'PAP'
WRITE_ALIGNMENT FILE = 'toxin.fasta', ALIGNMENT_FORMAT = 'FASTA'
CHECK_ALIGNMENT


next up previous contents index
Next: READ_ALIGNMENT2 read Up: Comparison and searching of Previous: Alignment file format   Contents   Index
Ben Webb 2004-10-04