next up previous contents index
Next: WRITE_PROFILE write Up: Comparison and searching of Previous: PROF_TO_ALN profile   Contents   Index

READ_PROFILE -- read a profile of a sequence

FILE = <string:1> 'default' partial or complete filename
PROFILE_FORMAT = <string:1> 'TEXT' 'TEXT' | 'BINARY' ; for READ/WRITE_PROFILE

Description:
This command will read a profile from a specified file. Two formats are supported: TEXT and BINARY.

The format of the profile file (text) is as follows:


# Number of sequences:      4
# Length of profile  :     20
# N_PROF_ITERATIONS  :      3
# GAP_PENALTIES_1D   :   -900.0   -50.0
# MATRIX_OFFSET      :    0.0
# RR_FILE            : ${MODINSTALLCVS}/modlib//as1.sim.mat
    1 2ctx                                     X     0    71     1    71     0     0     0    0.    0.0     IRCFITPDITS---KDCPN-
    2 2abx                                     X     0    74     1    74     0     0     0    0.    0.0     IVCHTTATIPS-SAVTCPPG
    3 1nbt                                     X     0    66     1    66     0     0     0    0.    0.0     RTCLISPSS---TPQTCPNG
    4 1fas                                     X     0    61     1    61     0     0     0    0.    0.0     TMCYSHTTTSRAILTNCG--

The first six lines begin with a '#' in the first column and give a few general details of the profile.

The first line gives the number of sequences in the profile. The line should be in the following format: '(24x,i6)'.

The second line gives the number of positions in the profile. This should be in '(24x,i6)' format also.

The third line gives the value of the N_PROF_ITERATIONS variable. The fourth line gives the value of the GAP_PENALTIES_1D variable. The fifth line gives the value of the MATRIX_OFFSET variable. The sixth line gives the value of the RR_FILE variable.

The number of sequences in the profile and its length are used to allocate memory for the profile arrays. So they should provide an accurate description of the profile.

The values of the variables described in lines 3 through 6 are not used internally by MODELLER. But the command expects to find a total of six header lines. These records represent useful information when BUILD_PROFILE was used to construct the profile.

The remaining lines consist of the alignment of the sequences in the profile. The format of these lines is of the form: '(i5,1x,a40,1x,a1,1x,7(i5,1x),f5.0,1x,g10.2,1x,32767a1)'

The various columns that precede the sequence are:

  1. The index number of the sequence in the profile.

  2. The code of the sequence (similar to ALIGN_CODES).

  3. The type of sequence ('S' for sequence, 'X' for structure). This depends on the original source of the sequences. (See ALN_TO_PROF and READ_SEQUENCE_DB).

  4. The iteration in which the sequence was selected as significant. (See BUILD_PROFILE).

  5. The length of the database sequence.

  6. The starting position of the target sequence in the alignment.

  7. The ending position of the target sequence in the alignment.

  8. The starting position of the database sequence in the alignment.

  9. The ending position of the database sequence in the alignment.

  10. The number of equivalent positions in the alignment.

  11. The sequence identity of between the target sequence and the database sequence.

  12. The e-value of the alignment. (See BUILD_PROFILE).

  13. The sequence alignment.

Many of the fields described above are valid only when the profile that is written out is the result of BUILD_PROFILE.

Example:


# Example file for: READ_PROFILE, PROF_TO_ALN

# Read in the profile file
READ_PROFILE FILE = 'toxin.prf', PROFILE_FORMAT = 'TEXT'

# Convert the profile to alignment
PROF_TO_ALN

# Select the sequences to write out
SET ALIGN_CODES = '2ctx' '1nbt'

# Write out the alignment
WRITE_ALIGNMENT FILE = 'readprofile.pir', ALIGNMENT_FORMAT = 'PIR'


next up previous contents index
Next: WRITE_PROFILE write Up: Comparison and searching of Previous: PROF_TO_ALN profile   Contents   Index
Ben Webb 2004-10-04