profile.scan() -- Compare a target profile against a database of profiles

Next: profile.build() Build Up: The profile class: using Previous: profile.to_alignment() profile Contents Index

profile.scan() -- Compare a target profile against a database of profiles

profile_list_file = <str:1> '' list of profiles for PROFILE_PROFILE_SCAN

profile_format = <str:1> 'TEXT' 'TEXT' | 'BINARY' ; for READ/WRITE_PROFILE

rr_file = <str:1> '$(LIB)/as1.sim.mat' input residue-residue scoring file

matrix_offset = <float:1> 0.00 substitution matrix offset for local alignment

ccmatrix_offset = <float:1> 100 Offset value for the scoring matrix in PPSCAN

gap_penalties_1d = <float:2> 900 50 gap creation and extension penalties for sequence/sequence alignment

max_aln_evalue = <float:1> 0.1 Max. E-value of alignments to include in BUILD_PROFILE

aln_base_filename = <str:1> 'alignment' basename for construction of alignment filenames used by PROFILE_PROFILE_SCAN

score_statistics = <bool:1> True PROFILE_PROFILE_SCAN: if turned off, the length-normalized z-scores are not computed

output_scores = <bool:1> False whether to output individual scores in a build_profile scan

output_score_file = <str:1> 'default' output file for writing out individual scores in seqfilter

pssm_weights_type = <str:1> 'HH1' type of weighting to calculate pssm; 'HH0' | 'HH1' | 'PSIC'

write_summary = <bool:1> True whether to write summary information for PPSCAN

summary_file = <str:1> 'ppscan.sum' output file for writing PPSCAN summary

output_alignments = <bool:1> True PROFILE_PROFILE_SCAN: if turned off, no alignments will be written out.

This command scans the given target profile against a database of template profiles and reports significant alignments; the target profile should have been read previously with the profile.read() command.

All the profiles listed in profile_list_file should be in a format that is understood by profile.read().

The profile_list_file should contain absolute or relative paths to the individual template profiles, one per line.

The template profiles can also be assembled into a PSSM database, that can then be read in for scanning. The PSSM database can be created using the make_pssmdb() command.

For the sake of both efficiency and speed, it is recommended to read in the template profiles as a database. (See example). The entire PSSM database, consisting of tens of thousands of PSSMs, can be read into the memory of an average PC.

See documentation under profile.read() for help on profile_format.

rr_file is the residue-residue substitution matrix to use when calculating the position-specific scoring matrix (PSSM). The current implemenation is optimized only for the BLOSUM62 matrix.

gap_penalties_1d are the gap penalties to use for the dynamic programming. matrix_offset is the value to be used to offset the substitution matrix (used in PSSM calculation). ccmatrix_offset is used to offset the scoring matrix during dynamic programing. The most optimal values for these parameters are:

matrix_offset = -450 (for the blosum62 matrix) ccmatrix_offset = -100 gap_penalties_1d = (-700, -70)

max_aln_evalue sets the threshold for the E-values. Alignments with e-values better than the threshold will be written out.

aln_base_filename sets the base filename for the alignments. The output alignment filenames will be of the form ALN_BASE_FILENAME_XXXX.ali. The XXXX is a 4-digit integer (prefixed with sufficient zeroes) that is incremented for each alignment. For example, alignment_0001.ali

score_statistics is a flag that triggers the calculation of e-values. If set to OFF, the significance estimates for the alignments will not be calculated. The calculation of alignment significance is similar to that used for profile.build(). This option can be useful when there are only a very small number of template profiles in profile_list_file, insufficient to calculate reliable statistics. Also see profile.build().

output_scores is a flag to write out the raw alignment scores, zscores and e-values for all the comparisons. output_score_file sets the name of the file to which this output should be written to. The various columns in the output file correspond to the following:

Index of the database profile
File name of the database profile
Length of the database profile
Logarithm of the length of the database profile
Alignment score
Length normalized z-score of the alignment
E-Value of the alignment

write_summary is a flag to output a summary of all the significant alignments into the file specified by summary_file. The format of the summary file is the following:

File name of target profile
Length of target profile
Number of the first aligned residue of the target profile
Number of the last aligned residue of the target profile
File name of the database profile
Length of the database profile
Number of the first aligned residue of the database profile
Number of the last aligned residue of the database profile
Number of equivalent positions in the alignment
Alignment score
Sequence identity of the alignment
Length normalized z-score of the alignment
E-Value of the alignment
Alignment file name

If output_alignments is set to OFF, alignments will not be written out.

In addition, the following details about every significant alignment is also written out to the log file (look for lines beginning with '>'):

File name of target profile
File name of the database profile
Length of the database profile
Alignment score
Sequence identity of the alignment
Length normalized z-score of the alignment
E-Value of the alignment

Example: examples/commands/ppscan.py

# Example for: profile.scan()

from modeller import *

env = environ()

# First create a database of PSSMs
env.make_pssmdb(profile_list_file = 'profiles.list',
                matrix_offset     = -450,
                rr_file           = '${LIB}/blosum62.sim.mat',
                pssmdb_name       = 'profiles.pssm',
                profile_format    = 'TEXT',
                pssm_weights_type = 'HH1')

# Read in the target profile
prf = profile(env, file='T3lzt-uniprot90.prf', profile_format='TEXT')

# Read the PSSM database
psm = pssmdb(env, pssmdb_name = 'profiles.pssm', pssmdb_format = 'text')

# Scan against all profiles in the 'profiles.list' file
# The score_statistics flag is set to false since there are not
# enough database profiles to calculate statistics.
prf.scan(profile_list_file = 'profiles.list',
         psm               = psm,
         matrix_offset     = -450,
         ccmatrix_offset   = -100,
         rr_file           = '${LIB}/blosum62.sim.mat',
         gap_penalties_1d  = (-700, -70),
         score_statistics  = False,
         output_alignments = True,
         output_scores     = False,
         output_score_file = 'T3lzt-ppscan.scores',
         profile_format    = 'TEXT',
         max_aln_evalue    = 1,
         aln_base_filename = 'T3lzt-ppscan',
         pssm_weights_type = 'HH1',
         write_summary     = True,
         summary_file      = 'T3lzt-ppscan.sum')

Next: profile.build() Build Up: The profile class: using Previous: profile.to_alignment() profile Contents Index

Ben Webb 2007-01-19

profile_list_file = `<str:1>`	`''`	list of profiles for PROFILE_PROFILE_SCAN
profile_format = `<str:1>`	`'TEXT'`	'TEXT' `\|` 'BINARY' ; for READ/WRITE_PROFILE
rr_file = `<str:1>`	`'$(LIB)/as1.sim.mat'`	input residue-residue scoring file
matrix_offset = `<float:1>`	`0.00`	substitution matrix offset for local alignment
ccmatrix_offset = `<float:1>`	`100`	Offset value for the scoring matrix in PPSCAN
gap_penalties_1d = `<float:2>`	`900 50`	gap creation and extension penalties for sequence/sequence alignment
max_aln_evalue = `<float:1>`	`0.1`	Max. E-value of alignments to include in BUILD_PROFILE
aln_base_filename = `<str:1>`	`'alignment'`	basename for construction of alignment filenames used by PROFILE_PROFILE_SCAN
score_statistics = `<bool:1>`	`True`	PROFILE_PROFILE_SCAN: if turned off, the length-normalized z-scores are not computed
output_scores = `<bool:1>`	`False`	whether to output individual scores in a build_profile scan
output_score_file = `<str:1>`	`'default'`	output file for writing out individual scores in seqfilter
pssm_weights_type = `<str:1>`	`'HH1'`	type of weighting to calculate pssm; `'HH0'` `\|` `'HH1'` `\|` `'PSIC'`
write_summary = `<bool:1>`	`True`	whether to write summary information for PPSCAN
summary_file = `<str:1>`	`'ppscan.sum'`	output file for writing PPSCAN summary
output_alignments = `<bool:1>`	`True`	PROFILE_PROFILE_SCAN: if turned off, no alignments will be written out.