profile.scan() -- Compare a target profile against a database of profiles

Next: profile.build() Build Up: The profile class: using Previous: profile.to_alignment() profile Contents Index

profile.scan() -- Compare a target profile against a database of profiles

profile_list_file = <str:1> '' list of profiles for PROFILE_PROFILE_SCAN

profile_format = <str:1> 'TEXT' 'TEXT' | 'BINARY' ; for READ/WRITE_PROFILE

rr_file = <str:1> '$(LIB)/as1.sim.mat' input residue-residue scoring file

matrix_offset = <float:1> 0.00 substitution matrix offset for local alignment

gap_penalties_1d = <float:2> 900 50 gap creation and extension penalties for sequence/sequence alignment

max_aln_evalue = <float:1> 0.1 Max. E-value of alignments to include in BUILD_PROFILE

aln_base_filename = <str:1> 'alignment' basename for construction of alignment filenames used by PROFILE_PROFILE_SCAN

score_statistics = <bool:1> True PROFILE_PROFILE_SCAN: if turned off, the length-normalized z-scores are not computed

output_scores = <bool:1> False whether to output individual scores in a build_profile scan

output_score_file = <str:1> 'default' output file for writing out individual scores in seqfilter

pssm_weights_type = <str:1> 'HH1' type of weighting to calculate pssm; 'HH0' | 'HH1'

write_summary = <bool:1> True whether to write summary information for PPSCAN

summary_file = <str:1> 'ppscan.sum' output file for writing PPSCAN summary

output_alignments = <bool:1> True PROFILE_PROFILE_SCAN: if turned off, no alignments will be written out.

Description:

This command scans the given target profile against a database of template profiles and reports significant alignments; the target profile should have been read previously with the profile.read() command.

The target_profile_file and all the profiles listed in profile_list_file should be in a format that is understood by profile.read().

The profile_list_file should contain absolute or relative paths to the individual template profiles, one per line.

See documentation under profile.read() for help on profile_format.

rr_file is the residue-residue substitution matrix to use when calculating the position-specific scoring matrix (PSSM). The current implemenation is optimized only for the BLOSUM62 matrix.

gap_penalties_1d are the gap penalties to use for the dynamic programming. matrix_offset is the value to be used to offset the substitution matrix. The most optimal values for these parameters are: matrix_offset = -200 gap_penalties_1d = -1900 -95

max_aln_evalue sets the threshold for the E-values. Alignments with e-values better than the threshold will be written out.

aln_base_filename sets the base filename for the alignments. The output alignment filenames will be of the form ALN_BASE_FILENAME_XXXX.ali. The XXXX is a 4-digit integer (prefixed with sufficient zeroes) that is incremented for each alignment. For example, alignment_0001.ali

score_statistics is a flag that triggers the calculation of e-values. If set to OFF, the significance estimates for the alignments will not be calculated. The calculation of alignment significance is similar to that used for profile.build(). This option can be useful when there are only a very small number of template profiles in profile_list_file, insufficient to calculate reliable statistics. Also see profile.build().

output_scores is a flag to write out the raw alignment scores, zscores and e-values for all the comparisons. output_score_file sets the name of the file to which this output should be written to.

write_summary is a flag to output a summary of all the significant alignments into the file specified by summary_file.

If output_alignments is set to OFF, alignments will not be written out.

Example: examples/commands/ppscan.py

# Example for: profile.scan()

env = environ()

# Read in the target profile
prf = profile(env, file='T3lzt-uniprot90.prf', profile_format='TEXT')

# Scan against all profiles in the 'profiles.list' file
prf.scan(profile_list_file = 'profiles.list',
         matrix_offset     = -200,
         rr_file           = '${LIB}/blosum62.sim.mat',
         gap_penalties_1d  = (-1900, -95),
         score_statistics  = False,
         output_alignments = True,
         output_scores     = False,
         output_score_file = 'T3lzt-ppscan.scores',
         profile_format    = 'TEXT',
         max_aln_evalue    = 1,
         aln_base_filename = 'T3lzt-ppscan',
         pssm_weights_type = 'HH1',
         write_summary     = True,
         summary_file      = 'T3lzt-ppscan.sum')

Next: profile.build() Build Up: The profile class: using Previous: profile.to_alignment() profile Contents Index

Ben Webb 2005-04-21

profile_list_file = `<str:1>`	`''`	list of profiles for PROFILE_PROFILE_SCAN
profile_format = `<str:1>`	`'TEXT'`	'TEXT' `\|` 'BINARY' ; for READ/WRITE_PROFILE
rr_file = `<str:1>`	`'$(LIB)/as1.sim.mat'`	input residue-residue scoring file
matrix_offset = `<float:1>`	`0.00`	substitution matrix offset for local alignment
gap_penalties_1d = `<float:2>`	`900 50`	gap creation and extension penalties for sequence/sequence alignment
max_aln_evalue = `<float:1>`	`0.1`	Max. E-value of alignments to include in BUILD_PROFILE
aln_base_filename = `<str:1>`	`'alignment'`	basename for construction of alignment filenames used by PROFILE_PROFILE_SCAN
score_statistics = `<bool:1>`	`True`	PROFILE_PROFILE_SCAN: if turned off, the length-normalized z-scores are not computed
output_scores = `<bool:1>`	`False`	whether to output individual scores in a build_profile scan
output_score_file = `<str:1>`	`'default'`	output file for writing out individual scores in seqfilter
pssm_weights_type = `<str:1>`	`'HH1'`	type of weighting to calculate pssm; `'HH0'` `\|` `'HH1'`
write_summary = `<bool:1>`	`True`	whether to write summary information for PPSCAN
summary_file = `<str:1>`	`'ppscan.sum'`	output file for writing PPSCAN summary
output_alignments = `<bool:1>`	`True`	PROFILE_PROFILE_SCAN: if turned off, no alignments will be written out.