next up previous contents index
Next: The sequence_db class: using Up: The profile class: using Previous: pssmdb() create   Contents   Index

make_pssmdb() -- Create a database of PSSMs given a list of profiles

profile_list_file = <str:1> '' list of profiles for PROFILE_PROFILE_SCAN
profile_format = <str:1> 'TEXT' 'TEXT' | 'BINARY' ; for READ/WRITE_PROFILE
rr_file = <str:1> '$(LIB)/as1.sim.mat' input residue-residue scoring file
matrix_offset = <float:1> 0.00 substitution matrix offset for local alignment
pssmdb_name = <str:1> 'profiles.pssm' Filename for PSSM database
pssm_weights_type = <str:1> 'HH1' type of weighting to calculate pssm; 'HH0' | 'HH1' | 'PSIC'

This command takes a list of profiles, specified in profile_list_file, to calculate their Position Specific Scoring Matrices (PSSM) and create a database of these PSSMs for use in profile.scan().

The profiles listed in profile_list_file should be in a format that is understood by profile.read(). For instance, like those created by profile.build() or alignment.to_profile. See documentation under profile.read() for help on profile_format.

rr_file is the residue-residue substitution matrix to use when calculating the position-specific scoring matrix (PSSM). The current implemenation is optimized only for the BLOSUM62 matrix.

matrix_offset is the value by which the scoring matrix is offset during dynamic programing. For the blosum62 matrix use a value of -450.

pssmdb_name is the name for the output PSSM database.

Example: examples/commands/ppscan.py


# Example for: profile.scan()

from modeller import *

env = environ()

# First create a database of PSSMs
env.make_pssmdb(profile_list_file = 'profiles.list',
                matrix_offset     = -450,
                rr_file           = '${LIB}/blosum62.sim.mat',
                pssmdb_name       = 'profiles.pssm',
                profile_format    = 'TEXT',
                pssm_weights_type = 'HH1')

# Read in the target profile
prf = profile(env, file='T3lzt-uniprot90.prf', profile_format='TEXT')

# Read the PSSM database
psm = pssmdb(env, pssmdb_name = 'profiles.pssm', pssmdb_format = 'text')

# Scan against all profiles in the 'profiles.list' file
# The score_statistics flag is set to false since there are not
# enough database profiles to calculate statistics.
prf.scan(profile_list_file = 'profiles.list',
         psm               = psm,
         matrix_offset     = -450,
         ccmatrix_offset   = -100,
         rr_file           = '${LIB}/blosum62.sim.mat',
         gap_penalties_1d  = (-700, -70),
         score_statistics  = False,
         output_alignments = True,
         output_scores     = False,
         output_score_file = 'T3lzt-ppscan.scores',
         profile_format    = 'TEXT',
         max_aln_evalue    = 1,
         aln_base_filename = 'T3lzt-ppscan',
         pssm_weights_type = 'HH1',
         write_summary     = True,
         summary_file      = 'T3lzt-ppscan.sum')


next up previous contents index
Next: The sequence_db class: using Up: The profile class: using Previous: pssmdb() create   Contents   Index
Ben Webb 2007-01-19