next up previous contents index
Next: DENDROGRAM clustering Up: Comparison and searching of Previous: ID_TABLE calculate   Contents   Index

SEQUENCE_COMPARISON -- compare sequences in alignment

RR_FILE = <string:1> '$(LIB)/as1.sim.mat' input residue-residue scoring file
DIRECTORY = <string:1> '' directory list (e.g., 'dir1:dir2:dir3:./:/')
MATRIX_FILE = <string:1> 'family.mat' the filename of the pairwise distance matrix
VARIABILITY_FILE = <string:1> 'undefined' output filename
OUTPUT_DIRECTORY = <string:1> '' output directory
ALIGN_CODES = <string:0> 'all' codes of proteins in the alignment
MAX_GAPS_MATCH = <integer:1> 1

Description:
The pairwise similarity of sequences in the current alignment is evaluated using a user specified residue-residue scores file.

The residue-residue scores, including gap-residue, and gap-gap scores, are read from file RR_FILE. The sequence pair score is equal to the average pairwise residue-residue score for all alignment positions that have at most MAX_GAPS_MATCH gaps (1 by default). If the gap-residue and gap-gap scores are not defined in MATRIX_FILE, they are set to the worst and best residue-residue score, respectively. If MATRIX_FILE is a similarity matrix, it is converted into a distance matrix ( $x' = -x + x_{\mbox{max}}$).

The comparison matrix is written in the PHYLIP format to file MATRIX_FILE.

The family variability as a function of alignment position is calculated as the RMS deviation of all residue - residue scores at a given position, but only for those pairs of residues that have at most MAX_GAPS_MATCH gaps (0, 1, or 2). The variability is written to file VARIABILITY_FILE, as is the number of pairwise comparisons contributing to each positional variability.

Example: See ID_TABLE command.


next up previous contents index
Next: DENDROGRAM clustering Up: Comparison and searching of Previous: ID_TABLE calculate   Contents   Index
Ben Webb 2004-10-04