alignment.align() -- align two (blocks of) sequences

Next: alignment.align2d() align Up: The alignment class: comparison Previous: alignment.compare_sequences() compare Contents Index

alignment.align() -- align two (blocks of) sequences

rr_file = <str:1> '$(LIB)/as1.sim.mat' input residue-residue scoring file

gap_penalties_1d = <float:2> 900 50 gap creation and extension penalties for sequence/sequence alignment

align_block = <int:1> 0 the last sequence in the first block of sequences

off_diagonal = <int:1> 100 to speed up the alignment

matrix_offset = <float:1> 0.00 substitution matrix offset for local alignment

overhang = <int:1> 0 un-penalized overhangs in protein comparisons

max_gap_length = <int:1> 999999 maximal length of gap in protein comparisons

local_alignment = <bool:1> False whether to do local as opposed to global alignment

align_what = <str:1> 'BLOCK' what to align in ALIGN; 'BLOCK' | 'ALIGNMENT' | 'LAST' | 'PROFILE'

read_weights = <bool:1> False whether to read the whole NxM weight matrix for ALIGN*

write_weights = <bool:1> False whether to write the whole NxM weight matrix for ALIGN*

input_weights_file = <str:1> '' Exteral weight matrix input to MODELLER (SALIGN/ALIGN)

output_weights_file = <str:1> '' File into which the weight file is wriiten (iff WRITE_WEIGHTS = 'on')

weigh_sequences = <bool:1> False whether or not to weigh sequences in a profile

smooth_prof_weight = <float:1> 10 for smoothing the profile aa frequency with a prior

This command aligns two blocks of sequences.

The two blocks of sequences to be aligned are sequences 1 to align_block and align_block+1 to the last sequence. The sequences within the two blocks should already be aligned; their alignment does not change.

The command can do either the global (similar to [Needleman & Wunsch, 1970]; local_alignment = False) or local dynamic programming alignment (similar to [Smith & Waterman, 1981]; local_alignment = True).

For the global alignment, set overhang length overhang to more than 0 so that the corresponding number of residues at either of the four termini won't be penalized by any gap penalties (this makes it a pseudo local alignment).

To speed up the calculation, set off_diagonal to a number smaller than the shortest sequence length. The alignments matching residues and with $\vert i-j\vert > {\sf off\_diagonal}\index{off\_diagonal@{\sf off\_diagonal}}$ are not considered at all in the search for the best alignment.

The gap initiation and extension penalties are specified by gap_penalties_1d. The default values of -900 -50 for the 'as1.sim.mat' similarity matrix were found to be optimal for pairwise alignments of sequences that share from 30% to 45% sequence identity (RS and AŠ , in preparation).

The residue type - residue type scores are read from file rr_file. The routine automatically determines whether it has to maximize similarity or minimize distance.

matrix_offset applies to local alignment only and influences its length. matrix_offset should be somewhere between the lowest and highest residue-residue scores. A smaller value of this parameter will make the local alignments shorter when distance is minimized, and longer when similarity is maximized. This works as follows: The recursively constructed dynamic programming comparison matrix is reset to 0 at position when the current alignment score becomes larger (distance) or smaller (similarity) than matrix_offset. Note that this is equivalent to the usual shifting of the residue-residue scoring matrix in the sense that there are two combinations of gap_penalties_1d and matrix_offset values that will give exactly the same alignments irrespective of whether the matrix is actually offset (with 0 used to restart local alignments in dynamic programming) or the matrix is not offset but matrix_offset is used as the cutoff for restarting local alignments in dynamic programming. For the same reason, the matrix offset does not have any effect on the global alignments if the gap extension penalty is also shifted for half of the matrix offset.

The position-position score is an average residue-residue score for all possible pairwise comparisons between the two blocks ( $n \times m$ comparisons are done, where and are the number of sequences in the two blocks, respectively). The first exception to this is when align_what is set to 'ALIGNMENT', in which case the two alignments defined by align_block are aligned; i.e., the score is obtained by comparing only equivalent positions between the two alignment blocks (only comparisons are done, where is the number of sequences in each of the two blocks). This option is useful in combination with alignment.compare_with() and alignment.write() for evaluation of various alignment parameters and methods. The second exception is when align_what is set to 'LAST', in which case only the last sequences in the two blocks are used to get the scores. In 'BLOCK', 'ALIGNMENT', and 'LAST' comparisons, penalty for a comparison of a gap with a residue during the calculation of the scoring matrix is obtained from the score file (gap-gap match should have a score of 0.0).

Only the 20 standard residue types, plus Asx (changes to Asn) and Glx (changes to Gln) are recognized. Every other unrecognized residue, except for a gap and a chain break, changes to Gly for comparison purposes.

For the time being, this and the other alignment commands (alignment.malign(), alignment.align2d(), alignment.align3d(), and alignment.malign3d()) remove chain break information from the CALN array, which means that chain breaks are not retained when the alignment is written to a file after executing these commands.

Example: examples/commands/align.py

# Example for: alignment.align()

# This will read two sequences, align them, and write the alignment
# to a file:

from modeller import *
log.verbose()
env = environ()

aln = alignment(env)
aln.append(file='toxin.ali', align_codes=('1fas', '2ctx'))

# The as1.sim.mat similarity matrix is used by default:
aln.align(gap_penalties_1d=(-600, -400))
aln.write(file='toxin-seq.ali')

Next: alignment.align2d() align Up: The alignment class: comparison Previous: alignment.compare_sequences() compare Contents Index

Ben Webb 2007-01-19

rr_file = `<str:1>`	`'$(LIB)/as1.sim.mat'`	input residue-residue scoring file
gap_penalties_1d = `<float:2>`	`900 50`	gap creation and extension penalties for sequence/sequence alignment
align_block = `<int:1>`	`0`	the last sequence in the first block of sequences
off_diagonal = `<int:1>`	`100`	to speed up the alignment
matrix_offset = `<float:1>`	`0.00`	substitution matrix offset for local alignment
overhang = `<int:1>`	`0`	un-penalized overhangs in protein comparisons
max_gap_length = `<int:1>`	`999999`	maximal length of gap in protein comparisons
local_alignment = `<bool:1>`	`False`	whether to do local as opposed to global alignment
align_what = `<str:1>`	`'BLOCK'`	what to align in ALIGN; `'BLOCK'` `\|` `'ALIGNMENT'` `\|` `'LAST'` `\|` `'PROFILE'`
read_weights = `<bool:1>`	`False`	whether to read the whole NxM weight matrix for ALIGN*
write_weights = `<bool:1>`	`False`	whether to write the whole NxM weight matrix for ALIGN*
input_weights_file = `<str:1>`	`''`	Exteral weight matrix input to MODELLER (SALIGN/ALIGN)
output_weights_file = `<str:1>`	`''`	File into which the weight file is wriiten (iff WRITE_WEIGHTS = 'on')
weigh_sequences = `<bool:1>`	`False`	whether or not to weigh sequences in a profile
smooth_prof_weight = `<float:1>`	`10`	for smoothing the profile aa frequency with a prior