io = <io_data> | Options for reading atom files | |
gap_penalties_3d = <float:2> | 0.0 1.75 | gap creation and extension penalties for structure/structure superposition |
fit_atoms = <str:1> | 'CA' | one atom type used for superposition |
fit = <bool:1> | True | whether to align |
output = <str:1> | 'LONG' | 'SHORT' | 'LONG' | 'VERY_LONG' |
align3d_trf = <bool:1> | False | whether to transform the distances before dynamic programming |
align3d_repeat = <bool:1> | False | do several starts to maximize number of equivalent positions |
off_diagonal = <int:1> | 100 | to speed up the alignment |
matrix_offset = <float:1> | 0.00 | substitution matrix offset for local alignment |
overhang = <int:1> | 0 | un-penalized overhangs in protein comparisons |
local_alignment = <bool:1> | False | whether to do local as opposed to global alignment |
The alignment algorithm is as follows. First, structure 2 is least-squares fit on structure 1 using all the equivalent residue positions in the initial alignment that have the specified atom type. Next, the residue-residue distance matrix is obtained by calculating Euclidean distances between all pairs of selected atoms from the two structures. The alignment of the two structures is then obtained by the standard dynamic programming optimization based on the residue-residue distance matrix.
gap_penalties_3d[1] is a gap creation penalty (usually 0), and gap_penalties_3d[2] is a gap extension penalty, say 1.75. This procedure identifies pairs of residues as equivalent when they have their selected atoms at most 2 times gap_penalties_3d[2] angstroms apart in the current orientation (this is so when the gap initiation penalty is 0). The reason is that an equivalence costs the distance between the two residues while an alternative, the gap-residue and residue-gap matches, costs twice the gap extension penalty.
From the dynamic programming run, a new alignment is obtained. Thus, structure 2 can be fitted onto structure 1 again, using this new alignment, and the whole cycle is repeated until there is no change in the number of equivalent positions and until the difference in the rotation matrices for the last two superpositions is very small. At the end, the framework, that is the alignment positions without gaps, is written to the log file.
If fit is False, no alignment is done.
If output contains 'SHORT', only the best alignment and its summary are displayed. If output contains 'LONG', summaries are displayed for all initial alignments in each framework cycle. If output contains 'VERY_LONG', all alignments are displayed.
If align3d_trf is True, the weights in the weight matrix are modified distances [Subbiah et al., 1993].
If align3d_repeat is True, three additional initial alignments are tried and the one resulting in the largest number of equivalent positions is selected.
# Example for: alignment.align3d(), model.superpose() # This will align 3D structures of two proteins: log.verbose() env = environ() # First example: read sequences from a sequence file: aln = alignment(env) aln.append(file='toxin.ali', align_codes=['1fas', '2ctx']) aln.align(gap_penalties_1d=[-600, -400]) aln.align3d(gap_penalties_3d=[0, 4.0]) aln.write(file='toxin-str.ali') # Second example: read sequences from PDB files to eliminate the # need for the toxin.ali sequence file: mdl = model(env) aln = alignment(env) for code in ['1fas', '2ctx']: mdl.read(file=code) aln.append_model(mdl, align_codes=code, atom_files=code) aln.align(gap_penalties_1d=(-600, -400)) aln.align3d(gap_penalties_3d=(0, 2.0)) aln.write(file='toxin-str.ali') # And now superpose the two structures using current alignment to get # various RMS's: mdl = model(env, file='1fas') mdl.pick_atoms(aln, atom_types='CA') mdl2 = model(env, file='2ctx') mdl.superpose(mdl2, aln)