alignment.malign3d() -- align two or more structures

Next: alignment.salign() align Up: The alignment class: comparison Previous: alignment.align3d() align Contents Index

alignment.malign3d() -- align two or more structures

io = <io_data> Options for reading atom files

gap_penalties_3d = <float:2> 0.0 1.75 gap creation and extension penalties for structure/structure superposition

off_diagonal = <int:1> 100 to speed up the alignment

matrix_offset = <float:1> 0.00 substitution matrix offset for local alignment

overhang = <int:1> 0 un-penalized overhangs in protein comparisons

local_alignment = <bool:1> False whether to do local as opposed to global alignment

fit_atoms = <str:1> 'CA' one atom type for superposition

fit = <bool:1> True whether to align

output = <str:1> 'LONG' 'SHORT' |'LONG' | 'VERY_LONG' | 'NO_ALIGNMENT'

write_fit = <bool:1> False whether to write out fitted coordinates to .fit files

edit_file_ext = <str:2> '.pdb' '_fit.pdb' old and new file extensions for filename construction in MALIGN3D

current_directory = <bool:1> True whether to write output .fit files to current directory

write_whole_pdb = <bool:1> True whether to write out all lines in the input PDB file

This command uses the current alignment as the starting point for an iterative least-squares superposition of two or more 3D structures. This results in a new multiple structural alignment. A good initial alignment may be obtained by sequence alignment (alignment.malign()). For superpositions, only one atom per residue is used, as specified by fit_atoms. The resulting alignment can be written to a file with the alignment.write() command. The multiply superposed coordinates remain in memory and can be used with such commands as model.transfer_xyz() if alnsequence.atom_file is not changed in the meantime. It is best to use the structure that overlaps most with all the other structures as the first protein in the alignment. This may prevent an error exit due to too few equivalent positions during framework construction.

The alignment algorithm is as follows. There are several cycles, each of which consists of an update of a framework and a calculation of a new alignment; the new alignment is based on the superposition of the structures onto the latest framework. The framework in each cycle is obtained as follows. The initial framework consists of the atoms in structure 1 that correspond to fit_atoms. If there is no specified atom types in any of the residues at a given position, the coordinates for this framework position are approximated by the neighboring coordinates. Next, all other structures are fit to this framework. The final framework for the current cycle is then obtained as an average of all the structures, in their fitted orientations, but only for residue positions that are common to all of them, given the current alignment. Another result is that all the structures are now superposed on this framework. Note that the alignment has not been changed yet. Next, the multiple alignment itself is re-derived in dynamic programming runs, where is the number of structures. This is done as follows. First, structure 2 is aligned with structure 1, using the inter-molecular atom-atom distance matrix, for all atoms of the selected type, as the weight matrix for the dynamic programming run. Next, structure 3 is aligned with an average of structures 1 and 2 using the same dynamic programming technique. Structure 4 is then aligned with an average of structures 1-3, and so on. Averages of structures - are calculated for all alignment positions where there is at least one residue in any of the structures - (this is different from a framework which requires that residues from all structures be present). Note that in this step, residues out of the current framework may get aligned and the current framework residues may get unaligned. Thus, after the series of dynamic programming runs, a new multiple alignment is obtained. This is then used in the next cycle to obtain the next framework and the next alignment. The cycles are repeated until there is no change in the number of equivalent positions. This procedure is best viewed as a way to determine the framework regions, not the whole alignment. The results from this command are expected to be similar to the output of program MNYFIT [Sutcliffe et al., 1987].

gap_penalties_3d[0] is a gap creation penalty (usually 0), and gap_penalties_3d[1] is a gap extension penalty, say 1.75. This procedure identifies pairs of positions as equivalent when they have their selected atoms at most 2 times gap_penalties_3d[1] angstroms apart in the current superposition (this is so when the gap initiation penalty is 0), as described for the alignment.align3d() command.

Argument output can contain the following values:

'SHORT', only the final framework is written to the log file.
'LONG', the framework after the alignment stage in each cycle is written to the log file.
'VERY_LONG', the framework from the framework stage in each cycle is also written to the log.

If write_fit is True, the fitted atom files are written out in their final fitted orientations. To construct the filenames, first the file extension in edit_file_ext[0] is removed (if present), and then the extension in edit_file_ext[1] is added. By default this creates files with a _fit extension.

If current_directory is True, the fitted atom files will go to the current directory. Otherwise, the output will be in the directory with the original files.

If write_whole_pdb is True, the whole PDB files are written out; otherwise only the parts corresponding to the aligned sequences are output.

If fit is False, the initial alignment is not changed. This is useful when all the structures have to be superimposed with the initial alignment (fit = False and write_fit = True).

Example: examples/commands/malign3d.py

# Example for: alignment.malign3d(), alignment.compare_structures()

# This will read all sequences from a sequence file, multiply align
# their 3D structures, and then also compare them using this alignment.

from modeller import *

env = environ()
env.io.atom_files_directory = '../atom_files'

aln = alignment(env, file='toxin.ali', align_codes='all')
aln.malign(gap_penalties_1d=(-600, -400))
aln.malign3d(gap_penalties_3d=(0, 2.0), write_fit=True, write_whole_pdb=False)
aln.write(file='toxin-str.pap', alignment_format='PAP')

# Make two comparisons: no cutoffs, and 3.5A/60 degree cutoffs for RMS, DRMS,
# and dihedral angle comparisons:
aln.compare_structures(rms_cutoffs=[999]*11)
aln.compare_structures(rms_cutoffs=(3.5, 3.5, 60, 60, 60, 60, 60, 60, 60,
                                    60, 60))

Next: alignment.salign() align Up: The alignment class: comparison Previous: alignment.align3d() align Contents Index

Ben Webb 2007-01-19

io = `<io_data>`		Options for reading atom files
gap_penalties_3d = `<float:2>`	`0.0 1.75`	gap creation and extension penalties for structure/structure superposition
off_diagonal = `<int:1>`	`100`	to speed up the alignment
matrix_offset = `<float:1>`	`0.00`	substitution matrix offset for local alignment
overhang = `<int:1>`	`0`	un-penalized overhangs in protein comparisons
local_alignment = `<bool:1>`	`False`	whether to do local as opposed to global alignment
fit_atoms = `<str:1>`	`'CA'`	one atom type for superposition
fit = `<bool:1>`	`True`	whether to align
output = `<str:1>`	`'LONG'`	`'SHORT'` `\|'LONG'` `\|` `'VERY_LONG'` `\|` `'NO_ALIGNMENT'`
write_fit = `<bool:1>`	`False`	whether to write out fitted coordinates to .fit files
edit_file_ext = `<str:2>`	`'.pdb' '_fit.pdb'`	old and new file extensions for filename construction in MALIGN3D
current_directory = `<bool:1>`	`True`	whether to write output .fit files to current directory
write_whole_pdb = `<bool:1>`	`True`	whether to write out all lines in the input PDB file