MALIGN3D -- align two or more structures

Next: EXPAND_ALIGNMENT put Up: Comparison and searching of Previous: ALIGN3D align Contents Index

MALIGN3D -- align two or more structures

ALIGN_CODES = $\langle{\tt string:0}\rangle$ 'all' codes of proteins in the alignment

ATOM_FILES = $\langle{\tt string:0}\rangle$ '' complete or partial atom filenames

ATOM_FILES_DIRECTORY = $\langle{\tt string:1}\rangle$ './' input atom files directory list (e.g., 'dir1:dir2:dir3:./:/')

GAP_PENALTIES_3D = $\langle{\tt real:2}\rangle$ 0.0 1.75 gap creation and extension penalties for structure/structure superposition

OFF_DIAGONAL = $\langle{\tt integer:1}\rangle$ 100 to speed up the alignment

MATRIX_OFFSET = $\langle{\tt real:1}\rangle$ 0.00 substitution matrix offset for local alignment

OVERHANG = $\langle{\tt integer:1}\rangle$ 0 un-penalized overhangs in protein comparisons

LOCAL_ALIGNMENT = $\langle{\tt logical:1}\rangle$ off whether to do local as opposed to global alignment

FIT_ATOMS = $\langle{\tt string:1}\rangle$ 'CA' one atom type for superposition

FIT = $\langle{\tt logical:1}\rangle$ on whether to align

OUTPUT = $\langle{\tt string:1}\rangle$ 'LONG' 'SHORT' |'LONG' | 'VERY_LONG' | 'NO_ALIGNMENT'

WRITE_FIT = $\langle{\tt logical:1}\rangle$ off whether to write out fitted coordinates to .fit files

CURRENT_DIRECTORY = $\langle{\tt logical:1}\rangle$ on whether to write output .fit files to current directory

WRITE_WHOLE_PDB = $\langle{\tt logical:1}\rangle$ on whether to write out all lines in the input PDB file

STOP_ON_ERROR = $\langle{\tt integer:1}\rangle$ 1 whether to stop on error

Output:: MODELLER_STATUS = $\langle{\tt integer:1}\rangle$

Description:

This command uses the current alignment as the starting point for an iterative least-squares superposition of two or more 3D structures. This results in a new multiple structural alignment. If no alignment is in memory, the initial alignment is the 1:1 alignment. A good initial alignment may be obtained by sequence alignment (MALIGN). For superpositions, only one atom per residue is used, as specified by FIT_ATOMS. The resulting alignment can be written to a file with the WRITE_ALIGNMENT command. The multiply superposed coordinates remain in memory and can be used with such commands as TRANSFER_XYZ if ATOM_FILES is not changed in the meantime. It is best to use the structure that overlaps most with all the other structures as the first protein in the alignment. This may prevent an error exit due to too few equivalent positions during framework construction.

The alignment algorithm is as follows. There are several cycles, each of which consists of an update of a framework and a calculation of a new alignment; the new alignment is based on the superposition of the structures onto the latest framework. The framework in each cycle is obtained as follows. The initial framework consists of the atoms in structure 1 that correspond to FIT_ATOMS. If there is no specified atom types in any of the residues at a given position, the coordinates for this framework position are approximated by the neighboring coordinates. Next, all other structures are fit to this framework. The final framework for the current cycle is then obtained as an average of all the structures, in their fitted orientations, but only for residue positions that are common to all of them, given the current alignment. Another result is that all the structures are now superposed on this framework. Note that the alignment has not been changed yet. Next, the multiple alignment itself is re-derived in dynamic programming runs, where is the number of structures. This is done as follows. First, structure 2 is aligned with structure 1, using the inter-molecular atom-atom distance matrix, for all atoms of the selected type, as the weight matrix for the dynamic programming run. Next, structure 3 is aligned with an average of structures 1 and 2 using the same dynamic programming technique. Structure 4 is then aligned with an average of structures 1-3, and so on. Averages of structures - are calculated for all alignment positions where there is at least one residue in any of the structures - (this is different from a framework which requires that residues from all structures be present). Note that in this step, residues out of the current framework may get aligned and the current framework residues may get unaligned. Thus, after the series of dynamic programming runs, a new multiple alignment is obtained. This is then used in the next cycle to obtain the next framework and the next alignment. The cycles are repeated until there is no change in the number of equivalent positions. This procedure is best viewed as a way to determine the framework regions, not the whole alignment. The results from this command are expected to be similar to the output of program MNYFIT [Sutcliffe et al., 1987].

GAP_PENALTIES_3D[1] is a gap creation penalty (usually 0), and GAP_PENALTIES_3D[2] is a gap extension penalty, say 1.75. This procedure identifies pairs of positions as equivalent when they have their selected atoms at most 2 times GAP_PENALTIES_3D[2] angstroms apart in the current superposition (this is so when the gap initiation penalty is 0), as described for the ALIGN3D command.

Argument OUTPUT can contain the following values:

'SHORT', only the final framework is written to the log file.
'LONG', the framework after the alignment stage in each cycle is written to the log file.
'VERY_LONG', the framework from the framework stage in each cycle is also written to the log.

If WRITE_FIT is on, the fitted atom files are written out in their final fitted orientations. Their filenames are the original filenames with an extension .fit.

If CURRENT_DIRECTORY is on, the output .fit files will go to the current directory. Otherwise, the output will be in the directory with the original files.

If WRITE_WHOLE_PDB is on, the whole PDB files are written out; otherwise only the parts corresponding to the aligned sequences are output.

If FIT is off, the initial alignment is not changed. This is useful when all the structures have to be superimposed with the initial alignment (FIT = off and WRITE_FIT = on).

Example:

# Example for: MALIGN3D, COMPARE

# This will read all sequences from a sequence file, multiply align
# their 3D structures, and then also compare them using this alignment.

READ_ALIGNMENT FILE = 'toxin.ali', ALIGN_CODES = 'all'
MALIGN GAP_PENALTIES_1D= -600 -400
MALIGN3D GAP_PENALTIES_3D= 0 2.0, WRITE_FIT = on,  WRITE_WHOLE_PDB = off
WRITE_ALIGNMENT FILE = 'toxin-str.pap', ALIGNMENT_FORMAT = 'PAP'

# Make two comparisons: no cutoffs, and 3.5A/60 degree cutoffs for RMS, DRMS,
# and dihedral angle comparisons:
COMPARE RMS_CUTOFFS = 999 999 999 999 999 999 999 999 999 999 999
COMPARE RMS_CUTOFFS = 3.5 3.5 60 60 60 60 60 60 60 60 60

Next: EXPAND_ALIGNMENT put Up: Comparison and searching of Previous: ALIGN3D align Contents Index

Ben Webb 2004-04-20

ALIGN_CODES = $\langle{\tt string:0}\rangle$	`'all'`	codes of proteins in the alignment
ATOM_FILES = $\langle{\tt string:0}\rangle$	`''`	complete or partial atom filenames
ATOM_FILES_DIRECTORY = $\langle{\tt string:1}\rangle$	`'./'`	input atom files directory list (e.g., `'dir1:dir2:dir3:./:/'`)
GAP_PENALTIES_3D = $\langle{\tt real:2}\rangle$	`0.0 1.75`	gap creation and extension penalties for structure/structure superposition
OFF_DIAGONAL = $\langle{\tt integer:1}\rangle$	`100`	to speed up the alignment
MATRIX_OFFSET = $\langle{\tt real:1}\rangle$	`0.00`	substitution matrix offset for local alignment
OVERHANG = $\langle{\tt integer:1}\rangle$	`0`	un-penalized overhangs in protein comparisons
LOCAL_ALIGNMENT = $\langle{\tt logical:1}\rangle$	`off`	whether to do local as opposed to global alignment
FIT_ATOMS = $\langle{\tt string:1}\rangle$	`'CA'`	one atom type for superposition
FIT = $\langle{\tt logical:1}\rangle$	`on`	whether to align
OUTPUT = $\langle{\tt string:1}\rangle$	`'LONG'`	`'SHORT'` `\|'LONG'` `\|` `'VERY_LONG'` `\|` `'NO_ALIGNMENT'`
WRITE_FIT = $\langle{\tt logical:1}\rangle$	`off`	whether to write out fitted coordinates to .fit files
CURRENT_DIRECTORY = $\langle{\tt logical:1}\rangle$	`on`	whether to write output .fit files to current directory
WRITE_WHOLE_PDB = $\langle{\tt logical:1}\rangle$	`on`	whether to write out all lines in the input PDB file
STOP_ON_ERROR = $\langle{\tt integer:1}\rangle$	`1`	whether to stop on error