next up previous contents index
Next: MALIGN align Up: Comparison and searching of Previous: ALIGN align   Contents   Index

ALIGN2D -- align sequences with structures

RR_FILE = <string:1> '$(LIB)/as1.sim.mat' input residue-residue scoring file
DIRECTORY = <string:1> '' directory of RR_FILE
GAP_PENALTIES_1D = <real:2> 900 50 gap creation and extension penalties for sequence/sequence alignment
GAP_PENALTIES_2D = <real:9> 0.35 1.2 0.9 1.2 0.6 8.6 1.2 0 0 gap penalties for sequence/structure alignment: helix, beta, accessibility, straightness, and CA-CA distance factor, dst min, dst power, t, structure_profile ; best U,V=-450,0
ALIGN_BLOCK = <integer:1> 0 the last sequence in the first block of sequences
STOP_ON_ERROR = <integer:1> 1 whether to stop on error
OFF_DIAGONAL = <integer:1> 100 to speed up the alignment
MATRIX_OFFSET = <real:1> 0.00 substitution matrix offset for local alignment
OVERHANG = <integer:1> 0 un-penalized overhangs in protein comparisons
LOCAL_ALIGNMENT = <logical:1> off whether to do local as opposed to global alignment
ALIGN_WHAT = <string:1> 'BLOCK' what to align in ALIGN; 'BLOCK' | 'ALIGNMENT' | 'LAST' | 'PROFILE'
FIT = <logical:1> on whether to align
READ_WEIGHTS = <logical:1> off whether to read the whole NxM weight matrix for ALIGN*
WRITE_WEIGHTS = <logical:1> off whether to write the whole NxM weight matrix for ALIGN*
INPUT_WEIGHTS_FILE = <string:1> ''
OUTPUT_WEIGHTS_FILE = <string:1> ''
WEIGH_SEQUENCES = <logical:1> off whether or not to weigh sequences in a profile
SMOOTH_PROF_WEIGHT = <real:1> 10 for smoothing the profile aa frequency with a prior
READ_PROFILE = <logical:1> off whether to read str profile for ALIGN2D
INPUT_PROFILE_FILE = <string:1> ''
WRITE_PROFILE = <logical:1> off whether to write str profile for ALIGN2D
OUTPUT_PROFILE_FILE = <string:1> ''

Output:
MODELLER_STATUS = <integer:1>

Description:
This command aligns a block of sequences (second block) with a block of structures (first block). It is the same as the ALIGN command except that a variable gap opening penalty is used. This gap penalty depends on the 3D structure of all sequences in block 1. The variable gap penalty can favor gaps in exposed regions, avoid gaps within secondary structure elements, favor gaps in curved parts of the mainchain, and minimize the distance between the two ${C}_\alpha$ positions spanning a gap. The ALIGN2D command is preferred for aligning a sequence with structure(s) in comparative modeling because it tends to place gaps in a better structural context. See Section 5.1.2 for the dynamic programming algorithm that implements the variable gap penalty. GAP_PENALTIES_2D specifies parameters $\omega_H$, $\omega_S$, $\omega_B$, $\omega_C$, $\omega_D$, $d_o$, $\gamma$, $t$ and $\omega_SC$. (Section 5.1.2). The default gap penalties GAP_PENALTIES_1D ($-450, 0$) and GAP_PENALTIES_2D (0.35, 1.2, 0.9, 1.2, 0.6, 8.6, 1.2, 0.0, 0.0) as well as the RR_FILE substitution matrix ('as1.sim.mat') were found to be optimal in pairwise alignments of structures and sequences sharing from 30% to 45% sequence identity (MSM, MAM-R, RS and AŠ, in preparation).

-- move to back

The linear gap penalty function for inserting a gap in block 1 of structures is: $g = f_1(H, S, B, C, SC) u + l v$ where $u$ and $v$ are the usual gap opening and extension penalties, $l$ is gap length, and $f_1$ is a function that is at least 1, but can be larger to make gap opening more difficult in the following circumstances: between two consecutive (i.e., $i, i+1$) helical positions, two consecutive $\beta$-strand positions, two consecutive buried positions, or two consecutive positions where the mainchain is locally straight. This function is $f_1 = 1 + [\omega_H
H_i H_{i+1} + \omega_S S_i S_{i+1} + \omega_B B_i B_{i+1} +
\omega_C C_i C_{i+1} + \omega_SC SC_i SC_{i+1}]$, $H_i$ is the fraction of helical residues at position $i$ in block 1, $S_i$ is the fraction of $\beta$-strand residues at position $i$ in block 1, $B_i$ is the average relative sidechain buriedness of residues at position $i$ in block 1, $C_i$ is the average straightness of residues at position $i$ in block 1, and $SC_i$ is the strucutural conserveredness at position $i$ in block 1. See Section 2.3.18 for the definition of these features. The original straightness is modified here by assigning maximal straightness of 1 to all residues in a helix or a $\beta$-strand. The structural conservedness of the residues in block 1 are imported from an external source "input_profile_file". The structural conservedness at a particular position gives the liklehood of the occurance of a gap when structurally similar regions from all know protein structures are aligned structurally.

The linear gap penalty function for opening a gap in block 2 of sequences is: $g = f_2(H, S, B, C, D, SC) u + l v$ where $f_2$ is a function that is at least 1, but can be larger to make the gap opening in block 2 more difficult in the following circumstances: when the first gap position is aligned with a helical residue, a $\beta$-strand residue, a buried residue, extended mainchain, or when the whole gap in block 2 is spanned by two residues in block 1 that are far apart in space. This function is $f_2 = 1 + [\omega_H H_i + \omega_S S_i + \omega_B B_i +
\omega_C C_i + \omega_D \sqrt{d-d_o} + \omega_SC SC_i]$. $d$ is the distance between the two ${C}_\alpha$ atoms spanning the gap, averaged over all structures in block 1 and $d_o$ is the distance that is small enough to correspond to no increase in the opening gap penalty (e.g., 8.6).

When FIT is off, no alignment is done and the routine returns only the average structural information, which can be written out by the WRITE_ALIGNMENT command.

Example:


# Demonstrating ALIGN2D, aligning with variable gap penalty

SET OUTPUT_CONTROL = 1 1 1 1 1

READ_TOPOLOGY FILE = '$(LIB)/top_heav.lib'

# Read aligned structure(s):
READ_ALIGNMENT FILE = 'toxin.ali', ALIGN_CODES = '2ctx'
# READ_ALIGNMENT FILE = 'toxin.ali', ALIGN_CODES = '2ctx' '2abx'
SET ADD_SEQUENCE = on, ALIGN_BLOCK = NUMB_OF_SEQUENCES
# Read aligned sequence(s):
READ_ALIGNMENT FILE = 'toxin.ali', ALIGN_CODES = ALIGN_CODES '1nbt'
 
# Structure sensitive variable gap penalty sequence-sequence alignment:
SET OVERHANG = 0
# SET RR_FILE = '$(LIB)/id.sim.mat'
SET GAP_PENALTIES_1D = -450 0
SET GAP_PENALTIES_2D = 0.35 1.2 0.9 1.2 0.6 8.6 1.2 0. 0.
ALIGN2D 
WRITE_ALIGNMENT FILE  = 'align2d.ali', ALIGNMENT_FORMAT = 'PIR',
WRITE_ALIGNMENT FILE  = 'align2d.pap', ALIGNMENT_FORMAT = 'PAP', ;
 ALIGNMENT_FEATURES='INDICES HELIX BETA STRAIGHTNESS ACCESSIBILITY CONSERVATION'
CHECK_ALIGNMENT

# Color the first template structure according to gaps in alignment:
READ_ALIGNMENT FILE = 'align2d.ali', ALIGN_CODES = '2ctx' '1nbt', ;
     ALIGNMENT_FORMAT = 'PIR', ADD_SEQUENCE = off, REMOVE_GAPS = on
READ_MODEL MODEL_SEGMENT = '2ctx', FILE = '2ctx'
COLOR_ALN_MODEL
WRITE_MODEL FILE = '2ctx.aln.pdb'

# Color the first template structure according to secondary structure:
WRITE_DATA OUTPUT = 'SSM BISO_SSM', FILE = '2ctx'
WRITE_MODEL FILE = '2ctx.ssm.pdb'

# Superpose the target structure onto the first template:
READ_MODEL2 FILE = '1nbt.pdb', MODEL2_SEGMENT = '1nbt' '1nbt'
PICK_ATOMS ATOM_TYPES = 'CA'
SUPERPOSE
WRITE_MODEL2 FILE = '1nbt.fit.pdb'


next up previous contents index
Next: MALIGN align Up: Comparison and searching of Previous: ALIGN align   Contents   Index
Ben Webb 2004-10-04