Hi
I want to make a homology model using the .pdb structure of a template and a target sequence (that is very similar to the template)
I have previosly used align2d (see script align2d_old.py below) to prepare homology models. but the current sequences are very long (a pentamer with 573 aa per chain) and align2d was taking a very long time (hours). so I wanted to use salign.
based on what I read in tutorials and forums I am however rather confused about the sequence of scritpts/commands to use
Essentially, I have three questions:
- what is the sequence of scripts / commands to use? do I first use salign to prepare an alignment and then align2d (or salign that emulates align2d).
- what is the purpose of first using salign and then align2d ?
- in the previous scripts of align2d the .pdb file was used as an input. in the new scripts that I used the .pdb file does not seem to be used. so how is the known structure of the template taken in to consideration for the alignment?
the scripts I have previously used and the ones I now use are below.
name of template: Glinker (known sequence and structure (pdb))
name of target: GSAlinker (known sequence )
thanks
Evelyne
*************************************************************
# profile-profile alignment using salign
# based on salign_profile_profile.py
# on https://salilab.org/modeller/9v7/manual/node288.html
from modeller import *
log.level(1, 0, 1, 1, 1)
env = environ()
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.fasta', alignment_format='FASTA')
aln.salign(rr_file='${LIB}/blosum62.sim.mat',
gap_penalties_1d=(-500, 0), output='',
align_block=2, # no. of seqs. in first MSA
align_what='PROFILE',
alignment_type='PAIRWISE',
comparison_type='PSSM', # or 'MAT' (Caution: Method NOT benchmarked
# for 'MAT')
similarity_flag=True, # The score matrix is not rescaled
substitution=True, # The BLOSUM62 substitution values are
# multiplied to the corr. coef.
#write_weights=True,
#output_weights_file='test.mtx', # optional, to write weight matrix
smooth_prof_weight=10.0) # For mixing data with priors
#write out aligned profiles (MSA)
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR')
# Make a pairwise alignment of two sequences
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR',
align_codes=('HA1onCtermVP1dc63_Glinker', 'HA1onCtermVP1dc63_GSAlinker'))
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', alignment_format='PIR')
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.pap', alignment_format='PAP')
**************************************************************
**************************************************************
# based on example on salign_align2d.py # on https://salilab.org/modeller/9v7/manual/node288.html # align2d/align using salign # parameters to be input by the user # 1. gap_penalties_1d # 2. gap_penalties_2d # 3. input alignment file
from modeller import * log.verbose() env = environ() env.io.atom_files_directory = ['../atom_files']
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', align_codes='all') aln.salign(rr_file='$(LIB)/as1.sim.mat', # Substitution matrix used output='', max_gap_length=20, gap_function=True, # If False then align2d not done feature_weights=(1., 0., 0., 0., 0., 0.), gap_penalties_1d=(-450, -50), gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0.0, 0.0), # d.p. score matrix #write_weights=True, output_weights_file='salign.mtx' similarity_flag=True) # Ensuring that the dynamic programming # matrix is not scaled to a difference matrix aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.ali', alignment_format='PIR' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA') aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.pap', alignment_format='PAP' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')
**************************************************************
**************************************************************
from modeller import *
log.verbose()
# create evironment and new alignment
env = environ()
aln = alignment(env)
# create a model for the template HA1onCtermVP1dc63_Glinker
mdl = model(env, file='HA1onCtermVP1dc63_Glinker', model_segment=('FIRST:A','LAST:E'))
aln.append_model(mdl, align_codes='HA1onCtermVP1dc63_Glinker', atom_files='HA1onCtermVP1dc63_Glinker.pdb')
# create a model for the target HA1onCtermVP1dc63_GSAlinker
aln.append(file='HA1onCtermVP1dc63_GSAlinker.ali', align_codes='HA1onCtermVP1dc63_GSAlinker')
aln.align2d()
aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.ali', alignment_format='PIR')
aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.pap', alignment_format='PAP')
**************************************************************
On 10/21/16 8:15 PM, Evelyne Deplazes wrote: > I have previosly used align2d (see script align2d_old.py below) to > prepare homology models. but the current sequences are very long (a > pentamer with 573 aa per chain) and align2d was taking a very long time > (hours). so I wanted to use salign.
You probably want to adjust the value of max_gap_length; see https://salilab.org/modeller/9.17/manual/node309.html 573 residues is not that long and shouldn't take more than a few seconds to align.
You can certainly use SALIGN if you want, but the underlying algorithm is the same.
> - what is the sequence of scripts / commands to use? do I first use > salign to prepare an alignment and then align2d (or salign that emulates > align2d).
You can perform your alignments in any order you like. Often we find that a structural alignment (align3d()) does better if it's given a reasonable alignment (e.g. that from align()) as an initial starting guess.
> - in the previous scripts of align2d the .pdb file was used as an input. > in the new scripts that I used the .pdb file does not seem to be used. > so how is the known structure of the template taken in to consideration > for the alignment?
In every case you give the alignment function an alignment object ('aln' in your scripts), not structures. This contains the sequences you want to align. Functions that also need structural information will automatically read in the PDB file when needed using the header information in the PIR file.
Ben Webb, Modeller Caretaker