Hi
I want to make a homology model using the .pdb structure of a template and a target sequence (that is very similar to the template)
I have previosly used align2d (see script align2d_old.py below) to prepare homology models. but the current sequences are very long (a pentamer with 573 aa per chain) and align2d was taking a very long time
(hours). so I wanted to use salign.
based on what I read in tutorials and forums I am however rather confused about the sequence of scritpts/commands to use
Essentially, I have three questions:
- what is the sequence of scripts / commands to use? do I first use salign to prepare an alignment and then align2d (or salign that emulates align2d).
- what is the purpose of first using salign and then align2d ?
- in the previous scripts of align2d the .pdb file was used as an input. in the new scripts that I used the .pdb file does not seem to be used. so how is the known structure of the template taken in to consideration
for the alignment? Also, if no .pdb file is used then the generated alignment file does not contain information /reference to the structure file. consequently the scritpt model_build.py that I use will complain about not finding the structure information
the scripts I have previously used (align2d_old.py) and the ones I tried to use now are below.
name of template: Glinker (known sequence and structure (pdb))
name of target: GSAlinker (known sequence )
thanks
Evelyne
*************************************************************
# profile-profile alignment using salign
# based on salign_profile_profile.py
# on https://salilab.org/modeller/9v7/manual/node288.html
from modeller import *
log.level(1, 0, 1, 1, 1)
env = environ()
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.fasta', alignment_format='FASTA')
aln.salign(rr_file='${LIB}/blosum62.sim.mat',
gap_penalties_1d=(-500, 0), output='',
align_block=2, # no. of seqs. in first MSA
align_what='PROFILE',
alignment_type='PAIRWISE',
comparison_type='PSSM', # or 'MAT' (Caution: Method NOT benchmarked
# for 'MAT')
similarity_flag=True, # The score matrix is not rescaled
substitution=True, # The BLOSUM62 substitution values are
# multiplied to the corr. coef.
#write_weights=True,
#output_weights_file='test.mtx', # optional, to write weight matrix
smooth_prof_weight=10.0) # For mixing data with priors
#write out aligned profiles (MSA)
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR')
# Make a pairwise alignment of two sequences
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR',
align_codes=('HA1onCtermVP1dc63_Glinker', 'HA1onCtermVP1dc63_GSAlinker'))
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', alignment_format='PIR')
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.pap', alignment_format='PAP')
**************************************************************
**************************************************************
# based on example on salign_align2d.py
# on https://salilab.org/modeller/9v7/manual/node288.html
# align2d/align using salign
# parameters to be input by the user
# 1. gap_penalties_1d
# 2. gap_penalties_2d
# 3. input alignment file
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = ['../atom_files']
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', align_codes='all')
aln.salign(rr_file='$(LIB)/as1.sim.mat', # Substitution matrix used
output='',
max_gap_length=20,
gap_function=True, # If False then align2d not done
feature_weights=(1., 0., 0., 0., 0., 0.),
gap_penalties_1d=(-450, -50),
gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0.0, 0.0),
# d.p. score matrix
#write_weights=True, output_weights_file='salign.mtx'
similarity_flag=True) # Ensuring that the dynamic programming
# matrix is not scaled to a difference matrix
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.ali', alignment_format='PIR' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')
aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.pap', alignment_format='PAP' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')
**************************************************************
**************************************************************
from modeller import *
log.verbose()
# create evironment and new alignment
env = environ()
aln = alignment(env)
# create a model for the template HA1onCtermVP1dc63_Glinker
mdl = model(env, file='HA1onCtermVP1dc63_Glinker', model_segment=('FIRST:A','LAST:E'))
aln.append_model(mdl, align_codes='HA1onCtermVP1dc63_Glinker', atom_files='HA1onCtermVP1dc63_Glinker.pdb')
# create a model for the target HA1onCtermVP1dc63_GSAlinker
aln.append(file='HA1onCtermVP1dc63_GSAlinker.ali', align_codes='HA1onCtermVP1dc63_GSAlinker')
aln.align2d()
aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.ali', alignment_format='PIR')
aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.pap', alignment_format='PAP')
**************************************************************