usage of and sequence of commands for salign instead of align2d
Hi
I want to make a homology model using the .pdb structure of a template and a target sequence (that is very similar to the template)
I have previosly used align2d (see script align2d_old.py below) to prepare homology models. but the current sequences are very long (a pentamer with 573 aa per chain) and align2d was taking a very long time (hours). so I wanted to use salign.
based on what I read in tutorials and forums I am however rather confused about the sequence of scritpts/commands to use
Essentially, I have three questions:
- what is the sequence of scripts / commands to use? do I first use salign to prepare an alignment and then align2d (or salign that emulates align2d). - what is the purpose of first using salign and then align2d ? - in the previous scripts of align2d the .pdb file was used as an input. in the new scripts that I used the .pdb file does not seem to be used. so how is the known structure of the template taken in to consideration for the alignment? Also, if no .pdb file is used then the generated alignment file does not contain information /reference to the structure file. consequently the scritpt model_build.py that I use will complain about not finding the structure information
the scripts I have previously used (align2d_old.py) and the ones I tried to use now are below. name of template: Glinker (known sequence and structure (pdb)) name of target: GSAlinker (known sequence )
thanks Evelyne
************************************************************* # profile-profile alignment using salign # based on salign_profile_profile.py # on https://salilab.org/modeller/9v7/manual/node288.html from modeller import *
log.level(1, 0, 1, 1, 1) env = environ()
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.fasta', alignment_format='FASTA')
aln.salign(rr_file='${LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, 0), output='', align_block=2, # no. of seqs. in first MSA align_what='PROFILE', alignment_type='PAIRWISE', comparison_type='PSSM', # or 'MAT' (Caution: Method NOT benchmarked # for 'MAT') similarity_flag=True, # The score matrix is not rescaled substitution=True, # The BLOSUM62 substitution values are # multiplied to the corr. coef. #write_weights=True, #output_weights_file='test.mtx', # optional, to write weight matrix smooth_prof_weight=10.0) # For mixing data with priors
#write out aligned profiles (MSA) aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR')
# Make a pairwise alignment of two sequences aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR', align_codes=('HA1onCtermVP1dc63_Glinker', 'HA1onCtermVP1dc63_GSAlinker')) aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', alignment_format='PIR') aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.pap', alignment_format='PAP') **************************************************************
************************************************************** # based on example on salign_align2d.py # on https://salilab.org/modeller/9v7/manual/node288.html # align2d/align using salign # parameters to be input by the user # 1. gap_penalties_1d # 2. gap_penalties_2d # 3. input alignment file
from modeller import * log.verbose() env = environ() env.io.atom_files_directory = ['../atom_files']
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', align_codes='all') aln.salign(rr_file='$(LIB)/as1.sim.mat', # Substitution matrix used output='', max_gap_length=20, gap_function=True, # If False then align2d not done feature_weights=(1., 0., 0., 0., 0., 0.), gap_penalties_1d=(-450, -50), gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0.0, 0.0), # d.p. score matrix #write_weights=True, output_weights_file='salign.mtx' similarity_flag=True) # Ensuring that the dynamic programming # matrix is not scaled to a difference matrix aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.ali', alignment_format='PIR' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA') aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.pap', alignment_format='PAP' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA') **************************************************************
**************************************************************
from modeller import *
log.verbose()
# create evironment and new alignment env = environ() aln = alignment(env)
# create a model for the template HA1onCtermVP1dc63_Glinker mdl = model(env, file='HA1onCtermVP1dc63_Glinker', model_segment=('FIRST:A','LAST:E')) aln.append_model(mdl, align_codes='HA1onCtermVP1dc63_Glinker', atom_files='HA1onCtermVP1dc63_Glinker.pdb')
# create a model for the target HA1onCtermVP1dc63_GSAlinker aln.append(file='HA1onCtermVP1dc63_GSAlinker.ali', align_codes='HA1onCtermVP1dc63_GSAlinker')
aln.align2d()
aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.ali', alignment_format='PIR') aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.pap', alignment_format='PAP') ************************************************************** ED Evelyne Deplazes
Reply all| Today, 11:15 AM modeller_usage@salilab.org Sent Items Hi
I want to make a homology model using the .pdb structure of a template and a target sequence (that is very similar to the template)
I have previosly used align2d (see script align2d_old.py below) to prepare homology models. but the current sequences are very long (a pentamer with 573 aa per chain) and align2d was taking a very long time (hours). so I wanted to use salign.
based on what I read in tutorials and forums I am however rather confused about the sequence of scritpts/commands to use
Essentially, I have three questions:
- what is the sequence of scripts / commands to use? do I first use salign to prepare an alignment and then align2d (or salign that emulates align2d). - what is the purpose of first using salign and then align2d ? - in the previous scripts of align2d the .pdb file was used as an input. in the new scripts that I used the .pdb file does not seem to be used. so how is the known structure of the template taken in to consideration for the alignment?
the scripts I have previously used and the ones I now use are below. name of template: Glinker (known sequence and structure (pdb)) name of target: GSAlinker (known sequence )
thanks Evelyne
************************************************************* # profile-profile alignment using salign # based on salign_profile_profile.py # on https://salilab.org/modeller/9v7/manual/node288.html from modeller import *
log.level(1, 0, 1, 1, 1) env = environ()
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.fasta', alignment_format='FASTA')
aln.salign(rr_file='${LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, 0), output='', align_block=2, # no. of seqs. in first MSA align_what='PROFILE', alignment_type='PAIRWISE', comparison_type='PSSM', # or 'MAT' (Caution: Method NOT benchmarked # for 'MAT') similarity_flag=True, # The score matrix is not rescaled substitution=True, # The BLOSUM62 substitution values are # multiplied to the corr. coef. #write_weights=True, #output_weights_file='test.mtx', # optional, to write weight matrix smooth_prof_weight=10.0) # For mixing data with priors
#write out aligned profiles (MSA) aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR')
# Make a pairwise alignment of two sequences aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR', align_codes=('HA1onCtermVP1dc63_Glinker', 'HA1onCtermVP1dc63_GSAlinker')) aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', alignment_format='PIR') aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.pap', alignment_format='PAP') **************************************************************
************************************************************** # based on example on salign_align2d.py # on https://salilab.org/modeller/9v7/manual/node288.html # align2d/align using salign # parameters to be input by the user # 1. gap_penalties_1d # 2. gap_penalties_2d # 3. input alignment file
from modeller import * log.verbose() env = environ() env.io.atom_files_directory = ['../atom_files']
aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', align_codes='all') aln.salign(rr_file='$(LIB)/as1.sim.mat', # Substitution matrix used output='', max_gap_length=20, gap_function=True, # If False then align2d not done feature_weights=(1., 0., 0., 0., 0., 0.), gap_penalties_1d=(-450, -50), gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0.0, 0.0), # d.p. score matrix #write_weights=True, output_weights_file='salign.mtx' similarity_flag=True) # Ensuring that the dynamic programming # matrix is not scaled to a difference matrix aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.ali', alignment_format='PIR' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA') aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.pap', alignment_format='PAP' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA') **************************************************************
************************************************************** #model_build.py from modeller import * from modeller.automodel import *
env = environ() a = automodel(env, alnfile='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', knowns='HA1onCtermVP1dc63_Glinker', sequence='HA1onCtermVP1dc63_GSAlinker', assess_methods=(assess.DOPE, assess.GA341)) a.starting_model = 1 a.ending_model = 5 a.make() **************************************************************
**************************************************************
#align2d_old.py from modeller import *
log.verbose()
# create evironment and new alignment env = environ() aln = alignment(env)
# create a model for the template HA1onCtermVP1dc63_Glinker mdl = model(env, file='HA1onCtermVP1dc63_Glinker', model_segment=('FIRST:A','LAST:E')) aln.append_model(mdl, align_codes='HA1onCtermVP1dc63_Glinker', atom_files='HA1onCtermVP1dc63_Glinker.pdb')
# create a model for the target HA1onCtermVP1dc63_GSAlinker aln.append(file='HA1onCtermVP1dc63_GSAlinker.ali', align_codes='HA1onCtermVP1dc63_GSAlinker')
aln.align2d()
aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.ali', alignment_format='PIR') aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.pap', alignment_format='PAP') **************************************************************
participants (1)
-
Evelyne Deplazes