[modeller_usage] usage of and sequence of commands for salign instead of align2d

I want to make a homology model using the .pdb structure of a template and a target sequence (that is very similar to the template)

I have previosly used align2d (see script align2d_old.py below) to prepare homology models. but the current sequences are very long (a pentamer with 573 aa per chain) and align2d was taking a very long time (hours). so I wanted to use salign.

based on what I read in tutorials and forums I am however rather confused about the sequence of scritpts/commands to use

Essentially, I have three questions:

- what is the sequence of scripts / commands to use? do I first use salign to prepare an alignment and then align2d (or salign that emulates align2d).

- what is the purpose of first using salign and then align2d ?

- in the previous scripts of align2d the .pdb file was used as an input. in the new scripts that I used the .pdb file does not seem to be used. so how is the known structure of the template taken in to consideration for the alignment? Also, if no .pdb file is used then the generated alignment file does not contain information /reference to the structure file. consequently the scritpt model_build.py that I use will complain about not finding the structure information

the scripts I have previously used (align2d_old.py) and the ones I tried to use now are below.

name of template: Glinker (known sequence and structure (pdb))

name of target: GSAlinker (known sequence )

thanks

Evelyne

*************************************************************

# profile-profile alignment using salign

# based on salign_profile_profile.py

# on https://salilab.org/modeller/9v7/manual/node288.html

from modeller import *

log.level(1, 0, 1, 1, 1)

env = environ()

aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.fasta', alignment_format='FASTA')

aln.salign(rr_file='${LIB}/blosum62.sim.mat',

gap_penalties_1d=(-500, 0), output='',

align_block=2, # no. of seqs. in first MSA

align_what='PROFILE',

alignment_type='PAIRWISE',

comparison_type='PSSM', # or 'MAT' (Caution: Method NOT benchmarked

# for 'MAT')

similarity_flag=True, # The score matrix is not rescaled

substitution=True, # The BLOSUM62 substitution values are

# multiplied to the corr. coef.

#write_weights=True,

#output_weights_file='test.mtx', # optional, to write weight matrix

smooth_prof_weight=10.0) # For mixing data with priors

#write out aligned profiles (MSA)

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR')

# Make a pairwise alignment of two sequences

aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR',

align_codes=('HA1onCtermVP1dc63_Glinker', 'HA1onCtermVP1dc63_GSAlinker'))

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', alignment_format='PIR')

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.pap', alignment_format='PAP')

**************************************************************

# based on example on salign_align2d.py

# on https://salilab.org/modeller/9v7/manual/node288.html

# align2d/align using salign

# parameters to be input by the user

# 1. gap_penalties_1d

# 2. gap_penalties_2d

# 3. input alignment file

from modeller import *

log.verbose()

env = environ()

env.io.atom_files_directory = ['../atom_files']

aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', align_codes='all')

aln.salign(rr_file='$(LIB)/as1.sim.mat', # Substitution matrix used

output='',

max_gap_length=20,

gap_function=True, # If False then align2d not done

feature_weights=(1., 0., 0., 0., 0., 0.),

gap_penalties_1d=(-450, -50),

gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0.0, 0.0),

# d.p. score matrix

#write_weights=True, output_weights_file='salign.mtx'

similarity_flag=True) # Ensuring that the dynamic programming

# matrix is not scaled to a difference matrix

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.ali', alignment_format='PIR' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.pap', alignment_format='PAP' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')

**************************************************************

from modeller import *

log.verbose()

# create evironment and new alignment

env = environ()

aln = alignment(env)

# create a model for the template HA1onCtermVP1dc63_Glinker

mdl = model(env, file='HA1onCtermVP1dc63_Glinker', model_segment=('FIRST:A','LAST:E'))

aln.append_model(mdl, align_codes='HA1onCtermVP1dc63_Glinker', atom_files='HA1onCtermVP1dc63_Glinker.pdb')

# create a model for the target HA1onCtermVP1dc63_GSAlinker

aln.append(file='HA1onCtermVP1dc63_GSAlinker.ali', align_codes='HA1onCtermVP1dc63_GSAlinker')

aln.align2d()

aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.ali', alignment_format='PIR')

aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.pap', alignment_format='PAP')

**************************************************************

Evelyne Deplazes

Today, 11:15 AM

modeller_usage@salilab.org

Sent Items

I want to make a homology model using the .pdb structure of a template and a target sequence (that is very similar to the template)

based on what I read in tutorials and forums I am however rather confused about the sequence of scritpts/commands to use

Essentially, I have three questions:

- what is the sequence of scripts / commands to use? do I first use salign to prepare an alignment and then align2d (or salign that emulates align2d).

- what is the purpose of first using salign and then align2d ?

the scripts I have previously used and the ones I now use are below.

name of template: Glinker (known sequence and structure (pdb))

name of target: GSAlinker (known sequence )

thanks

Evelyne

*************************************************************

# profile-profile alignment using salign

# based on salign_profile_profile.py

# on https://salilab.org/modeller/9v7/manual/node288.html

from modeller import *

log.level(1, 0, 1, 1, 1)

env = environ()

aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.fasta', alignment_format='FASTA')

aln.salign(rr_file='${LIB}/blosum62.sim.mat',

gap_penalties_1d=(-500, 0), output='',

align_block=2, # no. of seqs. in first MSA

align_what='PROFILE',

alignment_type='PAIRWISE',

comparison_type='PSSM', # or 'MAT' (Caution: Method NOT benchmarked

# for 'MAT')

similarity_flag=True, # The score matrix is not rescaled

substitution=True, # The BLOSUM62 substitution values are

# multiplied to the corr. coef.

#write_weights=True,

#output_weights_file='test.mtx', # optional, to write weight matrix

smooth_prof_weight=10.0) # For mixing data with priors

#write out aligned profiles (MSA)

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR')

# Make a pairwise alignment of two sequences

aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker.ali', alignment_format='PIR',

align_codes=('HA1onCtermVP1dc63_Glinker', 'HA1onCtermVP1dc63_GSAlinker'))

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', alignment_format='PIR')

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.pap', alignment_format='PAP')

**************************************************************

# based on example on salign_align2d.py

# on https://salilab.org/modeller/9v7/manual/node288.html

# align2d/align using salign

# parameters to be input by the user

# 1. gap_penalties_1d

# 2. gap_penalties_2d

# 3. input alignment file

from modeller import *

log.verbose()

env = environ()

env.io.atom_files_directory = ['../atom_files']

aln = alignment(env, file='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali', align_codes='all')

aln.salign(rr_file='$(LIB)/as1.sim.mat', # Substitution matrix used

output='',

max_gap_length=20,

gap_function=True, # If False then align2d not done

feature_weights=(1., 0., 0., 0., 0., 0.),

gap_penalties_1d=(-450, -50),

gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0.0, 0.0),

# d.p. score matrix

#write_weights=True, output_weights_file='salign.mtx'

similarity_flag=True) # Ensuring that the dynamic programming

# matrix is not scaled to a difference matrix

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.ali', alignment_format='PIR' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')

aln.write(file='HA1onCtermVP1dc63_Glinker_GSAlinker_align2d.pap', alignment_format='PAP' , alignment_features=' INDICES CONSERVATION ACCURACY HELIX BETA')

**************************************************************

#model_build.py

from modeller import *

from modeller.automodel import *

env = environ()

a = automodel(env, alnfile='HA1onCtermVP1dc63_Glinker_GSAlinker_pair.ali',

knowns='HA1onCtermVP1dc63_Glinker', sequence='HA1onCtermVP1dc63_GSAlinker',

assess_methods=(assess.DOPE, assess.GA341))

a.starting_model = 1

a.ending_model = 5

a.make()

**************************************************************

#align2d_old.py

from modeller import *

log.verbose()

# create evironment and new alignment

env = environ()

aln = alignment(env)

# create a model for the template HA1onCtermVP1dc63_Glinker

mdl = model(env, file='HA1onCtermVP1dc63_Glinker', model_segment=('FIRST:A','LAST:E'))

aln.append_model(mdl, align_codes='HA1onCtermVP1dc63_Glinker', atom_files='HA1onCtermVP1dc63_Glinker.pdb')

# create a model for the target HA1onCtermVP1dc63_GSAlinker

aln.append(file='HA1onCtermVP1dc63_GSAlinker.ali', align_codes='HA1onCtermVP1dc63_GSAlinker')

aln.align2d()

aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.ali', alignment_format='PIR')

aln.write(file='HA1onCtermVP1dc63_Glinker_HA1onCtermVP1dc63_GSAlinker.pap', alignment_format='PAP')

**************************************************************