Modeling with cryo-EM


Step 3: Align sequence with structure(s)

For this step of the tutorial, all input and output files can be found in the step3_align directory of the file archive available on the index page.

In the previous step, 1y7t:A was selected as a reasonable template for modeling. The next step is to align the target (TvLDH) sequence with this structure, as MODELLER relies on this alignment to extract the necessary restraints for comparative modeling. A good way to do this is to use the align2d() command in MODELLER. Although align2d() is based on a dynamic programming algorithm, it is different from standard sequence-sequence alignment methods because it takes into account structural information from the template when constructing an alignment. This task is achieved through a variable gap penalty function that tends to place gaps in solvent exposed and curved regions, outside secondary structure segments, and between two positions that are close in space. As a result, the alignment errors are reduced by approximately one third relative to those that occur with standard sequence alignment techniques. This improvement becomes more important as the similarity between the sequences decreases and the number of gaps increases. In the current example, the template-target similarity is so high that almost any alignment method with reasonable parameters will result in the same alignment.

from modeller import *

env = Environ()
env.io.atom_files_directory = ['.', '../atom_files']

aln = Alignment(env)

# Read in the 1y7t template structure and add to alignment
mdl = Model(env, file='1y7t', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='1y7tA', atom_files='1y7t')

# Add in the TvLDH sequence
aln.append(file='../step1_search/TvLDH.ali', align_codes='TvLDH')

# Sequence/structure alignment
aln.align2d(max_gap_length=40)

# Write out resulting alignment in both PIR and PAP formats
aln.write(file='TvLDH-1y7tA.ali', alignment_format='PIR')
aln.write(file='TvLDH-1y7tA.pap', alignment_format='PAP')

File: align2d.py

This script, when run in the usual fashion, produces the file TvLDH-1y7tA.ali which will be directly usable for modeling in the next step of the tutorial. It also writes out the alignment in PAP format, as shown below. This is easier to inspect visually. Due to the high target-template similarity, there are only a few gaps in the alignment. In the PAP format, all identical positions are marked with a "*".

 _aln.pos         10        20        30        40        50        60
1y7tA     MKAPVRVAVTGAAGQIGYSLLFRIAAGEMLGKDQPVILQLLEIPQAMKALEGVVMELEDCAFPLLAGL 
TvLDH     MSEAAHVLITGAAGQIGYILSHWIASGELYG-DRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF 
 _consrvd *     *  ********* *   ** **  * *  * * ** ** **  *    ********* ***

 _aln.p   70        80        90       100       110       120       130
1y7tA     EATDDPKVAFKDADYALLVGAAPRKAGMERRDLLQVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTN 
TvLDH     VATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTN 
 _consrvd  ** *** **** * * **   * * *  * **   *  **   *  *   **  ***** *** ***

 _aln.pos  140       150       160       170       180       190       200
1y7tA     ALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSTMFPDLFHAEV--DG 
TvLDH     CEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEG 
 _consrvd   **   *  * * **     ** ***    * * *  *       *****   *  **  *     *

 _aln.pos    210       220       230       240       250       260       270
1y7tA     R--PALELVDMEWYEKVFIPTVAQRGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPS 
TvLDH     KTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPV 
 _consrvd          *       *      *   *   **  ****   *** *   *  **  *   **  *

 _aln.pos      280       290       300       310       320       330
1y7tA     -QG-EYGIPEGIVYSFPVTA-KDGAYRVVEGLEINEFARKRMEITAQELLDEME-QVKALG-LI 
TvLDH     PEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG 
 _consrvd   *  ***  * * ***    * *   ****   *   *     *   *  * *     *

File: TvLDH-1y7tA.pap

On to the next step, or back to the index.