Modeling with cryo-EM
Step 3: Align sequence with structure(s)
For this step of the tutorial, all input and output files can be found in the step3_align directory of the file archive available on the index page.
In the previous step, 1y7t:A was selected as a reasonable template for modeling. The next step is to align the target (TvLDH) sequence with this structure, as MODELLER relies on this alignment to extract the necessary restraints for comparative modeling. A good way to do this is to use the align2d() command in MODELLER. Although align2d() is based on a dynamic programming algorithm, it is different from standard sequence-sequence alignment methods because it takes into account structural information from the template when constructing an alignment. This task is achieved through a variable gap penalty function that tends to place gaps in solvent exposed and curved regions, outside secondary structure segments, and between two positions that are close in space. As a result, the alignment errors are reduced by approximately one third relative to those that occur with standard sequence alignment techniques. This improvement becomes more important as the similarity between the sequences decreases and the number of gaps increases. In the current example, the template-target similarity is so high that almost any alignment method with reasonable parameters will result in the same alignment.
from modeller import * env = Environ() env.io.atom_files_directory = ['.', '../atom_files'] aln = Alignment(env) # Read in the 1y7t template structure and add to alignment mdl = Model(env, file='1y7t', model_segment=('FIRST:A','LAST:A')) aln.append_model(mdl, align_codes='1y7tA', atom_files='1y7t') # Add in the TvLDH sequence aln.append(file='../step1_search/TvLDH.ali', align_codes='TvLDH') # Sequence/structure alignment aln.align2d(max_gap_length=40) # Write out resulting alignment in both PIR and PAP formats aln.write(file='TvLDH-1y7tA.ali', alignment_format='PIR') aln.write(file='TvLDH-1y7tA.pap', alignment_format='PAP')
File: align2d.py
This script, when run in the usual fashion, produces the file TvLDH-1y7tA.ali which will be directly usable for modeling in the next step of the tutorial. It also writes out the alignment in PAP format, as shown below. This is easier to inspect visually. Due to the high target-template similarity, there are only a few gaps in the alignment. In the PAP format, all identical positions are marked with a "*".
_aln.pos 10 20 30 40 50 60 1y7tA MKAPVRVAVTGAAGQIGYSLLFRIAAGEMLGKDQPVILQLLEIPQAMKALEGVVMELEDCAFPLLAGL TvLDH MSEAAHVLITGAAGQIGYILSHWIASGELYG-DRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF _consrvd * * ********* * ** ** * * * * ** ** ** * ********* *** _aln.p 70 80 90 100 110 120 130 1y7tA EATDDPKVAFKDADYALLVGAAPRKAGMERRDLLQVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTN TvLDH VATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTN _consrvd ** *** **** * * ** * * * * ** * ** * * ** ***** *** *** _aln.pos 140 150 160 170 180 190 200 1y7tA ALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSTMFPDLFHAEV--DG TvLDH CEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEG _consrvd ** * * * ** ** *** * * * * ***** * ** * * _aln.pos 210 220 230 240 250 260 270 1y7tA R--PALELVDMEWYEKVFIPTVAQRGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPS TvLDH KTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPV _consrvd * * * * ** **** *** * * ** * ** * _aln.pos 280 290 300 310 320 330 1y7tA -QG-EYGIPEGIVYSFPVTA-KDGAYRVVEGLEINEFARKRMEITAQELLDEME-QVKALG-LI TvLDH PEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG _consrvd * *** * * *** * * **** * * * * * * *
File: TvLDH-1y7tA.pap