Modeling with cryo-EM
Step 2: Select a template
For this step of the tutorial, all input and output files can be found in the step2_select directory of the file archive available on the index page.
In the previous step, seven potential templates were identified. To select the most appropriate template for the query sequence, the Alignment.compare_structures() command can be used to assess the structural and sequence similarity between the possible templates (file "compare.py").
from modeller import * env = Environ() env.io.atom_files_directory = ['.', '../atom_files'] # Make a simple 1:1 alignment of 7 template structures aln = Alignment(env) for (pdb, chain) in (('1b8p', 'A'), ('1y7t', 'A'), ('1civ', 'A'), ('5mdh', 'A'), ('7mdh', 'A'), ('3d5t', 'A'), ('1smk', 'A')): m = Model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain)) aln.append_model(m, atom_files=pdb, align_codes=pdb+chain) # Sequence alignment aln.malign() # Structure alignment aln.malign3d() # Report details of the sequence/structure alignment aln.compare_structures() aln.id_table(matrix_file='family.mat') env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
File: compare.py
In this case, an empty alignment object 'aln' is created and then a Python 'for' loop is used to instruct MODELLER to read each of the PDB files in turn. (These PDB files must be first downloaded onto the machine, either from the PDB website itself, or from the archive linked on the index page. If the script is run from within the step2_select directory, it will automatically look for PDB files in the atom_files directory in this archive, due to the setting of atom_files_directory.) The model_segment argument is used to ask only for a single chain to be read from each PDB file (by default, all chains are read from the file). As each structure is read in, the append_model method is used to add the structure to the alignment.
At the end of the loop, all of the structures are in the alignment, but they are not ideally aligned to each other (append_model creates a simple 1:1 alignment with no gaps). Therefore, the alignment is improved by using malign to calculate a multiple sequence alignment. The malign3d command then performs an iterative least-squares superposition of the seven 3D structures, using the multiple sequence alignment as its starting point. The compare_structures command compares the structures according to the alignment constructed by malign3d. It does not make an alignment, but it calculates the RMS and DRMS deviations between atomic positions and distances, differences between the mainchain and sidechain dihedral angles, percentage sequence identities, and several other measures. Finally, the id_table command writes a file with pairwise sequence distances that can be used directly as the input to the dendrogram command (or the clustering programs in the PHYLIP package). dendrogram calculates a clustering tree from the input matrix of pairwise distances, which helps in visualizing the differences among the template candidates. This script can be run in the usual fashion; excerpts from the log file are shown below (file "compare.log").
Sequence identity comparison (ID_TABLE): Diagonal ... number of residues; Upper triangle ... number of identical residues; Lower triangle ... % sequence identity, id/min(length). 1b8pA @11y7tA @11civA @25mdhA @27mdhA @23d5tA @21smkA @2 1b8pA @1 327 201 146 152 152 249 50 1y7tA @1 61 327 158 170 160 210 58 1civA @2 45 48 374 140 304 148 55 5mdhA @2 46 52 42 333 140 164 58 7mdhA @2 46 49 87 42 351 148 50 3d5tA @2 78 65 46 51 46 321 50 1smkA @2 16 19 18 19 16 16 313 Weighted pair-group average clustering based on a distance matrix: .---------- 1b8pA @1.9 22.0000 | .---------------------- 3d5tA @2.5 37.0000 | .-------------------------------- 1y7tA @1.6 49.7500 | .------------------------------------- 5mdhA @2.4 55.4375 | | .--- 1civA @2.8 13.0000 | | .---------------------------------------------------------- 7mdhA @2.4 82.3750 | .------------------------------------------------------------ 1smkA @2.5 +----+----+----+----+----+----+----+----+----+----+----+----+ 85.1500 72.6625 60.1750 47.6875 35.2000 22.7125 10.2250 78.9062 66.4187 53.9313 41.4437 28.9562 16.4688
Excerpts of the file compare.log
The comparison above shows that 1civ:A and 7mdh:A are almost identical, both sequentially and structurally. However, 7mdh:A has a better crystallographic resolution (2.4Å versus 2.8Å), eliminating 1civ:A. 1smk:A is the most diverse structure of the whole set of possible templates. However, it is the one with the lowest sequence identity (34%) to the query sequence. The last group of structures (5mdh:A, 1y7t:A, 3d5t:A and 1b8p:A) share some similarities. From this group, 1y7t:A is selected because of its better crystallographic resolution (1.6Å) and higher overall sequence identity to the query sequence (45%).