Modeling with cryo-EM


Step 2: Select a template

For this step of the tutorial, all input and output files can be found in the step2_select directory of the file archive available on the index page.

In the previous step, seven potential templates were identified. To select the most appropriate template for the query sequence, the Alignment.compare_structures() command can be used to assess the structural and sequence similarity between the possible templates (file "compare.py").

from modeller import *

env = Environ()
env.io.atom_files_directory = ['.', '../atom_files']

# Make a simple 1:1 alignment of 7 template structures
aln = Alignment(env)
for (pdb, chain) in (('1b8p', 'A'), ('1y7t', 'A'), ('1civ', 'A'),
                     ('5mdh', 'A'), ('7mdh', 'A'), ('3d5t', 'A'),
                     ('1smk', 'A')):
    m = Model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain))
    aln.append_model(m, atom_files=pdb, align_codes=pdb+chain)

# Sequence alignment
aln.malign()

# Structure alignment
aln.malign3d()

# Report details of the sequence/structure alignment
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)

File: compare.py

In this case, an empty alignment object 'aln' is created and then a Python 'for' loop is used to instruct MODELLER to read each of the PDB files in turn. (These PDB files must be first downloaded onto the machine, either from the PDB website itself, or from the archive linked on the index page. If the script is run from within the step2_select directory, it will automatically look for PDB files in the atom_files directory in this archive, due to the setting of atom_files_directory.) The model_segment argument is used to ask only for a single chain to be read from each PDB file (by default, all chains are read from the file). As each structure is read in, the append_model method is used to add the structure to the alignment.

At the end of the loop, all of the structures are in the alignment, but they are not ideally aligned to each other (append_model creates a simple 1:1 alignment with no gaps). Therefore, the alignment is improved by using malign to calculate a multiple sequence alignment. The malign3d command then performs an iterative least-squares superposition of the seven 3D structures, using the multiple sequence alignment as its starting point. The compare_structures command compares the structures according to the alignment constructed by malign3d. It does not make an alignment, but it calculates the RMS and DRMS deviations between atomic positions and distances, differences between the mainchain and sidechain dihedral angles, percentage sequence identities, and several other measures. Finally, the id_table command writes a file with pairwise sequence distances that can be used directly as the input to the dendrogram command (or the clustering programs in the PHYLIP package). dendrogram calculates a clustering tree from the input matrix of pairwise distances, which helps in visualizing the differences among the template candidates. This script can be run in the usual fashion; excerpts from the log file are shown below (file "compare.log").

Sequence identity comparison (ID_TABLE):

   Diagonal       ... number of residues;
   Upper triangle ... number of identical residues;
   Lower triangle ... % sequence identity, id/min(length).

         1b8pA @11y7tA @11civA @25mdhA @27mdhA @23d5tA @21smkA @2
1b8pA @1      327     201     146     152     152     249      50
1y7tA @1       61     327     158     170     160     210      58
1civA @2       45      48     374     140     304     148      55
5mdhA @2       46      52      42     333     140     164      58
7mdhA @2       46      49      87      42     351     148      50
3d5tA @2       78      65      46      51      46     321      50
1smkA @2       16      19      18      19      16      16     313


Weighted pair-group average clustering based on a distance matrix:


                                                        .---------- 1b8pA @1.9    22.0000
                                                        |
                                            .---------------------- 3d5tA @2.5    37.0000
                                            |
                                  .-------------------------------- 1y7tA @1.6    49.7500
                                  |
                             .------------------------------------- 5mdhA @2.4    55.4375
                             |
                             |                                 .--- 1civA @2.8    13.0000
                             |                                 |
        .---------------------------------------------------------- 7mdhA @2.4    82.3750
        |
      .------------------------------------------------------------ 1smkA @2.5

      +----+----+----+----+----+----+----+----+----+----+----+----+
    85.1500   72.6625   60.1750   47.6875   35.2000   22.7125   10.2250
         78.9062   66.4187   53.9313   41.4437   28.9562   16.4688

Excerpts of the file compare.log

The comparison above shows that 1civ:A and 7mdh:A are almost identical, both sequentially and structurally. However, 7mdh:A has a better crystallographic resolution (2.4Å versus 2.8Å), eliminating 1civ:A. 1smk:A is the most diverse structure of the whole set of possible templates. However, it is the one with the lowest sequence identity (34%) to the query sequence. The last group of structures (5mdh:A, 1y7t:A, 3d5t:A and 1b8p:A) share some similarities. From this group, 1y7t:A is selected because of its better crystallographic resolution (1.6Å) and higher overall sequence identity to the query sequence (45%).

On to the next step, or back to the index.