Modeling with cryo-EM


Step 7: Refine models with loop modeling

For this step of the tutorial, all input and output files can be found in the step7_loop directory of the file archive available on the index page.

In comparative modeling using templates of reasonable sequence identity (>30%) a major source of errors is the conformation of loops. These often correspond to insertions (regions of sequence in the target model that are aligned with gaps in the template). MODELLER contains a loop refinement algorithm that, rather than using the homology information from the template, uses a atomistic statistical potential derived from known protein structures (see the MODELLER manual for more information). This can be used to generate alternative conformations for a chosen loop region.

In the previous step, it could be seen that part of the structure (roughly residues 93 through 100) did not fit into the density map. Thus, these residues are candidates for loop refinement. The script loop.py below, given the fitted PDB file from the previous step, generates 5 alternative conformations for these residues.

from modeller import *
from modeller.automodel import *

# Override the regular LoopModel class to select our own loops
class MyLoop(LoopModel):
    # This routine picks the residues to be refined by loop modeling
    def select_loop_atoms(self):
        # One loop from residues 93 to 100 inclusive
        return Selection(self.residue_range('93:', '100:'))

env = Environ()

# Build 5 loop models from TvLD_1_1.pdb, and assess each one with DOPE
a = MyLoop(env, inimodel='../step6_fit/TvLD_1_1.pdb',
           sequence='TvLDH', loop_assess_methods=assess.DOPE)
a.loop.starting_model = 1
a.loop.ending_model = 5
a.make()

File: loop.py

The script outputs the five conformations, named TvLDH.BL00010001.pdb through TvLDH.BL00050001.pdb. Only the selected residues are moved in the optimization - the rest of the protein is kept fixed.

Since the input file was already fitted into the map, the generated loop conformations need not be fit again - only the CCF need be recalculated. The fitall.py script used for this purpose is very similar to that used in the previous step, and calculates the CCF both for the initial conformation and each of the generated loops.

from __future__ import print_function
from modeller import *
from modeller.scripts import complete_pdb

log.verbose()
env = Environ()
env.libs.topology.read('${LIB}/top_heav.lib')
env.libs.parameters.read('${LIB}/par.lib')

map='../step6_fit/TvLDH.10A.mrc'  
resolution=10.0  
box_size=48 
apix=1.88
x=-26.742; y=-9.5205; z=-10.375 #origin

m1 = Model(env, file='../step6_fit/TvLD_1_1.pdb')
sel = Selection(m1).only_atom_types('CA')
scal = physical.Values(default=0., em_density=1.0)
results = []
filenames = ['../step6_fit/TvLD_1_1.pdb']
for i in range(1, 6):
    filenames.append('TvLDH.BL%04d0001.pdb' % i)
for fname in filenames:
    m2 = complete_pdb(env, fname)
    aln = Alignment(env)
    aln.append_model(m1, align_codes='fitted')
    aln.append_model(m2, align_codes='model')
    sel.superpose(m2, aln)
    den = Density(env, file=map, em_density_format='MRC',
                  voxel_size=apix, resolution=resolution, em_map_size=box_size,
                  density_type='GAUSS', px=x,py=y,pz=z)
    m2.env.edat.density = den
    molpdf, terms = Selection(m2).energy(schedule_scale=scal)
    results.append((fname, -molpdf))

print("\n%-40s %12s" % ('Filename', 'CCF'))
print("\n".join(['%-40s %12.4f' % x for x in results]))

File: fitall.py

On running the script, an output file fitall.log is produced, containing CCF information similar to the following:

Filename                                          CCF
../step6_fit/TvLD_1_1.pdb                      0.9381
TvLDH.BL00010001.pdb                           0.9373
TvLDH.BL00020001.pdb                           0.9373
TvLDH.BL00030001.pdb                           0.9348
TvLDH.BL00040001.pdb                           0.9365
TvLDH.BL00050001.pdb                           0.9354

Excerpts of the file fitall.log

All of the generated loops give reasonable fits into the map. However, none gives a better fit than the original conformation. This may be due to the small number of loops produced - the conformational space that needs to be searched during loop modeling is particularly large, and for an 8 residue loop it is recommended to build on the order of 300 models to achieve a reasonable degree of sampling. On the other hand, it may be that the loop region chosen was too short, such that it was not flexible enough to move into the map, the alignment for comparative modeling is slightly incorrect, so that secondary structure elements are shifted, or the map itself contains no suitable density.

On to the next step, or back to the index.