Sample studies with MDT¶
Introduction¶
This section describes the use of MDT for updating many of the MODELLER restraint libraries, including stereochemical, non-bonded, and homology-derived restraints:
Stereochemical restraints
chemical bonds: p(Bond | BondType)
chemical angles: p(Angle | AngleType)
improper dihedral angles as defined in the CHARMM residue topology file: p(Dihedral | DihedralType)
chemical angles: p(Angle | AngleType)
the ω dihedral angle of the peptide bond: p(ω | ResidueType+1) where ResidueType+1 refers to the residue type following the residue with the ω dihedral angle
the Φ and Ψ dihedral angles: p(Φ | ResidueType), p(Ψ | ResidueType)
the sidechain dihedral angles: p(χ1 | ResidueType), p(χ2 | ResidueType), p(χ3 | ResidueType), p(χ4 | ResidueType)
the mainchain conformation: p(Φ, Ψ | ResidueType)
Non-bonded restraints
the mainchain hydrogen bonding restraints: p(h | d, a)
the non-bonded pair of atom triplets: p(d, α1, α2, θ1, θ2, θ3 | t1, t2)
Homology-derived restraints
distance: p(d | d’)
The following sections will outline the process of starting with the Protein
Data Bank (PDB) and ending up with the MODELLER restraint
library files. We will describe the rationale for the process, input data sets,
programs and scripts used, and even the analysis of the results. All of the
input files should be found in the MDT distribution, in the
constr2005
directory.
The overall approach is to construct appropriately accurate, smooth, and transformed surfaces based on the statistics in PDB for use as spatial restraints for model building. The restraints from the first iteration will be used to construct many models, which in turn will be used to re-derive the equivalent restraints from the models. These model-derived restraints will then be compared against the original PDB-derived restraints to find problems and get indications as to how to change the restraints so that models are statistically as similar to PDB structures as possible.
Stereochemical restraints¶
The sample¶
The starting point for deriving the restraints in this section consists of 9,365 chains that are representative of the 65,629 chains in the October 2005 version of PDB. The representative set was obtained by clustering all PDB chains with MODELLER, such that the representative chains are from 30 to 3000 residues in length and are sharing less than 60% sequence identity to each other (or are more than 30 residues different in length). This is the corresponding MODELLER script:
from modeller import *
import re
log.verbose()
env = Environ()
sdb = SequenceDB(env, seq_database_file='pdball.pir',
seq_database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 3000),
clean_sequences=True)
sdb.filter(rr_file='${LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, -50),
matrix_offset = -450, max_diff_res=30, seqid_cut=60,
output_grp_file='pdb_60.grp', output_cod_file='pdb_60.cod')
# Make pdb_60.pir file by copying every sequence listed in pdb_60.cod
# from pdball.pir:
out = open("pdb_60.pir", "w")
codes = [line.rstrip('\r\n') for line in file("pdb_60.cod")]
codes = dict.fromkeys(codes)
pirhead = re.compile(">P1;(.*)$")
printline = False
for line in file("pdball.pir"):
m = pirhead.match(line)
if m:
printline = m.group(1) in codes
if printline:
out.write(line)
The actual chains for restraint derivation are in fact a subset of the 9,365 representative chains, corresponding to the 4,532 crystallographic structures determined at least at 2 Å resolution (the representative structure for each group is the highest resolution x-ray structure in the group). This decision was made by looking at the distribution of the χ1 dihedral angles as a function of resolution (see Sidechain dihedral angle χ1) and the distribution of resolutions themselves for all 9,365 representative chains, using this MDT script:
from modeller import *
import mdt
import mdt.features
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=mdt.uniform_bins(20, 0, 0.2))
m = mdt.Table(mlib, features=xray)
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt2.mdt')
This script creates a Library
object and then adds an X-ray
resolution feature. Values of this feature are placed into 20 bins of width 0.2,
starting at 0. It then creates a
Table
object, which is the MDT table itself.
This starts off as an empty 1D table of the X-ray
resolution feature. It then uses a MODELLER alignment object to read the
sequences from pdb_60.pir
one by one, and for each one it
updates the X-ray resolution feature in the MDT table by calling
Table.add_alignment()
. Finally,
the table is written out to the file mdt2.mdt
using
Table.write()
.
The resulting MDT table mdt2.mdt
was then
plotted with the script:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=mdt.uniform_bins(20, 0, 0.2))
m = mdt.Table(mlib, file='mdt2.mdt')
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999, WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='asgl2-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=2, text=text, x_decimal=1)
os.system("asgl asgl2-a")
os.system("ps2pdf asgl2-a.ps")
where the Table.write_asgl()
method writes out an ASGL script and the MDT data in a form suitable for
plotting (which we then execute with ASGL using Python’s
os.system()
method). This gives an
impact of resolution plot
.
Chemical bonds¶
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
# read the bond definitions in terms of the constituting atom type pairs:
mlib.bond_classes.read('${LIB}/bndgrp.lib')
# define the features:
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
bond_type = mdt.features.BondType(mlib)
bond_length = mdt.features.BondLength(mlib,
bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, features=(xray, bond_type, bond_length))
# make the MDT table using the pdb_60 sample chains:
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
# write out the MDT table:
m.write('mdt.mdt')
In this case, we read the file bndgrp.lib
which defines all
chemical bonds, using the BondClasses.read()
method. The
MDT we then construct is a 3D table of X-ray resolution, bond type, and bond
length. The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
mlib.bond_classes.read('${LIB}/bndgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
bond_type = mdt.features.BondType(mlib)
bond_length = mdt.features.BondLength(mlib,
bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, bond_type, bond_length),
offset=(0,0,0), shape=(1,-1,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = 1. 0.0025
SET BAR_XSHIFT = 0.00125
ZOOM SCALE_WORLDX = 0.08
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=999, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of bond plots
.
Notice that here we use the Table.reshape()
method,
which can reshape a table by reordering the features, and/or reducing the bin
ranges (offset or shape) of these features. In this case we don’t change the
feature order, or the offset (leaving it at the default 0,0,0) but we do
change the shape. The first feature is restricted to only one bin - because
our X-ray resolution feature contains only two bins (for “less than
2 Å” and the undefined bin, which catches everything 2 Å or
greater) this keeps only the good structures
for our plot. The other two features have their bin ranges reduced by 1
(a negative value for shape means “reduce the size by this amount”),
which effectively removes the final (“undefined”) bin.
Inspection of the plots shows that all distributions are mono-modal, but most are distinctly non-Gaussian. However, at this point, Gaussian restraints are still favored because the ranges are very narrow and because the non-Gaussian shape of the histograms may result from the application of all the other restraints (this supposition will be tested by deriving the corresponding distributions from the models, not PDB structures). Also, although many distributions are quite symmetrical, not all of them are. Therefore, there is the question of how best to fit a restraint to the data. There are at least three possibilities, in principle: (i) calculating the average and standard deviation from all (subset) of the data, (ii) least-squares fitting of the Gaussian model to the data, and (iii) using cubic splines of the data. The first option was adopted here: the mean and standard deviation will be the parameters of the analytically defined bond restraint for MODELLER.
The final MODELLER MDT library is produced with:
from modeller import *
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
mlib.bond_classes.read('${LIB}/bndgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
bond_type = mdt.features.BondType(mlib)
bond_length = mdt.features.BondLength(mlib,
bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, bond_type, bond_length),
offset=(0,0,0), shape=(0,-1,-1))
m = m.integrate(features=(bond_type, bond_length))
mdt.write_bondlib(file('bonds.py', 'w'), m, density_cutoff=0.1)
Here we use the Table.integrate()
method, which removes a feature from the table by integrating the remaining
features over all of that feature’s bins, and the
write_bondlib()
function
to write out a MODELLER script which builds restraints using the MDT-derived
distributions.
Chemical angles¶
As for the bonds above, the MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
mlib.angle_classes.read('${LIB}/anggrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
angle_type = mdt.features.AngleType(mlib)
angle = mdt.features.Angle(mlib, bins=mdt.uniform_bins(720, 0, 0.25))
m = mdt.Table(mlib, features=(xray, angle_type, angle))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
mlib.angle_classes.read('${LIB}/anggrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
angle_type = mdt.features.AngleType(mlib)
angle = mdt.features.Angle(mlib, bins=mdt.uniform_bins(720, 0, 0.25))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, angle_type, angle),
offset=(0,0,0), shape=(1,-1,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = 0. 0.25
SET BAR_XSHIFT = 0.125
ZOOM SCALE_WORLDX = 0.08
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=999, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of angle plots
.
The situation is similar to that for the chemical bonds, except that there are also four cases of bi-modal (as opposed to mono-modal) distributions: Asp:CB:CG:OD2, Asp:OD2:CG,OD1, Pro:CB:CG:CD, and Pro:CD:N:CA angles. The Asp bi-modal distribution may result from crystallographers mislabeling carboxyl oxygens for the protonated state of the sidechain (which is interesting because the corresponding angles might be used as a means to assign the protonation state). The mean values for these angles were edited by hand. Otherwise exactly the same considerations as for bonds apply here.
The final MODELLER MDT library is produced with:
from modeller import *
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
mlib.angle_classes.read('${LIB}/anggrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
angle_type = mdt.features.AngleType(mlib)
angle = mdt.features.Angle(mlib, bins=mdt.uniform_bins(720, 0, 0.25))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, angle_type, angle),
offset=(0,0,0), shape=(0,-1,-1))
m = m.integrate(features=(angle_type, angle))
mdt.write_anglelib(file('angles.py', 'w'), m, density_cutoff=0.1)
Improper dihedral angles¶
Exactly the same situation applies as for the chemical bonds. The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
mlib.dihedral_classes.read('${LIB}/impgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
impr_type = mdt.features.DihedralType(mlib)
improper = mdt.features.Dihedral(mlib, bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, features=(xray, impr_type, improper))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
mlib.dihedral_classes.read('${LIB}/impgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
impr_type = mdt.features.DihedralType(mlib)
improper = mdt.features.Dihedral(mlib, bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, impr_type, improper),
offset=(0,0,0), shape=(1,-1,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 0.5
SET BAR_XSHIFT = 0.25
ZOOM SCALE_WORLDX = 0.08
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=999, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving
a set of improper plots
.
The final MODELLER MDT library is produced with:
from modeller import *
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
mlib.dihedral_classes.read('${LIB}/impgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
impr_type = mdt.features.DihedralType(mlib)
improper = mdt.features.Dihedral(mlib, bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, impr_type, improper),
offset=(0,0,0), shape=(1,-1,-1))
m = m.integrate(features=(impr_type, improper))
mdt.write_improperlib(file('impropers.py', 'w'), m, density_cutoff=0.1)
Sidechain dihedral angle χ1¶
The first question asked was “What is the impact of resolution on the distributions of residue χ1?”. The answer was obtained by constructing and inspecting p(χ1 | R, resolution) with:
from modeller import *
import mdt
import mdt.features
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 1.4, "under 1.4"),
(1.4, 1.6, "1.4-1.6"),
(1.6, 1.8, "1.6-1.8"),
(1.8, 2.001, "1.8-2.0")])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi1))
a = Alignment(env)
f = modfile.File('../../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
and
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 1.4, "under 1.4"),
(1.4, 1.6, "1.4-1.6"),
(1.6, 1.8, "1.6-1.8"),
(1.8, 2.001, "1.8-2.0")])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# Remove undefined bins (and gap residue type)
m = m.reshape(features=(xray, restyp, chi1), offset=m.offset, shape=(0,-2,-1))
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999, WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='asgl2-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=1)
os.system("asgl asgl2-a")
os.system("ps2pdf asgl2-a.ps")
giving
this output
which clearly shows that X-ray structures at resolution of at least 2.0 Å
are just fine. X-ray structures above that resolution and NMR structures
(whose resolution is set artificially to 0.45 Å for the purposes of MDT
tabulation) do not appear to be suitable for deriving restraints for modeling,
as the peaks are significantly wider and there is a significant population
at ~120°. Also, the peaks appear Gaussian. Thus, a weighted sum of three
Gaussians (except for Pro, which has two) was judged to be an appropriate
model for these data. Again, the following script was used to construct the
MDT table:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi1))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
and the contents then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi1), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ1 plots
.
The weights, means, and standard deviations of the Gaussians were obtained by least-squares fitting with ASGL (with the script below) and are manually added to the MODELLER MDT library.
SET TICK_FONT = 13
SET BAR_GRAYNESS = 1.00
SET CAPTION_FONT = 12
# The parameters are initial guesses
# (number of points, (weight, mean, standard deviation)_i; last weight missing),
# to help ASGL a little, but not important; just check the fitted curves
# against the data in fit.ps:
SET FIT_PARAM_INITIAL = 30000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'c.dat', POSITION = 1 0, CAPTION_TEXT = 'C'
SET FIT_PARAM_INITIAL = 118000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'd.dat', POSITION = 2 0, CAPTION_TEXT = 'D'
CALL ROUTINE = 'gauss3', FILE = 'e.dat', POSITION = 3 0, CAPTION_TEXT = 'E'
CALL ROUTINE = 'gauss3', FILE = 'f.dat', POSITION = 4 0, CAPTION_TEXT = 'F'
CALL ROUTINE = 'gauss3', FILE = 'h.dat', POSITION = 5 0, CAPTION_TEXT = 'H'
CALL ROUTINE = 'gauss3', FILE = 'i.dat', POSITION = 6 0, CAPTION_TEXT = 'I'
CALL ROUTINE = 'gauss3', FILE = 'k.dat', POSITION = 7 0, CAPTION_TEXT = 'K'
CALL ROUTINE = 'gauss3', FILE = 'l.dat', POSITION = 8 0, CAPTION_TEXT = 'L'
NEW_PAGE
SET FIT_PARAM_INITIAL = 45000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'm.dat', POSITION = 1 0, CAPTION_TEXT = 'M'
SET FIT_PARAM_INITIAL = 88000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'n.dat', POSITION = 2 0, CAPTION_TEXT = 'N'
# Pro has two peaks only, "gauss3' will still work as is:
SET FIT_PARAM_INITIAL = 95000 0.4 -30 7 0.4 40 7 0 5
CALL ROUTINE = 'gauss3', FILE = 'p.dat', POSITION = 3 0, CAPTION_TEXT = 'P'
SET FIT_PARAM_INITIAL = 76000 0.3 175 10 0.6 -65 20 62 10
CALL ROUTINE = 'gauss3', FILE = 'q.dat', POSITION = 4 0, CAPTION_TEXT = 'Q'
SET FIT_PARAM_INITIAL = 104000 0.3 175 10 0.6 -65 20 62 10
CALL ROUTINE = 'gauss3', FILE = 'r.dat', POSITION = 5 0, CAPTION_TEXT = 'R'
SET FIT_PARAM_INITIAL = 124000 0.3 175 10 0.6 -65 20 62 10
CALL ROUTINE = 'gauss3', FILE = 's.dat', POSITION = 6 0, CAPTION_TEXT = 'S'
SET FIT_PARAM_INITIAL = 112000 0.1 -175 10 0.5 -65 10 65 10
CALL ROUTINE = 'gauss3', FILE = 't.dat', POSITION = 7 0, CAPTION_TEXT = 'T'
SET FIT_PARAM_INITIAL = 147000 0.7 180 10 0.1 -65 10 65 10
CALL ROUTINE = 'gauss3', FILE = 'v.dat', POSITION = 8 0, CAPTION_TEXT = 'V'
NEW_PAGE
SET FIT_PARAM_INITIAL = 28000 0.2 175 10 0.5 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'w.dat', POSITION = 1 0, CAPTION_TEXT = 'W'
SET FIT_PARAM_INITIAL = 72000 0.2 175 10 0.7 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'y.dat', POSITION = 2 0, CAPTION_TEXT = 'Y'
SUBROUTINE ROUTINE = 'gauss3'
READ_TABLE
SET X_TICK = -180 10 180, X_TICK_LABEL = 1 6
SET Y_TICK = -999 -999 -999, Y_TICK_LABEL = -999 -999
SET XY_COLUMNS = 0 1
# only to get 1, 2, 3, 4, 5, ... in column 2
WORLD
# get the right X-axis from -180 to +180:
TRANSFORM NO_XY_SCOLUMNS = 1 0, XY_SCOLUMNS = 2, ;
TRF_TYPE = 'LINEAR', TRF_PARAMETERS = -181.25 2.5
WORLD WORLD_WINDOW = -190 0 190 -999
AXES2D
RESET_CAPTIONS
CAPTION CAPTION_POSITION 1
CAPTION CAPTION_POSITION 2, CAPTION_TEXT '@c@_1_'
CAPTION CAPTION_POSITION 3, CAPTION_TEXT 'FREQUENCY'
HIST2D
SET ERROR_COLUMN = 0
SET FIT_MODEL = POLYGAUSS360
# SET FIT_PARAM_INITIAL = 1639 0.3 175 10 0.3 -65 10 60 10
SET FIT_PARAM_INDICES = 1 2 3 4 5 6 7 8 9
FIT
SMOOTH_TABLE SMOOTH_TYPE = 'SPLINE'
PLOT2D PLOT2D_LINE_TYPE = 1, PLOT2D_SYMBOL_TYPE = 0
RETURN
END_SUBROUTINE
Sidechain dihedral angle χ2¶
The situation is very similar to that for χ1, except that the shapes of histograms are not Gaussian in most cases. Therefore, 1D cubic splines are used to represent the restraints.
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi2 = mdt.features.Chi2Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi2))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi2 = mdt.features.Chi2Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi2), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ2 plots
.
The final MODELLER MDT library is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi2 = mdt.features.Chi2Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, chi2), offset=(0,0,0), shape=(1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, chi2))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("chi2.py", "w"), m, "chi2", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
This script also uses Table.smooth()
to smooth the raw
frequencies and Table.normalize()
to convert the
distribution into a PDF. It is then converted into a statistical potential
by taking the negative log of the values (using the
Table.log_transform()
,
Table.linear_transform()
, and
Table.offset_min()
methods).
The smoothing parameter weight of 10 was selected by trial and error,
inspecting the resulting restraints in modlib-a.ps
, also produced
by the script above.
Sidechain dihedral angle χ3¶
Exactly the same considerations apply as to χ2. The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi3 = mdt.features.Chi3Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi3))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi3 = mdt.features.Chi3Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi3), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ3 plots
.
The final MODELLER MDT library is produced with:
from modeller import *
import os
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi3 = mdt.features.Chi3Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, chi3), offset=(0,0,0), shape=(1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, chi3))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("chi3.py", "w"), m, "chi3", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps
,
also produced by the script above.
Sidechain dihedral angle χ4¶
Exactly the same considerations apply as to χ2 and χ3. The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi4 = mdt.features.Chi4Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi4))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi4 = mdt.features.Chi4Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi4), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ4 plots
.
The final MODELLER MDT library is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi4 = mdt.features.Chi4Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, chi4), offset=(0,0,0), shape=(1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, chi4))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("chi4.py", "w"), m, "chi4", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps
,
also produced by the script above.
Mainchain dihedral angle Φ¶
Exactly the same considerations apply as to χ2, χ3, and χ4. The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, features=(xray, restyp, phi))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, phi), offset=(0,0,0), shape=(-1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of Φ plots
.
The final MODELLER MDT library is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, phi), offset=(0,0,0), shape=(-1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, phi))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("phi.py", "w"), m, "phi", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps
,
also produced by the script above.
Mainchain dihedral angle Ψ¶
Exactly the same considerations apply as to χ2, χ3, χ4, and Φ. The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, features=(xray, restyp, psi))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, psi), offset=(0,0,0), shape=(-1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of Ψ plots
.
The final MODELLER MDT library is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, psi), offset=(0,0,0), shape=(-1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, psi))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("psi.py", "w"), m, "psi", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps
,
also produced by the script above.
Mainchain dihedral angle ω¶
This dihedral angle is a little different from all others explored thus far
because it depends more strongly on the type of the subsequent residue than
the type of the residue whose dihedral angle is studied; that is, the ω
of the residue preceding Pro, not the Pro ω, is impacted by the Pro
residue. These dependencies are explored with MDT tables in directory
constr2005/omega/run1/
. The bottom line is that we need
to set delta to 1 when creating our
residue type
feature (rather
than the default value 0), which will make it refer to the type of the
residue after the residue with the dihedral angle ω.
The next step is to obtain the p(ω | R+1) distributions with finer sampling of 0.5°:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega = mdt.features.OmegaDihedral(mlib, bins=mdt.uniform_bins(720, -180, 0.5))
# This table uses the subsequent residue type, relative to the omega
m = mdt.Table(mlib, features=(xray, restyp_1, omega))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The distribution in raw form
is then plotted with:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega = mdt.features.OmegaDihedral(mlib, bins=mdt.uniform_bins(720, -180, 0.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp_1, omega), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 0.5
SET BAR_XSHIFT = 0.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
and in logarithmic form
with:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega = mdt.features.OmegaDihedral(mlib, bins=mdt.uniform_bins(720, -180, 0.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp_1, omega), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 0.5
SET BAR_XSHIFT = 0.25
TRANSFORM TRF_TYPE = LOGARITHMIC4, ;
TRF_PARAMETERS = 1 1, NO_XY_SCOLUMNS = 0 1, XY_SCOLUMNS = 1
"""
m.write_asgl(asglroot='asgl2-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl2-a")
os.system("ps2pdf asgl2-a.ps")
Clearly, the peaks are sharp and will best be modeled by Gaussian distributions.
Similarly to χ1, two Gaussian distributions are fit to the histograms with the following ASGL script:
SET TICK_FONT = 13
SET BAR_GRAYNESS = 1.00
SET CAPTION_FONT = 12
SET FIT_PARAM_INITIAL = 87000 0.95 179 5 0 5
CALL ROUTINE = 'gauss2', FILE = 'a.dat', POSITION = 1 0, CAPTION_TEXT = 'A'
CALL ROUTINE = 'gauss2', FILE = 'c.dat', POSITION = 2 0, CAPTION_TEXT = 'C'
CALL ROUTINE = 'gauss2', FILE = 'd.dat', POSITION = 3 0, CAPTION_TEXT = 'D'
CALL ROUTINE = 'gauss2', FILE = 'e.dat', POSITION = 4 0, CAPTION_TEXT = 'E'
CALL ROUTINE = 'gauss2', FILE = 'f.dat', POSITION = 5 0, CAPTION_TEXT = 'F'
CALL ROUTINE = 'gauss2', FILE = 'g.dat', POSITION = 6 0, CAPTION_TEXT = 'G'
CALL ROUTINE = 'gauss2', FILE = 'h.dat', POSITION = 7 0, CAPTION_TEXT = 'H'
CALL ROUTINE = 'gauss2', FILE = 'i.dat', POSITION = 8 0, CAPTION_TEXT = 'I'
NEW_PAGE
CALL ROUTINE = 'gauss2', FILE = 'k.dat', POSITION = 1 0, CAPTION_TEXT = 'K'
CALL ROUTINE = 'gauss2', FILE = 'l.dat', POSITION = 2 0, CAPTION_TEXT = 'L'
CALL ROUTINE = 'gauss2', FILE = 'm.dat', POSITION = 3 0, CAPTION_TEXT = 'M'
CALL ROUTINE = 'gauss2', FILE = 'n.dat', POSITION = 4 0, CAPTION_TEXT = 'N'
CALL ROUTINE = 'gauss2', FILE = 'p.dat', POSITION = 5 0, CAPTION_TEXT = 'P'
CALL ROUTINE = 'gauss2', FILE = 'q.dat', POSITION = 6 0, CAPTION_TEXT = 'Q'
CALL ROUTINE = 'gauss2', FILE = 'r.dat', POSITION = 7 0, CAPTION_TEXT = 'R'
CALL ROUTINE = 'gauss2', FILE = 's.dat', POSITION = 8 0, CAPTION_TEXT = 'S'
NEW_PAGE
CALL ROUTINE = 'gauss2', FILE = 't.dat', POSITION = 1 0, CAPTION_TEXT = 'T'
CALL ROUTINE = 'gauss2', FILE = 'v.dat', POSITION = 2 0, CAPTION_TEXT = 'V'
CALL ROUTINE = 'gauss2', FILE = 'w.dat', POSITION = 3 0, CAPTION_TEXT = 'W'
CALL ROUTINE = 'gauss2', FILE = 'y.dat', POSITION = 4 0, CAPTION_TEXT = 'Y'
SUBROUTINE ROUTINE = 'gauss2'
READ_TABLE
SET X_TICK = -180 10 180, X_TICK_LABEL = 1 6
SET Y_TICK = -999 -999 -999, Y_TICK_LABEL = -999 -999
SET XY_COLUMNS = 0 1
# only to get 1, 2, 3, 4, 5, ... in column 2
WORLD
# get the right X-axis from -180 to +180:
TRANSFORM NO_XY_SCOLUMNS = 1 0, XY_SCOLUMNS = 2, ;
TRF_TYPE = 'LINEAR', TRF_PARAMETERS = -180.25 0.5
WORLD WORLD_WINDOW = -190 0 190 -999
AXES2D
RESET_CAPTIONS
CAPTION CAPTION_POSITION 1
CAPTION CAPTION_POSITION 2, CAPTION_TEXT '@w@'
CAPTION CAPTION_POSITION 3, CAPTION_TEXT 'FREQUENCY'
HIST2D
SET ERROR_COLUMN = 0
SET FIT_MODEL = POLYGAUSS360
# SET FIT_PARAM_INITIAL = 1639 0.3 175 10 0.3 -65 10 60 10
SET FIT_PARAM_INDICES = 1 2 3 4 5 6
FIT
SMOOTH_TABLE SMOOTH_TYPE = 'SPLINE'
PLOT2D PLOT2D_LINE_TYPE = 1, PLOT2D_SYMBOL_TYPE = 0
RETURN
END_SUBROUTINE
The means and standard deviations for each residue type are extracted from
fit.log
by the ASGL get_prms.F
program, but they are only used to guess the means of 179.8° and 0°
and standard deviations of 1.5° and 1.5° for the two peaks,
respectively. The distribution of ω dihedral angles in the models
calculated with these ω restraints will be checked carefully and the
restraint parameters will be adapted as needed.
The weights of the peaks are not determined reliably by least-squares fitting in this case because the second weight is very close to 0 (in principle, they can even be less than zero). Therefore, they are determined separately by establishing p(cω | R+1) where cω is the class of the ω dihedral angle (1 or 2, trans or cis).
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega_class = mdt.features.OmegaClass(mlib)
# Table of the subsequent residue type relative to the omega class
m = mdt.Table(mlib, features=(xray, restyp_1, omega_class))
a = Alignment(env)
f = modfile.File('../../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega_class = mdt.features.OmegaClass(mlib)
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp_1, omega_class),
offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = 1 1
SET BAR_XSHIFT = 0.5
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving
an omega weights plot
.
The library omega.py
is edited manually to replace the
means and standard deviations with 179.8 0.0 2.3 2.3.
Mainchain dihedral angles Φ and Ψ¶
The initial runs in run1
explored Ramachandran maps
extracted from different representative sets of structures (e.g., clustered by
40% sequence identity) and stratification by the crystallographic residue
Biso as well as resolution and residue type. We ended up with
the sample and stratification described above.
The 2D histograms p(Φ, Ψ | R) are derived with:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = Environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, features=(xray, restyp, psi, phi))
a = Alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
They are plotted with
from modeller import *
import os
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, psi, phi),
offset=(0,0,0,0), shape=(1,-2,-1,-1))
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 0 0, DPLOT_BOUNDS 0.0 -999
TRANSFORM TRF_TYPE=LOGARITHMIC4, TRF_PARAMETERS=10 1
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=3, dimensions=2,
plot_position=9, every_x_numbered=12, every_y_numbered=12,
text=text, x_decimal=0, y_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of Φ/Ψ plots
.
The distributions are clearly not 2D Gaussian functions and need to be approximated by 2D cubic splines. Exploring and visualizing various smoothing options results in the following file to produce the final MODELLER MDT library:
from modeller import *
import mdt
import mdt.features
env = Environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
# Eliminate the bins corresponding to undefined values:
m = m.reshape(features=(xray, restyp, psi, phi), offset=(0,0,0,0),
shape=(1,-2,-1,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, psi, phi))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=2, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=2, dx_dy=(5., 5.), to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=2)
mdt.write_2dsplinelib(file("phipsi.py", "w"), m, density_cutoff=0.1)
The raw, smooth, and transformed surfaces are visualized and compared best with Mathematica.
Non-bonded restraints¶
A general pairwise distance- and atom-type dependent statistical potential for all atom type pairs has been derived by Min-yi Shen (DOPE). MDT could, however, be used to derive specialized pairwise non-bonded restraints.