Introduction
Note
Should have plots of raw data histograms superposed on the
final restraints in all cases.
This section describes the use of MDT for updating many of the
MODELLER restraint libraries, including stereochemical, non-bonded, and
homology-derived restraints:
- Stereochemical restraints
- chemical bonds: p(Bond | BondType)
- chemical angles: p(Angle | AngleType)
- improper dihedral angles as defined in the CHARMM residue
topology file: p(Dihedral | DihedralType)
- chemical angles: p(Angle | AngleType)
- the ω dihedral angle of the peptide bond:
p(ω | ResidueType+1) where ResidueType+1 refers to the residue type
following the residue with the ω dihedral angle
- the Φ and Ψ dihedral angles:
p(Φ | ResidueType), p(Ψ | ResidueType)
- the sidechain dihedral angles:
p(χ1 | ResidueType), p(χ2 | ResidueType), p(χ3 | ResidueType),
p(χ4 | ResidueType)
- the mainchain conformation:
p(Φ, Ψ | ResidueType)
- Non-bonded restraints
- the mainchain hydrogen bonding restraints:
p(h | d, a)
- the non-bonded pair of atom triplets:
p(d, α1, α2, θ1, θ2, θ3 | t1, t2)
- Homology-derived restraints
The following sections will outline the process of starting with the Protein
Data Bank (PDB) and ending up with the MODELLER restraint
library files. We will describe the rationale for the process, input data sets,
programs and scripts used, and even the analysis of the results. All of the
input files should be found in the MDT distribution, in the
constr2005 directory.
The overall approach is to construct appropriately accurate, smooth, and
transformed surfaces based on the statistics in PDB for use as spatial
restraints for model building. The restraints from the first iteration will be
used to construct many models, which in turn will be used to re-derive the
equivalent restraints from the models. These model-derived restraints will
then be compared against the original PDB-derived restraints to find
problems and get indications as to how to change the restraints so that
models are statistically as similar to PDB structures as possible.
Stereochemical restraints
The sample
The starting point for deriving the restraints in this section consists of
9,365 chains that are representative of the 65,629 chains in the October 2005
version of PDB. The representative set was obtained by clustering all PDB
chains with MODELLER, such that the representative chains are from 30 to
3000 residues in length and are sharing less than 60% sequence identity to
each other (or are more than 30 residues different in length). This
is the corresponding MODELLER script:
from modeller import *
import re
log.verbose()
env = environ()
sdb = sequence_db(env, seq_database_file='pdball.pir',
seq_database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 3000),
clean_sequences=True)
sdb.filter(rr_file='${LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, -50),
matrix_offset = -450, max_diff_res=30, seqid_cut=60,
output_grp_file='pdb_60.grp', output_cod_file='pdb_60.cod')
# Make pdb_60.pir file by copying every sequence listed in pdb_60.cod
# from pdball.pir:
out = file("pdb_60.pir", "w")
codes = [line.rstrip('\r\n') for line in file("pdb_60.cod")]
codes = dict.fromkeys(codes)
pirhead = re.compile(">P1;(.*)$")
printline = False
for line in file("pdball.pir"):
m = pirhead.match(line)
if m:
printline = m.group(1) in codes
if printline:
out.write(line)
The actual chains for restraint derivation are in fact a subset of the 9,365
representative chains, corresponding to the 4,532 crystallographic structures
determined at least at 2 Å resolution (the representative structure for
each group is the highest resolution x-ray structure in the group). This
decision was made by looking at the distribution of the
χ1 dihedral angles as a function of resolution
(see Sidechain dihedral angle χ1) and the distribution of resolutions themselves
for all 9,365 representative chains, using this MDT script:
from modeller import *
import mdt
import mdt.features
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=mdt.uniform_bins(20, 0, 0.2))
m = mdt.Table(mlib, features=xray)
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt2.mdt')
This script creates a Library object and then adds an X-ray
resolution feature. Values of this feature are placed into 20 bins of width 0.2,
starting at 0. It then creates a
Table object, which is the MDT table itself.
This starts off as an empty 1D table of the X-ray
resolution feature. It then uses a MODELLER alignment object to read the
sequences from pdb_60.pir one by one, and for each one it
updates the X-ray resolution feature in the MDT table by calling
Table.add_alignment(). Finally,
the table is written out to the file mdt2.mdt using
Table.write().
The resulting MDT table mdt2.mdt was then
plotted with the script:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=mdt.uniform_bins(20, 0, 0.2))
m = mdt.Table(mlib, file='mdt2.mdt')
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999, WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='asgl2-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=2, text=text, x_decimal=1)
os.system("asgl asgl2-a")
os.system("ps2pdf asgl2-a.ps")
where the Table.write_asgl()
method writes out an ASGL script and the MDT data in a form suitable for
plotting (which we then execute with ASGL using Python’s
os.system() method). This gives an
impact of resolution plot.
Chemical bonds
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
# read the bond definitions in terms of the constituting atom type pairs:
mlib.bond_classes.read('${LIB}/bndgrp.lib')
# define the features:
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
bond_type = mdt.features.BondType(mlib)
bond_length = mdt.features.BondLength(mlib,
bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, features=(xray, bond_type, bond_length))
# make the MDT table using the pdb_60 sample chains:
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
# write out the MDT table:
m.write('mdt.mdt')
In this case, we read the file bndgrp.lib which defines all
chemical bonds, using the BondClasses.read() method. The
MDT we then construct is a 3D table of X-ray resolution, bond type, and bond
length. The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
mlib.bond_classes.read('${LIB}/bndgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
bond_type = mdt.features.BondType(mlib)
bond_length = mdt.features.BondLength(mlib,
bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, bond_type, bond_length),
offset=(0,0,0), shape=(1,-1,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = 1. 0.0025
SET BAR_XSHIFT = 0.00125
ZOOM SCALE_WORLDX = 0.08
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=999, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of bond plots. Notice that
here we use the Table.reshape() method,
which can reshape a table by reordering the features, and/or reducing the bin
ranges (offset or shape) of these features. In this case we don’t change the
feature order, or the offset (leaving it at the default 0,0,0) but we do
change the shape. The first feature is restricted to only one bin - because
our X-ray resolution feature contains only two bins (for “less than
2 Å” and the undefined bin, which catches everything 2 Å or
greater) this keeps only the good structures
for our plot. The other two features have their bin ranges reduced by 1
(a negative value for shape means “reduce the size by this amount”),
which effectively removes the final (“undefined”) bin.
Inspection of the plots shows that all distributions are mono-modal, but most
are distinctly non-Gaussian. However, at this point, Gaussian restraints are
still favored because the ranges are very narrow and because the non-Gaussian
shape of the histograms may result from the application of all the other
restraints (this supposition will be tested by deriving the corresponding
distributions from the models, not PDB structures). Also, although many
distributions are quite symmetrical, not all of them are. Therefore, there is
the question of how best to fit a restraint to the data. There are at least
three possibilities, in principle: (i) calculating the average and standard
deviation from all (subset) of the data, (ii) least-squares fitting of the
Gaussian model to the data, and (iii) using cubic splines of the data. The
first option was adopted here: the mean and standard deviation will be the
parameters of the analytically defined bond restraint for MODELLER.
The final MODELLER MDT library is produced with:
from modeller import *
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
mlib.bond_classes.read('${LIB}/bndgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
bond_type = mdt.features.BondType(mlib)
bond_length = mdt.features.BondLength(mlib,
bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, bond_type, bond_length),
offset=(0,0,0), shape=(0,-1,-1))
m = m.integrate(features=(bond_type, bond_length))
mdt.write_bondlib(file('bonds.py', 'w'), m, density_cutoff=0.1)
Here we use the Table.integrate()
method, which removes a feature from the table by integrating the remaining
features over all of that feature’s bins, and the
write_bondlib() function
to write out a MODELLER script which builds restraints using the MDT-derived
distributions.
Chemical angles
As for the bonds above, the MDT table is constructed with the following
MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
mlib.angle_classes.read('${LIB}/anggrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
angle_type = mdt.features.AngleType(mlib)
angle = mdt.features.Angle(mlib, bins=mdt.uniform_bins(720, 0, 0.25))
m = mdt.Table(mlib, features=(xray, angle_type, angle))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
mlib.angle_classes.read('${LIB}/anggrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
angle_type = mdt.features.AngleType(mlib)
angle = mdt.features.Angle(mlib, bins=mdt.uniform_bins(720, 0, 0.25))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, angle_type, angle),
offset=(0,0,0), shape=(1,-1,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = 0. 0.25
SET BAR_XSHIFT = 0.125
ZOOM SCALE_WORLDX = 0.08
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=999, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of angle plots.
The situation is similar to that for the chemical bonds, except that there are
also four cases of bi-modal (as opposed to mono-modal) distributions:
Asp:CB:CG:OD2, Asp:OD2:CG,OD1, Pro:CB:CG:CD, and Pro:CD:N:CA angles. The Asp
bi-modal distribution may result from crystallographers mislabeling carboxyl
oxygens for the protonated state of the sidechain (which is interesting
because the corresponding angles might be used as a means to assign the
protonation state). The mean values for these angles were edited by hand.
Otherwise exactly the same considerations as for bonds apply here.
The final MODELLER MDT library is produced with:
from modeller import *
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
mlib.angle_classes.read('${LIB}/anggrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
angle_type = mdt.features.AngleType(mlib)
angle = mdt.features.Angle(mlib, bins=mdt.uniform_bins(720, 0, 0.25))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, angle_type, angle),
offset=(0,0,0), shape=(0,-1,-1))
m = m.integrate(features=(angle_type, angle))
mdt.write_anglelib(file('angles.py', 'w'), m, density_cutoff=0.1)
Improper dihedral angles
Exactly the same situation applies as for the chemical bonds.
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
mlib.dihedral_classes.read('${LIB}/impgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
impr_type = mdt.features.DihedralType(mlib)
improper = mdt.features.Dihedral(mlib, bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, features=(xray, impr_type, improper))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
mlib.dihedral_classes.read('${LIB}/impgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
impr_type = mdt.features.DihedralType(mlib)
improper = mdt.features.Dihedral(mlib, bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, impr_type, improper),
offset=(0,0,0), shape=(1,-1,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 0.5
SET BAR_XSHIFT = 0.25
ZOOM SCALE_WORLDX = 0.08
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=999, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of improper plots.
The final MODELLER MDT library is produced with:
from modeller import *
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
mlib.dihedral_classes.read('${LIB}/impgrp.lib')
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
impr_type = mdt.features.DihedralType(mlib)
improper = mdt.features.Dihedral(mlib, bins=mdt.uniform_bins(400, 1.0, 0.0025))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, impr_type, improper),
offset=(0,0,0), shape=(1,-1,-1))
m = m.integrate(features=(impr_type, improper))
mdt.write_improperlib(file('impropers.py', 'w'), m, density_cutoff=0.1)
Sidechain dihedral angle χ1
The first question asked was “What is the impact of resolution on the
distributions of residue χ1?”. The answer was
obtained by constructing and inspecting
p(χ1 | R, resolution) with:
from modeller import *
import mdt
import mdt.features
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 1.4, "under 1.4"),
(1.4, 1.6, "1.4-1.6"),
(1.6, 1.8, "1.6-1.8"),
(1.8, 2.001, "1.8-2.0")])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi1))
a = alignment(env)
f = modfile.File('../../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
and
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 1.4, "under 1.4"),
(1.4, 1.6, "1.4-1.6"),
(1.6, 1.8, "1.6-1.8"),
(1.8, 2.001, "1.8-2.0")])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# Remove undefined bins (and gap residue type)
m = m.reshape(features=(xray, restyp, chi1), offset=m.offset, shape=(0,-2,-1))
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999, WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='asgl2-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=1)
os.system("asgl asgl2-a")
os.system("ps2pdf asgl2-a.ps")
giving this output
which clearly shows that X-ray structures at resolution of at least 2.0 Å
are just fine. X-ray structures above that resolution and NMR structures
(whose resolution is set artificially to 0.45 Å for the purposes of MDT
tabulation) do not appear to be suitable for deriving restraints for modeling,
as the peaks are significantly wider and there is a significant population
at ~120°. Also, the peaks appear Gaussian. Thus, a weighted sum of three
Gaussians (except for Pro, which has two) was judged to be an appropriate
model for these data. Again, the following script was used to construct the
MDT table:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi1))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
and the contents then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi1 = mdt.features.Chi1Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi1), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ1 plots.
The weights, means, and standard deviations of the Gaussians were obtained
by least-squares fitting with ASGL (with the script below) and are manually
added to the MODELLER MDT library.
SET TICK_FONT = 13
SET BAR_GRAYNESS = 1.00
SET CAPTION_FONT = 12
# The parameters are initial guesses
# (number of points, (weight, mean, standard deviation)_i; last weight missing),
# to help ASGL a little, but not important; just check the fitted curves
# against the data in fit.ps:
SET FIT_PARAM_INITIAL = 30000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'c.dat', POSITION = 1 0, CAPTION_TEXT = 'C'
SET FIT_PARAM_INITIAL = 118000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'd.dat', POSITION = 2 0, CAPTION_TEXT = 'D'
CALL ROUTINE = 'gauss3', FILE = 'e.dat', POSITION = 3 0, CAPTION_TEXT = 'E'
CALL ROUTINE = 'gauss3', FILE = 'f.dat', POSITION = 4 0, CAPTION_TEXT = 'F'
CALL ROUTINE = 'gauss3', FILE = 'h.dat', POSITION = 5 0, CAPTION_TEXT = 'H'
CALL ROUTINE = 'gauss3', FILE = 'i.dat', POSITION = 6 0, CAPTION_TEXT = 'I'
CALL ROUTINE = 'gauss3', FILE = 'k.dat', POSITION = 7 0, CAPTION_TEXT = 'K'
CALL ROUTINE = 'gauss3', FILE = 'l.dat', POSITION = 8 0, CAPTION_TEXT = 'L'
NEW_PAGE
SET FIT_PARAM_INITIAL = 45000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'm.dat', POSITION = 1 0, CAPTION_TEXT = 'M'
SET FIT_PARAM_INITIAL = 88000 0.3 175 10 0.3 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'n.dat', POSITION = 2 0, CAPTION_TEXT = 'N'
# Pro has two peaks only, "gauss3' will still work as is:
SET FIT_PARAM_INITIAL = 95000 0.4 -30 7 0.4 40 7 0 5
CALL ROUTINE = 'gauss3', FILE = 'p.dat', POSITION = 3 0, CAPTION_TEXT = 'P'
SET FIT_PARAM_INITIAL = 76000 0.3 175 10 0.6 -65 20 62 10
CALL ROUTINE = 'gauss3', FILE = 'q.dat', POSITION = 4 0, CAPTION_TEXT = 'Q'
SET FIT_PARAM_INITIAL = 104000 0.3 175 10 0.6 -65 20 62 10
CALL ROUTINE = 'gauss3', FILE = 'r.dat', POSITION = 5 0, CAPTION_TEXT = 'R'
SET FIT_PARAM_INITIAL = 124000 0.3 175 10 0.6 -65 20 62 10
CALL ROUTINE = 'gauss3', FILE = 's.dat', POSITION = 6 0, CAPTION_TEXT = 'S'
SET FIT_PARAM_INITIAL = 112000 0.1 -175 10 0.5 -65 10 65 10
CALL ROUTINE = 'gauss3', FILE = 't.dat', POSITION = 7 0, CAPTION_TEXT = 'T'
SET FIT_PARAM_INITIAL = 147000 0.7 180 10 0.1 -65 10 65 10
CALL ROUTINE = 'gauss3', FILE = 'v.dat', POSITION = 8 0, CAPTION_TEXT = 'V'
NEW_PAGE
SET FIT_PARAM_INITIAL = 28000 0.2 175 10 0.5 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'w.dat', POSITION = 1 0, CAPTION_TEXT = 'W'
SET FIT_PARAM_INITIAL = 72000 0.2 175 10 0.7 -65 10 60 10
CALL ROUTINE = 'gauss3', FILE = 'y.dat', POSITION = 2 0, CAPTION_TEXT = 'Y'
SUBROUTINE ROUTINE = 'gauss3'
READ_TABLE
SET X_TICK = -180 10 180, X_TICK_LABEL = 1 6
SET Y_TICK = -999 -999 -999, Y_TICK_LABEL = -999 -999
SET XY_COLUMNS = 0 1
# only to get 1, 2, 3, 4, 5, ... in column 2
WORLD
# get the right X-axis from -180 to +180:
TRANSFORM NO_XY_SCOLUMNS = 1 0, XY_SCOLUMNS = 2, ;
TRF_TYPE = 'LINEAR', TRF_PARAMETERS = -181.25 2.5
WORLD WORLD_WINDOW = -190 0 190 -999
AXES2D
RESET_CAPTIONS
CAPTION CAPTION_POSITION 1
CAPTION CAPTION_POSITION 2, CAPTION_TEXT '@c@_1_'
CAPTION CAPTION_POSITION 3, CAPTION_TEXT 'FREQUENCY'
HIST2D
SET ERROR_COLUMN = 0
SET FIT_MODEL = POLYGAUSS360
# SET FIT_PARAM_INITIAL = 1639 0.3 175 10 0.3 -65 10 60 10
SET FIT_PARAM_INDICES = 1 2 3 4 5 6 7 8 9
FIT
SMOOTH_TABLE SMOOTH_TYPE = 'SPLINE'
PLOT2D PLOT2D_LINE_TYPE = 1, PLOT2D_SYMBOL_TYPE = 0
RETURN
END_SUBROUTINE
Sidechain dihedral angle χ2
The situation is very similar to that for χ1,
except that the shapes of histograms are not Gaussian in most cases.
Therefore, 1D cubic splines are used to represent the restraints.
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi2 = mdt.features.Chi2Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi2))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi2 = mdt.features.Chi2Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi2), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ2 plots.
The final MODELLER MDT library is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi2 = mdt.features.Chi2Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, chi2), offset=(0,0,0), shape=(1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, chi2))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("chi2.py", "w"), m, "chi2", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
This script also uses Table.smooth() to smooth the raw
frequencies and Table.normalize() to convert the
distribution into a PDF. It is then converted into a statistical potential
by taking the negative log of the values (using the
Table.log_transform(),
Table.linear_transform(), and
Table.offset_min() methods).
The smoothing parameter weight of 10 was selected by trial and error,
inspecting the resulting restraints in modlib-a.ps, also produced
by the script above.
Sidechain dihedral angle χ3
Exactly the same considerations apply as to χ2.
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi3 = mdt.features.Chi3Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi3))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi3 = mdt.features.Chi3Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi3), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ3 plots. The final MODELLER MDT library is
produced with:
from modeller import *
import os
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi3 = mdt.features.Chi3Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, chi3), offset=(0,0,0), shape=(1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, chi3))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("chi3.py", "w"), m, "chi3", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps,
also produced by the script above.
Sidechain dihedral angle χ4
Exactly the same considerations apply as to χ2 and χ3. The MDT table is
constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi4 = mdt.features.Chi4Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, features=(xray, restyp, chi4))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi4 = mdt.features.Chi4Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, chi4), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of χ4 plots. The final MODELLER MDT library is
produced with:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
chi4 = mdt.features.Chi4Dihedral(mlib, bins=mdt.uniform_bins(144, -180, 2.5))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, chi4), offset=(0,0,0), shape=(1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, chi4))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("chi4.py", "w"), m, "chi4", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps,
also produced by the script above.
Mainchain dihedral angle Φ
Exactly the same considerations apply as to χ2, χ3, and χ4. The MDT
table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, features=(xray, restyp, phi))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, phi), offset=(0,0,0), shape=(-1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of Φ plots. The final MODELLER MDT library
is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, phi), offset=(0,0,0), shape=(-1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, phi))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("phi.py", "w"), m, "phi", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps,
also produced by the script above.
Mainchain dihedral angle Ψ
Exactly the same considerations apply as to χ2, χ3, χ4, and Φ.
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, features=(xray, restyp, psi))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, psi), offset=(0,0,0), shape=(-1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 2.5
SET BAR_XSHIFT = 1.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of Ψ plots. The final MODELLER MDT library
is produced with:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
# remove the bins corresponding to undefined values for each of the 3 variables:
m = m.reshape(features=(xray, restyp, psi), offset=(0,0,0), shape=(-1,-2,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, psi))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=1, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=1, dx_dy=2.5, to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=1)
mdt.write_splinelib(file("psi.py", "w"), m, "psi", density_cutoff=0.1)
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
"""
m.write_asgl(asglroot='modlib-a', plot_type='PLOT2D', every_x_numbered=20,
text=text, dimensions=1, plot_position=1, plots_per_page=8)
os.system('asgl modlib-a')
The resulting restraints are plotted in modlib-a.ps,
also produced by the script above.
Mainchain dihedral angle ω
This dihedral angle is a little different from all others explored thus far
because it depends more strongly on the type of the subsequent residue than
the type of the residue whose dihedral angle is studied; that is, the ω
of the residue preceding Pro, not the Pro ω, is impacted by the Pro
residue. These dependencies are explored with MDT tables in directory
constr2005/omega/run1/. The bottom line is that we need
to set delta to 1 when creating our
residue type feature (rather
than the default value 0), which will make it refer to the type of the
residue after the residue with the dihedral angle ω.
The next step is to obtain the p(ω | R+1)
distributions with finer sampling of 0.5°:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega = mdt.features.OmegaDihedral(mlib, bins=mdt.uniform_bins(720, -180, 0.5))
# This table uses the subsequent residue type, relative to the omega
m = mdt.Table(mlib, features=(xray, restyp_1, omega))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The distribution in raw form is then plotted with:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega = mdt.features.OmegaDihedral(mlib, bins=mdt.uniform_bins(720, -180, 0.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp_1, omega), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 0.5
SET BAR_XSHIFT = 0.25
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
and in logarithmic form with:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega = mdt.features.OmegaDihedral(mlib, bins=mdt.uniform_bins(720, -180, 0.5))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp_1, omega), offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = -180. 0.5
SET BAR_XSHIFT = 0.25
TRANSFORM TRF_TYPE = LOGARITHMIC4, ;
TRF_PARAMETERS = 1 1, NO_XY_SCOLUMNS = 0 1, XY_SCOLUMNS = 1
"""
m.write_asgl(asglroot='asgl2-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl2-a")
os.system("ps2pdf asgl2-a.ps")
Clearly, the peaks are sharp and will best be modeled by Gaussian distributions.
Similarly to χ1, two Gaussian distributions are
fit to the histograms with the following ASGL script:
SET TICK_FONT = 13
SET BAR_GRAYNESS = 1.00
SET CAPTION_FONT = 12
SET FIT_PARAM_INITIAL = 87000 0.95 179 5 0 5
CALL ROUTINE = 'gauss2', FILE = 'a.dat', POSITION = 1 0, CAPTION_TEXT = 'A'
CALL ROUTINE = 'gauss2', FILE = 'c.dat', POSITION = 2 0, CAPTION_TEXT = 'C'
CALL ROUTINE = 'gauss2', FILE = 'd.dat', POSITION = 3 0, CAPTION_TEXT = 'D'
CALL ROUTINE = 'gauss2', FILE = 'e.dat', POSITION = 4 0, CAPTION_TEXT = 'E'
CALL ROUTINE = 'gauss2', FILE = 'f.dat', POSITION = 5 0, CAPTION_TEXT = 'F'
CALL ROUTINE = 'gauss2', FILE = 'g.dat', POSITION = 6 0, CAPTION_TEXT = 'G'
CALL ROUTINE = 'gauss2', FILE = 'h.dat', POSITION = 7 0, CAPTION_TEXT = 'H'
CALL ROUTINE = 'gauss2', FILE = 'i.dat', POSITION = 8 0, CAPTION_TEXT = 'I'
NEW_PAGE
CALL ROUTINE = 'gauss2', FILE = 'k.dat', POSITION = 1 0, CAPTION_TEXT = 'K'
CALL ROUTINE = 'gauss2', FILE = 'l.dat', POSITION = 2 0, CAPTION_TEXT = 'L'
CALL ROUTINE = 'gauss2', FILE = 'm.dat', POSITION = 3 0, CAPTION_TEXT = 'M'
CALL ROUTINE = 'gauss2', FILE = 'n.dat', POSITION = 4 0, CAPTION_TEXT = 'N'
CALL ROUTINE = 'gauss2', FILE = 'p.dat', POSITION = 5 0, CAPTION_TEXT = 'P'
CALL ROUTINE = 'gauss2', FILE = 'q.dat', POSITION = 6 0, CAPTION_TEXT = 'Q'
CALL ROUTINE = 'gauss2', FILE = 'r.dat', POSITION = 7 0, CAPTION_TEXT = 'R'
CALL ROUTINE = 'gauss2', FILE = 's.dat', POSITION = 8 0, CAPTION_TEXT = 'S'
NEW_PAGE
CALL ROUTINE = 'gauss2', FILE = 't.dat', POSITION = 1 0, CAPTION_TEXT = 'T'
CALL ROUTINE = 'gauss2', FILE = 'v.dat', POSITION = 2 0, CAPTION_TEXT = 'V'
CALL ROUTINE = 'gauss2', FILE = 'w.dat', POSITION = 3 0, CAPTION_TEXT = 'W'
CALL ROUTINE = 'gauss2', FILE = 'y.dat', POSITION = 4 0, CAPTION_TEXT = 'Y'
SUBROUTINE ROUTINE = 'gauss2'
READ_TABLE
SET X_TICK = -180 10 180, X_TICK_LABEL = 1 6
SET Y_TICK = -999 -999 -999, Y_TICK_LABEL = -999 -999
SET XY_COLUMNS = 0 1
# only to get 1, 2, 3, 4, 5, ... in column 2
WORLD
# get the right X-axis from -180 to +180:
TRANSFORM NO_XY_SCOLUMNS = 1 0, XY_SCOLUMNS = 2, ;
TRF_TYPE = 'LINEAR', TRF_PARAMETERS = -180.25 0.5
WORLD WORLD_WINDOW = -190 0 190 -999
AXES2D
RESET_CAPTIONS
CAPTION CAPTION_POSITION 1
CAPTION CAPTION_POSITION 2, CAPTION_TEXT '@w@'
CAPTION CAPTION_POSITION 3, CAPTION_TEXT 'FREQUENCY'
HIST2D
SET ERROR_COLUMN = 0
SET FIT_MODEL = POLYGAUSS360
# SET FIT_PARAM_INITIAL = 1639 0.3 175 10 0.3 -65 10 60 10
SET FIT_PARAM_INDICES = 1 2 3 4 5 6
FIT
SMOOTH_TABLE SMOOTH_TYPE = 'SPLINE'
PLOT2D PLOT2D_LINE_TYPE = 1, PLOT2D_SYMBOL_TYPE = 0
RETURN
END_SUBROUTINE
The means and standard deviations for each residue type are extracted from
fit.log by the ASGL get_prms.F
program, but they are only used to guess the means of 179.8° and 0°
and standard deviations of 1.5° and 1.5° for the two peaks,
respectively. The distribution of ω dihedral angles in the models
calculated with these ω restraints will be checked carefully and the
restraint parameters will be adapted as needed.
The weights of the peaks are not determined reliably by least-squares fitting
in this case because the second weight is very close to 0 (in principle, they
can even be less than zero). Therefore, they are determined separately by
establishing p(cω | R+1) where cω is the class of the
ω dihedral angle (1 or 2, trans or cis).
The MDT table is constructed with the following MDT Python script:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega_class = mdt.features.OmegaClass(mlib)
# Table of the subsequent residue type relative to the omega class
m = mdt.Table(mlib, features=(xray, restyp_1, omega_class))
a = alignment(env)
f = modfile.File('../../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
The contents of the MDT table are then plotted with ASGL as follows:
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp_1 = mdt.features.ResidueType(mlib, delta=1)
omega_class = mdt.features.OmegaClass(mlib)
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp_1, omega_class),
offset=(0,0,0), shape=(1,-2,-1))
text = """
SET X_LABEL_STYLE = 2, X_TICK_LABEL = -999 -999
SET X_TICK = -999 -999 -999
SET TICK_FONT = 5, CAPTION_FONT = 5
SET Y_TICK = -999 -999 -999
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 1 1, XY_SCOLUMNS = 2 1
FILL_COLUMN COLUMN = 2, COLUMN_PARAMETERS = 1 1
SET BAR_XSHIFT = 0.5
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=8, dimensions=1,
plot_position=1, every_x_numbered=20, text=text, x_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving an omega weights plot.
The library omega.py is edited manually to replace the
means and standard deviations with 179.8 0.0 2.3 2.3.
Mainchain dihedral angles Φ and Ψ
The initial runs in run1 explored Ramachandran maps
extracted from different representative sets of structures (e.g., clustered by
40% sequence identity) and stratification by the crystallographic residue
Biso as well as resolution and residue type. We ended up with the sample
and stratification described above.
The 2D histograms p(Φ, Ψ | R) are derived with:
from modeller import *
import mdt
import mdt.features
# See ../bonds/make_mdt.py for additional comments
env = environ()
log.minimal()
env.io.atom_files_directory = ['/salilab/park2/database/pdb/divided/']
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, features=(xray, restyp, psi, phi))
a = alignment(env)
f = modfile.File('../cluster-PDB/pdb_60.pir', 'r')
while a.read_one(f):
m.add_alignment(a)
m.write('mdt.mdt')
They are plotted with
from modeller import *
import os
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
m = m.reshape(features=(xray, restyp, psi, phi),
offset=(0,0,0,0), shape=(1,-2,-1,-1))
text = """
SET TICK_FONT = 5, CAPTION_FONT = 5
SET WORLD_WINDOW = -999 -999 -999 -999
SET NO_XY_SCOLUMNS = 0 0, DPLOT_BOUNDS 0.0 -999
TRANSFORM TRF_TYPE=LOGARITHMIC4, TRF_PARAMETERS=10 1
"""
m.write_asgl(asglroot='asgl1-a', plots_per_page=3, dimensions=2,
plot_position=9, every_x_numbered=12, every_y_numbered=12,
text=text, x_decimal=0, y_decimal=0)
os.system("asgl asgl1-a")
os.system("ps2pdf asgl1-a.ps")
giving a set of Φ/Ψ plots.
The distributions are clearly not 2D Gaussian functions and need to be
approximated by 2D cubic splines. Exploring and visualizing various smoothing
options results in the following file to produce the final MODELLER MDT
library:
from modeller import *
import mdt
import mdt.features
env = environ()
mlib = mdt.Library(env)
xray = mdt.features.XRayResolution(mlib, bins=[(0.51, 2.001, 'High res(2.0A)')])
restyp = mdt.features.ResidueType(mlib)
psi = mdt.features.PsiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
phi = mdt.features.PhiDihedral(mlib, bins=mdt.uniform_bins(72, -180, 5.0))
m = mdt.Table(mlib, file='mdt.mdt')
# Eliminate the bins corresponding to undefined values:
m = m.reshape(features=(xray, restyp, psi, phi), offset=(0,0,0,0),
shape=(1,-2,-1,-1))
# Let's get rid of the resolution variable from the output MDT table:
m = m.integrate(features=(restyp, psi, phi))
# Process the raw histograms to get appropriate pdf 1D splines for restraints:
# Start by smoothing with a uniform prior (equal weight when 10 points per bin),
# producing a normalized distribution that sums to 1 (not a pdf when dx != 1):
m = m.smooth(dimensions=2, weight=10)
# Normalize it to get the true pdf (Integral p(x) dx = 1):
# (the scaling actually does not matter, because I am eventually taking the
# log and subtracting the smallest element of the final pdf, so this command
# could be omitted without impact):
m = m.normalize(to_pdf=True, dimensions=2, dx_dy=(5., 5.), to_zero=True)
# Take the logarithm of the smoothed frequencies
# (this is safe: none of bins is 0 because of mdt.smooth()):
m = m.log_transform(offset=0., multiplier=1.)
# Reverse the sign:
m = m.linear_transform(offset=0., multiplier=-1.)
# Offset the final distribution so that the lowest value is at 0:
m = m.offset_min(dimensions=2)
mdt.write_2dsplinelib(file("phipsi.py", "w"), m, density_cutoff=0.1)
The raw, smooth, and transformed surfaces are visualized and compared best
with Mathematica.