The `mdt` Python module¶

MDT, a module for protein structure analysis.

MDT is free software: you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with MDT. If not, see <http://www.gnu.org/licenses/>.

Setup of the MDT system¶

class mdt.Library(env, distance_atoms=('CA', 'CA'), special_atoms=False, hbond_cutoff=3.5)¶

Library data used in the construction and use of MDTs.

Parameters :	env: the Modeller environment to use distance_atoms: the atom types to use for the `features.ResidueDistance` feature special_atoms: whether to treat disulfide and termini atoms specially for atom class features (see `features.AtomType`) hbond_cutoff: maximum separation between two H-bonded atoms (see `features.HydrogenBondDonor`)

angle_classes¶: Angle classes; see BondClasses

atom_classes¶: Atom classes; see BondClasses

bond_classes¶: Bond classes; see BondClasses

dihedral_classes¶: Dihedral classes; see BondClasses

hbond_classes¶: Hydrogen bond atom classes; see HydrogenBondClasses

tuple_classes¶: Atom tuple classes; see TupleClasses and Tuple features

class mdt.TupleClasses(mlib)¶

Classifications of tuples of atoms into classes. Usually accessed as Library.tuple_classes. These classes are used by tuple or tuple pair features.

read(filename)¶: Read atom tuple information from a file

class mdt.BondClasses(mlib, n_atom)¶

Classifications of atoms/bonds/angles/dihedrals into classes. These classes are used by atom and chemical bond features. Usually accessed as Library.atom_classes, Library.bond_classes, Library.angle_classes, or Library.dihedral_classes. (There is no need to create your own BondClasses objects.)

read(filename)¶: Read class information from filename

class mdt.HydrogenBondClasses(mlib)¶

Classifications of atoms into hydrogen bond classes. Usually accessed as Library.hbond_classes. These classes are used by the features.HydrogenBondAcceptor, features.HydrogenBondDonor and features.HydrogenBondSatisfaction features.

read(filename)¶: Read hydrogen bond atom class information from a file

Creation and manipulation of data tables¶

class mdt.Table(mlib, file=None, features=None, bin_type=<mdt._BinType object at 0x2b4f890>)¶

A multi-dimensional table.

Parameters :	mlib: the MDT Library object to use file: if specified, the filename to read the initial table from (if the name ends with ‘.hdf5’, `Table.read_hdf5()` is used, otherwise `Table.read()`) features: if specified (and file is not), a list of feature types to initialize the table with (using `Table.make()`) bin_type: type of storage for bin data (see Storage for bin data).

Individual elements from the table can be accessed in standard Python fashion, e.g.

>>> import mdt
>>> import modeller
>>> env = modeller.environ()
>>> mlib = mdt.Library(env)
>>> restyp1 = mdt.features.ResidueType(mlib, protein=0)
>>> restyp2 = mdt.features.ResidueType(mlib, protein=1)
>>> gap = mdt.features.GapDistance(mlib, mdt.uniform_bins(10, 0, 1))
>>> m = mdt.Table(mlib, features=(restyp1,restyp2,gap))
>>> print m[0,0,0]

You can also access an element as m[0][0][0], a 1D section as m[0][0], or a 2D section as m[0]. See TableSection.

add_alignment(aln, distngh=6.0, surftyp=1, accessibility_type=8, residue_span_range=(-99999, -2, 2, 99999), chain_span_range=(-99999, 0, 0, 99999), exclude_bonds=False, exclude_angles=False, exclude_dihedrals=False, sympairs=False, symtriples=False, io=None, edat=None)¶

Add data from a Modeller alignment to this MDT.

Parameters :

aln: Modeller alignment.
distngh: distance below which residues are considered neighbors. Used by features.NeighborhoodDifference.
surftyp: 1 for PSA contact area, 2 for surface area. Used by features.AtomAccessibility.
accessibility_type: PSA accessibility type (1-10). Used by features.AtomAccessibility.
residue_span_range: sequence separation (inclusive) for residue pair, atom pair and tuple pair features. For the two residue indices r1 and r2 in the tuple-tuple and atom- atom cases, or two alignment position indices in the residue-residue case, the following must be true:

residue_span_range[0] <= (r2 - r1) <= residue_span_range[1]

residue_span_range[2] <= (r2 - r1) <= residue_span_range[3]

For symmetric residue-residue features, only one condition must be met:

residue_span_range[2] <= abs(r2 - r1) <= residue_span_range[3]

For example, the default value of (-99999, -2, 2, 99999) excludes all pairs within the same residue (for which the sequence separation is 0) or within adjacent residues (for which the separation is 1 or -1).
chain_span_range: works like residue_span_range, but for the chain indices. It is used only by the atom pair and tuple pair features. The default value of (-99999, 0, 0, 99999) allows all interactions. For example, using (-99999, -1, 1, 99999) instead would exclude all interactions within the same chain.
exclude_bonds: if True, then all pairs of atoms involved in a chemical bond (see Library.bond_classes) are excluded from atom pair and tuple pair features.
exclude_angles: if True, then the 1-3 pair of atoms from each angle are excluded (see exclude_bonds).
exclude_dihedrals: if True, then the 1-4 pair of atoms from each dihedral are excluded (see exclude_bonds).
sympairs: if True, all features involving pairs of proteins are symmetric.
symtriples: if True, all features involving triples of proteins are symmetric.

add_alignment_witherr(aln, distngh=6.0, surftyp=1, accessibility_type=8, residue_span_range=(-99999, -2, 2, 99999), chain_span_range=(-99999, 0, 0, 99999), exclude_bonds=False, exclude_angles=False, exclude_dihedrals=False, sympairs=False, symtriples=False, io=None, edat=None, errorscale=1)¶

Add data from a Modeller alignment to this MDT. Same as add_alignment except the errors in data are taken into account. The parameter errorscale controls how the error is used:

0: the errors are ignored; this function is the same as

add_alignment.

>0 : the errors are taken into account by propagating the errors

in each axis of each atom into the calculated distances or angles. The errors in the position of individual atoms are first calculated using B-iso, X-ray resolution, and R-factor, and then divided by this errorscale value.

close(dimensions)¶

Attempt to ‘close’ the MDT, so that it is useful for creating splines of periodic features.

If dimensions = 1, it makes the two terminal points equal to their average. If dimensions = 2, it applies the averages to both pairs of edges and then again to all four corner points.

Returns:	the closed MDT.
Return type:	`Table`

copy(bin_type=None)¶

If bin_type is specified, it is the storage type to convert the bin data to (see Storage for bin data).

Returns:	a copy of this MDT table.
Return type:	`Table`

entropy_full()¶: Print full entropy information.

entropy_hx()¶

The MDT is integrated to get a 1D histogram, then normalized by the sum of the bin values. Finally, entropy is calculated as Σ_i -p_i ln p_i

Returns:	the entropy of the last dependent variable.
Return type:	float

exp_transform(offset, expoffset, multiplier, power)¶

Apply an exponential transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = offset + exp(expoffset + multiplier * a ^ power).

Return type:	`Table`

integrate(features)¶

Integrate the MDT, and reorder the features. This is useful for squeezing large MDT arrays into smaller ones, and also for eliminating unwanted features (such as X-ray resolution) in preparation for Table.write().

Parameters :	features: the new features (all must be present in the original MDT).
Returns:	the integrated MDT.
Return type:	`Table`

inverse_transform(offset, multiplier, undefined=0.0)¶

Apply an inverse transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = offset + multiplier / a. Where a is zero, b is assigned to be undefined.

Returns:	the transformed MDT.
Return type:	`Table`

linear_transform(offset, multiplier)¶

Apply a linear transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = offset + a * multiplier.

Returns:	the transformed MDT.
Return type:	`Table`

log_transform(offset, multiplier, undefined=0.0)¶

Apply a log transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = ln(offset + multiplier * a). Where this would involve the logarithm of a negative number, b is assigned to be undefined.

Returns:	the transformed MDT.
Return type:	`Table`

make(features)¶: Clear the table, and set the features. features must be a list of previously created objects from the mdt.features module.

n_protein_pairs¶: Number of protein pairs

n_proteins¶: Number of proteins

normalize(dimensions, dx_dy, to_zero, to_pdf)¶

Normalize or scale the MDT. It does not really matter what the contents of the input MDT are; sensible contents include the raw or normalized frequencies.

Parameters :	dimensions: specifies whether a 1D or a 2D table is normalized. More precisely, the input distributions are p(x \| a, b, c, ...) if dimensions = 1, or p(x, y \| a, b, c, ...) if dimensions = 2, where y and x are the second to last and last features in the list of features. dx_dy: widths of the bins (either one or two numbers, depending on dimensions). If the value of either dx or dy is -999, the corresponding bin width is extracted from the MDT data structure (not available for all features). to_zero: if the histogram is empty, setting this True will set the bin values to zero, and False will yield a uniform distribution. It has no effect when the histogram is not empty. to_pdf: if False, the output is obtained by scaling the input such that for 1D histograms Σ _i p(x _i) = 1, and for 2D histograms Σ _i,j p(x _i,j) = 1. Note that dx_dy is not taken into account during this scaling. If it is True, the normalization takes into account dx_dy so that the normalized distribution is actually a PDF. That is, Σ _i p(x _i) dx = 1 for 1D and Σ _i,j p(x _i,j) dx dy = 1 for 2D, where dx and dy are the widths of the bins.
Returns:	the normalized MDT.
Return type:	`Table`

offset_min(dimensions)¶

Offset the MDT by the minimum value, either in each 1D section (dimensions = 1) or in each 2D section (dimensions = 2).

Returns:	the transformed MDT.
Return type:	`Table`

open_alignment(aln, distngh=6.0, surftyp=1, accessibility_type=8, sympairs=False, symtriples=False, io=None, edat=None)¶

Open a Modeller alignment to allow MDT indices to be queried (see Source). Arguments are as for Table.add_alignment().

Return type:	`Source`

pdf¶: Whether this MDT is a PDF

read(file)¶: Read an MDT from file.

read_hdf5(file)¶: Read an MDT in HDF5 format from file.

reshape(features, offset, shape)¶

Reorder the MDT features and optionally decrease their ranges. When an MDT is created, each feature has exactly the bins defined in the Library‘s bin file. However, for each feature, you can change the offset (initial number of bins from the bin file to omit) from the default 0, and the shape (total number of bins).

All parameters should be lists with the same number of elements as the MDT has features.

Parameters :	features: the new ordering of the MDT features. offset: the new offset (see offset). shape: the new shape (see shape). If any element in this list is 0 or negative, it is added to the MDT’s existing shape to get the new value. Thus, a value of 0 would leave the shape unchanged, -1 would remove the last (undefined) bin, etc.
Returns:	the reshaped MDT.
Return type:	`Table`

sample_size¶: Number of sample points

smooth(dimensions, weight)¶

Smooth the MDT with a uniform prior. The MDT is treated either as a histogram (if dimensions = 1) or a 2D density (dimensions = 2) of dependent features (the last 1 or 2 features in the table) and a uniform distribution is added followed by scaling:

p_i = w₁ / n + w₂ v_i / S

S = Σ_iⁿ v_i

w₁ = 1 / ( 1 + S / (weight * n))

w₂ = 1 - w₁

where v is the input MDT array, n is the number of bins in the histogram, and p is the output MDT array, smoothed and normalized. weight is the number of points per bin in the histogram at which the relative weights of the input histogram and the uniform prior are equal.

The sum of the bins in the output MDT array is 1, for each histogram.

Note that the resulting output MDT array is not necessarily a PDF, because the bin widths are not taken into account during scaling. That is, the sum of all bin values multiplied by the bin widths is not 1 if the bin widths are not 1.

Returns:	the smoothed MDT.
Return type:	`Table`

super_smooth(dimensions, prior_weight, entropy_weighing)¶

Multi-level smoothing. This super-smoothes the raw frequencies in the MDT using the hierarchical smoothing procedure for 1D histograms described in Sali and Blundell, JMB 1993. It was also employed in Sali and Overington, Prot Sci. 1994.

Briefly, the idea is to recursively construct the best possible prior distribution for smoothing 1D data p(x | a, b, c, ...). The best prior is a weighted sum (weights optionally based on entropy) of the best possible estimate of p(x | a, b, ...) integrated over c for each c. Each one of these can itself be obtained from a prior and the data, and so on recursively.

The example above is for a single dependent feature (x), which is the case when dimensions = 1. x should be the last feature in the table. dimensions can be set to other values if you have more dependent features - for example, dimensions = 2 will work with p(x, y | a, b, c, ...) where x and y are the last two features in the table.

Parameters :	dimensions: Number of dependent features. prior_weight: Weight for the prior distribution. entropy_weighing: Whether to weight distributions by their entropies.
Returns:	the smoothed MDT.
Return type:	`Table`

symmetric¶: True if a symmetric scan can be performed

write(file, write_preamble=True)¶: Write the table to file. If write_preamble is False, it will only write out the contents of the MDT table, without the preamble including the feature list, bins, etc. This is useful for example for creating a file to be read by another program, such as Mathematica.

write_asgl(asglroot, text, dimensions, plot_position, plots_per_page, plot_density_cutoff=-1.0, plot_type='HIST2D', every_x_numbered=1, every_y_numbered=1, x_decimal=1, y_decimal=1)¶

Make input files for ASGL.

Parameters :

asglroot: filename prefix for ASGL TOP script and data files.
text: ASGL command lines that are written for each plot.
dimensions: whether to make 1D or 2D plots.
plot_position: position of the plot on the page, in ASGL convention.
plots_per_page: number of plots per page.
plot_density_cutoff: the minimal sum of the bin values that each plot has to have before it is actually written out; otherwise it is ignored. This helps to avoid wasting paper on empty plots when the MDT array data are sparse.
plot_type: select ‘HIST2D’ or ‘PLOT2D’ when dimensions = 2.
every_x_numbered: spacing for labels on the X axis.
every_y_numbered: spacing for labels on the Y axis.
x_decimal: the number of decimal places used to write X feature values.
y_decimal: the number of decimal places used to write Y feature values.

write_hdf5(file)¶: Write an MDT in HDF5 format to file.

class mdt.TableSection(mdt, indices)¶

A section of a multi-dimensional table. You should not create TableSection objects directly, but rather by indexing a Table object, as a TableSection is just a ‘view’ into an existing table. For example,

>>> m = mdt.Table(mlib, features=(residue_type, xray_resolution))
>>> print m[0].entropy()

would create a section (using m[0]) which is a 1D table over the 2nd feature (X-ray resolution) for the first bin (0) of the first feature (residue type), and then get the entropy using the TableSection.entropy() method.

entropy()¶: Entropy of all points in the table

features¶: Features in this MDT; a list of Feature objects

mean_stdev()¶: Mean and standard deviation of the table

offset¶: Array offsets; see Feature.offset

shape¶: Array shape; the number of bins for each feature

sum()¶: Sum of all points in the table

class mdt.Feature(mdt, indx)¶

A single feature in an MDT. Generally accessed as TableSection.features.

bins¶: Feature bins; a list of Bin objects

ifeat¶: Integer type

offset¶: Offset of first bin compared to the MDT library feature (usually 0, but can be changed with Table.reshape())

periodic¶: Whether feature is periodic

class mdt.Bin(feature, indx)¶

A single bin in a feature. Generally accessed as Feature.bins.

range¶: Bin range; usually the minimum and maximum floating-point values for the feature to be placed in this bin.

symbol¶: Bin symbol

class mdt.Source(mdt, mlib, aln, distngh, surftyp, accessibility_type, sympairs, symtriples, io, edat)¶

A source of data for an MDT (generally a Modeller alignment, opened with Table.open_alignment()).

index(feat, is1, ip1, is2, ir1, ir2, ir1p, ir2p, ia1, ia1p, ip2, ibnd1, ibnd1p, is3, ir3, ir3p)¶

Return the bin index (starting at 1) of a single MDT feature. (Arguments ending in 2 and 3 are used for features involving pairs or triples of proteins.)

Warning

This is a low-level interface, and no bounds checking is performed on these parameters. Avoid this function if possible.

Parameters :

feat: MDT feature object from mdt.features module.
is1: index of the sequence within the alignment.
ip1: position within the sequence (i.e. including gaps).
ir1: residue index (i.e. not including alignment gaps).
ir1p: second residue index for residue-residue features.
ia1: atom index.
ia1p: second atom index for atom-atom features.
ibnd1: bond or tuple index.
ibnd1p: second bond/tuple index for bond-bond or tuple-tuple features.

sum(residue_span_range=(-99999, -2, 2, 99999), chain_span_range=(-99999, 0, 0, 99999), exclude_bonds=False, exclude_angles=False, exclude_dihedrals=False)¶: Scan all data points in the source, and return the sum. See Table.add_alignment() for a description of the residue_span_range, chain_span_range and exclude_* arguments.

Library information¶

mdt.version¶: The full MDT version number, as a string, e.g. ‘5.0’ or ‘SVN’.

mdt.version_info¶: For release builds, the major and minor version numbers as a tuple of integers - e.g. (5, 0). For SVN builds, this is the same as ‘version’.

Utility functions¶

mdt.uniform_bins(num, start, width)¶: Make a list of num equally-sized bins, each of which has the given width, and starting at start. This is suitable for input to any of the classes in mdt.features which need a list of bins.

mdt.write_bondlib(fh, mdt, density_cutoff=None, entropy_cutoff=None)¶

Write out a Modeller bond library file from an MDT. The input MDT should be a 2D table (usually of bond type and bond distance). For each bond type, the 1D MDT section (see TableSection) of bond distance is examined, and its mean and standard deviation used to generate a Modeller harmonic restraint.

Parameters :	fh: Python file to write to mdt: input MDT `Table` object density_cutoff: if specified, MDT bond distance sections with sums below this value are not used entropy_cutoff: if specified, MDT bond distance sections with entropies above this value are not used

mdt.write_anglelib(fh, mdt, density_cutoff=None, entropy_cutoff=None)¶: Write out a Modeller angle library file from an MDT. See write_bondlib() for more details. The MDT should be a 2D table, usually of angle type and bond angle.

mdt.write_improperlib(fh, mdt, density_cutoff=None, entropy_cutoff=None)¶: Write out a Modeller dihedral angle library file from an MDT. See write_bondlib() for more details. The MDT should be a 2D table, usually of dihedral type and bond dihedral angle.

mdt.write_splinelib(fh, mdt, dihtype, density_cutoff=None, entropy_cutoff=None)¶: Write out a Modeller 1D spline library file from an MDT. The MDT should be a 2D table, usually of residue type and a chi dihedral angle. dihtype should identify the dihedral type (i.e. chi1/chi2/chi3/chi4). The operation is similar to write_bondlib(), but each MDT section is treated as the spline values. No special processing is done, so it is expected that the user has first done any necessary transformations (e.g. normalization with Table.normalize() to convert raw counts into a PDF, negative log transform with Table.log_transform() and Table.linear_transform() to convert a PDF into a statistical potential).

mdt.write_2dsplinelib(fh, mdt, density_cutoff=None, entropy_cutoff=None)¶: Write out a Modeller 2D spline library file from an MDT. See write_splinelib() for more details. The input MDT should be a 3D table, e.g. of residue type, phi angle, and psi angle.

Bin storage types¶

mdt.Float¶
mdt.Double¶
mdt.Int32¶
mdt.UnsignedInt32¶
mdt.Int16¶
mdt.UnsignedInt16¶
mdt.Int8¶
mdt.UnsignedInt8¶: See Storage for bin data.

Exceptions¶

exception mdt.MDTError¶: A generic MDT error.

exception mdt.FileFormatError¶: A file is of the wrong format.

The `mdt` Python module¶

Setup of the MDT system¶

Creation and manipulation of data tables¶

Library information¶

Utility functions¶

Bin storage types¶

Exceptions¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

The mdt Python module¶

Setup of the MDT system¶

Creation and manipulation of data tables¶

Library information¶

Utility functions¶

Bin storage types¶

Exceptions¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation

The `mdt` Python module¶