MDT, a module for protein structure analysis.
Copyright 1989-2015 Andrej Sali.
MDT is free software: you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with MDT. If not, see <http://www.gnu.org/licenses/>.
Library data used in the construction and use of MDTs.
Parameters: |
|
---|
Angle classes; see BondClasses
Atom classes; see BondClasses
Bond classes; see BondClasses
Dihedral classes; see BondClasses
Hydrogen bond atom classes; see HydrogenBondClasses
Atom tuple classes; see TupleClasses and Tuple features
Classifications of tuples of atoms into classes. Usually accessed as Library.tuple_classes. These classes are used by tuple or tuple pair features.
Read atom tuple information from filename. This is a text file with a format similar to that accepted by BondClasses.read(). The file can consist either of sets of atom triplets (named with TRPGRP lines and containing triples of atoms named on TRIPLET lines) or sets of atom doublets using DBLGRP and DOUBLET lines. Each atom but the first in each doublet or triplet can also be restricted to match only in certain residue types by naming the residue in parentheses before the rest of the atom name (and CHARMM-style + or - qualifier). For example, a suitable atom triplet file looks like:
TRPGRP 't1'
TRIPLET 'ALA' 'CA' '+C' '-C'
TRPGRP 't2'
TRIPLET 'ALA' 'CA' '(CYS)+C' '-C'
The first triplet is named ‘t1’ and will match any set of three atoms where the first is called CA in an ALA residue, and the other two atoms are C atoms in the previous and next residue. The second triplet is similar but will only include triplets where the next residue is a CYS.
Classifications of atoms/bonds/angles/dihedrals into classes. These classes are used by atom and chemical bond features. Usually accessed as Library.atom_classes, Library.bond_classes, Library.angle_classes, or Library.dihedral_classes. (There is no need to create your own BondClasses objects.)
Read class information from filename. This is a text file with a simple format. Each line either denotes the start of a new named class, or names a member of the last-named class, as a residue name followed by one or more atom names. For example, an atom class file might start with:
ATMGRP 'AC'
ATOM 'ALA' 'CA'
ATOM 'ALA' 'C'
ATOM '*' 'CB'
Thus, the first atom class is called ‘AC’ and any CA or C atom in an ALA residue, or the CB atom in any residue, will be placed in this class.
Bond class files are similar but use BNDGRP and BOND lines, each of which names two atoms:
BNDGRP 'ALA:C:+N'
BOND 'ALA' 'C' '+N'
Note that CHARMM-style + or - prefixes can be added to atom names for all but the first atom on a BOND line, to indicate the atom must be found in the next or previous residue.
Angle class files use ANGGRP and ANGLE lines; each ANGLE line names three atoms. Dihedral class files use DIHGRP and DIHEDRAL lines; each DIHEDRAL line names four atoms.
Classifications of atoms into hydrogen bond classes. Usually accessed as Library.hbond_classes. These classes are used by the features.HydrogenBondAcceptor, features.HydrogenBondDonor and features.HydrogenBondSatisfaction features.
Read hydrogen bond atom class information from a file
A multi-dimensional table.
Parameters: |
|
---|
Individual elements from the table can be accessed in standard Python fashion, e.g.
>>> import mdt
>>> import modeller
>>> env = modeller.environ()
>>> mlib = mdt.Library(env)
>>> restyp1 = mdt.features.ResidueType(mlib, protein=0)
>>> restyp2 = mdt.features.ResidueType(mlib, protein=1)
>>> gap = mdt.features.GapDistance(mlib, mdt.uniform_bins(10, 0, 1))
>>> m = mdt.Table(mlib, features=(restyp1,restyp2,gap))
>>> print m[0,0,0]
You can also access an element as m[0][0][0], a 1D section as m[0][0], or a 2D section as m[0]. See TableSection.
Add data from a Modeller alignment to this MDT. This method will first scan through all proteins, pairs of proteins, or triples of proteins in the alignment (it will scan all triples if the mdt.Library contains features defined on all of proteins 0, 1 and 2, pairs if the features are defined on two different proteins, and individual proteins otherwise). Within each protein, it may then scan through all residues, atoms, etc. if the features request it (see the scan types table).
Parameters: |
|
---|
Add data from a Modeller alignment to this MDT. Same as add_alignment except the errors in data are taken into account. The parameter errorscale controls how the error is used:
- 0: the errors are ignored; this function is the same as
add_alignment.
- >0 : the errors are taken into account by propagating the errors
in each axis of each atom into the calculated distances or angles. The errors in the position of individual atoms are first calculated using B-iso, X-ray resolution, and R-factor, and then divided by this errorscale value.
Clear the table (set all bins to zero)
Attempt to ‘close’ the MDT, so that it is useful for creating splines of periodic features.
If dimensions = 1, it makes the two terminal points equal to their average. If dimensions = 2, it applies the averages to both pairs of edges and then again to all four corner points.
Returns: | the closed MDT. |
---|---|
Return type: | Table |
If bin_type is specified, it is the storage type to convert the bin data to (see Storage for bin data).
Returns: | a copy of this MDT table. |
---|---|
Return type: | Table |
Print full entropy information.
The MDT is integrated to get a 1D histogram, then normalized by the sum of the bin values. Finally, entropy is calculated as Σi -pi ln pi
Returns: | the entropy of the last dependent variable. |
---|---|
Return type: | float |
Apply an exponential transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = offset + exp(expoffset + multiplier * a ^ power).
Return type: | Table |
---|
Get a NumPy array ‘view’ of this Table. The array contains all of the raw data in the MDT table, allowing it to be manipulated with NumPy functions. The data are not copied; modifications made to the data by NumPy affect the data in the Table (and vice versa).
Functions that destroy the data in the Table (Table.make(), Table.read() and Table.read_hdf5()) cannot be called if any NumPy array views exist, since they would invalidate the views. The views must first be deleted.
If MDT was not built with NumPy support, a NotImplementedError exception is raised. If NumPy cannot be loaded, an ImportError is raised.
Returns: | a view of this table. |
---|---|
Return type: | NumPy array |
Integrate the MDT, and reorder the features. This is useful for squeezing large MDT arrays into smaller ones, and also for eliminating unwanted features (such as X-ray resolution) in preparation for Table.write().
Parameters: |
|
---|---|
Returns: | the integrated MDT. |
Return type: |
Apply an inverse transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = offset + multiplier / a. Where a is zero, b is assigned to be undefined.
Returns: | the transformed MDT. |
---|---|
Return type: | Table |
Apply a linear transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = offset + a * multiplier.
Returns: | the transformed MDT. |
---|---|
Return type: | Table |
Apply a log transform to the MDT. Each element in the new MDT, b, is obtained from the original MDT element a, using the following relation: b = ln(offset + multiplier * a). Where this would involve the logarithm of a negative number, b is assigned to be undefined.
Returns: | the transformed MDT. |
---|---|
Return type: | Table |
Clear the table, and set the features. features must be a list of previously created objects from the mdt.features module. If given, shape has the same meaning as in Table.reshape() and causes the table to use only a subset of the feature bins.
ValueError is raised if any views of the table exist (see Table.get_array_view()).
Number of protein pairs
Number of proteins
Normalize or scale the MDT. It does not really matter what the contents of the input MDT are; sensible contents include the raw or normalized frequencies.
Parameters: |
|
---|---|
Returns: | the normalized MDT. |
Return type: |
Offset the MDT by the minimum value, either in each 1D section (dimensions = 1) or in each 2D section (dimensions = 2).
Returns: | the transformed MDT. |
---|---|
Return type: | Table |
Open a Modeller alignment to allow MDT indices to be queried (see Source). Arguments are as for Table.add_alignment().
Return type: | Source |
---|
Whether this MDT is a PDF
Read an MDT from file. ValueError is raised if any views of the table exist (see Table.get_array_view()).
Read an MDT in HDF5 format from file. ValueError is raised if any views of the table exist (see Table.get_array_view()).
Reorder the MDT features and optionally decrease their ranges. When an MDT is created, each feature has exactly the bins defined in the Library‘s bin file. However, for each feature, you can change the offset (initial number of bins from the bin file to omit) from the default 0, and the shape (total number of bins).
All parameters should be lists with the same number of elements as the MDT has features.
Parameters: |
|
---|---|
Returns: | the reshaped MDT. |
Return type: |
Number of sample points
Smooth the MDT with a uniform prior. The MDT is treated either as a histogram (if dimensions = 1) or a 2D density (dimensions = 2) of dependent features (the last 1 or 2 features in the table) and a uniform distribution is added followed by scaling:
pi = w1 / n + w2 vi / S
S = Σin vi
w1 = 1 / ( 1 + S / (weight * n))
w2 = 1 - w1
where v is the input MDT array, n is the number of bins in the histogram, and p is the output MDT array, smoothed and normalized. weight is the number of points per bin in the histogram at which the relative weights of the input histogram and the uniform prior are equal.
The sum of the bins in the output MDT array is 1, for each histogram.
Note that the resulting output MDT array is not necessarily a PDF, because the bin widths are not taken into account during scaling. That is, the sum of all bin values multiplied by the bin widths is not 1 if the bin widths are not 1.
Returns: | the smoothed MDT. |
---|---|
Return type: | Table |
Multi-level smoothing. This super-smoothes the raw frequencies in the MDT using the hierarchical smoothing procedure for 1D histograms described in Sali and Blundell, JMB 1993. It was also employed in Sali and Overington, Prot Sci. 1994.
Briefly, the idea is to recursively construct the best possible prior distribution for smoothing 1D data p(x | a, b, c, ...). The best prior is a weighted sum (weights optionally based on entropy) of the best possible estimate of p(x | a, b, ...) integrated over c for each c. Each one of these can itself be obtained from a prior and the data, and so on recursively.
The example above is for a single dependent feature (x), which is the case when dimensions = 1. x should be the last feature in the table. dimensions can be set to other values if you have more dependent features - for example, dimensions = 2 will work with p(x, y | a, b, c, ...) where x and y are the last two features in the table.
Parameters: |
|
---|---|
Returns: | the smoothed MDT. |
Return type: |
True if a symmetric scan can be performed
Write the table to file. If write_preamble is False, it will only write out the contents of the MDT table, without the preamble including the feature list, bins, etc. This is useful for example for creating a file to be read by another program, such as Mathematica.
Make input files for ASGL.
Parameters: |
|
---|
Write an MDT in HDF5 format to file. Certain library information (such as the mapping from feature values to bin indices, and atom or tuple class information) and information about the last scan is also written to the file. (This information will be missing or incomplete if add_alignment() hasn’t first been called.) Note that this information is not read back in by read_hdf5(); it is intended primarily for other programs that want to reproduce the environment in which the MDT was generated as closely as possible.
Parameters: |
|
---|
A section of a multi-dimensional table. You should not create TableSection objects directly, but rather by indexing a Table object, as a TableSection is just a ‘view’ into an existing table. For example,
>>> m = mdt.Table(mlib, features=(residue_type, xray_resolution))
>>> print m[0].entropy()
would create a section (using m[0]) which is a 1D table over the 2nd feature (X-ray resolution) for the first bin (0) of the first feature (residue type), and then get the entropy using the TableSection.entropy() method.
Entropy of all points in the table
Mean and standard deviation of the table
Array offsets; see Feature.offset
Array shape; the number of bins for each feature
Sum of all points in the table
A single feature in an MDT. Generally accessed as TableSection.features.
Integer type
Offset of first bin compared to the MDT library feature (usually 0, but can be changed with Table.reshape())
Whether feature is periodic
A single bin in a feature. Generally accessed as Feature.bins.
Bin range; usually the minimum and maximum floating-point values for the feature to be placed in this bin.
Bin symbol
A source of data for an MDT (generally a Modeller alignment, opened with Table.open_alignment()).
Return the bin index (starting at 1) of a single MDT feature. (Arguments ending in 2 and 3 are used for features involving pairs or triples of proteins.)
Warning
This is a low-level interface, and no bounds checking is performed on these parameters. Avoid this function if possible.
Parameters: |
|
---|
Scan all data points in the source, and return the sum. See Table.add_alignment() for a description of the residue_span_range, chain_span_range and exclude_* arguments.
The full MDT version number, as a string, e.g. ‘5.0’ or ‘SVN’.
For release builds, the major and minor version numbers as a tuple of integers - e.g. (5, 0). For SVN builds, this is the same as ‘version’.
Make a list of num equally-sized bins, each of which has the given width, and starting at start. This is suitable for input to any of the classes in mdt.features which need a list of bins.
Write out a Modeller bond library file from an MDT. The input MDT should be a 2D table (usually of bond type and bond distance). For each bond type, the 1D MDT section (see TableSection) of bond distance is examined, and its mean and standard deviation used to generate a Modeller harmonic restraint.
Parameters: |
|
---|
Write out a Modeller angle library file from an MDT. See write_bondlib() for more details. The MDT should be a 2D table, usually of angle type and bond angle.
Write out a Modeller dihedral angle library file from an MDT. See write_bondlib() for more details. The MDT should be a 2D table, usually of dihedral type and bond dihedral angle.
Write out a Modeller 1D spline library file from an MDT. The MDT should be a 2D table, usually of residue type and a chi dihedral angle. dihtype should identify the dihedral type (i.e. chi1/chi2/chi3/chi4). The operation is similar to write_bondlib(), but each MDT section is treated as the spline values. No special processing is done, so it is expected that the user has first done any necessary transformations (e.g. normalization with Table.normalize() to convert raw counts into a PDF, negative log transform with Table.log_transform() and Table.linear_transform() to convert a PDF into a statistical potential).
Write out a Modeller 2D spline library file from an MDT. See write_splinelib() for more details. The input MDT should be a 3D table, e.g. of residue type, phi angle, and psi angle.
Write out a Modeller statistical potential file (as accepted by group_restraints.append()). The MDT is assumed to be a 3D table of distance against the types of the two atoms. No special processing is done, so it is expected that the user has first done any necessary transformations (e.g. normalization with Table.normalize() to convert raw counts into a PDF, negative log transform with Table.log_transform() and Table.linear_transform() to convert a PDF into a statistical potential).