IMP  2.3.1
The Integrative Modeling Platform
introduction.md
1 Introduction {#introduction}
2 ============
3 
4 [TOC]
5 
6 # Integrative modeling # {#introduction_intmod}
7 
8 Detailed structural characterization of macromolecular assemblies is usually more difficult than that of single proteins because assemblies often don’t crystallize or are too large for NMR spectroscopy. This challenge can be addressed by an “integrative” or “hybrid” approach that simultaneously considers all available information about a given assembly. The integrative approach has several advantages. First, synergy among the input data minimizes the drawbacks of sparse, noisy, ambiguous and incoherent datasets. Each individual piece of data contains little structural information, but by simultaneously fitting a model to all data derived from independent experiments, the degeneracy of the structures that fit the data can be markedly reduced. Second, this approach has the potential to produce all structures that are consistent with the data, not only one structure. Third, an analysis of the structures allows us to estimate the precision of both the data and the structures. Last, this approach can make the process of structure determination more efficient, by indicating which measurements would be most informative.
9 
10 ## Example modeling efforts ## {#introduction_efforts}
11 Hybrid structures based on our integrative approach:
12  - The E. coli ribosome, the first eukaryotic ribosome from S. cerevisiae
13  - The first mammalian ribosome from C. lupus48 and a fungal ribosome
14  - The E. coli Hsp90
15  - The eukaryotic chaperonin TRiC/CCT
16  - The actin/scruin complex
17  - Ryr1 voltage gated channel
18  - The baker’s yeast [nuclear pore complex](http://salilab.org/npc) (NPC)
19  - The [Nup84 complex](http://salilab.org/nup84/)
20  - Transport through the NPC
21  - Microtubule nucleation
22  - The 26S proteasome
23  - [PCS9K-Fab complex](../tutorial/idock_pcsk9.html)
24  - The yeast spindle pole body
25  - Chromatin globin domain
26  - The lymphoblastoid cell genome
27 
28 ## The four stage process ## {#introduction_four_stages}
29 
30 The integrative structure determination is an iterative process consisting of four stages:
31 1. gathering of data;
32 2. design of the model representation and encoding of the data as a scoring function. The scoring function consists of terms, called restraints, one for each data-point;
33 3. the sampling method that finds good scoring conformations of the model;
34 4. and analysis of data and resulting model conformations, including of uncertainty in the solutions.
35 
36 # IMP # {#introduction_imp}
37 
38 IMP provides tools to implement the computational parts of the integrative modeling iterative process, steps 2-4. This computation can be driven from Python scripts or C++ programs. The examples below will use Python scripts.
39 
40 ## Representation: IMP::Model ## {#introduction_representation}
41 
42 In IMP, the model is represented as a collection of data, called particles, each of which has associated attributes (e.g. an atom with associated coordinates, mass, radius etc). In IMP, the attributes can be numbers, strings, or lists of other particles, among other things. Each particle is identified by an index (IMP::kernel::ParticleIndex) and has an associated name, in order to make it easier to understand. Finally, attributes are identified by keys (e.g. IMP::kernel::StringKey for string attributes). The key identifies one type of data that may be contained in many particles.
43 
44 At the most basic, to create particles and manipulate attributes you can do
45 
46  import IMP.kernel
47  model= IMP.kernel.Model()
48  particle_0= model.add_particle("my first particle")
49  string_key = IMP.kernel.StringKey("my first data")
50  model.add_attribute(string_key, particle_0, "Hi, particle 0")
51 
52  particle_1= model.add_particle("my second particle")
53  model.add_attribute(string_key, particle_1, "Hi, particle 1")
54 
55  print model.get_attribute(string_key, particle_0)
56 
57 Certain of the attributes can be marked as parameters of the model. These are attributes that you want to sample or optimize. To do so you can do
58  model.set_is_optimized(float_key, particle_0)
59 
60 \note A lot of IMP uses IMP::Particle objects instead of IMP::kernel::ParticleIndex objects to identify particles. They should be treated as roughly the same. To map from an index to a particle you use IMP::kernel::Model::get_particle() and to go the other way IMP::kernel::Particle::get_index(). Using the indexes is preferred. When doing it on lists, you can use IMP::kernel::get_indexes() and IMP::kernel::get_particles().
61 
62 ## Decorators ## {#introduction_decorators}
63 
64 Accessing all your data at such a low level can get tiresome, so we provide decorators to make it easier. Each type of decorator provides an interface to manipulate a particular type of data. For example, an IMP.atom.Residue decorator provides access to residue associated information (e.g. the index of the residue, or its type) in particles that have it.
65 
66  residue= IMP.atom.Residue(model, my_residue)
67  print residue.get_residue_type()
68 
69 Decorators provide a standard interface to add their data to a particle, decorate a particle that already has the needed data or check if a particle is appropriate to have the decorator used with it.
70 
71  # add coordinates to a particle
72  decorated_particle = IMP.core.XYZ.setup_particle(model, my_particle, IMP.algebra.Vector3D(1,2,3))
73  print decorated_particle.get_coordinates()
74 
75  # my_other_particle has already been set up, so we can just decorate it directly
76  another_decorated_particle = IMP.core.XYZ(model, my_other_particle)
77  print another_decorated_particle.get_coordinates()
78 
79  # we can change the coordinates too
80  another_decorated_particle.set_coordinates(IMP.algebra.Vector3D(5,4,3))
81 
82 Decorators can also be used to create relationships between particles. For example, rigid bodies are implemented using the IMP::core::RigidBody decorator. Each rigid body has a collection of other particles that move along with it, the IMP::core::RigidMember particles.
83 
84 ## Representing biological molecules ## {#introduction_biomolecules}
85 
86 Biological modules are represented hierarchically in IMP using the IMP::atom::Hierarchy. These hierarchies follow the natural hierarchical nature of most biomolecules. A protein from a PDB would be a hierarchy with a root for the whole PDB file with a child per chain. Each chain particle has a child for each residue in the chain, and each residue has a child for each atom. Each particle has various types of associated data. For example an atom has data using the IMP::atom::Atom, IMP::core::XYZR, IMP::atom::Mass and IMP::atom::Hierarchy decorators.
87 
88 The structures represented do not have to be atomic and can be multi-resolution - that is, they can have coordinates at any level of the hierarchy. The invariants are that the leaves must have coordinates, radii and mass. Pieces of the hierarchy can be picked out using the IMP::atom::Selection using the standard sorts of biological criteria:
89 
90  # Select residues 10 through 49.
91  my_residues= IMP.atom.Selection(my_pdb, residue_indexes=range(10,50)).get_particles()
92 
93 
94 ## Containers ## {#introduction_containers}
95 
96 You can manipulate and maintain collections of particles using IMP::Container. A collection can be anything from a list of particles gathered manually, to all pairs of particles from some list that are closer than a certain distance to one another. For example, to maintain a list of all close pairs of particles you can do
97 
98  # all particle pairs closer than 3A
99  # it is always good to give things names; that is what the last argument does
100  close_pairs= IMP.container.ClosePairContainer(all_my_particles, 3, "My close pairs")
101 
102 These containers can then be used to create scoring functions or analyze the data.
103 
104 ## Constraints and Invariants ## {#introduction_constraints}
105 
106 Many things such as rigid bodies and lists of all close pairs depend on maintaining some property as the model changes. These properties are maintained by IMP::kernel::Constraint objects. Since the invariants may depend on things that are reasonably expensive to compute, these invariants are updated only when requested. This means that if you change the coordinates of some particles, the contents of the close pairs list might be incorrect until it is updated. The required update can be triggered implicitly, for example when some scoring function needs it, or explicitly, when IMP::kernel::Model::update() is called.
107 
108 Behind the scenes, IMP maintains an IMP::kernel::DependencyGraph that tracks how information flows between the particles and the containers, based on the constraints. It is used to detect, for example, that a particular particle is part of a rigid body, and so if its coordinates are needed for scoring, the rigid body must be brought up to date and the appropriate constraint must be asked to update the member particle's coordinates. In order to be able to track this information, relevant objects (IMP::kernel::ModelObject) have methods IMP::kernel::ModelObject::get_inputs() and IMP::kernel::ModelObject::get_outputs() that return the things that are read and written respectively.
109 
110 ## Scoring ## {#introduction_scoring}
111 
112 One then needs to be able to evaluate how well the current configuration of the model fits this data that one is using to model. In addition to scores, when requested derivatives of the total score as a function of each parameter can be computed.
113 
114 ### Restraints ### {#introduction_restraints}
115 
116 An IMP::kernel::Restraint computes a score on some set of particles. For example, a restraint be used to penalize configurations of the model that have collisions
117 
118  # penalize collisions with a spring constant of 10 kcal/mol A
119  soft_sphere_pair_score= IMP.core.SoftSpherePairScore(10)
120  my_excluded_volume_restraint= IMP.container.PairsRestraint(soft_sphere_pair_score,
121  close_pairs,
122  "excluded volume")
123 
124 To get the score of an individual restraint, you can use its IMP::kernel::Restraint::get_score() method.
125 
126 ### Scoring functions ### {#introduction_scoring_functions}
127 
128 Scoring in IMP is done by creating an IMP::kernel::ScoringFunction. A scoring function consists of the sum of terms, each called a Restraint. You can create many different scoring functions for different purposes and each restraint can be part of multiple scoring functions.
129 
130  my_scoring_function= IMP.core.RestraintsScoringFunction([my_excluded_volume_restraint],
131  "score excluded volume")
132 
133 \note You will see old example code that, instead of creating an IMP::kernel::ScoringFunction, adds the restraints to the model. This creates an implicit scoring function consisting of all the restraints so added. But it should not be done in new code.
134 
135 ## Sampling ## {#introduction_sampling}
136 
137 It is now time to find configurations of the model that score well with regards to the scoring function you developed. IMP provides a number of tools for that.
138 
139 ### Optimizers ### {#introduction_optimizers}
140 
141 An IMP::kernel::Optimizer takes the current configuration of the model and perturbs it, typically trying to make it better (but perhaps just into a different configuration following some rule, such as molecular dynamics). They use a scoring function you provide to guide the process.
142 
143  my_optimizer= IMP.core.ConjugateGradients(m)
144  my_optimizer.set_scoring_function(my_scoring_function)
145  my_optimizer.optimize(1000)
146 
147 \note In old code, the scoring function may not be explicitly set on the optimizer. The optimizer then uses the implicit scoring function in the IMP::Model. This shouldn't be done in new code as it is a bit error prone and may become an error at some point.
148 
149 ### Samplers ### {#introduction_samplers}
150 
151 A IMP::kernel::Sampler produces a set of configurations of the model using some sampling scheme.
152 
153 ## Storing and analysis ## {#introduction_analsysis}
154 
155 Configurations of the model can be saved and visualized in a variety of ways. Atomic structures can be written as PDB files using IMP::atom::write_pdb(). More flexibly, coarse grained models, geometry and information about the scoring function can be written to RMF files.
156 
157  my_rmf= RMF.create_rmf_file("my.rmf")
158  IMP.rmf.add_hierarchy(my_rmf, my_hierarchy)
159  IMP.rmf.add_restraint(my_rmf, my_excluded_volume_restraint)
160  IMP.rmf.save_frame(my_rmf, 0)
161 
162 
163 ## Modular structure of IMP ## {#introduction_modular}
164 
165 Functionality in \imp is grouped into modules, each with its own
166 namespace (in C++) or package (in Python). For %example, the functionality
167 for IMP::core can be found like
168 
169  IMP::core::XYZ(p)
170 
171 in C++ and
172 
173  IMP.core.XYZ(p)
174 
175 in Python.
176 
177 A module contains classes,
178 methods and data which are related and controlled by a set of authors. The names
179 of the authors, the license for the module, its version and an overview of the
180 module can be found on the module main page (e.g. IMP::example).
181 See the "Namespaces" tab above for a complete list of modules in this version of \imp.
182 
183 ## Understanding what is going on ## {#introduction_understanding}
184 
185 IMP provides two sorts of tools to help you understand what is going on when you write a script. Both logging and checks are disabled if you use a fast build, so make sure you have access to a non-fast build.
186 
187 ### Logging ### {#introduction_logging}
188 
189 Many operations in IMP can print out log messages as they work, allowing one to see what is being done. The amount of logging can be controlled globally by using IMP::base::set_log_level() or for individual objects by calling, for example `model.set_log_level(IMP.base.VERBOSE)`.
190 
191 ### Runtime checks ### {#introduction_checks}
192 
193 IMP implements lots of runtime checks to make sure both that it is used properly and that it is working correctly. These can be turned on and off globally using IMP::base::set_check_level() or for individual objects.
194 
195 ## Conventions ## {#introduction_conventions}
196 
197 IMP tries to make things simpler to use by adhering to various naming and interface conventions.
198 
199 ### Units ### {#introduction_units}
200 Unless documented otherwise, the following units are used
201 - angstrom for all distances
202 - \f$ \frac{kcal}{mol \unicode[serif]{xC5}}\f$ for forces/derivatives
203 - \f$\frac{kcal}{mol}\f$ for energies
204 - radians for angles. All angles are counterclockwise.
205 - all charges are in units of the elementary charge
206 - all times are in femtoseconds
207 
208 ### Names ### {#introduction_names}
209 
210 - Names in `CamelCase` are class names, for %example IMP::RestraintSet
211 - Lower case names separated with underscores (`_`) in them are functions or methods, for example IMP::Model::update() or IMP::Model::add_particle().
212 - Collections of data of a certain class, e.g. `ClassName` are passed using type `ClassNames`. This type is a `list` in Python and a IMP::base::Vector<ClassName> (which is roughly equivalent to std::vector<ClassName*>) in C++.
213 - These function names start with a verb, which indicates what the method does. Methods starting with
214  - `set_` change some stored value
215  - `get_` create or return a \c value object or
216  return an existing IMP::base::Object class object
217  - `create_` create a new IMP::base::Object class object
218  - `add_`, `remove_` or `clear_` manipulate the contents of a collection of data
219  - `show_` print things in a human-readable format
220  - `load_` and `save_` or `read_` and `write_` move data between files and memory
221  - `link_` create a connection between something and an IMP::base::Object
222  - `update_` change the internal state of an IMP::base::Object
223  - `do_` is a virtual method as part of a \external{http://en.wikipedia.org/wiki/Non-virtual_interface_pattern,non-virtual interface pattern}
224  - `handle_` take action when an event occurs
225  - `validate_` check the state of data and print messages and throw exceptions if something is corrupted
226  - `setup_` and `teardown_` create or destroy some type of invariant (e.g. the constraints for a rigid body)
227  - `apply_` either apply a passed object to each piece of data in some collection or apply the object itself to a particular piece of passed data (this is a bit ambiguous)
228 - names starting with `IMP_` are preprocessor symbols (C++ only)
229 - names don't use abbreviations
230 
231 ### Graphs ### {#introduction_graphs}
232 
233 Graphs in IMP are represented in C++ using the \external{http://www.boost.org/doc/libs/release/libs/graph, Boost Graph Library}. All graphs used in IMP are \external{http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/VertexAndEdgeListGraph.html, VertexAndEdgeListGraphs}, have vertex_name properties,
234 and are \external{http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/BidirectionalGraph.html, BidirectionalGraphs} if they are directed.
235 
236 The Boost.Graph interface cannot be easily exported to Python so we instead provide a simple wrapper IMP::PythonDirectedGraph. There are methods to translate the graphs into various common Python and other formats (e.g. graphviz).
237 
238 
239 ### Values and Objects (C++ only) ### {#introduction_values}
240 
241 As is conventional in C++, IMP classes are divided into broad, exclusive types
242 - *Object classes*: They inherit from IMP::base::Object and are always passed by pointer. They are reference counted and so should only be stored using IMP::base::Pointer in C++ (in Python everything is reference counted). Never allocate these on the stack as very bad things can happen. Objects cannot be duplicated. Equality on objects is defined as identity (e.g. two different objects are different even if the data they contain is identical).
243 
244 - *Value classes* which are normal data types. They are passed by value (or `const&`), never by pointer. Equality is defined based on the data stored in the value. Most value types in IMP are always valid, but a few, mostly geometric types (IMP::algebra::Vector3D) are designed for fast, low-level use and are left in an uninitialized state by their default constructor.
245 
246 - *RAII classes* control some particular resource using the [RAII idiom](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization).
247 They grab control of a resource when created and then free it when they are destroyed. As a result, they cannot be copied. Non-IMP examples include things like files in Python, which are automatically closed when the file object is deleted.
248 
249 All types in IMP, with a few documented exceptions, can be
250 - compared to other objects of the same type
251 - output to a C++ stream or printed in Python
252 - meaningfully put into Python dictionaries or C++ hash maps
253 
254 ### Backwards compatibility and deprecation ### {#introduction_backwards}
255 
256 IMP tries to maintain backwards compatibility, however, this is not always feasible. Our general
257 policy is that functionality that is deprecated in one release (e.g. 2.1) is removed in the next one (2.2).
258 Deprecated functionality should produce warnings when use (e.g. compile time messages for deprecated
259 macros and runtime messages for deprecated functions called from Python). In addition, bugs discovered
260 in deprecated functionality are not fixed.
261 
262 # Where to go next # {#introduction_next}
263 
264 Probably the best thing to do next is to read the [kernel/nup84.py](kernel_2nup84_8py-example.html) example.
Add mass to a particle.
Definition: Mass.h:23
ParticleIndexes get_indexes(const ParticlesTemp &ps)
IMP::kernel::ModelObject ModelObject
boost::graph DependencyGraph
A directed graph on the interactions between the various objects in the model.
IMP::base::Vector< IMP::base::Pointer< Object > > Objects
A list of objects.
Definition: types.h:61
Ints get_index(const kernel::ParticlesTemp &particles, const Subset &subset, const Subsets &excluded)
Selection(Hierarchy hierarchy=None, Hierarchies hierarchies=[], Strings molecules=[], Ints residue_indexes=[], Strings chains=[], AtomTypes atom_types=[], ResidueTypes residue_types=[], Strings domains=[], double resolution=0, std::string molecule=None, int residue_index=None, std::string chain=None, AtomType atom_type=None, ResidueType residue_type=None, HierarchyType hierarchy_type=None, Terminus terminus=None, std::string domain=None, core::ParticleType particle_type=None, core::ParticleTypes particle_types=[], int copy_index=-1, Ints copy_indexs=[], int state_index=-1, Ints state_indexes=[])
ParticlesTemp get_particles(kernel::Model *m, const ParticleIndexes &ps)
IMP::kernel::ScoringFunction ScoringFunction
IMP::base::Vector< IMP::base::Pointer< Restraint > > Restraints
A smart pointer to a reference counted object.
Definition: Pointer.h:87
Represents a scoring function on the model.
IMP::base::Vector< IMP::base::Pointer< Optimizer > > Optimizers
Abstract class for containers of particles.
The standard decorator for manipulating molecular structures.
A decorator for a particle representing an atom.
Definition: atom/Atom.h:234
IMP::kernel::Model Model
A restraint is a term in an IMP ScoringFunction.
Common base class for heavy weight IMP objects.
Definition: Object.h:106
Base class for all optimizers.
double get_score() const
base::Index< ParticleIndexTag > ParticleIndex
IMP::kernel::Particle Particle
Implement a constraint on the Model.
A decorator for a rigid body.
Definition: rigid_bodies.h:75
void update()
Sometimes it is useful to be able to make sure the model is up to date.
void add_particle(RMF::FileHandle fh, kernel::Particle *hs)
IMP::kernel::Restraint Restraint
IMP::base::Vector< IMP::base::Pointer< Constraint > > Constraints
Select hierarchy particles identified by the biological name.
Definition: Selection.h:62
Key< 2, true > StringKey
The type used to identify string attributes in the Particles.
Class for storing model, its restraints, constraints, and particles.
Definition: kernel/Model.h:73
A decorator for a particle with x,y,z coordinates and a radius.
Definition: XYZR.h:27