IMP  2.3.1
The Integrative Modeling Platform
design_example.md
1 Design example {#designexample}
2 ==============
3 
4 # Overview #
5 
6 [TOC]
7 This page walks through an iterative design process to give an
8 example of what sort of issues are important and what to think about
9 when choosing how to implement some functionality.
10 
11 # Original Description # {#design_original}
12 
13  Hao wants to implement ligand/protein scoring to IMP so that he can
14  take advantage of the existing infrastructure. The details of the scoring
15  function are currently experimental. The code does the following:
16 
17 1. Read in the protein pdb and the small ligand mol2. The protein is in
18  a pdb file and so can use IMP::atom::read_pdb. The ligand is in a mol2
19  file which defines its own set of pdb-compatible atom types.
20 2. He proposed storing the coordinates and atom types in vectors outside
21  of the decorators to speed up scoring.
22 3. Read in the potential of mean force (PMF) table from a file with
23  a custom format. The number of dimensions can be constant including
24  the two atom types for a pair of atoms, and the distance between that
25  pair. The values are stored in the table, will not change during the
26  program, and need to be looked up quickly given the dimension data.
27  The PMF table uses different atom names than the mol2 file.
28 4. Score a conformation by looping over all ligand-protein atom
29  pairs. For each pair look up the PMF value in the table by the
30  two atom types and the distance, sum up all PMF values.
31 
32 ## Comments on the original description ## {#design_original_comments}
33 
34 1. mol2 is a standard file format so it makes sense to have a reader
35  for it in IMP. We can adopt the mol2 atom names as the standard names
36  for ligand atoms in IMP.
37 2. The details of how the coordinates are stored and accessed are
38  implementation details and worrying about them too much should probably
39  be delayed until later once other considerations are figured out.
40 3. Loading the PMF table is a natural operation for an initialization
41  function. However, since the PMF table is not a standard file format,
42  it doesn't make sense for it to go into IMP, at least not until a file
43  format for the protein-ligand scoring has been worked out. Also there is
44  little reason to keep the PMF table atom types around, and they probably
45  should be converted to more standard atom types on load. Finally, since
46  the data in the PMF file is directly the scoring data, there isn't a
47  real need to have a special representation for it in memory.
48 4. There are two different considerations here; which pairs of atoms to
49  use and how to score each pair.
50 
51 
52 # Design Proposal for Reading # {#design_reading}
53 Since the mol2 reader is quite separate from the scoring, we will consider
54 it on its own first. In analogy to the pdb reader, it makes sense to
55 provide a function `read_mol2(std::istream &in, Model *m)` which returns
57 
58 The mol2 atom types can either be added at runtime using
59 IMP::atom::add_atom_type() or a list of predefined constants can be added
60 similar to the IMP::atom::AT_N. The latter requires editing both
61 IMP/atom/Atom.h and modules/atom/src/Atom.cpp and so it is a bit harder
62 to get right.
63 
64 # Implementing Scoring as a IMP::Restraint # {#design_restraint}
65 
66 First, this functionality should probably go in a new module since it
67 is experimental. One can use the scratch module in a separate `git` branch,
68 for example.
69 
70 One could then have a `PMFRestraint` which loads a PMF file from the
71 module data directory (or from a user-specified path). It would
72 also take two IMP::atom::Hierarchy decorators, one for the ligand and
73 one for the protein and score all pairs over the two. For each pair of atoms,
74 it would look at the IMP::atom::Atom::get_type() value and use that
75 to find the function to use in a stored table.
76 
77 Such a design requires a reasonable amount of implementation, especially
78 once one is interested in accelerating the scoring by only scoring nearby
79 pairs. The `PMFRestraint` could use a IMP::core::ClosePairsScoreState
80 internally if needed.
81 
82 # Implementing Scoring as a IMP::PairScore # {#design_score}
83 
84 One could instead separate the scoring from the pair generation by implementing
85 the scoring as an IMP::PairScore. Then the user could specify an
86 IMP::core::ClosePairsScoreState when experimenting to see what is the fastest
87 way to implement things.
88 
89 As with the restraint solution, the IMP::PairScore would use the
90 IMP::atom::Atom::get_type() value to look up the correct function to use.
91 
92 If you look around in \imp for similar pair scores (see IMP::PairScore and the
93 inheritance diagram) you see there is a IMP::core::TypedPairScore which
94 already does what you need. That is, it takes a pair of particles, looks up
95 their types, and then applies a particular IMP::PairScore based on their types.
96 IMP::core::TypedPairScore expects an IMP::IntKey to describe the type. The
97 appropriate key can be obtained from IMP::atom::Atom::get_type_key().
98 
99 Then all that needs to be implemented in a function, say
100 IMP::hao::create_pair_score_from_pmf() which creates an IMP::core::TypedPairScore,
101 loads a PMF file and then calls IMP::core::TypedPairScore::set_pair_score() for
102 each pair stored in the PMF file after translating PMF types to the
103 appropriate IMP::atom::AtomType.
104 
105 This design has the advantage of very little code to write. As a result it
106 is easy to experiment (move to 3D tables or change the set of close pairs). Also
107 different, non-overlapping PDFs can be combined by just adding more terms to
108 the IMP::core::TypedPairScore.
109 
110 The disadvantages are that the scoring passes through more layers of function
111 calls, making it hard to use optimizations such as storing all the coordinates
112 in a central place.
113 
114 
115 # Some final thoughts # {#design_final}
116 
117 1. Figure out orthogonal degrees of freedom and try to split
118  functionality into pieces that control each. Here it is the set
119  of pairs and how to score each of them. Doing this makes it
120  easier to reuse code.
121 2. Don't create two classes when you only have one set of work. Here,
122  all you have is a mapping between a pair of types and a
123  distance and a score. Having both a PMFTable and PMFPairScore
124  locks you into that aspect of the interface without giving you
125  any real flexibility.
126 3. Implementing things in terms of many small classes makes the
127  design much more flexible. You can easily replace a piece
128  without touching anything else and since each part is simple,
129  replacing a particular piece doesn't take much work. The added
130  complexity can easily be hidden away using helper functions in
131  your code (or, if the action is very common, in IMP).
AtomType add_atom_type(std::string name, Element e)
Create a new AtomType.
Key< 1, true > IntKey
The type used to identify int attributes in the Particles.
IMP::kernel::PairScore PairScore
Hierarchy read_pdb(base::TextInput input, kernel::Model *model, PDBSelector *selector=get_default_pdb_selector(), bool select_first_model=true)
Abstract class for scoring object(s) of type ParticlePair.
The standard decorator for manipulating molecular structures.
IMP::kernel::Model Model
Hierarchy read_mol2(base::TextInput mol2_file, kernel::Model *model, Mol2Selector *mol2sel=nullptr)
Create a hierarchy from a Mol2 file.
const AtomType AT_N