1 Design example {#designexample}
7 This page walks through an iterative design process to give an
8 example of what sort of issues are important and what to think about
9 when choosing how to implement some functionality.
11 # Original Description # {#design_original}
13 Hao wants to implement ligand/protein scoring to IMP so that he can
14 take advantage of the existing infrastructure. The details of the scoring
15 function are currently experimental. The code does the following:
17 1. Read in the protein pdb and the small ligand mol2. The protein is in
19 file which defines its own set of pdb-compatible atom types.
20 2. He proposed storing the coordinates and atom types in vectors outside
21 of the decorators to speed up scoring.
22 3. Read in the potential of mean force (PMF) table from a file with
23 a custom format. The number of dimensions can be constant including
24 the two atom types for a pair of atoms, and the distance between that
25 pair. The values are stored in the table, will not change during the
26 program, and need to be looked up quickly given the dimension data.
27 The PMF table uses different atom names than the mol2 file.
28 4. Score a conformation by looping over all ligand-protein atom
29 pairs. For each pair look up the PMF value in the table by the
30 two atom types and the distance, sum up all PMF values.
32 ## Comments on the original description ## {#design_original_comments}
34 1. mol2 is a standard file format so it makes sense to have a reader
35 for it in IMP. We can adopt the mol2 atom names as the standard names
36 for ligand atoms in IMP.
37 2. The details of how the coordinates are stored and accessed are
38 implementation details and worrying about them too much should probably
39 be delayed until later once other considerations are figured out.
40 3. Loading the PMF table is a natural operation
for an initialization
41 function. However, since the PMF table is not a standard file format,
42 it doesn
't make sense for it to go into IMP, at least not until a file
43 format for the protein-ligand scoring has been worked out. Also there is
44 little reason to keep the PMF table atom types around, and they probably
45 should be converted to more standard atom types on load. Finally, since
46 the data in the PMF file is directly the scoring data, there isn't a
47 real need to have a special representation
for it in memory.
48 4. There are two different considerations here; which pairs of atoms to
49 use and how to score each pair.
52 # Design Proposal
for Reading # {#design_reading}
53 Since the mol2 reader is quite separate from the scoring, we will consider
54 it on its own first. In analogy to the pdb reader, it makes sense to
55 provide a
function `
read_mol2(std::istream &in,
Model *m)` which returns
58 The mol2 atom types can either be added at runtime
using
60 similar to the IMP::atom::
AT_N. The latter requires editing both
61 IMP/atom/Atom.h and modules/atom/src/Atom.cpp and so it is a bit harder
64 # Implementing Scoring as a IMP::Restraint # {#design_restraint}
66 First,
this functionality should probably go in a
new module since it
67 is experimental. One can use the scratch module in a separate `git` branch,
70 One could then have a `PMFRestraint` which loads a PMF file from the
71 module data directory (or from a user-specified path). It would
73 one
for the protein and score all pairs over the two. For each pair of atoms,
74 it would look at the IMP::atom::Atom::get_type() value and use that
75 to find the function to use in a stored table.
77 Such a design requires a reasonable amount of implementation, especially
78 once one is interested in accelerating the scoring by only scoring nearby
79 pairs. The `PMFRestraint` could use a IMP::core::ClosePairsScoreState
82 # Implementing Scoring as a IMP::PairScore # {#design_score}
84 One could instead separate the scoring from the pair generation by implementing
86 IMP::core::ClosePairsScoreState when experimenting to see what is the fastest
87 way to implement things.
90 IMP::atom::Atom::get_type() value to look up the correct function to use.
92 If you look around in \imp for similar pair scores (see IMP::
PairScore and the
93 inheritance diagram) you see there is a IMP::core::TypedPairScore which
94 already does what you need. That is, it takes a pair of particles, looks up
95 their types, and then applies a particular IMP::
PairScore based on their types.
96 IMP::core::TypedPairScore expects an IMP::
IntKey to describe the type. The
97 appropriate key can be obtained from IMP::atom::Atom::get_type_key().
99 Then all that needs to be implemented in a function, say
100 IMP::hao::create_pair_score_from_pmf() which creates an IMP::core::TypedPairScore,
101 loads a PMF file and then calls IMP::core::TypedPairScore::set_pair_score() for
102 each pair stored in the PMF file after translating PMF types to the
103 appropriate IMP::atom::AtomType.
105 This design has the advantage of very little code to write. As a result it
106 is easy to experiment (move to 3D tables or change the set of close pairs). Also
107 different, non-overlapping PDFs can be combined by just adding more terms to
108 the IMP::core::TypedPairScore.
110 The disadvantages are that the scoring passes through more layers of function
111 calls, making it hard to use optimizations such as storing all the coordinates
115 # Some final thoughts # {#design_final}
117 1. Figure out orthogonal degrees of freedom and
try to split
118 functionality into pieces that control each. Here it is the set
119 of pairs and how to score each of them. Doing
this makes it
120 easier to reuse code.
121 2. Don
't create two classes when you only have one set of work. Here,
122 all you have is a mapping between a pair of types and a
123 distance and a score. Having both a PMFTable and PMFPairScore
124 locks you into that aspect of the interface without giving you
125 any real flexibility.
126 3. Implementing things in terms of many small classes makes the
127 design much more flexible. You can easily replace a piece
128 without touching anything else and since each part is simple,
129 replacing a particular piece doesn't take much work. The added
130 complexity can easily be hidden away
using helper functions in
131 your code (or,
if the action is very common, in IMP).
AtomType add_atom_type(std::string name, Element e)
Create a new AtomType.
Key< 1, true > IntKey
The type used to identify int attributes in the Particles.
IMP::kernel::PairScore PairScore
Hierarchy read_pdb(base::TextInput input, kernel::Model *model, PDBSelector *selector=get_default_pdb_selector(), bool select_first_model=true)
Abstract class for scoring object(s) of type ParticlePair.
The standard decorator for manipulating molecular structures.
Hierarchy read_mol2(base::TextInput mol2_file, kernel::Model *model, Mol2Selector *mol2sel=nullptr)
Create a hierarchy from a Mol2 file.