IMP  2.3.1
The Integrative Modeling Platform
developer_guide.md
1 Developer Guide {#devguide}
2 ===============
3 
4 # Developing with IMP #
5 [TOC]
6 
7 This page presents instructions on how to develop code using
8 IMP. Developers should also read [Getting started as a developer](https://github.com/salilab/imp/wiki/Getting-started-as-a-developer).
9 
10 # Getting around IMP # {#devguide_getting_around}
11 
12 The input files in the IMP directory are structured as follows:
13 - `tools` contains various command line utilities for use by developers. They
14  are [documented below](#devguide_scripts).
15 - `doc` contains inputs for general IMP overview documentation (such as this
16  page), as well as configuration scripts for `doxygen`.
17 - `applications` contains various applications implementing easier-to-use
18  command line functionality, using a variety of IMP modules.
19 - each subdirectory of `modules/` defines a module; they all have the same
20  structure. The directory for module `name` has the following structure:
21  - `README.md` contains a module overview
22  - `include` contains the C++ header files
23  - `src` contains the C++ source files
24  - `bin` contains C++ source files each of which is built into an executable
25  - `pyext` contains files defining the Python interface to the module as well
26  as Python source files (in `pyext/src`)
27  - `test` contains test files, that can be run with `ctest`
28  - `doc` contains additional documentation that is provided via `.dox`
29  or `.md` files
30  - `examples` contains examples in Python and C++, as well as any data needed
31  for examples
32  - `data` contains any data files needed by the module
33 
34 When IMP is built, a number of directories are created in the build directory. They are
35  - `include` which includes all the headers. The headers for module `name` are
36  placed in `include/IMP/name`
37  - `lib` where the C++ and Python libraries are placed. Module `name` is built
38  into a C++ library `lib/libimp_name.so` (or `.dylib` on a Mac) and a Python
39  library with Python files located in `lib/IMP/name` and the binary part in
40  `lib/_IMP_name.so`
41  - `doc` where the html documentation is placed in `doc/html` and the examples
42  in `doc/examples` with a subdirectory for each module
43  - `data` where each module gets a subdirectory for its data.
44 
45 When IMP is installed, the structure from the build directory is
46 moved over more or less intact except that the C++ and Python
47 libraries are put in the (different) appropriate locations.
48 
49 
50 # Writing new code # {#devguide_new_code}
51 
52 The easiest way to start writing new functions and classes is to
53 create a new module using [make-module.py](\ref dev_tools_make_module).
54 This creates a new module in the `modules` directory. Alternatively, you can
55 simply use the `scratch` module.
56 
57 We highly recommend using a revision control system such as
58 [git](http://git-scm.com/) or [svn](http://subversion.tigris.org/) to
59 keep track of changes to your module.
60 
61 If, instead, you choose to add code to an existing module, you need to
62 consult with the person or people who control that module. Their names
63 can be found on the module main page.
64 
65 When designing the interface for your new code, you should
66 
67 - search IMP for similar functionality and, if there is any, adapt
68  the existing interface for your purposes. For example, the existing
69  IMP::atom::read_pdb() and IMP::atom::write_pdb() functions provide
70  templates that should be used for the design of any functions that
71  create particles from a file or write particles to a file. Since
72  IMP::atom::Bond, IMP::algebra::Segment3D and
73  IMP::display::Geometry all use methods like
74  IMP::algebra::Segment3D::get_point() to access the
75  endpoints of a segment, any new object which defines similar
76  point-based geometry should do likewise.
77 
78 - think about how other people are likely to use the code. For
79  example, not all molecular hierarchies have atoms as their leaves,
80  so make sure your code searches for arbitrary
81  IMP::core::XYZ particles rather than atoms if you only care
82  about the geometry.
83 
84 - look for easy ways of splitting the functionality into pieces. It
85  generally makes sense, for %example, to split selection of the
86  particles from the action taken on them, either by accepting a
87  IMP::kernel::Refiner, a IMP::kernel::SingletonContainer or just an arbitrary
88  IMP::kernel::ParticleIndexes object.
89 
90 
91 You may want to read [the design example](\ref designexample) for
92 some suggestions on how to go about implementing your functionality
93 in IMP.
94 
95 ## Coding conventions ## {#devguide_conventions}
96 
97 Make sure you read the [API Conventions](\ref introduction_conventions) page
98 first.
99 
100 To ensure code consistency and readability, certain conventions
101 must be adhered to when writing code for IMP. Some of these
102 conventions are automatically checked for by source control before
103 allowing a new commit, and can also be checked yourself in new
104 code by running [check_standards.py](#devguide_check_standards).
105 
106 ### Indentation ### {#devguide_indentation}
107 
108 All C++ headers and code should be indented with 2-space indents. Do not use
109 tabs. [clang-format](\ref dev_tools_clang_format) can help you do this formatting
110 automatically.
111 
112 All Python code should conform to the [Python style
113 guide](http://www.python.org/dev/peps/pep-0008/). In essence this
114 translates to 4-space indents, no tabs, and similar class, method and
115 variable naming to the C++ code. You can ensure that your Python code
116 is correctly indented by using the
117 [cleanup_code.py script](\ref dev_tools_clang_format).
118 
119 ### Names ### {#devguide_names}
120 
121 See the [introduction](\ref introduction_names) first. In addition, developers
122 should be aware that
123 - all preprocessor symbols must begin with `IMP`.
124 - names of files that implement a single class should be named for that
125  class; for example the `SpecialVector` class could be implemented in
126  `SpecialVector.h` and `SpecialVector.cpp`
127 - files that provide free functions or macros should be given names
128  `separated_by_underscores,` for `example `container_macros.h`
129 - Functions which take a parameter which has units should have the
130  unit as part of the function name, for %example
131  IMP::atom::SimulationParameters::set_maximum_time_step_in_femtoseconds().
132  Remember the Mars orbiter. The exception to this is distance and
133  force numbers which should always be in angstroms and kcal/mol
134  angstrom respectively unless otherwise stated.
135 
136 ### Passing and storing data ### {#devguide_passing}
137 
138 - When a class or function takes a set of particles which are expected to
139  be those of a particular type of decorator, it should take a list of
140  decorators instead. eg IMP::core::transform() takes a IMP::core::XYZ.
141  This makes it clearer what attributes the particle is required to have
142  as well as allows functions to be overloaded (so there can be an
143  IMP::core::transform() which takes IMP::core::RigidBody particles instead).
144 
145 
146 - IMP::Restraint and IMP::ScoreState classes should generally use a
147  IMP::SingletonContainer (or other type of Container) to store the set of
148  IMP::Particle objects that they act on.
149 
150 - Store collections of IMP::Object-derived
151  objects of type `Name` using a `Names.` Declare functions that
152  accept them to take a `NamesTemp` (`Names` is a `NamesTemp)`.
153  `Names` are reference counted (see IMP::RefCounted for details),
154  `NamesTemp` are not. Store collections of particles using a
155  `Particles` object, rather than decorators.
156 
157 ### Display ### {#devguide_display}
158 
159 All values must have a `show` method which takes an optional
160 `std::ostream` and prints information about the object (see
161 IMP::base::Array::show() for an example). Add a `write` method if you
162 want to provide output that can be read back in.
163 
164 ### Errors ### {#devguide_errors}
165 
166 Classes and methods should use IMP exceptions to report errors. See
167 IMP::base::Exception for a list of existing exceptions. See
168 [checks](exception_8h.html) for more information.
169 
170 ### Namespaces ### {#devguide_namespace}
171 
172 Use the provided `IMPMODULE_BEGIN_NAMESPACE,`
173 `IMPMODULE_END_NAMESPACE,` `IMPMODULE_BEGIN_INTERNAL_NAMESPACE` and
174 `IMPMODULE_END_INTERNAL_NAMESPACE` macros to put declarations in a
175 namespace appropriate for module `MODULE.`
176 
177 Each module has an internal namespace, eg `IMP::base::internal` and an internal
178 include directory `IMP/base/internal.` Any function which is
179  - not intended to be part of the API,
180  - not documented,
181  - liable to change without notice,
182  - or not tested
183 
184 should be declared in an internal header and placed in the internal namespace.
185 
186 The functionality in such internal headers is
187  - not exported to Python
188  - and not part of of documented API
189 
190 As a result, such functions do not need to obey all the coding conventions
191 (but we recommend that they do).
192 
193 
194 ## Documenting your code ## {#devguide_documenting}
195 
196 IMP is documented using `doxygen`. See
197 [Documenting your code in doxygen](http://www.doxygen.nl/docblocks.html)
198 to get started. We use `//!` and `/**` ... * / blocks for documentation.
199 You are encouraged to use `Doxygen's`
200 [markdown support](http://www.stack.nl/~dimitri/doxygen/manual/markdown.html) as much as possible.
201 
202 Python code should provide Python doc strings.
203 
204 All headers not in internal directories are parsed through
205 `doxygen`. Any function that you do not want documented (for example,
206 because it is not well tested), hide by surrounding with
207 
208  \#ifndef IMP_DOXYGEN
209  void messy_poorly_thought_out_function();
210  \#endif
211 
212 We provide a number of extra Doxygen commands to aid in producing nice
213 IMP documentation.
214 
215 - To mark that some part of the API has not yet been well planned at may change
216  using `\\unstable{Classname}.` The documentation will include a disclaimer
217  and the class or function will be added to a list of unstable classes. It is
218  better to simply hide such things from `doxygen`.
219 
220 - To mark a method as not having been well tested yet, use `\\untested{Classname}.`
221 
222 - To mark a method as not having been implemented, use `\\untested{Classname}.`
223 
224 ## Debugging and testing your code ## {#devguide_testing}
225 
226 Ensuring that your code is correct can be very difficult, so IMP
227 provides a number of tools to help you out.
228 
229 The first set are assert-style macros:
230 
231 - IMP_USAGE_CHECK() which should be used to check that arguments to
232  functions and methods satisfy the preconditions.
233 
234 - IMP_INTERNAL_CHECK() which should be used to verify internal state
235  and return values to make sure they satisfy pre and post-conditions.
236 
237 See [checks](exception_8h.html) page for more details. As a
238 general guideline, any improper usage to produce at least a warning
239 all return values should be checked by such code.
240 
241 The second is logging macros such as:
242 
243 - IMP_LOG() which allows controlled display of messages about what the
244  code is doing. See [logging](log_8h.html) for more information.
245 
246 Finally, each module has a set of unit tests. The
247 tests are located in the `modules/modulename/test` directory.
248 These tests should try, as much as possible to provide independent
249 verification of the correctness of the code. Any
250 file in that directory or a subdirectory whose name matches `test_*.{py,cpp}`,
251 `medium_test_*.{py,cpp}` or `expensive_test_*.{py,cpp}` is considered a test.
252 Normal tests should run in at most a few seconds on a typical machine, medium
253 tests in 10 seconds or so and expensive tests in a couple of minutes.
254 
255 Some tests will require input files or temporary files. Input files
256 should be placed in a directory called `input` in the `test`
257 directory. The test script should then call
258 \command{self.get_input_file_name(file_name)} to get the true path to
259 the file. Likewise, appropriate names for temporary files should be
260 found by calling
261 \command{self.get_tmp_file_name(file_name)}. Temporary files will be
262 located in `build/tmp.` The test should remove temporary files after
263 using them.
264 
265 ## Writing Examples ## {#devguide_examples}
266 
267 Writing examples is very important part of being an IMP developer and
268 one of the best ways to help people use your code. To write a (Python)
269 example, create a file `myexample.py` in the example directory of an
270 appropriate module, along with a file `myexample.readme.` The readme
271 should provide a brief overview of what the code in the module is
272 trying to accomplish as well as key pieces of IMP functionality that
273 it uses.
274 
275 When writing examples, one should try (as appropriate) to do the following:
276 - begin the example with `import` lines for the IMP modules used
277 - have parameters describing the process taking place. These include names of
278  PDB files, the resolution to perform computations at etc.
279 - define a function `create_representating` which creates and returns the model
280  with the needed particles along with a data structure so that key
281  particles can be located. It should define nested functions as
282  needed to encapsulate commonly used code
283 - define a function `create_restraints` which creates the restraints to score
284  conformations of the representation
285 - define a function `get_conformations` to perform the sampling
286 - define a function `analyze_conformations` to perform some sort of clustering
287  and analysis of the resulting conformations
288 - finally do the actual work of calling the `create_representation` and
289  `create_restraints` functions and performing samping and analysis and
290  displaying the solutions.
291 
292 Obviously, not all examples need all of the above parts.
293 
294 The example should have enough comments that the reasoning behind each line of code is clear to someone who roughly understands how IMP in general works.
295 
296 Examples must use methods like IMP::base::get_example_data() to access
297 data in the example directory. This allows them to be run from
298 anywhere.
299 
300 
301 ## Exporting code to Python ## {#devguide_swig}
302 
303 IMP uses SWIG to wrap code C++ code and export it to Python. Since SWIG is
304 relatively complicated, we provide a number of helper macros and an example
305 file (see modules/example/pyext/swig.i-in). The key bits are
306 - the information goes into a file called swig.i-in in the module pyext directory
307 - the first part should be one `IMP_SWIG_VALUE(),` `IMP_SWIG_OBJECT()` or
308  `IMP_SWIG_DECORATOR()` line per value type, object type or decorator object
309  the module exports to Python. Each of these lines looks like
310 
311  IMP_SWIG_VALUE(IMP::module_namespace, ClassName, ClassNames);
312 
313 - then there should be a number of `%include` lines, one per header file
314  in the module which exports a class or function to Python. The header files
315  must be in order such that no class is used before a declaration for it
316  is encountered (SWIG does not do recursive inclusion)
317 - finally, any templates that are to be exported to SWIG must have a
318  `%template` call. It should look something like
319 
320  namespace IMP {
321  namespace module_namespace {
322  %template(PythonName) CPPName<Restraint, 3>;
323  }
324  }
325 
326 
327 
328 # Managing your own module # {#devguide_module}
329 
330 When there is a significant group of new functionality, a new set of
331 authors, or code that is dependent on a new external dependency, it is
332 probably a good idea to put that code in its own module. To create a
333 new module, run [make-module.py](\ref dev_tools_make_module) script
334 from the main IMP source directory, passing the name of your new
335 module. The module name should consist of lower case characters and
336 numbers and the name should not start with a number. In addition the
337 name "local" is special and is reserved to modules that are internal
338 to code for handling a particular biological system or application. eg
339 
340  ./tools/make-module.py mymodule
341 
342 The next step is to update the information about the module stored in
343 `modules/mymodule/README.md`. This includes the names of the authors and
344 descriptions of what the module is supposed to do.
345 
346 If the module makes use of external libraries. See the files `modules/base/dependencies.py` and `modules/base/dependency/Log4CXX.description`
347 for examples.
348 
349 Each module has an auto-generated header called `modulename_config.h.`
350 This header contains basic definitions needed for the module and
351 should be included (first) in each header file in the module. In
352 addition, there is a header `module_version.h` which contains the
353 version info as preprocessor symbols. This should not be included in
354 module headers or cpp files as doing so will force frequent
355 recompilations.
356 
357 
358 
359 
360 # Contributing code back to the repository # {#devguide_contributing}
361 
362 In order to be shared with others as part of the IMP distribution,
363 code needs to be of higher quality and more thoroughly vetted than
364 typical research code. As a result, it may make sense to keep the
365 code as part of a private module until you better understand what
366 capabilities can be cleanly offered to others.
367 
368 The first set of questions to answer are
369 
370 - What exactly is the functionality I would like to contribute? Is
371  it a single function, a single Restraint, a set of related classes
372  and functions?
373 
374 - Is there similar functionality already in IMP? If so, it might make
375  more sense to modify the existing code in cooperation with its
376  author. At the very least, the new code needs to respect the
377  conventions established by the prior code in order to maintain
378  consistency.
379 
380 - Where should the new functionality go? It can either be added to an
381  existing module or as part of a new module. If adding to an existing
382  module, you must communicate with the authors of that module to get
383  permission and coordinate changes.
384 
385 - Should the functionality be written in C++ or Python? In general, we
386  suggest C++ if you are comfortable programming in that language as
387  that makes the functionality available to more people.
388 
389 You are encouraged to post to the
390 `imp-dev` list to find help
391 answering these questions as it can be hard to grasp all the various
392 pieces of functionality already in the repository.
393 
394 All code contributed to IMP
395 - must follow the [IMP coding conventions](#devguide_conventions)
396 - should follow general good [C++ programming practices](#devguide_cpp)
397 - must have unit tests
398 - must pass all unit tests
399 - must have documentation
400 - must build on all supported compilers (roughly, recent versions of `gcc`,
401  `clang++` and `Visual C++`) without warnings
402 - should have examples
403 - must not have warnings when its doc is built
404 
405 See [getting started as a developer](https://github.com/salilab/imp/wiki/Getting-started-as-a-developer) for more information on submitting code.
406 
407 ## Once you have submitted code ## {#devguide_supporting}
408 
409 Once you have submitted code, you should monitor the [Nightly build
410 status](http://integrativemodeling.org/nightly/results/) to make sure that
411 your code builds on all platforms and passes the unit tests. Please
412 fix all build problems as fast as possible.
413 
414 In addition to monitoring the `imp-dev` list, developers who have a module or
415 are committing patches to svn may want to subscribe to the `imp-commits` email
416 list which receives notices of all changes made to the IMP repository.
417 
418 
419 ## Cross platform compatibility ## {#devguide_cross_platform}
420 
421 IMP is designed to run on a wide variety of platforms. To detect problems on
422 other platforms
423 we provide nightly test runs on the supported
424 platforms for code that is part of the IMP repository.
425 
426 In order to make it more likely that your code works on all the supported platforms:
427 - use the headers and classes in IMP::compatibility when appropriate
428 - avoid the use of `and` and `or` in C++ code, use `&&` and `||` instead.
429 - avoid `friend` declarations involving templates, use the preprocessor,
430  conditionally on the symbols `SWIG` and `IMP_DOXYGEN` to hide code as
431  needed instead.
432 
433 ### C++ 11 ### {#devguide_cxx11}
434 IMP now turns on C++ 11 support when it can. However, since compilers
435 are still quite variable in which C++ 11 features they support, it is
436 not adviseable to use them directly in IMP code at this point. To aid
437 in their use when practical we provide several helper macros:
438 - IMP_OVERRIDE inserts the `override` keyword when available
439 - IMP_FINAL inserts the `final` keyword when available
440 
441 More will come.
442 
443 # Good programming practices # {#devguide_cpp}
444 
445 Two excellent sources for general C++ coding guidelines are
446 
447 - [C++ Coding Standards](http://www.amazon.com/Coding-Standards-Rules-Guidelines-Practices/dp/0321113586) by Sutter and Alexandrescu
448 
449 - [Effective C++](http://www.amazon.com/Effective-Specific-Addison-Wesley-Professional-Computing/dp/0201924889) by Meyers
450 
451 IMP endeavors to follow all the of the guidelines published in those
452 books. The Sali lab owns copies of both of these books that you
453 are free to borrow.
454 
455 
456 # IMP gotchas # {#devguide_gotchas}
457 
458 Below are a suggestions prompted by bugs found in code submitted to IMP.
459 
460 - Never use '`using namespace`' outside of a function; instead
461  explicitly provide the namespace. (This avoids namespace pollution, and
462  removes any ambiguity.)
463 
464 - Never use the preprocessor to define constants. Use `const`
465  variables instead. Preprocessor symbols don't have scope or type
466  and so can have unexpected effects.
467 
468 - Don't expect IMP::base::Object::get_name() names to be unique; they
469  are there for human viewing. If you need a unique identifier
470  associated with an object or non-geometric value, just use the
471  object or value itself.
472 
473 - Pass other objects by value or by `const &` (if the object is
474  large) and store copies of them.
475 
476 - Never expose member variables in an object which has
477  methods. All such member variables should be private.
478 
479 - Don't derive a class from another class simply to reuse some
480  code that the base class provides - only do so if your derived
481  class could make sense when cast to the base class. As above,
482  reuse existing code by pulling it into a function.
483 
484 - Clearly mark any file that is created by a script so that other
485  people know to edit the original file.
486 
487 - Always return a `const` value or `const` reference if you are not
488  providing write access. Returning a `const` copy means the
489  compiler will report an error if the caller tries to modify the
490  return value without creating a copy of it.
491 
492 - Include files from the local module first, then files from the
493  other IMP modules and kernel and finally outside includes. This
494  makes any dependencies in your code obvious, and by including
495  standard headers \e after IMP headers, any missing includes in the
496  headers themselves show up early (rather than being masked by
497  other headers you include).
498 
499  #include <IMP/mymodule/mymodule_exports.h>
500  #include <IMP/mymodule/MyRestraint.h>
501  #include <IMP/Restraint.h>
502  #include <vector>
503 
504 - Use `double` variables for all computational intermediates.
505 
506 - Avoid using nested classes in the API as SWIG can't wrap them
507  properly. If you must use use nested classes, you will have to
508  do more work to provide a Python interface to your code.
509 
510 
511 - Delay initialization of keys until they are actually needed
512  (since all initialized keys take up memory within each particle,
513  more or less). The best way to do this is to have them be static
514  variables in a static function:
515 
516  FloatKey get_my_float_key() {
517  static FloatKey k("hello");
518  return k;
519  }
520 
521 - One is the almost always the right number:
522  - Information should be stored in exactly one
523  place. Duplicated information easily gets out of sync.
524  - A given piece of code should only appear once. Do not copy,
525  paste and modify to create new functionality. Instead,
526  figure out a way to reuse the existing code by pulling it
527  into an internal function and adding extra parameters. If
528  you don't, when you find bugs, you won't remember to fix
529  them in all the copies of the code.
530  - There should be exactly one way to represent any particular
531  state. If there is more than one way, anyone who writes
532  library code which uses that type of state has to handle all
533  ways. For %example, there is only one scheme for
534  representing proteins, namely the IMP::atom::Hierarchy.
535  - Each class/method should do exactly one thing. The presence
536  of arguments which dramatically change the behavior of the
537  class/method is a sign that it should be split. Splitting
538  it can make the code simpler, expose the common code for
539  others to use and make it harder to make mistakes by
540  getting the mode flag wrong.
541  - Methods should take at most one argument of each type (and
542  ideally only one argument). If there are several arguments
543  of the same types (eg two different `double` parameters) it is
544  easy for a user to mix up the order of arguments and the compiler will
545  not complain. `int` and `double` count as
546  equivalent types for this rule since the compiler will
547  transparently convert an `int` into a `double.`
548 
549 
550 # Further reading # {#devguide_further_reading}
551 
552 - [Developer tools](\ref dev_tools)
553 - [Developer FAQ](http://github.com/salilab/imp/wiki/FAQ-for-developers)
554 - [Internals](http://github.com/salilab/imp/wiki/Internals).
void report(std::string benchmark, std::string algorithm, double time, double check)
Report a benchmark result in a standard way.
void write_pdb(const Selection &mhd, base::TextOutput out, unsigned int model=1)
IMP::kernel::SingletonContainer SingletonContainer
The general base class for IMP exceptions.
Definition: exception.h:49
ScoreStates maintain invariants in the Model.
Hierarchy read_pdb(base::TextInput input, kernel::Model *model, PDBSelector *selector=get_default_pdb_selector(), bool select_first_model=true)
void transform(Hierarchy h, const algebra::Transformation3D &tr)
Transform a hierarchy. This is aware of rigid bodies.
IMP::kernel::Refiner Refiner
A restraint is a term in an IMP ScoringFunction.
A decorator for a particle with x,y,z coordinates.
Definition: XYZ.h:30
Class to handle individual model particles.
void show(Hierarchy h, std::ostream &out=std::cout)
Print out a molecular hierarchy.
A shared container for Singletons.