IMP  2.1.0
The Integrative Modeling Platform
developer_guide.md
1 # Developer Guide #
2 
3 # Developing with IMP # {#devguide}
4 [TOC]
5 
6 This page presents instructions on how to develop code using
7 IMP. Developers should also read [Getting started as a developer](https://github.com/salilab/imp/wiki/Getting-started-as-a-developer).
8 
9 # Getting around IMP # {#devguide_getting_around}
10 
11 The input files in the IMP directory are structured as follows:
12 - `tools` contains various command line utilities for use by developers. They
13  are [documented below](#devguide_scripts).
14 - `doc` contains inputs for general IMP overview documentation (such as this
15  page), as well as configuration scripts for `doxygen`.
16 - `applications` contains various applications implementing using a variety of
17  IMP modules.
18 - each subdirectory of `module/` defines a module; they all have the same
19  structure. The directory for module `name` has
20  the following structure
21  - `README.md` contains a module overview
22  - `include` contains the C++ header files
23  - `src` contains the C++ source files
24  - `bin` contains C++ source files each of which is built into an executable
25  - `pyext` contains files defining the Python interface to the module as well
26  as Python source files (in `pyext/src`)
27  - `test` contains test files, that can be run with `ctest`.
28  - `doc` contains additional documentation that is provided via `.dox` files
29  - `examples` contains examples in Python and C++, as well as any data needed
30  for examples
31  - `data` contains any data files needed by the module
32 
33 When IMP is built, a number of directories are created in the build directory. They are
34  - `include` which includes all the headers. The headers for module `name` are
35  placed in `include/IMP/name`
36  - `lib` where the C++ and Python libraries are placed. Module `name` is built
37  into a C++ library `lib/libimp_name.so` (or `.dylib` on a Mac) and a Python
38  library with Python files located in `lib/IMP/name` and the binary part in
39  `lib/_IMP_name.so.`
40  - `doc` where the html documentation is placed in `doc/html` and the examples
41  in `doc/examples` with a subdirectory for each module
42  - `data` where each module gets a subdirectory for its data.
43 
44 When IMP is installed, the structure from the `build` directory is
45 moved over more or less intact except that the C++ and Python
46 libraries are put in the (different) appropriate locations.
47 
48 
49 # Writing new code # {#devguide_new_code}
50 
51 The easiest way to start writing new functions and classes is to
52 create a new module using [make-module.py](\ref dev_tools_make_module).
53 This creates a new module in the `modules` directory or simply use the
54 `scratch` module.
55 
56 We highly recommend using a revision control system such as
57 [git](http://git-scm.com/) or [svn](http://subversion.tigris.org/) to
58 keep track of changes to your module.
59 
60 If, instead, you choose to add code to an existing module you need to
61 consult with the person who people who control that module. Their names
62 can be found on the module main page.
63 
64 When designing the interface for your new code, you should
65 
66 - search IMP for similar functionality and, if there is any, adapt
67  the existing interface for your purposes. For example, the existing
68  IMP::atom::read_pdb() and IMP::atom::write_pdb() functions provide
69  templates that should be used for the design of any functions that
70  create particles from a file or write particles to a file. Since
71  IMP::atom::Bond, IMP::algebra::Segment3D and
72  IMP::display::Geometry all use methods like
73  IMP::algebra::Segment3D::get_point() to access the
74  endpoints of a segment, any new object which defines similar
75  point-based geometry should do likewise.
76 
77 - think about how other people are likely to use the code. For
78  example, not all molecular hierarchies have atoms as their leaves,
79  so make sure your code searches for arbitrary
80  IMP::core::XYZ particles rather than atoms if you only care
81  about the geometry.
82 
83 - look for easy ways of splitting the functionality into pieces. It
84  generally makes sense, for %example, to split selection of the
85  particles from the action taken on them, either by accepting a
86  IMP::kernel::Refiner, or a IMP::kernel::SingletonContainer or just an arbitrary
87  IMP::kernel::ParticleIndexes object.
88 
89 
90 You may want to read [the design example](\ref designexample) for
91 some suggestions on how to go about implementing your functionality
92 in IMP.
93 
94 ## Coding conventions ## {#devguide_conventions}
95 
96 Make sure you read the [API Conventions](\ref introduction_conventions) page
97 first.
98 
99 To ensure code consistency and readability, certain conventions
100 must be adhered to when writing code for IMP. Some of these
101 conventions are automatically checked for by source control before
102 allowing a new commit, and can also be checked yourself in new
103 code by running [check_standards.py](#devguide_check_standards) files_to_check`.
104 
105 ### Indentation ### {#devguide_indentation}
106 
107 All C++ headers and code should be indented with 2-space indents. Do not use
108 tabs. [clang-format](\ref dev_tools_clang_format) can help you do this formatting
109 automatically.
110 
111 All Python code should conform to the [Python style
112 guide](http://www.python.org/dev/peps/pep-0008/). In essence this
113 translates to 4-space indents, no tabs, and similar class, method and
114 variable naming to the C++ code. You can ensure that your Python code
115 is correctly indented by using the `tools/reindent.py` script,
116 available as part of the IMP distribution.
117 
118 ### Names ### {#devguide_names}
119 
120 See the [introduction](\ref introduction_names) first. In addition, developers
121 should be aware that
122 - all preprocessor symbols must begin with `IMP`.
123 - names of files that implement a single class should be named for that
124  class; for example the `SpecialVector` class could be implemented in
125  `SpecialVector.h` and `SpecialVector.cpp`
126 - files that provide free functions or macros should be given names
127  `separated_by_underscores,` for `example `container_macros.h`
128 - Functions which take a parameter which has units should have the
129  unit as part of the function name, for %example
130  IMP::atom::SimulationParameters::set_maximum_time_step_in_femtoseconds().
131  Remember the Mars orbiter. The exception to this is distance and
132  force numbers which should always be in angstroms and kcal/mol
133  angstrom respectively unless otherwise stated.
134 
135 ### Passing and storing data ### {#devguide_passing}
136 
137 - When a class or function takes a set of particles which are expected to
138  be those of a particular type of decorator, it should take a list of
139  decorators instead. eg IMP::core::transform() takes a IMP::core::XYZ.
140  This makes it clearer what attributes the particle is required to have
141  as well as allows functions to be overloaded (so there can be an
142  IMP::core::transform() which takes IMP::core::RigidBody particles instead).
143 
144 
145 - IMP::Restraint and IMP::ScoreState classes should generally use a
146  IMP::SingletonContainer (or other type of Container) to store the set of
147  IMP::Particle objects that they act on.
148 
149 - Store collections of IMP::Object-derived
150  objects of type `Name` using a `Names.` Declare functions that
151  accept them to take a `NamesTemp` (`Names` is a `NamesTemp)`.
152  `Names` are reference counted (see IMP::RefCounted for details),
153  `NamesTemp` are not. Store collections of particles using a
154  `Particles` object, rather than decorators.
155 
156 ### Display ### {#devguide_display}
157 
158 All values must have a `show` method which takes an optional
159 `std::ostream` and prints information about the object (see
160 IMP::base::Array::show() for an example). Add a `write` method if you
161 want to provide output that can be read back in.
162 
163 ### Errors ### {#devguide_errors}
164 
165 Classes and methods should use IMP exceptions to report errors. See
166 IMP::base::Exception for a list of existing exceptions. See
167 [checks](base_2exception_8h.html) for more information.
168 
169 ### Namespaces ### {#devguide_namespace}
170 
171 Use the provided `IMPMODULE_BEGIN_NAMESPACE,`
172 `IMPMODULE_END_NAMESPACE,` `IMPMODULE_BEGIN_INTERNAL_NAMESPACE` and
173 `IMPMODULE_END_INTERNAL_NAMESPACE` macros to put declarations in a
174 namespace appropriate for module `MODULE.`
175 
176 Each module has an internal namespace, eg `IMP::base::internal` and an internal
177 include directory `IMP/base/internal.` Any function which is
178  - not intended to be part of the API,
179  - not documented,
180  - liable to change without notice,
181  - or not tested
182 
183 should be declared in an internal header and placed in the internal namespace.
184 
185 The functionality in such internal headers is
186  - not exported to Python
187  - and not part of of documented API
188 
189 As a result, such functions do not need to obey all the coding conventions
190 (but we recommend that they do).
191 
192 
193 ## Documenting your code ## {#devguide_documenting}
194 
195 IMP is documented using `doxygen`. See
196 [Documenting your code in doxygen](http://www.doxygen.nl/docblocks.html)
197 to get started. We use `//!` and `/**` ... * / blocks for documentation.
198 You are encouraged to use `Doxygen's`
199 [markdown support](http://www.stack.nl/~dimitri/doxygen/manual/markdown.html) as much as possible.
200 
201 Python code should provide Python doc strings.
202 
203 All headers not in internal directories are parsed through
204 `doxygen`. Any function that you do not want documented (for example,
205 because it is not well tested), hide by surrounding with
206 
207  \#ifndef IMP_DOXYGEN
208  void messy_poorly_thought_out_function();
209  \#endif
210 
211 We provide a number of extra Doxygen commands to aid in producing nice
212 IMP documentation.
213 
214 - To mark that some part of the API has not yet been well planned at may change
215  using `\\unstable{Classname}.` The documentation will include a disclaimer
216  and the class or function will be added to a list of unstable classes. It is
217  better to simply hide such things from `doxygen`.
218 
219 - To mark a method as not having been well tested yet, use `\\untested{Classname}.`
220 
221 - To mark a method as not having been implemented, use `\\untested{Classname}.`
222 
223 ## Debugging and testing your code ## {#devguide_testing}
224 
225 Ensuring that your code is correct can be very difficult, so IMP
226 provides a number of tools to help you out.
227 
228 The first set are assert-style macros:
229 
230 - IMP_USAGE_CHECK() which should be used to check that arguments to
231  functions and methods satisfy the preconditions.
232 
233 - IMP_INTERNAL_CHECK() which should be used to verify internal state
234  and return values to make sure they satisfy pre and post-conditions.
235 
236 See [checks](base_2exception_8h.html) page for more details. As a
237 general guideline, any improper usage to produce at least a warning
238 all return values should be checked by such code.
239 
240 The second is logging macros such as:
241 
242 - IMP_LOG() which allows controlled display of messages about what the
243  code is doing. See [logging](base_2log_8h.html) for more information.
244 
245 Finally, each module has a set of unit tests. The
246 tests are located in the `modules/modulename/test` directory.
247 These tests should try, as much as possible to provide independent
248 verification of the correctness of the code. Any
249 file in that directory or a subdirectory whose name matches `test_*.{py,cpp}`,
250 `medium_test_*.{py,cpp}` or `expensive_test_*.{py,cpp}` is considered a test.
251 Normal tests should run in at most a few seconds on a typical machine, medium
252 tests in 10 seconds or so and expensive tests in a couple of minutes.
253 
254 Some tests will require input files or temporary files. Input files
255 should be placed in a directory called `input` in the `test`
256 directory. The test script should then call
257 \command{self.get_input_file_name(file_name)} to get the true path to
258 the file. Likewise, appropriate names for temporary files should be
259 found by calling
260 \command{self.get_tmp_file_name(file_name)}. Temporary files will be
261 located in `build/tmp.` The test should remove temporary files after
262 using them.
263 
264 ## Writing Examples ## {#devguide_examples}
265 
266 Writing examples is very important part of being an IMP developer and
267 one of the best ways to help people use your code. To write a (Python)
268 example, create a file `myexample.py` in the example directory of an
269 appropriate module, along with a file `myexample.readme.` The readme
270 should provide a brief overview of what the code in the module is
271 trying to accomplish as well as key pieces of IMP functionality that
272 it uses.
273 
274 When writing examples, one should try (as appropriate) to do the following:
275 - begin the example with `import` lines for the IMP modules used
276 - have parameters describing the process taking place. These include names of
277  PDB files, the resolution to perform computations at etc.
278 - define a function `create_representating` which creates and returns the model
279  with the needed particles along with a data structure so that key
280  particles can be located. It should define nested functions as
281  needed to encapsulate commonly used code
282 - define a function `create_restraints` which creates the restraints to score
283  conformations of the representation
284 - define a function `get_conformations` to perform the sampling
285 - define a function `analyze_conformations` to perform some sort of clustering
286  and analysis of the resulting conformations
287 - finally do the actual work of calling the `create_representation` and
288  `create_restraints` functions and performing samping and analysis and
289  displaying the solutions.
290 
291 Obviously, not all examples need all of the above parts.
292 
293 The example should have enough comments that the reasoning behind each line of code is clear to someone who roughly understands how IMP in general works.
294 
295 Examples must use methods like IMP::base::get_example_data() to access
296 data in the example directory. This allows them to be run from
297 anywhere.
298 
299 
300 ## Exporting code to Python ## {#devguide_swig}
301 
302 IMP uses SWIG to wrap code C++ code and export it to Python. Since SWIG is
303 relatively complicated, we provide a number of helper macros and an example
304 file (see modules/example/pyext/swig.i-in). The key bits are
305 - the information goes into a file called swig.i-in in the module pyext directory
306 - the first part should be one `IMP_SWIG_VALUE(),` `IMP_SWIG_OBJECT()` or
307  `IMP_SWIG_DECORATOR()` line per value type, object type or decorator object
308  the module exports to Python. Each of these lines looks like
309 
310  IMP_SWIG_VALUE(IMP::module_namespace, ClassName, ClassNames);
311 
312 - then there should be a number of `%include` lines, one per header file
313  in the module which exports a class or function to Python. The header files
314  must be in order such that no class is used before a declaration for it
315  is encountered (SWIG does not do recursive inclusion)
316 - finally, any templates that are to be exported to SWIG must have a
317  `%template` call. It should look something like
318 
319  namespace IMP {
320  namespace module_namespace {
321  %template(PythonName) CPPName<Restraint, 3>;
322  }
323  }
324 
325 
326 
327 # Managing your own module # {#devguide_module}
328 
329 When there is a significant group of new functionality, a new set of
330 authors, or code that is dependent on a new external dependency, it is
331 probably a good idea to put that code in its own module. To create a
332 new module, run [make-module.py](\ref dev_tools_make_module) script
333 from the main IMP source directory, passing the name of your new
334 module. The module name should consist of lower case characters and
335 numbers and the name should not start with a number. In addition the
336 name "local" is special and is reserved to modules that are internal
337 to code for handling a particular biological system or application. eg
338 
339  ./tools/make-module.py mymodule
340 
341 The next step is to update the information about the module stored in
342 `modules/mymodule/README.md`. This includes the names of the authors and
343 descriptions of what the module is supposed to do.
344 
345 If the module makes use of external libraries. See the files `modules/base/dependencies.py` and `modules/base/dependency/Log4CXX.description`
346 for examples.
347 
348 Each module has an auto-generated header called `modulename_config.h.`
349 This header contains basic definitions needed for the module and
350 should be included (first) in each header file in the module. In
351 addition, there is a header `module_version.h` which contains the
352 version info as preprocessor symbols. This should not be included in
353 module headers or cpp files as doing so will force frequent
354 recompilations.
355 
356 
357 
358 
359 # Contributing code back to the repository # {#devguide_contributing}
360 
361 In order to be shared with others as part of the IMP distribution,
362 code needs to be of higher quality and more thoroughly vetted than
363 typical research code. As a result, it may make sense to keep the
364 code as part of a private module until you better understand what
365 capabilities can be cleanly offered to others.
366 
367 The first set of questions to answer are
368 
369 - What exactly is the functionality I would like to contribute? Is
370  it a single function, a single Restraint, a set of related classes
371  and functions?
372 
373 - Is there similar functionality already in IMP? If so, it might make
374  more sense to modify the existing code in cooperation with its
375  author. At the very least, the new code needs to respect the
376  conventions established by the prior code in order to maintain
377  consistency.
378 
379 - Where should the new functionality go? It can either be added to an
380  existing module or as part of a new module. If adding to an existing
381  module, you must communicate with the authors of that module to get
382  permission and coordinate changes.
383 
384 - Should the functionality be written in C++ or Python? In general, we
385  suggest C++ if you are comfortable programming in that language as
386  that makes the functionality available to more people.
387 
388 You are encouraged to post to the
389 `imp-dev` list to find help
390 answering these questions as it can be hard to grasp all the various
391 pieces of functionality already in the repository.
392 
393 All code contributed to IMP
394 - must follow the [IMP coding conventions](#devguide_conventions)
395 - should follow general good [C++ programming practices](#devguide_cpp)
396 - must have unit tests
397 - must pass all unit tests
398 - must have documentation
399 - must build on all supported compilers (roughly, recent versions of `gcc`,
400  `clang++` and `Visual C++`) without warnings
401 - should have examples
402 - must not have warnings when its doc is built
403 
404 See [getting started as a developer](https://github.com/salilab/imp/wiki/Getting-started-as-a-developer) for more information on submitting code.
405 
406 ## Once you have submitted code ## {#devguide_supporting}
407 
408 Once you have submitted code, you should monitor the [Nightly build
409 status](http://integrativemodeling.org/nightly/results/) to make sure that
410 your code builds on all platforms and passes the unit tests. Please
411 fix all build problems as fast as possible.
412 
413 In addition to monitoring the `imp-dev` list, developers who have a module or
414 are committing patches to svn may want to subscribe to the `imp-commits` email
415 list which receives notices of all changes made to the IMP repository.
416 
417 
418 ## Cross platform compatibility ## {#devguide_cross_platform}
419 
420 IMP is designed to run on a wide variety of platforms. To detect problems on
421 other platforms
422 we provide nightly test runs on the supported
423 platforms for code that is part of the IMP repository.
424 
425 In order to make it more likely that your code works on all the supported platforms:
426 - use the headers and classes in IMP::compatibility when appropriate
427 - avoid the use of `and` and `or` in C++ code, use `&&` and `||` instead.
428 - avoid `friend` declarations involving templates, use the preprocessor,
429  conditionally on the symbols `SWIG` and `IMP_DOXYGEN` to hide code as
430  needed instead.
431 
432 ### C++ 11 ### {#devguide_cxx11}
433 IMP now turns on C++ 11 support when it can. However, since compilers
434 are still quite variable in which C++ 11 features they support, it is
435 not adviseable to use them directly in IMP code at this point. To aid
436 in their use when practical we provide several helper macros:
437 - IMP_OVERRIDE inserts the `override` keyword when available
438 - IMP_FINAL inserts the `final` keyword when available
439 
440 More will come.
441 
442 # Good programming practices # {#devguide_cpp}
443 
444 Two excellent sources for general C++ coding guidelines are
445 
446 - [C++ Coding Standards](http://www.amazon.com/Coding-Standards-Rules-Guidelines-Practices/dp/0321113586) by Sutter and Alexandrescu
447 
448 - [Effective C++](http://www.amazon.com/Effective-Specific-Addison-Wesley-Professional-Computing/dp/0201924889) by Meyers
449 
450 IMP endeavors to follow all the of the guidelines published in those
451 books. The Sali lab owns copies of both of these books that you
452 are free to borrow.
453 
454 
455 # IMP gotchas # {#devguide_gotchas}
456 
457 Below are a suggestions prompted by bugs found in code submitted to IMP.
458 
459 - Never use '`using` `namespace'` outside of a function; instead
460  explicitly provide the namespace. (This avoids namespace pollution, and
461  removes any ambiguity.)
462 
463 - Never use the preprocessor to define constants. Use `const`
464  variables instead. Preprocessor symbols don't have scope or type
465  and so can have unexpected effects.
466 
467 - Don't expect IMP::base::Object::get_name() names to be unique, they
468  are there for human viewing. If you need a unique identifier
469  associated with an object or non-geometric value, just use the
470  object or value itself.
471 
472 - Pass other objects by value or by `const` & (if the object is
473  large) and store copies of them.
474 
475 - Never expose member variables in an object which has
476  methods. All such member variables should be private.
477 
478 - Don't derive a class from another class simply to reuse some
479  code that the base class provides - only do so if your derived
480  class could make sense when cast to the base class. As above,
481  reuse existing code by pulling it into a function.
482 
483 - Clearly mark any file that is created by a script so that other
484  people know to edit the original file.
485 
486 - Always return a `const` value or `const` ref if you are not
487  providing write access. Returning a `const` copy means the
488  compiler will report an error if the caller tries to modify the
489  return value without creating a copy of it.
490 
491 - Include files from the local module first, then files from the
492  other IMP modules and kernel and finally outside includes. This
493  makes any dependencies in your code obvious, and by including
494  standard headers \e after IMP headers, any missing includes in the
495  headers themselves show up early (rather than being masked by
496  other headers you include).
497 
498  #include <IMP/mymodule/mymodule_exports.h>
499  #include <IMP/mymodule/MyRestraint.h>
500  #include <IMP/Restraint.h>
501  #include <vector>
502 
503 - Use `double` variables for all computational intermediates.
504 
505 - Avoid using nested classes in the API as SWIG can't wrap them
506  properly. If you must use use nested classes, you will have to
507  do more work to provide a Python interface to your code.
508 
509 
510 - Delay initialization of keys until they are actually needed
511  (since all initialized keys take up memory within each particle,
512  more or less). The best way to do this is to have them be static
513  variables in a static function:
514 
515  FloatKey get_my_float_key() {
516  static FloatKey k("hello");
517  return k;
518  }
519 
520 - One is the almost always the right number:
521  - Information should be stored in exactly one
522  place. Duplicated information easily gets out of sync.
523  - A given piece of code should only appear once. Do not copy,
524  paste and modify to create new functionality. Instead,
525  figure out a way to reuse the existing code by pulling it
526  into an internal function and adding extra parameters. If
527  you don't, when you find bugs, you won't remember to fix
528  them in all the copies of the code.
529  - There should be exactly one way to represent any particular
530  state. If there is more than one way, anyone who writes
531  library code which uses that type of state has to handle all
532  ways. For %example, there is only one scheme for
533  representing proteins, namely the IMP::atom::Hierarchy.
534  - Each class/method should do exactly one thing. The presence
535  of arguments which dramatically change the behavior of the
536  class/method is a sign that it should be split. Splitting
537  it can make the code simpler, expose the common code for
538  others to use and make it harder to make mistakes by
539  getting the mode flag wrong.
540  - Methods should take at most one argument of each type (and
541  ideally only one argument). If there are several arguments
542  of the same types (eg two different `double` parameters) it is
543  easy for a user to mix up the order of arguments and the compiler will
544  not complain. `int` and `double` count as
545  equivalent types for this rule since the compiler will
546  transparently convert an `int` into a `double.`
547 
548 
549 # Further reading # {#devguide_further_reading}
550 
551 - [Developer tools](\ref dev_tools)
552 - [Developer FAQ](http://github.com/salilab/imp/wiki/FAQ-for-developers)
553 - [Internals](http://github.com/salilab/imp/wiki/Internals).
void report(std::string benchmark, std::string algorithm, double time, double check)
Report a benchmark result in a standard way.
void write_pdb(const Selection &mhd, base::TextOutput out, unsigned int model=1)
IMP::kernel::SingletonContainer SingletonContainer
The general base class for IMP exceptions.
ScoreStates maintian invariants in the Model.
Hierarchy read_pdb(base::TextInput input, kernel::Model *model, PDBSelector *selector=get_default_pdb_selector(), bool select_first_model=true)
void transform(Hierarchy h, const algebra::Transformation3D &tr)
IMP::kernel::Refiner Refiner
A restraint is a term in an IMP ScoringFunction.
A decorator for a particle with x,y,z coordinates.
Definition: XYZ.h:32
Class to handle individual model particles.
void show(Hierarchy h, std::ostream &out=std::cout)
Print out a molecular hierarchy.
A shared container for Singletons.