---------- Forwarded message ----------
From: Riccardo Pellarin <pellarin.riccardo@gmail.com>
Date: Mon, Aug 6, 2012 at 12:05 AM
Subject: Re: on documentation
To: Daniel Russel <drussel@gmail.com>

Hi Guys,

would like to share my thoughts on IMP documentation, maybe repeating
what we've already said. I think it is important, though, to share our experience.

Let's suppose I want to fit two structures and calculate the Calpha-rmsd,
a very simple task.

Was typing RMSD in the imp manual search field and got 36 entries.

1st problem: the entry titles are uninformative, unless you know exactly
what each module is supposed to do (statistic, atom, multifit etc, etc). 

Knowing a little bit of IMP I could filter the entries and remove all classes belonging
to multifit and em modules, for instance. Let's take the first seven entries which
might do what I want to do:

Member IMP::statistics::ConfigurationSetRMSDMetric::ConfigurationSetRMSDMetric
class IMP::atom::RMSDCalculator
Member IMP::atom::RMSDCalculator::RMSDCalculator
Member IMP::atom::RMSDCalculator::RMSDCalculators
class IMP::statistics::ConfigurationSetRMSDMetric
Member IMP::atom::get_pairwise_rmsd_score
IMP::atom::get_rmsd

2nd problem: I see a lot of redundancy in the list, and a lot of confusion:
classes and members are mixed together... why is that? Wouldn't it be cleaner
to separate them in two different lists?

Now, let's clean a little bit the list, my eyes go on these candidates:

class IMP::atom::RMSDCalculator
class IMP::statistics::ConfigurationSetRMSDMetric
Member IMP::atom::get_pairwise_rmsd_score
IMP::atom::get_rmsd

3rd Problem: there is not a single function that does a simple task
as an RMSD calculation, but there are many, with different flavors...
Probably many people implemented the same thing many times
because they didn't understand what was implemented before?

Let's have a look at the functions and see if they do what I want...
(as a side note, reading the documentation of IMP functions,
I really would like to leave notes on many of them....)

Let's start with IMP::atom::RMSDCalculator

Detailed Description

Fast rmsd calculation. Used to calculate rmsd between multiple 
transformation that operate on the same particles


Well, that is not detailed. 
What is a "fast rmsd"? No structural fitting I guess? What is the "rmsd between multiple
transformations" ? Maybe rigid body transformations? I start to doubt that this rmsd function is 
calculated between particles at all... 
Let's try to rewrite it. This is what I would like to read:

Short Description: Calculates the rmsd of a list of particles.

Detailed Description: Calculates the root mean square displacement (rmsd) of particles
subjected to rigid-body transformations. The rmsd calculation does
not perform structural best-fit alignment.
Usage: 
1) construct the class using a list of particles:
RMSDCalculator(particles)
2) get the rmsd using the method get_rmsd(trans3D1, trans3D2
where trans3D1 and trans3D2 are rigid body transformations of the
reference and displaced configurations, respectively.
Simple Example: ....

It would be cool if the short description appears in the search
page, along with the class name.

Let's go to the second function:  IMP::statistics::ConfigurationSetRMSDMetric

Detailed Description

Compute the RMSD between specified sets of particles in pairs of configurations, within a configuration set

this is even more cryptic. Maybe: 

Calculates the RMSD of a list of particles between all possible configurations pairs in a "configuration set", which is....

Strangely, this class has not get_rmsd(), but get_distance() method....
Is that the same?

Let's go to another example: IMP::atom::get_pairwise_rmsd_score
The measure quantifies the RMSD between the relative placements of two components compared to a reference relative placement. First, the two compared structures are brought into the same frame of reference by superposing the first pair of equivalent domains (ref1 and mdl1). Next, the RMSD is calculated for the second component

What are the components? Maybe subunits? What are the domains? Why the function is called rmsd_score? Is that different from the rmsd?

Ok I can go on for almost every function and method in IMP.

At the end, I'm completely unsure of what function I should use
for my task.... they all look the same.

Here's my proposal: Every function documentation must have these entries:

Short Description: (appears in the search page)
Detailed Description: 
[Algorithm Description: in some cases]
Usage: 
Simple Example: 

The developer might leave these fields empty, of course. 
When I search something, the first entries should be the 
ones which are more relevant and documented.
Or maybe, the search page should have Documented and Undocumented results. 
(where Undocumented is a function which is lacking a long documentation page).

Of course we cannot force people to write comprehensive documentation,
but at least we can give the user the option of choosing the functions which
are better documented: that will be bad for developers that write code which is
undocumented, since their code will never be used by somebody else.
As a user, I will be skeptical using something where the documentation fields 
are empty!

Sorry, that was long. Hope to hear your feedbacks