Re: [IMP-dev] on documentation

6 Aug 2012


      I'd like to amend a bit. Google search does work rather better. eg googling
"site:salilab.org/imp/nightly/doc/html/ compute rmsd between two
hierarchies" does get you to the page with the right function as the top
hit (although it is a very long page, I'm trying to figure out something
for that).
Perhaps we should just drop the doxygen search entirely. Does anyone find
it useful?
For people who don't know "site:xxxx" it restricts google to only return
hits whose url matches the prefix.
On Mon, Aug 6, 2012 at 11:03 AM, Daniel Russel drussel@gmail.com wrote:
> To add my two cents:
> - the search in doxygen kind of sucks and google definitely isn't better
> on this. There is no good way that I can think of to prioritize search
> results, so I'm not sure where to go to make this aspect better. And,
> unfortunately, as one adds more to the API and documentation, you just get
> more hits it more or less random order that you have to look through.
> Anyone have any good ideas on this? We can try going back to the doxygen
> live search as that may allow one to experiment more interactively with
> search terms (I had severe limitations before, but these may have been
> fixed).
>
> - in general, you need to read the documentation of all the bases classes
> of a class and the module before you will understand the class. I think
> this cannot be reasonably avoided. Otherwise content would have to be
> duplicated in many places, which invariably results in it having more
> errors/being even less compete (or requiring a great deal more time for the
> same amount of content). Hopefully something
> like ConfigurationSetRMSDMetric would make more sense in light
> understanding statistics::Metric. For example, it has no get_rmsd() method
> since it is a specialization of the Metric base class and that defines a
> get_distance() virtual method, so having a get_rmsd() method would be
> useless where it is supposed to be used.
>
> -  What I would really like to see is that when someone spends the time to
> figure something out like this, they add an example/patch the comments in
> the files and then sends the patch off to someone to integrate :-)
>
> - I'd like to move to a more structured commit model for IMP with some
> more review of things that go in so that we can prod people (and me) more
> to improve docs/merge redundant things. I typed up some thoughts on
> modifying the comment model here <
> https://github.com/salilab/imp/wiki/A-proposed-commit-model-for-IMP%3E Feel
> free to edit (or request permissions to edit, I'm a bit unclear on how
> those are regulated :-) The main idea would be that if things, in general,
> have two people look at them before going into most modules in IMP, they
> should be a bit more coherent and documented. And, if one is able to share
> things prior to committing them to the SVN repository, they can stay in
> purgatory a bit longer (and will hopefully be worked on a bit longer),
> before they considered good enough and work on them ceases (as tends to
> happen). Not sure if this will work :-)
>
>
> On Mon, Aug 6, 2012 at 9:56 AM, Daniel Russel drussel@gmail.com wrote:
>
>> ---------- Forwarded message ----------
>> From: Riccardo Pellarin pellarin.riccardo@gmail.com
>> Date: Mon, Aug 6, 2012 at 12:05 AM
>> Subject: Re: on documentation
>> To: Daniel Russel drussel@gmail.com
>>
>> Hi Guys,
>>
>> would like to share my thoughts on IMP documentation, maybe repeating
>> what we've already said. I think it is important, though, to share our
>> experience.
>>
>> Let's suppose I want to fit two structures and calculate the Calpha-rmsd,
>> a very simple task.
>>
>> Was typing RMSD in the imp manual search field and got 36 entries.
>>
>> 1st problem: the entry titles are uninformative, unless you know exactly
>> what each module is supposed to do (statistic, atom, multifit etc, etc).
>>
>> Knowing a little bit of IMP I could filter the entries and remove all
>> classes belonging
>> to multifit and em modules, for instance. Let's take the first seven
>> entries which
>> might do what I want to do:
>>
>> Member IMP::statistics::ConfigurationSetRMSDMetric::
>> ConfigurationSetRMSDMetric
>> class IMP::atom::RMSDCalculator
>> Member IMP::atom::RMSDCalculator::RMSDCalculator
>> Member IMP::atom::RMSDCalculator::RMSDCalculators
>> class IMP::statistics::ConfigurationSetRMSDMetric
>> Member IMP::atom::get_pairwise_rmsd_score
>> IMP::atom::get_rmsd
>>
>> 2nd problem: I see a lot of redundancy in the list, and a lot of
>> confusion:
>> classes and members are mixed together... why is that? Wouldn't it be
>> cleaner
>> to separate them in two different lists?
>>
>> Now, let's clean a little bit the list, my eyes go on these candidates:
>>
>> class IMP::atom::RMSDCalculator
>> class IMP::statistics::ConfigurationSetRMSDMetric
>> Member IMP::atom::get_pairwise_rmsd_score
>> IMP::atom::get_rmsd
>>
>> 3rd Problem: there is not a single function that does a simple task
>> as an RMSD calculation, but there are many, with different flavors...
>> Probably many people implemented the same thing many times
>> because they didn't understand what was implemented before?
>>
>> Let's have a look at the functions and see if they do what I want...
>> (as a side note, reading the documentation of IMP functions,
>> I really would like to leave notes on many of them....)
>>
>> Let's start with IMP::atom::RMSDCalculator
>>
>> Detailed Description
>>
>> Fast rmsd calculation. Used to calculate rmsd between multiple
>> transformation that operate on the same particles
>>
>>
>> Well, that is not detailed.
>> What is a "fast rmsd"? No structural fitting I guess? What is the "rmsd
>> between multiple
>>  transformations" ? Maybe rigid body transformations? I start to doubt
>> that this rmsd function is
>> calculated between particles at all...
>> Let's try to rewrite it. This is what I would like to read:
>>
>> *Short Description:* Calculates the rmsd of a list of particles.
>>
>> *Detailed Description:* Calculates the root mean square displacement
>> (rmsd) of particles
>> subjected to rigid-body transformations. The rmsd calculation does
>> not perform structural best-fit alignment.
>> *Usage:*
>> 1) construct the class using a list of particles:
>> RMSDCalculator(particles)
>> 2) get the rmsd using the method get_rmsd(trans3D1, trans3D2)
>> where trans3D1 and trans3D2 are rigid body transformations of the
>> reference and displaced configurations, respectively.
>> *Simple Example: ....*
>> *
>> *
>> It would be cool if the short description appears in the search
>> page, along with the class name.
>>
>> Let's go to the second
>> function:  IMP::statistics::ConfigurationSetRMSDMetric
>>
>> Detailed Description
>>
>> Compute the RMSD between specified sets of particles in pairs of
>> configurations, within a configuration set
>> this is even more cryptic. Maybe:
>>
>> Calculates the RMSD of a list of particles between all possible
>> configurations pairs in a "configuration set", which is....
>>
>> Strangely, this class has not get_rmsd(), but get_distance() method....
>> Is that the same?
>>
>> Let's go to another example: IMP::atom::get_pairwise_rmsd_score
>> The measure quantifies the RMSD between the relative placements of two
>> components compared to a reference relative placement. First, the two
>> compared structures are brought into the same frame of reference by
>> superposing the first pair of equivalent domains (ref1 and mdl1). Next, the
>> RMSD is calculated for the second component
>>
>> What are the components? Maybe subunits? What are the domains? Why the
>> function is called rmsd_score? Is that different from the rmsd?
>>
>> Ok I can go on for almost every function and method in IMP.
>>
>> At the end, I'm completely unsure of what function I should use
>> for my task.... they all look the same.
>>
>> Here's my proposal: Every function documentation must have these entries:
>>
>> *Short Description:* (appears in the search page)
>> *Detailed Description:*
>> [*Algorithm Description:* in some cases]
>> *Usage:*
>> *Simple Example: *
>>
>> The developer might leave these fields empty, of course.
>> When I search something, the first entries should be the
>> ones which are more relevant and documented.
>> Or maybe, the search page should have Documented and Undocumented
>> results.
>> (where Undocumented is a function which is lacking a long documentation
>> page).
>>
>> Of course we cannot force people to write comprehensive documentation,
>> but at least we can give the user the option of choosing the functions
>> which
>> are better documented: that will be bad for developers that write code
>> which is
>> undocumented, since their code will never be used by somebody else.
>> As a user, I will be skeptical using something where the documentation
>> fields
>> are empty!
>>
>> Sorry, that was long. Hope to hear your feedbacks
>>
>>
>>
>>
>