[Coarse grain modeling] a bunch of questions and comments regarding the nup84 cg example - IMP-users

23 Jun 2011


      Hi list,
I am currently considering IMP functionalities for coarse grain modeling, and the nup84 cg example is really helpful, thanks a lot for that ! 
Nonetheless, it raises some questions and remarks that I will detail hereafter. I know the mail is quite big, but I hope it can help with improving not only my comprehension, but also maybe the IMP documentation.
I. Concerning the representation
I discovered the IMP.atom helper functions, that seem to help quite a lot to get rid of many technical details. The exact behavior of each of these function is nevertheless still a bit elusive to me IMP.atom.create_protein, IMP.atom.create_connectivity_restraint, IMP.atom.create_distance_restraint.
1. I think there is a minor bug in the documentation for IMP.atom.create_protein which states that a hierarchy of balls is created, and "The balls are held together by a ConnectivityRestraint with the given spring constant.". The second assertion seems erroneous to me. I would probably also add some details on the hierarchy returned by the first occurrence of  IMP.atom.create_protein, since it differs from the second one (another second hierarchical level seems indeed to be inserted).
2. In the nup84 cg example, the connection between the balls of a same protein is made in 
        h=IMP.atom.create_protein(m, name, resolution, ds)
with a call to
        r=IMP.atom.create_connectivity_restraint([IMP.atom.Selection(c ) for c in h.get_children()],k)
…And after dissection of the function's code I am still wondering what is exactly done.
In my understanding, when applied to a hierarchy returned by IMP.atom.create_protein(), the generated restraint is always created through a ConnectingPairContainer, which connects balls in a tree like structure, and I cannot see how and when this tree is built. 
Plus, I basically was expecting something pretty much simple, such as a distance restraint applied between the successive fragments in the molecule.
3. Nothing very important, just a bit noisy/confusing : in create_protein() sub-function, the leaves variable 
        leaves= IMP.atom.get_leaves(h)
        is never used… So why not just stripping it ?
4. minor bug in the documentation : some occurrences of create_connectivity_restraint() have no mentioned return type.
5. When it comes to inserting inter-molecules restraints, I think I understand the meaning of the two functions : 
add_connectivity_restraint and  add_distance_restraint, but I'd like to be sure of that :
the first one enforces the specified molecules to be somehow connected (technically : consider each molecule as a node in a complete graph, weight each edge (A,B) with the smallest distance computed for a pair of particles belonging to AxB (thanks to KClosePairsPairScore), then computing MST and deriving score (thanks to ConnectivityRestraint).)
the second one merely favors the two molecular hierarchies to be in exact contact (given the two molecules A and B test the closest pair of particles in AxB, and return the score for that pair of particles).
II. Concerning the sampling part
I am not sure to understand how the MCCG sampler works. 
In my understanding, the sampler uses an optimizer to improve a set of initial random solutions, hence generating several putative solutions, or at least not so bad ones (a sample).
In this context, the two lines :
    sampler.set_number_of_conjugate_gradient_steps(100)
    sampler.set_number_of_monte_carlo_steps(50)
Merely control each optimization step, whereas
    sampler.set_number_of_attempts(40)
controls the number of initial (or final ?) retained solutions. Am I correct or at least near enough ?
III. Concerning the analysis part of the example :
1.  The last argument of :
    embed= IMP.statistics.ConfigurationSetXYZEmbedding(cs,
                 IMP.container.ListSingletonContainer(IMP.atom.get_leaves(all)), True)
is not documented.
Even though the name of the variable "bool align=false" is quite suggestive, I have an issue to guess the type of alignment that is considered here. Maybe a simple question can help to leverage my problem : Let's say I have two configurations that can be derived from one another through a simple rotation°translation; does setting this parameter to true help me to have the same embedding for each conformations, and hence classify both in the same cluster ?
2. In the analyze_conformations() function, I think the line
        cs.load_configuration(i)
ought to be replaced by
        cs.load_configuration( cluster.get_cluster_representative(i) )
Thanks a lot for all your work, and efforts.
--Ben