Hi list,
I am currently considering IMP functionalities for coarse grain modeling, and the nup84 cg example is really helpful, thanks a lot for that ! Nonetheless, it raises some questions and remarks that I will detail hereafter. I know the mail is quite big, but I hope it can help with improving not only my comprehension, but also maybe the IMP documentation.
I. Concerning the representation
I discovered the IMP.atom helper functions, that seem to help quite a lot to get rid of many technical details. The exact behavior of each of these function is nevertheless still a bit elusive to me IMP.atom.create_protein, IMP.atom.create_connectivity_restraint, IMP.atom.create_distance_restraint.
1. I think there is a minor bug in the documentation for IMP.atom.create_protein which states that a hierarchy of balls is created, and "The balls are held together by a ConnectivityRestraint with the given spring constant.". The second assertion seems erroneous to me. I would probably also add some details on the hierarchy returned by the first occurrence of IMP.atom.create_protein, since it differs from the second one (another second hierarchical level seems indeed to be inserted).
2. In the nup84 cg example, the connection between the balls of a same protein is made in h=IMP.atom.create_protein(m, name, resolution, ds) with a call to r=IMP.atom.create_connectivity_restraint([IMP.atom.Selection(c ) for c in h.get_children()],k) …And after dissection of the function's code I am still wondering what is exactly done. In my understanding, when applied to a hierarchy returned by IMP.atom.create_protein(), the generated restraint is always created through a ConnectingPairContainer, which connects balls in a tree like structure, and I cannot see how and when this tree is built. Plus, I basically was expecting something pretty much simple, such as a distance restraint applied between the successive fragments in the molecule.
3. Nothing very important, just a bit noisy/confusing : in create_protein() sub-function, the leaves variable leaves= IMP.atom.get_leaves(h) is never used… So why not just stripping it ?
4. minor bug in the documentation : some occurrences of create_connectivity_restraint() have no mentioned return type.
5. When it comes to inserting inter-molecules restraints, I think I understand the meaning of the two functions : add_connectivity_restraint and add_distance_restraint, but I'd like to be sure of that : the first one enforces the specified molecules to be somehow connected (technically : consider each molecule as a node in a complete graph, weight each edge (A,B) with the smallest distance computed for a pair of particles belonging to AxB (thanks to KClosePairsPairScore), then computing MST and deriving score (thanks to ConnectivityRestraint).) the second one merely favors the two molecular hierarchies to be in exact contact (given the two molecules A and B test the closest pair of particles in AxB, and return the score for that pair of particles).
II. Concerning the sampling part
I am not sure to understand how the MCCG sampler works. In my understanding, the sampler uses an optimizer to improve a set of initial random solutions, hence generating several putative solutions, or at least not so bad ones (a sample). In this context, the two lines : sampler.set_number_of_conjugate_gradient_steps(100) sampler.set_number_of_monte_carlo_steps(50) Merely control each optimization step, whereas sampler.set_number_of_attempts(40) controls the number of initial (or final ?) retained solutions. Am I correct or at least near enough ?
III. Concerning the analysis part of the example :
1. The last argument of : embed= IMP.statistics.ConfigurationSetXYZEmbedding(cs, IMP.container.ListSingletonContainer(IMP.atom.get_leaves(all)), True) is not documented. Even though the name of the variable "bool align=false" is quite suggestive, I have an issue to guess the type of alignment that is considered here. Maybe a simple question can help to leverage my problem : Let's say I have two configurations that can be derived from one another through a simple rotation°translation; does setting this parameter to true help me to have the same embedding for each conformations, and hence classify both in the same cluster ?
2. In the analyze_conformations() function, I think the line cs.load_configuration(i) ought to be replaced by cs.load_configuration( cluster.get_cluster_representative(i) )
Thanks a lot for all your work, and efforts.
--Ben
> 1. I think there is a minor bug in the documentation for IMP.atom.create_protein which states that a hierarchy of balls is created, and "The balls are held together by a ConnectivityRestraint with the given spring constant.". The second assertion seems erroneous to me. Indeed, the function changed but not the docs.
> I would probably also add some details on the hierarchy returned by the first occurrence of IMP.atom.create_protein, since it differs from the second one (another second hierarchical level seems indeed to be inserted). I'll take a look. I've be purposefully vague about the internal structure of the hierarchy returned by various methods to increase flexibility, but I'm not sure if that is still worthwhile.
> > 2. In the nup84 cg example, the connection between the balls of a same protein is made in > h=IMP.atom.create_protein(m, name, resolution, ds) > with a call to > r=IMP.atom.create_connectivity_restraint([IMP.atom.Selection(c ) for c in h.get_children()],k) > …And after dissection of the function's code I am still wondering what is exactly done. > In my understanding, when applied to a hierarchy returned by IMP.atom.create_protein(), the generated restraint is always created through a ConnectingPairContainer, which connects balls in a tree like structure, and I cannot see how and when this tree is built. > Plus, I basically was expecting something pretty much simple, such as a distance restraint applied between the successive fragments in the molecule. It is supposed to apply the simplest restraint it can based on what is passed. That is, one of: - distance restraint - kclosepairspairscore based restraint - connected pair container with distance pair score - connectivity restraint
If you have a case where it isn't doing the simplest, let me know,
> > 3. Nothing very important, just a bit noisy/confusing : in create_protein() sub-function, the leaves variable > leaves= IMP.atom.get_leaves(h) > is never used… So why not just stripping it ? I don't see that. Where is it?
> > 4. minor bug in the documentation : some occurrences of create_connectivity_restraint() have no mentioned return type. Where do you see this?
> > 5. When it comes to inserting inter-molecules restraints, I think I understand the meaning of the two functions : > add_connectivity_restraint and add_distance_restraint, but I'd like to be sure of that : > the first one enforces the specified molecules to be somehow connected (technically : consider each molecule as a node in a complete graph, weight each edge (A,B) with the smallest distance computed for a pair of particles belonging to AxB (thanks to KClosePairsPairScore), then computing MST and deriving score (thanks to ConnectivityRestraint).) > the second one merely favors the two molecular hierarchies to be in exact contact (given the two molecules A and B test the closest pair of particles in AxB, and return the score for that pair of particles). In practice they are more or less the same thing (modulo implementation details) when both passed a pair of selections. The names are just different for consistency with other parts of IMP. I'm not sure if that was a good decision.
> > > II. Concerning the sampling part > > I am not sure to understand how the MCCG sampler works. > In my understanding, the sampler uses an optimizer to improve a set of initial random solutions, hence generating several putative solutions, or at least not so bad ones (a sample). > In this context, the two lines : > sampler.set_number_of_conjugate_gradient_steps(100) > sampler.set_number_of_monte_carlo_steps(50) > Merely control each optimization step, whereas > sampler.set_number_of_attempts(40) > controls the number of initial (or final ?) retained solutions. Am I correct or at least near enough ? Basically there are three nested loops: 1) attempts 2) MC steps 3) CG steps
I should add that to the docs to make it clearer.
> > > III. Concerning the analysis part of the example : > > 1. The last argument of : > embed= IMP.statistics.ConfigurationSetXYZEmbedding(cs, > IMP.container.ListSingletonContainer(IMP.atom.get_leaves(all)), True) > is not documented. > Even though the name of the variable "bool align=false" is quite suggestive, I have an issue to guess the type of alignment that is considered here. Maybe a simple question can help to leverage my problem : Let's say I have two configurations that can be derived from one another through a simple rotation°translation; does setting this parameter to true help me to have the same embedding for each conformations, and hence classify both in the same cluster ? It is whether rigid alignment is performed. Currently, this alignment is against the first configuration, which may not be the best option. I'll add a note to the docs.
> > 2. In the analyze_conformations() function, I think the line > cs.load_configuration(i) > ought to be replaced by > cs.load_configuration( cluster.get_cluster_representative(i) ) Yup. Thanks.
Hi Daniel, and thanks for the answers.
> It is supposed to apply the simplest restraint it can based on what is passed. That is, one of: > - distance restraint > - kclosepairspairscore based restraint > - connected pair container with distance pair score > - connectivity restraint
I think I got the technical aspect of the function, but I am still puzzled with the concrete interpretations :)
Consider we have a coarse representation of a protein as a succession of 4 bead-domains, obtained through create_protein() with the provided indexes of the domain limits [0,100,200,320,456]. Somehow I'd like the connectivity to be enforced only between the successive domains… And I have the feeling this is not what is achieved in the nup84 cg example. Here, atom.create_connectivity_restraint() is called on a list of selection objects each resulting in a single particle, hence the usage of a ConnectedPairContainer, whose effect is to create a connection tree (?)… And basically, I have to confess I didn't really understand this specific container behavior neither from the documentation, nor from its code.
>> 3. Nothing very important, just a bit noisy/confusing : in create_protein() sub-function, the leaves variable >> leaves= IMP.atom.get_leaves(h) >> is never used… So why not just stripping it ? > I don't see that. Where is it?
kernel/src/nup84_cg line 28
http://salilab.org/imp/nightly/doc/html/kernel_examples.html nup84 cg example
> >> >> 4. minor bug in the documentation : some occurrences of create_connectivity_restraint() have no mentioned return type. > Where do you see this?
http://salilab.org/imp/nightly/doc/html/namespaceIMP_1_1atom.html#dc57f58d75... for instance, the first occurrence of create_connectivity_restraint reads Restraint* create_connectivity_restraint ( const Selections & s, double x0, double k ) and the next one : IMP::atom::create_connectivity_restraint ( const Selections & s, double k )
The same behavior seem to happen for each polymorphic function.
>> III. Concerning the analysis part of the example : >> >> 1. The last argument of : >> embed= IMP.statistics.ConfigurationSetXYZEmbedding(cs, >> IMP.container.ListSingletonContainer(IMP.atom.get_leaves(all)), True) >> is not documented. >> Even though the name of the variable "bool align=false" is quite suggestive, I have an issue to guess the type of alignment that is considered here. Maybe a simple question can help to leverage my problem : Let's say I have two configurations that can be derived from one another through a simple rotation°translation; does setting this parameter to true help me to have the same embedding for each conformations, and hence classify both in the same cluster ? > It is whether rigid alignment is performed. Currently, this alignment is against the first configuration, which may not be the best option. I'll add a note to the docs.
OK… Let me try to put it right : 1. With align set to True, prior to their embbeding in dimension 3N, all configurations (comprising N particles in dimension 3) are firstly aligned on configuration0. 2. I guess the alignment is "merely" an RMSD minimization
And add a few questions : 1. Based on my experiments it seems this alignment does not impact the configurations, I mean the rigid transformations is only applied to the embeddings and not to the configurations themselves. Correct ? Is there a way to retrieve the applied transformations, or a way to have them applied to the configurations too ? 2. Are there any IMP functionalities to perform configurations or model alignments ?
Thanks for your precious help
--Ben
On Jun 24, 2011, at 5:04 AM, Benjamin SCHWARZ wrote:
> Hi Daniel, and thanks for the answers. > >> It is supposed to apply the simplest restraint it can based on what is passed. That is, one of: >> - distance restraint >> - kclosepairspairscore based restraint >> - connected pair container with distance pair score >> - connectivity restraint > > I think I got the technical aspect of the function, but I am still puzzled with the concrete interpretations :) > > Consider we have a coarse representation of a protein as a succession of 4 bead-domains, obtained through create_protein() with the provided indexes of the domain limits [0,100,200,320,456]. Somehow I'd like the connectivity to be enforced only between the successive domains… And I have the feeling this is not what is achieved in the nup84 cg example. Here, atom.create_connectivity_restraint() is called on a list of selection objects each resulting in a single particle, hence the usage of a ConnectedPairContainer, whose effect is to create a connection tree (?)… And basically, I have to confess I didn't really understand this specific container behavior neither from the documentation, nor from its code. If you just want each successive pair to be connected, just add a distance restraint for each successive pair. You can do this in various ways, probably the simplest of which is to list the pairs and create a pairs restraint with that list of pairs and a HarmonicSphereDistancePairScore with a distance of 0.
>>> 3. Nothing very important, just a bit noisy/confusing : in create_protein() sub-function, the leaves variable >>> leaves= IMP.atom.get_leaves(h) >>> is never used… So why not just stripping it ? >> I don't see that. Where is it? > > kernel/src/nup84_cg line 28 > > http://salilab.org/imp/nightly/doc/html/kernel_examples.html > nup84 cg exampl Indeed, thanks. I'm not sure how I missed that.
>> >>> >>> 4. minor bug in the documentation : some occurrences of create_connectivity_restraint() have no mentioned return type. >> Where do you see this? > > http://salilab.org/imp/nightly/doc/html/namespaceIMP_1_1atom.html#dc57f58d75... > for instance, the first occurrence of create_connectivity_restraint reads > Restraint* create_connectivity_restraint ( const Selections & s, double x0, double k ) > and the next one : > IMP::atom::create_connectivity_restraint ( const Selections & s, double k ) > > The same behavior seem to happen for each polymorphic function. Odd, Thanks for pointing it out. I'll look in to it. Thanks.
> OK… Let me try to put it right : > 1. With align set to True, prior to their embbeding in dimension 3N, all configurations (comprising N particles in dimension 3) are firstly aligned on configuration0. > 2. I guess the alignment is "merely" an RMSD minimization Yes, rigid, RMSD minimization.
> > And add a few questions : > 1. Based on my experiments it seems this alignment does not impact the configurations, I mean the rigid transformations is only applied to the embeddings and not to the configurations themselves. Correct ? Is there a way to retrieve the applied transformations, or a way to have them applied to the configurations too ? Good point. I'll add a method to get them back out.
> 2. Are there any IMP functionalities to perform configurations or model alignments ? There is functionality to perform alignment on sets of particles and points: get_transformation_aligning_first_to_second. For technical reasons, it is in IMP.core when using python and IMP.algebra when using C++. I'll bet there is a way I can insert the python functions into IMP.algebra so that it is symmetric. I'll look in to that.