The helper module contains various atom.Hierarchy-specific code (get_bounding_box, clone, destroy, and the functions for creating simplified versions). Externally, this doesn't make much sense, but code is there since some of it internally uses em for computations. In addition, Elina's new code for simplified restraint management forms a nice, coherent set of functionality which could take over the helper module by itself. As a result, I propose
- helper.get_bound_box, helper.clone and helper.destroy be moved to atom.
- the helper.create_simplified function be renamed em.create_simplified_from_density putting it in the em module and making it more clear why it is there (there current name is too vague anyway). It should also create the rigid bodies it needs internally.
- helper.create_simplified_by_residue needs to be thought about since its current method of asigning radii doesn't make sense for anything other than density based restraints (so it may make sense to move it to em).
I don't think any of the code is too widely used, so changing things now should be minimally disruptive. Thoughts?
> > > > - helper.create_simplified_by_residue needs to be thought about > since its current method of asigning radii doesn't make sense for > anything other than density based restraints (so it may make sense > to move it to em). >
How else do you propose to define radius other than the particles' sphere cover ?
On Nov 1, 2009, at 9:47 AM, Keren Lasker wrote:
> >> >> >> >> - helper.create_simplified_by_residue needs to be thought about >> since its current method of asigning radii doesn't make sense for >> anything other than density based restraints (so it may make sense >> to move it to em). And, I should add, when you know residue-residue proximities.
>> > > How else do you propose to define radius other than the particles' > sphere cover ? radius of gyration or a radius so that the sphere volume matches that of the k residues match or something that doesn't go all crazy when (depending on the scale) you have beta strands or alpha helices.
Given a molecule interesting geometric aspects include 1) residue locations 2) regions of space occupied by the molecule 3) regions of space free from the protein 4) centers of mass 5) total volume
Each of these is required for different sorts of restraints. For example, EM fitting requires 4 have bounded error, residue proximities requires 1 have bounded error, packing a bunch of molecules to form a complex requires bounds on 5 be accurate and preferable bounds on 2 and 3.
If we are generating a rigid model to approximate a given pdb we should be able to get all of them (the helper.create_simplified() can be trivially modified to do so, but is slow). Given you experience with clustering for finding centers for em-fitting, a faster approach might be to cluster the density and then put spheres at these cluster centers. We can then measure the error for all of the above and increase the number of spheres as needed until the error matches the tolerance passed a the parameter to the function.
I don't see that doing it along the backbone makes any sense after 4ish residues as the set of shapes that those residues occupy can vary too much to be represented by a sphere. And if you are holding the structure rigid (or only letting it change a bit), you don't gain anything from having particles represent consecutive residues (and if it is non-rigid, we will have some serious issues with preventing it from blowing up). Is there something?
One issue that this raises again is that we use radius for several different purposes. - for proximity detection, we want to know the maximum extents of an object: that is, the size of the space a residue could possibly be in - for packing we the core set of space that it occupies, which will always be smaller than the maximum extent
We could separate the two, but that would be a reasonably significant amount of work making sure various classes use the right one. But might be worth it. If we do that then - restraints that force things to be close together (residue-residue proximities for example) could use the extents - restraints that force things apart (excluded volume) could use the core radius
Then, a simple simplification procedure which - uses the cover of the residues to produce an extents radius - uses the volume of the residues produce a core radius would be pretty OK for most any way one split of the residues. Clustering them would still be better than chopping along the backbone when coarsening a lot.
Does this make sense?
On Nov 1, 2009, at 11:16 AM, Daniel Russel wrote:
> > On Nov 1, 2009, at 9:47 AM, Keren Lasker wrote: > >> >>> >>> >>> >>> - helper.create_simplified_by_residue needs to be thought about >>> since its current method of asigning radii doesn't make sense for >>> anything other than density based restraints (so it may make sense >>> to move it to em). > And, I should add, when you know residue-residue proximities. > >>> >> >> How else do you propose to define radius other than the particles' >> sphere cover ? > radius of gyration or a radius so that the sphere volume matches > that of the k residues match or something that doesn't go all crazy > when (depending on the scale) you have beta strands or alpha helices. > > Given a molecule interesting geometric aspects include > 1) residue locations > 2) regions of space occupied by the molecule > 3) regions of space free from the protein > 4) centers of mass > 5) total volume > > Each of these is required for different sorts of restraints. For > example, EM fitting requires 4 have bounded error, residue > proximities requires 1 have bounded error, packing a bunch of > molecules to form a complex requires bounds on 5 be accurate and > preferable bounds on 2 and 3. > > If we are generating a rigid model to approximate a given pdb we > should be able to get all of them (the helper.create_simplified() > can be trivially modified to do so, but is slow). Given you > experience with clustering for finding centers for em-fitting, a > faster approach might be to cluster the density and then put spheres > at these cluster centers. We can then measure the error for all of > the above and increase the number of spheres as needed until the > error matches the tolerance passed a the parameter to the function. > > I don't see that doing it along the backbone makes any sense after > 4ish residues as the set of shapes that those residues occupy can > vary too much to be represented by a sphere. And if you are holding > the structure rigid (or only letting it change a bit), you don't > gain anything from having particles represent consecutive residues > (and if it is non-rigid, we will have some serious issues with > preventing it from blowing up). Is there something? > > One issue that this raises again is that we use radius for several > different purposes. > - for proximity detection, we want to know the maximum extents of an > object: that is, the size of the space a residue could possibly be in > - for packing we the core set of space that it occupies, which will > always be smaller than the maximum extent > > We could separate the two, but that would be a reasonably > significant amount of work making sure various classes use the right > one. But might be worth it. If we do that then > - restraints that force things to be close together (residue-residue > proximities for example) could use the extents > - restraints that force things apart (excluded volume) could use > the core radius > > Then, a simple simplification procedure which > - uses the cover of the residues to produce an extents radius > - uses the volume of the residues produce a core radius > would be pretty OK for most any way one split of the residues. > Clustering them would still be better than chopping along the > backbone when coarsening a lot. > > Does this make sense? > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
daniel - sorry - this is an empty reply - pressed send my mistake. still figuring out how to summaries my thoughts on that .... ;) On Nov 1, 2009, at 1:43 PM, Keren Lasker wrote:
> > On Nov 1, 2009, at 11:16 AM, Daniel Russel wrote: > >> >> On Nov 1, 2009, at 9:47 AM, Keren Lasker wrote: >> >>> >>>> >>>> >>>> >>>> - helper.create_simplified_by_residue needs to be thought about >>>> since its current method of asigning radii doesn't make sense for >>>> anything other than density based restraints (so it may make >>>> sense to move it to em). >> And, I should add, when you know residue-residue proximities. >> >>>> >>> >>> How else do you propose to define radius other than the particles' >>> sphere cover ? >> radius of gyration or a radius so that the sphere volume matches >> that of the k residues match or something that doesn't go all crazy >> when (depending on the scale) you have beta strands or alpha helices. >> >> Given a molecule interesting geometric aspects include >> 1) residue locations >> 2) regions of space occupied by the molecule >> 3) regions of space free from the protein >> 4) centers of mass >> 5) total volume >> >> Each of these is required for different sorts of restraints. For >> example, EM fitting requires 4 have bounded error, residue >> proximities requires 1 have bounded error, packing a bunch of >> molecules to form a complex requires bounds on 5 be accurate and >> preferable bounds on 2 and 3. >> >> If we are generating a rigid model to approximate a given pdb we >> should be able to get all of them (the helper.create_simplified() >> can be trivially modified to do so, but is slow). Given you >> experience with clustering for finding centers for em-fitting, a >> faster approach might be to cluster the density and then put >> spheres at these cluster centers. We can then measure the error for >> all of the above and increase the number of spheres as needed until >> the error matches the tolerance passed a the parameter to the >> function. >> >> I don't see that doing it along the backbone makes any sense after >> 4ish residues as the set of shapes that those residues occupy can >> vary too much to be represented by a sphere. And if you are holding >> the structure rigid (or only letting it change a bit), you don't >> gain anything from having particles represent consecutive residues >> (and if it is non-rigid, we will have some serious issues with >> preventing it from blowing up). Is there something? >> >> One issue that this raises again is that we use radius for several >> different purposes. >> - for proximity detection, we want to know the maximum extents of >> an object: that is, the size of the space a residue could possibly >> be in >> - for packing we the core set of space that it occupies, which will >> always be smaller than the maximum extent >> >> We could separate the two, but that would be a reasonably >> significant amount of work making sure various classes use the >> right one. But might be worth it. If we do that then >> - restraints that force things to be close together (residue- >> residue proximities for example) could use the extents >> - restraints that force things apart (excluded volume) could use >> the core radius >> >> Then, a simple simplification procedure which >> - uses the cover of the residues to produce an extents radius >> - uses the volume of the residues produce a core radius >> would be pretty OK for most any way one split of the residues. >> Clustering them would still be better than chopping along the >> backbone when coarsening a lot. >> >> Does this make sense? >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
Daniel,
I think we should separate the discussion for fine coarsening ( up to 5 residues) coarse coarsening ( more than 5 residues).
For fine coarsening I think the helper function is fine and most restraints would work will well with it. For coarse coarsening I agree that a covering sphere is not the best solution and that clustering based techniques ( such as GMM used in MultiFit) is better. I tested a few cases today and indeed GMM is able to overcome outliers ( such as helices) and generate better spheres. Once I will migrate my code into IMP we can use GMM as a helper function to generate low resolution representations of proteins.
Coarse coarsening, on the other hand, is needed mostly for flexible chain representation for which we can not apply either of these methods, as we do not have a structure, and so I think that for now relaxing the restraints is sufficient. Anyway - representation is a work in progress, so lets first finish testing various alternatives before imposing solutions in IMP. As for moving the residue based one from helper to em, I do not think it is necessary as it will be used by others for fine coarsening ( from a discussion I had today with Hao and Jeremy) - but as long as the function is there - I care less on its exact position :)
Keren. On Nov 1, 2009, at 11:16 AM, Daniel Russel wrote:
> > On Nov 1, 2009, at 9:47 AM, Keren Lasker wrote: > >> >>> >>> >>> >>> - helper.create_simplified_by_residue needs to be thought about >>> since its current method of asigning radii doesn't make sense for >>> anything other than density based restraints (so it may make sense >>> to move it to em). > And, I should add, when you know residue-residue proximities. > >>> >> >> How else do you propose to define radius other than the particles' >> sphere cover ? > radius of gyration or a radius so that the sphere volume matches > that of the k residues match or something that doesn't go all crazy > when (depending on the scale) you have beta strands or alpha helices. > > Given a molecule interesting geometric aspects include > 1) residue locations > 2) regions of space occupied by the molecule > 3) regions of space free from the protein > 4) centers of mass > 5) total volume > > Each of these is required for different sorts of restraints. For > example, EM fitting requires 4 have bounded error, residue > proximities requires 1 have bounded error, packing a bunch of > molecules to form a complex requires bounds on 5 be accurate and > preferable bounds on 2 and 3. > > If we are generating a rigid model to approximate a given pdb we > should be able to get all of them (the helper.create_simplified() > can be trivially modified to do so, but is slow). Given you > experience with clustering for finding centers for em-fitting, a > faster approach might be to cluster the density and then put spheres > at these cluster centers. We can then measure the error for all of > the above and increase the number of spheres as needed until the > error matches the tolerance passed a the parameter to the function. > > I don't see that doing it along the backbone makes any sense after > 4ish residues as the set of shapes that those residues occupy can > vary too much to be represented by a sphere. And if you are holding > the structure rigid (or only letting it change a bit), you don't > gain anything from having particles represent consecutive residues > (and if it is non-rigid, we will have some serious issues with > preventing it from blowing up). Is there something? > > One issue that this raises again is that we use radius for several > different purposes. > - for proximity detection, we want to know the maximum extents of an > object: that is, the size of the space a residue could possibly be in > - for packing we the core set of space that it occupies, which will > always be smaller than the maximum extent > > We could separate the two, but that would be a reasonably > significant amount of work making sure various classes use the right > one. But might be worth it. If we do that then > - restraints that force things to be close together (residue-residue > proximities for example) could use the extents > - restraints that force things apart (excluded volume) could use > the core radius > > Then, a simple simplification procedure which > - uses the cover of the residues to produce an extents radius > - uses the volume of the residues produce a core radius > would be pretty OK for most any way one split of the residues. > Clustering them would still be better than chopping along the > backbone when coarsening a lot. > > Does this make sense? > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
> I think we should separate the discussion for > fine coarsening ( up to 5 residues) > coarse coarsening ( more than 5 residues). > > For fine coarsening I think the helper function is fine and most > restraints would work will well with it. I still have the question of why bother keeping consecutive residues together? As far as I can tell, it produces uniformly worse results than allowing them to be separate. Unless there is some advantage, it isn't something that should be there.
> Coarse coarsening, on the other hand, is needed mostly for flexible > chain representation for which we can not apply either of these > methods, as we do not have a structure, and so I think that for now > relaxing the restraints is sufficient. It also makes sense in the case where you have structures of some of the components, but not others and just want to preserve overall shape.
On Nov 2, 2009, at 6:10 AM, Daniel Russel wrote:
>> I think we should separate the discussion for >> fine coarsening ( up to 5 residues) >> coarse coarsening ( more than 5 residues). >> >> For fine coarsening I think the helper function is fine and most >> restraints would work well with it. > I still have the question of why bother keeping consecutive residues > together? As far as I can tell, it produces uniformly worse results > than allowing them to be separate. Unless there is some advantage, > it isn't something that should be there. >
This is a way of accelerating the optimization. We can benchmark your updated excluded volume restraint for example to see how well it preforms with large assemblies - lets look at it today together - sounds good ?
On Nov 2, 2009, at 6:28 AM, Keren Lasker wrote:
> > On Nov 2, 2009, at 6:10 AM, Daniel Russel wrote: > >>> I think we should separate the discussion for >>> fine coarsening ( up to 5 residues) >>> coarse coarsening ( more than 5 residues). >>> >>> For fine coarsening I think the helper function is fine and most >>> restraints would work well with it. >> I still have the question of why bother keeping consecutive >> residues together? As far as I can tell, it produces uniformly >> worse results than allowing them to be separate. Unless there is >> some advantage, it isn't something that should be there. >> > > This is a way of accelerating the optimization. We can benchmark > your updated excluded volume restraint for example to see how well > it preforms with large assemblies - lets look at it today together - > sounds good ? That isn't the question I have. Clearly fewer particles makes things faster :-) My question is: - We have a function which guarantees that consecutive residues are kept together along the backbone. By providing such a guarantee, it limits the set of simplified structures that it can produce
- The more limited set is worse than a set not constrained by that guarantee under various various conditions and metrics discussed before
- If one has a group of residues that really need to be kept together, it is easy enough to simplify them separately from the other residues.
- As far as I can tell, the limited set is not better under any metrics/conditions that we care about. If this is the case, then we shouldn't have a function which simplifies along the backbone. And if this is not the case, I'm wondering when it is not :-)
So the question is when is it useful to someone to guarantee that consecutive residues are kept together?
participants (2)
-
Daniel Russel
-
Keren Lasker