On Nov 1, 2009, at 9:47 AM, Keren Lasker wrote:
- helper.create_simplified_by_residue needs to be thought about since its current method of asigning radii doesn't make sense for anything other than density based restraints (so it may make sense to move it to em).
And, I should add, when you know residue-residue proximities.
How else do you propose to define radius other than the particles' sphere cover ?
radius of gyration or a radius so that the sphere volume matches that of the k residues match or something that doesn't go all crazy when (depending on the scale) you have beta strands or alpha helices.
Given a molecule interesting geometric aspects include
1) residue locations
2) regions of space occupied by the molecule
3) regions of space free from the protein
4) centers of mass
5) total volume
Each of these is required for different sorts of restraints. For example, EM fitting requires 4 have bounded error, residue proximities requires 1 have bounded error, packing a bunch of molecules to form a complex requires bounds on 5 be accurate and preferable bounds on 2 and 3.
If we are generating a rigid model to approximate a given pdb we should be able to get all of them (the helper.create_simplified() can be trivially modified to do so, but is slow). Given you experience with clustering for finding centers for em-fitting, a faster approach might be to cluster the density and then put spheres at these cluster centers. We can then measure the error for all of the above and increase the number of spheres as needed until the error matches the tolerance passed a the parameter to the function.
I don't see that doing it along the backbone makes any sense after 4ish residues as the set of shapes that those residues occupy can vary too much to be represented by a sphere. And if you are holding the structure rigid (or only letting it change a bit), you don't gain anything from having particles represent consecutive residues (and if it is non-rigid, we will have some serious issues with preventing it from blowing up). Is there something?
One issue that this raises again is that we use radius for several different purposes.
- for proximity detection, we want to know the maximum extents of an object: that is, the size of the space a residue could possibly be in
- for packing we the core set of space that it occupies, which will always be smaller than the maximum extent
We could separate the two, but that would be a reasonably significant amount of work making sure various classes use the right one. But might be worth it. If we do that then
- restraints that force things to be close together (residue-residue proximities for example) could use the extents
- restraints that force things apart (excluded volume) could use the core radius
Then, a simple simplification procedure which
- uses the cover of the residues to produce an extents radius
- uses the volume of the residues produce a core radius
would be pretty OK for most any way one split of the residues. Clustering them would still be better than chopping along the backbone when coarsening a lot.
Does this make sense?