Failure rate of restraints

older
has anyone written a heavy atom...

Daniel Russel

8 Jun 2009 8 Jun '09

11:23 a.m.

With Keren, we ran into problem caused the fact that em restraints go to 1 as they are increasingly violated while hard sphere restraints go to larger numbers (and are dependent on the number of particles). The result was that the em restraint would get more or less ignored by the optimizer unless a large scaling factor was added to EM. This experience raises an important question: Do we want a convention about how restraint scores behave as they are increasingly violated? A convention could be something like: - all satisfiable restraints (ones whose optimal is at 0) should have scores that go as x^2 (or just as x) per atom where x is the displacement is angstroms from a satisfying conformation. - a restraint score can stop increasing at a value where there is no longer any signal.

Obviously, one could only very roughly approximate this in practice, but a rough approximation might be good enough to get many of the benifits

Advantages: - You have a natural starting point for weight values between the various restraints. Variations from this are directly upweighting or downweighting data relative to them having equal importance. - The relative importance of restraints does not depend on how far from optimal you are - The relative importance of restraints does not depend on how many particles they are restraining - The importance of various terms won't depend on the scale of the representation

Feasibility: - core restraints such as Diameter and stuff would require using a per-particle scaling factor (to incorporate the number of atoms). Excluded volume is approximately correct too given such a scaling (at least if the model isn't too messed up) - EM would require adding a term to penalize things for escaping far from the map as well as some experiments on how things break down under random displacements of various amounts (and rescaling of cross correlation accordingly) and scaling by the number of particles. Such could easily be added as a wrapper around a CC based restraint. And understanding of those issues seems important anyway. - others?

Attachments:

attachment.htm (text/html — 2.2 KB)

Show replies by date

Ben Webb

8 Jun 8 Jun

11:35 a.m.

Daniel Russel wrote: > With Keren, we ran into problem caused the fact that em restraints go to > 1 as they are increasingly violated while hard sphere restraints go to > larger numbers (and are dependent on the number of particles). The > result was that the em restraint would get more or less ignored by the > optimizer unless a large scaling factor was added to EM.

This is just basic physics - most restraints are extensive, in that they scale with the system size (e.g. excluded volume, stereochemistry). Any CC-like restraint is intensive, and doesn't scale with system size. Obviously that doesn't work when you combine the two.

For EM we solved this years ago with a scaling factor. Ideally the scale would simply be N^2 where N is the number of particles in the system.

I don't much like the idea of scaling "regular" restraints by the number of atoms or similar, since that would break pretty much everything else where the assumption is made that the sum of the restraints is a score that can be safely minimized (this assumes that the score does increase as you add atoms, and restraints on multiple atoms should have more weight than those on just pairs). But for intensive restraints like EM it would certainly make sense to automatically weight them in a sensible way so optimization works correctly.

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Daniel Russel

11:59 a.m.

> This is just basic physics - most restraints are extensive, in that > they > scale with the system size (e.g. excluded volume, stereochemistry). > Any > CC-like restraint is intensive, and doesn't scale with system size. I wouldn't say it has anything to do with physics, but yes, it is obvious once you look at it :-)

> Obviously that doesn't work when you combine the two. The challenge is either making sure people pay attention to to the issue early on or making it go away as a problem so IMP developers don't have to help them individually :-)

> For EM we solved this years ago with a scaling factor. Ideally the > scale > would simply be N^2 where N is the number of particles in the system. Why quadratic rather than linear?

> I don't much like the idea of scaling "regular" restraints by the > number > of atoms or similar, since that would break pretty much everything > else > where the assumption is made that the sum of the restraints is a score > that can be safely minimized I don't see that being able to minimize the score cares about whether it increases as you add atoms - for most cases the number number of atoms is constant or otherwise not interesting (ie if you are docking proteins you don't want a larger protein to automatically score worse) - and if you care about minimizing the number of atoms, you can always add that as a term in your scoring function

> (this assumes that the score does increase > as you add atoms, and restraints on multiple atoms should have more > weight than those on just pairs). Currently the weight scales with the number of particles rather than the number of atoms. The convention I proposed would make it scale with the number of atoms.

A key invariant is that changing the resolution of the representation should not change things too much.

Ben Webb

12:41 p.m.

Daniel Russel wrote: >> For EM we solved this years ago with a scaling factor. Ideally the scale >> would simply be N^2 where N is the number of particles in the system. > Why quadratic rather than linear?

Most physics-based scores are interaction energies between pairs of particles. But not all of course, otherwise this would be a solved problem already.

> I don't see that being able to minimize the score cares about whether it > increases as you add atoms

That's not my point. The point is that physics-based forcefields are balanced that way (and making up a new forcefield is a decades-long endeavor). Rescaling all the terms is not likely to give correct behavior. I'm not talking about Grand Canonical type approaches here, although they should certainly be considered too.

> A key invariant is that changing the resolution of the representation > should not change things too much.

That certainly makes sense.

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Daniel Russel

12:57 p.m.

>> Why quadratic rather than linear? > > Most physics-based scores are interaction energies between pairs of > particles. But not all of course, otherwise this would be a solved > problem already. Sure, but for what we do (namely, not gravitation), the number of pairs scales linearly with the number of atoms rather than quadratically (since we have terms with finite cutoffs and packing constraints).

> >> I don't see that being able to minimize the score cares about >> whether it >> increases as you add atoms > > That's not my point. The point is that physics-based forcefields are > balanced that way (and making up a new forcefield is a decades-long > endeavor). Rescaling all the terms is not likely to give correct > behavior. Rescaling a physics forcefield is harmless if all you are interesting in doing is preserving minima. That said, looking like existing physics force fields is a reasonable criteria. But that requires that the other terms scale with the number of atoms too (since all of the force fields have finite cutoffs).

Ben Webb

1:10 p.m.

Daniel Russel wrote: >> Most physics-based scores are interaction energies between pairs of >> particles. But not all of course, otherwise this would be a solved >> problem already. > Sure, but for what we do (namely, not gravitation), the number of pairs > scales linearly with the number of atoms rather than quadratically > (since we have terms with finite cutoffs and packing constraints).

That is not true for Modeller-style homology-derived restraints, as one example.

> Rescaling a physics forcefield is harmless if all you are interesting in > doing is preserving minima.

Of course, but rescaling different parts of the forcefield by different amounts (e.g. bond terms vs. torsions, since the latter act on twice as many atoms) will really break things, and that was what I read your proposal as.

> That said, looking like existing physics > force fields is a reasonable criteria. But that requires that the other > terms scale with the number of atoms too (since all of the force fields > have finite cutoffs).

Molecular mechanics people have worked with such nonbonded interactions in their forcefields for many years: the effects of such cutoffs on the energies and dynamics are well understood. I don't think the same could be said for a rescaled term. This is why I suggest rescaling terms such as EM and SAXS rather than sterics and nonbonds.

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Daniel Russel

1:23 p.m.

> > Of course, but rescaling different parts of the forcefield by > different > amounts (e.g. bond terms vs. torsions, since the latter act on twice > as > many atoms) will really break things, and that was what I read your > proposal as. Sorry if I was unclear. I would consider a FF a single restraint as it doesn't make sense to mix and match the terms (the fact that it is implemented using other things is an internal detail). I also explicitly excluded them since they are not satisfiable restraints (there is no sense of fit/not fit with a raw force field). Further, you can't really use a FF far from a good configuration and when you are close to optimum, the scaling issues don't matter so much (since all the satisfiable restraints must be near 0 anyway).

> Molecular mechanics people have worked with such nonbonded > interactions > in their forcefields for many years: the effects of such cutoffs on > the > energies and dynamics are well understood. I don't think the same > could > be said for a rescaled term. This is why I suggest rescaling terms > such > as EM and SAXS rather than sterics and nonbonds. As far as behavior is concerned, it doesn't make any difference whether you rescale one or the other as either one can be used to give the same shaped energy landscape.

Keren Lasker

1:28 p.m.

If one choose the rescaling option - from others experience, should the derivates be scaled the same as the score ? I suspect not as for example X^2 becomes 2x.

Are there any insights from previous projects?

Keren.

Ben Webb

1:30 p.m.

Keren Lasker wrote: > If one choose the rescaling option - from others experience, should the > derivates be scaled the same as the score ?

Yes.

> I suspect not as for example X^2 becomes 2x.

That's only the case if you're taking the partial derivative with respect to X, and X here is "number of particles" or something similar, not the X coordinate.

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Keren Lasker

1:33 p.m.

makes sense - thank you. On Jun 8, 2009, at 1:30 PM, Ben Webb wrote:

> Keren Lasker wrote: >> If one choose the rescaling option - from others experience, should >> the >> derivates be scaled the same as the score ? > > Yes. > >> I suspect not as for example X^2 becomes 2x. > > That's only the case if you're taking the partial derivative with > respect to X, and X here is "number of particles" or something > similar, > not the X coordinate. > > Ben > -- > ben@salilab.org http://salilab.org/~ben/ > "It is a capital mistake to theorize before one has data." > - Sir Arthur Conan Doyle > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Friedrich Foerster

1:30 p.m.

the scaling factor will not be differentiated. at least, i don't.

On Mon, Jun 8, 2009 at 10:28 PM, Keren Lasker kerenl@salilab.org wrote: > If one choose the rescaling option - from others experience, should the > derivates be scaled the same as the score ? > I suspect not as for example X^2 becomes 2x. > > Are there any insights from previous projects? > > Keren. > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev > >

-- -- Dr. Friedrich Foerster Max-Planck Institut fuer Biochemie Am Klopferspitz 18 D-82152 Martinsried Tel: +49 89 8578 2651 Fax: +49 89 8578 2641 foerster@biochem.mpg.de www.tomotronic.org

Friedrich Foerster

1:29 p.m.

if you really want to provide a solution making most people happy i'd suggest learning from x-ray crystallography. there restraints are commonly scaled by doing a number of optimizations to get an estimate for the scaling. a similar solution would be greatly appreciated, but it is considerable amount of work.

i am pretty sure that any default scaling would by far be insufficient for most cases, not to speak of 80 %. already ben's example with the homology-based restraints should make it obvious that a generic scaling factor is probably impossible to derive in a rather heuristic framework as imp or modeller.

therefore, if you really intend to make a one-fits-all solution i'd advocate for a serious effort in analogy to established x-ray protocols. fudge solutions won't buy anything, i predict. e.g., what about the resolution of em maps. just to name one problem ...

cheers

frido

On Mon, Jun 8, 2009 at 10:10 PM, Ben Webb ben@salilab.org wrote: > Daniel Russel wrote: >>> Most physics-based scores are interaction energies between pairs of >>> particles. But not all of course, otherwise this would be a solved >>> problem already. >> Sure, but for what we do (namely, not gravitation), the number of pairs >> scales linearly with the number of atoms rather than quadratically >> (since we have terms with finite cutoffs and packing constraints). > > That is not true for Modeller-style homology-derived restraints, as one > example. > >> Rescaling a physics forcefield is harmless if all you are interesting in >> doing is preserving minima. > > Of course, but rescaling different parts of the forcefield by different > amounts (e.g. bond terms vs. torsions, since the latter act on twice as > many atoms) will really break things, and that was what I read your > proposal as. > >> That said, looking like existing physics >> force fields is a reasonable criteria. But that requires that the other >> terms scale with the number of atoms too (since all of the force fields >> have finite cutoffs). > > Molecular mechanics people have worked with such nonbonded interactions > in their forcefields for many years: the effects of such cutoffs on the > energies and dynamics are well understood. I don't think the same could > be said for a rescaled term. This is why I suggest rescaling terms such > as EM and SAXS rather than sterics and nonbonds. > > Ben > -- > ben@salilab.org http://salilab.org/~ben/ > "It is a capital mistake to theorize before one has data." > - Sir Arthur Conan Doyle > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev > >

-- -- Dr. Friedrich Foerster Max-Planck Institut fuer Biochemie Am Klopferspitz 18 D-82152 Martinsried Tel: +49 89 8578 2651 Fax: +49 89 8578 2641 foerster@biochem.mpg.de www.tomotronic.org

Daniel Russel

1:49 p.m.

Just as a clarification, the point is not to have a be all and end all solution, it is to provide a more reasonable starting point and make it clear that there is a scaling issue. In addition, establishing a convention about how the error scales with the number of atoms means that the scaling will have to be changed less after modifications to the representation (local optimization, rather than a global one :-)

The fact that some restraints types (especially ones which are not near a minimum of the restraint at the minimum of the whole function) are difficult isn't really relevant IF some restraints are easy. We can provide functions/guidelines to handle the easy cases serving the dual purpose of automating what can be automated and providing a reminder of what cannot. And, what is easy for us, is not necessarily easy for others. Whether some are easy outside of diameter and excluded volume is still under contention though :-)

On Jun 8, 2009, at 1:29 PM, Friedrich Foerster wrote:

> if you really want to provide a solution making most people happy i'd > suggest learning from x-ray crystallography. there restraints are > commonly scaled by doing a number of optimizations to get an estimate > for the scaling. a similar solution would be greatly appreciated, but > it is considerable amount of work. > > i am pretty sure that any default scaling would by far be insufficient > for most cases, not to speak of 80 %. already ben's example with the > homology-based restraints should make it obvious that a generic > scaling factor is probably impossible to derive in a rather heuristic > framework as imp or modeller.

> > therefore, if you really intend to make a one-fits-all solution i'd > advocate for a serious effort in analogy to established x-ray > protocols. fudge solutions won't buy anything, i predict. e.g., what > about the resolution of em maps. just to name one problem ... > > cheers > > frido > > On Mon, Jun 8, 2009 at 10:10 PM, Ben Webb ben@salilab.org wrote: >> Daniel Russel wrote: >>>> Most physics-based scores are interaction energies between pairs of >>>> particles. But not all of course, otherwise this would be a solved >>>> problem already. >>> Sure, but for what we do (namely, not gravitation), the number of >>> pairs >>> scales linearly with the number of atoms rather than quadratically >>> (since we have terms with finite cutoffs and packing constraints). >> >> That is not true for Modeller-style homology-derived restraints, as >> one >> example. >> >>> Rescaling a physics forcefield is harmless if all you are >>> interesting in >>> doing is preserving minima. >> >> Of course, but rescaling different parts of the forcefield by >> different >> amounts (e.g. bond terms vs. torsions, since the latter act on >> twice as >> many atoms) will really break things, and that was what I read your >> proposal as. >> >>> That said, looking like existing physics >>> force fields is a reasonable criteria. But that requires that the >>> other >>> terms scale with the number of atoms too (since all of the force >>> fields >>> have finite cutoffs). >> >> Molecular mechanics people have worked with such nonbonded >> interactions >> in their forcefields for many years: the effects of such cutoffs on >> the >> energies and dynamics are well understood. I don't think the same >> could >> be said for a rescaled term. This is why I suggest rescaling terms >> such >> as EM and SAXS rather than sterics and nonbonds. >> >> Ben >> -- >> ben@salilab.org http://salilab.org/~ben/ >> "It is a capital mistake to theorize before one has data." >> - Sir Arthur Conan Doyle >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev >> >> > > > > -- > -- > > Dr. Friedrich Foerster > Max-Planck Institut fuer Biochemie > Am Klopferspitz 18 > D-82152 Martinsried > > Tel: +49 89 8578 2651 > Fax: +49 89 8578 2641 > > foerster@biochem.mpg.de > > www.tomotronic.org > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Friedrich Foerster

12:56 p.m.

funny how always the same questions re-occur ... anyways, in theory i guess we all agreed long time ago that isd (inferential structure determination) is the way to go. in practice, not much has happened in that direction other than mike drawing spheres.

in my eyes, a theoretically sound scale roughly by the number of voxels, which is the amount of information. a fudge solution would probably be a scale ~N, which i also did for the saxs restraint. however, as the solution of scaling different kinds of restraints is not really solved anyways, i do not quite see the point of putting a pre-fixed solution, as it pretends to have found a solution for something where it has actually not been found.

cheers

frido

On Mon, Jun 8, 2009 at 8:59 PM, Daniel Russel drussel@gmail.com wrote: > >> This is just basic physics - most restraints are extensive, in that they >> scale with the system size (e.g. excluded volume, stereochemistry). Any >> CC-like restraint is intensive, and doesn't scale with system size. > > I wouldn't say it has anything to do with physics, but yes, it is obvious > once you look at it :-) > >> Obviously that doesn't work when you combine the two. > > The challenge is either making sure people pay attention to to the issue > early on or making it go away as a problem so IMP developers don't have to > help them individually :-) > > >> For EM we solved this years ago with a scaling factor. Ideally the scale >> would simply be N^2 where N is the number of particles in the system. > > Why quadratic rather than linear? > >> I don't much like the idea of scaling "regular" restraints by the number >> of atoms or similar, since that would break pretty much everything else >> where the assumption is made that the sum of the restraints is a score >> that can be safely minimized > > I don't see that being able to minimize the score cares about whether it > increases as you add atoms > - for most cases the number number of atoms is constant or otherwise not > interesting (ie if you are docking proteins you don't want a larger protein > to automatically score worse) > - and if you care about minimizing the number of atoms, you can always add > that as a term in your scoring function > >> (this assumes that the score does increase >> as you add atoms, and restraints on multiple atoms should have more >> weight than those on just pairs). > > Currently the weight scales with the number of particles rather than the > number of atoms. The convention I proposed would make it scale with the > number of atoms. > > A key invariant is that changing the resolution of the representation should > not change things too much. > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev > >

-- -- Dr. Friedrich Foerster Max-Planck Institut fuer Biochemie Am Klopferspitz 18 D-82152 Martinsried Tel: +49 89 8578 2651 Fax: +49 89 8578 2641 foerster@biochem.mpg.de www.tomotronic.org

Daniel Russel

1:09 p.m.

> in my eyes, a theoretically sound scale roughly by the number of > voxels, which is the amount of information. a fudge solution would > probably be a scale ~N, which i also did for the saxs restraint.

> however, as the solution of scaling different kinds of restraints is > not really solved anyways, i do not quite see the point of putting a > pre-fixed solution, as it pretends to have found a solution for > something where it has actually not been found. I agree that we aren't too a point where we can have a bulletproof solution. However, given that the problem comes up again and again, it seems reasonable to take people's experience so far and use it to develop guidelines for how restraints should behave and rules of thumb to provide better starting points for other users. In addition, the more awareness of an issue like things permeates things, the less likely it will be that someone is surprised. For example, if the docs for the EM restraint say something about "The EM restraint should be wrapped by the ScaledEMRestraint wrapper which scales the CC scaled by the number of particles and a factor for X so as to bring it into scale with the other restraints", then anyone who uses the EM restraint will know to think about scaling.

As it is, people will, as Keren did, occasionally simply add restraints to a Model without thinking about scaling. Or will change their representation scale, and wonder why nothing works any more.

I suspect that we can get 80% of the way there (with something like what I suggested), which should be good enough for many applications and to keep a bunch of users happy.

5776

Age (days ago)

5776

Last active (days ago)

List overview

Download

14 comments

4 participants

tags (0)

participants (4)

Ben Webb
Daniel Russel
Friedrich Foerster
Keren Lasker