problems with a sample script (basic IMP optimization)
Hi list,
As a first step towards a serious usage of IMP, I tried to write a sample code using rigid bodies, restraints and optimizers. With MonteCarlo optimizer, at the end of optimization the model seems to remains unchanged. With ConjugateGradients or SteepestDescent, when using too many steps in the optimizer (under 100 though !?) the script generally crashes with one of these error messages :
File "/usr/local/lib/python2.6/site-packages/IMP/core/__init__.py", line 642, in optimize def optimize(self, *args): return _IMP_core.ConjugateGradients_optimize(self, *args) _IMP.UsageException: Attempting to construct a rotation from a non-quaternion value. The coefficient vector must have a length of 1. Got: 0.165858 -0.916284 0.55176 0.327904 gives 1.27905
File "/usr/local/lib/python2.6/site-packages/IMP/core/__init__.py", line 642, in optimize def optimize(self, *args): return _IMP_core.ConjugateGradients_optimize(self, *args) _IMP.UsageException: Attempting to construct a rotation from a non-quaternion value. The coefficient vector must have a length of 1. Got: 0.389486 -0.600546 0.553302 0.99031 gives 1.79921
Moreover when the script finishes, its outputs seem not relevant at all, though the score is better (smaller). maybe it is normal though, due to the choice of my restraints or to the lack of optimization steps… I think my understanding of IMP usage is still very far from perfect, will you help me sorting that out by having a look at my code ?
You will fin the script attached to this message, its outlet is really basic : - a structure is loaded ( I choose 1RDT, a complex comprising two protein domains and two peptides) - a map is sampled at 10A for that structure - chains are extracted from the structure, rigidified and randomly moved in space - two restraints are added to the model in order to tale into account - the fit of all atoms to the sampled map - no steric clashes between rigid bodies - optimization
thank you very much
Dr. Benjamin SCHWARZ Biocomputing group Email : schwarz@igbmc.fr Voice : +33 (0)3 68 85 47 30 FAX : +33 (0)3 68 85 47 18
Structural Biology & Genomics Dept. - IGBMC 1 rue Laurent Fries BP 10142 F - 67404 Illkirch CEDEX
…And it is even better with the script.
Dr. Benjamin SCHWARZ Biocomputing group Email : schwarz@igbmc.fr Voice : +33 (0)3 68 85 47 30 FAX : +33 (0)3 68 85 47 18
Structural Biology & Genomics Dept. - IGBMC 1 rue Laurent Fries BP 10142 F - 67404 Illkirch CEDEX
The bug sounds like something Elina found and patched after the 1.0 release. I'll test your script with the current SVN.
As general comments: - with monte-carlo, you have to provide the move set. We really should put up an error if there is no move set :-) I'll fix that. - Conjugate gradients can get stuck very easily (which is why we are working on a better optimizer). This is especially true with rigid bodies composed of my particles as the atom balls can get mixed in together and CG can never separate them.
On Jul 23, 2010, at 7:00 AM, Benjamin SCHWARZ wrote:
> Hi list, > > As a first step towards a serious usage of IMP, I tried to write a sample code using rigid bodies, restraints and optimizers. > With MonteCarlo optimizer, at the end of optimization the model seems to remains unchanged. With ConjugateGradients or SteepestDescent, when using too many steps in the optimizer (under 100 though !?) the script generally crashes with one of these error messages : > > File "/usr/local/lib/python2.6/site-packages/IMP/core/__init__.py", line 642, in optimize > def optimize(self, *args): return _IMP_core.ConjugateGradients_optimize(self, *args) > _IMP.UsageException: Attempting to construct a rotation from a non-quaternion value. The coefficient vector must have a length of 1. Got: 0.165858 -0.916284 0.55176 0.327904 gives 1.27905 > > > File "/usr/local/lib/python2.6/site-packages/IMP/core/__init__.py", line 642, in optimize > def optimize(self, *args): return _IMP_core.ConjugateGradients_optimize(self, *args) > _IMP.UsageException: Attempting to construct a rotation from a non-quaternion value. The coefficient vector must have a length of 1. Got: 0.389486 -0.600546 0.553302 0.99031 gives 1.79921 > > Moreover when the script finishes, its outputs seem not relevant at all, though the score is better (smaller). maybe it is normal though, due to the choice of my restraints or to the lack of optimization steps… I think my understanding of IMP usage is still very far from perfect, will you help me sorting that out by having a look at my code ? > > You will fin the script attached to this message, its outlet is really basic : > - a structure is loaded ( I choose 1RDT, a complex comprising two protein domains and two peptides) > - a map is sampled at 10A for that structure > - chains are extracted from the structure, rigidified and randomly moved in space > - two restraints are added to the model in order to tale into account > - the fit of all atoms to the sampled map > - no steric clashes between rigid bodies > - optimization > > > thank you very much > > > > Dr. Benjamin SCHWARZ > Biocomputing group > Email : schwarz@igbmc.fr > Voice : +33 (0)3 68 85 47 30 > FAX : +33 (0)3 68 85 47 18 > > <logo-uds-signature.gif> > > > Structural Biology & Genomics Dept. - IGBMC > 1 rue Laurent Fries > BP 10142 > F - 67404 Illkirch CEDEX > > _______________________________________________ > IMP-users mailing list > IMP-users@salilab.org > https://salilab.org/mailman/listinfo/imp-users
Thanks a lot for the answer
> The bug sounds like something Elina found and patched after the 1.0 release. I'll test your script with the current SVN. The good news for me, is that you seem not to blame my code. I guess it means, my understanding of IMP's usage tends to improve :)
> As general comments: > - with monte-carlo, you have to provide the move set. We really should put up an error if there is no move set :-) I'll fix that. OK, I will have a serious glance on these aspects
> - Conjugate gradients can get stuck very easily (which is why we are working on a better > optimizer). OK, I will have a look at the C++ code for IMP.em.local_rigid_fitting in order to understand how you mixed MC and CG there.
> This is especially true with rigid bodies composed of my particles as the atom balls can get mixed in together and CG can never separate them. I am sorry, I don't understand this point.
Please, let me know when you have tested my script with the latest IMP code.
--Ben
Following Daniel's suggestion I added a set of Movers to my MC optimizer (see code attached). Now the optimizer actually does something, though the results are very strange. It is much probable I poorly chose my parameters, and 1000 steps are not so much for a MC optimization.
It also appears I have a problem with the cost function : it evaluates to 18.36 for the native configuration and to 1371 for an initial "noisy" configuration not so far from the initial configuration. It then evaluates to 1.03 for an "optimized" configuration where all four chains are placed far outside the sampled density. I did not discover yet if and how it is possible to set parameters for the global cost function, maybe my problem has something to do with it.
Anyhow, I would be very grateful if someone would glance at my script, just to check I did use IMP in a proper manner.
--Ben
Le 23 juil. 2010 à 19:44, Daniel Russel a écrit :
> The bug sounds like something Elina found and patched after the 1.0 release. I'll test your script with the current SVN. > > As general comments: > - with monte-carlo, you have to provide the move set. We really should put up an error if there is no move set :-) I'll fix that. > - Conjugate gradients can get stuck very easily (which is why we are working on a better optimizer). This is especially true with rigid bodies composed of my particles as the atom balls can get mixed in together and CG can never separate them.
Biocomputing group Email : schwarz@igbmc.fr Voice : +33 (0)3 68 85 47 30 FAX : +33 (0)3 68 85 47 18
Structural Biology & Genomics Dept. - IGBMC 1 rue Laurent Fries BP 10142 F - 67404 Illkirch CEDEX
Hi list,
In order to have a better insight I modified my script, keeping only the em.FitRestraint in my model (script attached). The score for the initial model (5.96046447754e-08) is reassuringly near 0; which is reasonable when comparing two maps sampled from the same structure. Nevertheless, when chains are randomly moved, the score does not radically improve, even when the transformations move them far away from their initial position. Moreover, I have an example where the score is minimized from 0.37 for the initial (moved chains) position to 0.49 for the optimized solution !?
I may have missed something, and I would need some technical support to fix the problem quickly.
--Ben
Dr. Benjamin SCHWARZ Biocomputing group Email : schwarz@igbmc.fr Voice : +33 (0)3 68 85 47 30 FAX : +33 (0)3 68 85 47 18
Structural Biology & Genomics Dept. - IGBMC 1 rue Laurent Fries BP 10142 F - 67404 Illkirch CEDEX
I think them main problem is the roughness of the scoring function caused by the excluded volume terms (as I had cryptically mentioned before). Your scoring function looks something like EM+sum over many pairs, where EM goes from 0 to 1 and each of the terms being summed over goes from, more of less 0 to 1 also. So the EM score is swamped by the excluded volume terms as soon as there is any overlap. And MC just moves things randomly, hoping for a good solution. So it will take a while to find anything good. In addition, if you use Conjugate Gradients, the derivatives of the excluded volume score on atoms are pretty useless since they are computed using each atom ball (explanation of the problem would probably benefit from a picture, I'll work on that :-).
My suggestions are - don't use a full atomic representation for your proteins (eg, simplify them using IMP.atom.create_simplified_along_backbone()). This will smooth the excluded volume scoring function out and making scoring faster. - use Conjugate Gradients in conjunction with MC (create a conjugate gradients optimizer and add it to the MC one using mc.set_local_optimizer()). This will then perform local minimization after each MC step. - when using monte carlo for optimization (as opposed to sampling), you should call set_return_best(True) on the object so that it saves the best state it finds, rather than just returning the last state accepted (which may have higher score).
On Jul 27, 2010, at 6:30 AM, Benjamin SCHWARZ wrote:
> Hi list, > > In order to have a better insight I modified my script, keeping only the em.FitRestraint in my model (script attached). > The score for the initial model (5.96046447754e-08) is reassuringly near 0; which is reasonable when comparing two maps sampled from the same structure. > Nevertheless, when chains are randomly moved, the score does not radically improve, even when the transformations move them far away from their initial position. Moreover, I have an example where the score is minimized from 0.37 for the initial (moved chains) position to 0.49 for the optimized solution !? > > I may have missed something, and I would need some technical support to fix the problem quickly. > > --Ben > > > <fitWithRestraints.py> > > Dr. Benjamin SCHWARZ > Biocomputing group > Email : schwarz@igbmc.fr > Voice : +33 (0)3 68 85 47 30 > FAX : +33 (0)3 68 85 47 18 > > <logo-uds-signature.gif> > > > Structural Biology & Genomics Dept. - IGBMC > 1 rue Laurent Fries > BP 10142 > F - 67404 Illkirch CEDEX > > _______________________________________________ > IMP-users mailing list > IMP-users@salilab.org > https://salilab.org/mailman/listinfo/imp-users
Hello Daniel,
> I think them main problem is the roughness of the scoring function caused by the excluded volume terms (as I had cryptically mentioned before). Your scoring function looks something like EM+sum over many pairs, where EM goes from 0 to 1 and each of the terms being summed over goes from, more of less 0 to 1 also. So the EM score is swamped by the excluded volume terms as soon as there is any overlap. And MC just moves things randomly, hoping for a good solution. So it will take a while to find anything good.
It perfectly makes sense… So, I understand that diverse restraints may have diverse scoring schemes and images ( e.g. [0,1] for FittingRestraint and [0,+M] for ExcludedVolumeRestraint, with M potentially very high)… And now, what to do with it ? :) There seem to be no "pythonic" way to modify constants in front of the diverse restraints, or to compose a restraint with a function to modulate its score. My lucky guess is : go for C++ to create a new restraint by composing existing restraints… Am I correct ?
> In addition, if you use Conjugate Gradients, the derivatives of the excluded volume score on atoms are pretty useless since they are computed using each atom ball (explanation of the problem would probably benefit from a picture, I'll work on that :-).
I am afraid I need a picture... and a lot more explanation on IMP internal computations ;) So don't spend too much time on this unless you think it can benefit to others.
> My suggestions are > - don't use a full atomic representation for your proteins (eg, simplify them using IMP.atom.create_simplified_along_backbone()). This will smooth the excluded volume scoring function out and making scoring faster. I'll try that
> - use Conjugate Gradients in conjunction with MC (create a conjugate gradients optimizer and add it to the MC one using mc.set_local_optimizer()). This will then perform local minimization after each MC step. I missed the set_local_optimizer() trick… I'll have a closer look at that.
> - when using monte carlo for optimization (as opposed to sampling), you should call set_return_best(True) on the object so that it saves the best state it finds, rather than just returning the last state accepted (which may have higher score).
I missed that to...
Thanks a lot for your answer, I will test that ASAP
Dr. Benjamin SCHWARZ Biocomputing group Email : schwarz@igbmc.fr Voice : +33 (0)3 68 85 47 30 FAX : +33 (0)3 68 85 47 18
Structural Biology & Genomics Dept. - IGBMC 1 rue Laurent Fries BP 10142 F - 67404 Illkirch CEDEX
On 7/27/10 9:10 AM, Benjamin SCHWARZ wrote: > So, I understand that diverse restraints may have diverse scoring > schemes and images ( e.g. [0,1] for FittingRestraint and [0,+M] for > ExcludedVolumeRestraint, with M potentially very high)… And now, what to > do with it ? :) > There seem to be no "pythonic" way to modify constants in front of the > diverse restraints, or to compose a restraint with a function to > modulate its score. My lucky guess is : go for C++ to create a new > restraint by composing existing restraints… Am I correct ?
Yes, when combining very different restraints, weighting may need to be considered. You can weight a Restraint by putting it in a RestraintSet and calling that object's set_weight() method.
Ben
> It perfectly makes sense… > So, I understand that diverse restraints may have diverse scoring schemes and images ( e.g. [0,1] for FittingRestraint and [0,+M] for ExcludedVolumeRestraint, with M potentially very high)… And now, what to do with it ? :) > There seem to be no "pythonic" way to modify constants in front of the diverse restraints, or to compose a restraint with a function to modulate its score. My lucky guess is : go for C++ to create a new restraint by composing existing restraints… Am I correct ?
As Ben pointed out, you can use RestraintSets to weight things. This does get to a more fundamental problem that weighting inhomogeneous restraints relative to one another is fundamentally broken and should be avoided. We are working on solutions to that (using an inference based optimizer Keren developed), but for now you have to fool around with weights.
> >> In addition, if you use Conjugate Gradients, the derivatives of the excluded volume score on atoms are pretty useless since they are computed using each atom ball (explanation of the problem would probably benefit from a picture, I'll work on that :-). > > I am afraid I need a picture... and a lot more explanation on IMP internal computations ;) > So don't spend too much time on this unless you think it can benefit to others. I tried with a picture, but couldn't make anything that wasn't confusingly cluttered.
To try to explain better, for a pair of intersecting rigid bodies, the derivatives of the excluded volume restraint are computed by adding up the derivatives for each of the balls which constitute the rigid body. The sum of these individual derivatives does not necessarily point in a direction which separates the two rigid bodies unless the inter-penetration of bodies is small compared to the size of the balls. As a result CG would get stuck in local minima. We are working on a simple alternatively, but I don't think it is there yet. Keren?
--Daniel
> As Ben pointed out, you can use RestraintSets to weight things. This does get to a more fundamental problem that weighting inhomogeneous restraints relative to one another is fundamentally broken and should be avoided. We are working on solutions to that (using an inference based optimizer Keren developed), but for now you have to fool around with weights.
Just to be 100% sure to understand : what you mean when saying this approach is "broken" is that rather a technical / software problem, it is a fundamental / theoretical evilness / badness to mix "things" that do not leave in the same space. Though, since a more theoretically justified technique is not yet at hand, a weighted sum of restraints will do the trick.
Concerning Keren's development, has it got anything to do with Domino ?
>>> In addition, if you use Conjugate Gradients, the derivatives of the excluded volume score on atoms are pretty useless since they are computed using each atom ball (explanation of the problem would probably benefit from a picture, I'll work on that :-). > I tried with a picture, but couldn't make anything that wasn't confusingly cluttered. > To try to explain better, for a pair of intersecting rigid bodies, the derivatives of the excluded volume restraint are computed by adding up the derivatives for each of the balls which constitute the rigid body. The sum of these individual derivatives does not necessarily point in a direction which separates the two rigid bodies unless the inter-penetration of bodies is small compared to the size of the balls. As a result CG would get stuck in local minima.
That's perfectly clear, thanks a lot :)
Dr. Benjamin SCHWARZ Biocomputing group Email : schwarz@igbmc.fr Voice : +33 (0)3 68 85 47 30 FAX : +33 (0)3 68 85 47 18
Structural Biology & Genomics Dept. - IGBMC 1 rue Laurent Fries BP 10142 F - 67404 Illkirch CEDEX
On Jul 28, 2010, at 9:09 AM, Benjamin SCHWARZ wrote:
>> As Ben pointed out, you can use RestraintSets to weight things. This does get to a more fundamental problem that weighting inhomogeneous restraints relative to one another is fundamentally broken and should be avoided. We are working on solutions to that (using an inference based optimizer Keren developed), but for now you have to fool around with weights. > > Just to be 100% sure to understand : what you mean when saying this approach is "broken" is that rather a technical / software problem, it is a fundamental / theoretical evilness / badness to mix "things" that do not leave in the same space. Though, since a more theoretically justified technique is not yet at hand, a weighted sum of restraints will do the trick. The latter. It is a fundamental problem, rather than a software problem. Different restraints have different minima, grown to different values and at different rates. As a result there isn't really a set of weights that makes sense over a whole optimization process.
> > Concerning Keren's development, has it got anything to do with Domino ? Yes, We are working on a version (domino2) which is based on filtering of possible conformations based on the scores of individual restraints. --Daniel
As evoked in a former message, I couldn't manage to have GC optimizer working with the binary MAC OSX version (IMP 1.0). So I compiled the latest version on one of our linux machines. The compilation was easily achieved with the default parameters, though I guess it could be possible to enhance IMP speed by tweaking some magic compilation parameters… I'll see this point later.
I am thus using IMP version SVN 6211
With this version CG optimizer seem to work fine, though I always get warning messages WARNING could not find parameters for radius:1.85 WARNING Setting params for radius :1.85 WARNING could not find parameters for radius:2.275 WARNING Setting params for radius :2.275 WARNING could not find parameters for radius:2 WARNING Setting params for radius :2 WARNING could not find parameters for radius:1.7 WARNING Setting params for radius :1.7 WARNING could not find parameters for radius:2.175 WARNING Setting params for radius :2.175 WARNING could not find parameters for radius:2.06 WARNING Setting params for radius :2.06 WARNING could not find parameters for radius:1.77 WARNING Setting params for radius :1.77 WARNING could not find parameters for radius:1.9924 WARNING Setting params for radius :1.9924 WARNING could not find parameters for radius:1.8 WARNING Setting params for radius :1.8 WARNING Object "Nameless" was never used. See the IMP::Object documentation for an explanation.
But when trying to use CG in conjunction with MC ( mco.set_local_optimizer( cgo ) ), I still experience problems; sometimes it seems to pole in an infinite loop, sometimes it prematurely ends with this message :
Traceback (most recent call last): File "./fitWithRestraints.py", line 222, in <module> opti.optimize(params["optimizer"]["maxSteps"]) File "/usr/local/imp/lib64/python2.6/site-packages/IMP/__init__.py", line 1569, in optimize def optimize(self, *args): return _IMP.Optimizer_optimize(self, *args) _IMP.UsageException: Attempt to use uninitialized vector.
Ultimately, I am wondering if trying to get CG working is not a waste of time for me… I mean, maybe there are better approaches... So, If you have any comment or general advice that I should check, feel free to do so.
--Ben
Ben - from the warning messages I see that you are using the FitRestraint. Are you using any other restraints as well? It will be useful to get your optimization script so we can take a look. Keren. On Aug 12, 2010, at 7:00 AM, Benjamin SCHWARZ wrote:
> As evoked in a former message, I couldn't manage to have GC > optimizer working with the binary MAC OSX version (IMP 1.0). So I > compiled the latest version on one of our linux machines. The > compilation was easily achieved with the default parameters, though > I guess it could be possible to enhance IMP speed by tweaking some > magic compilation parameters… I'll see this point later. > > I am thus using IMP version SVN 6211 > > With this version CG optimizer seem to work fine, though I always > get warning messages > WARNING could not find parameters for radius:1.85 > WARNING Setting params for radius :1.85 > WARNING could not find parameters for radius:2.275 > WARNING Setting params for radius :2.275 > WARNING could not find parameters for radius:2 > WARNING Setting params for radius :2 > WARNING could not find parameters for radius:1.7 > WARNING Setting params for radius :1.7 > WARNING could not find parameters for radius:2.175 > WARNING Setting params for radius :2.175 > WARNING could not find parameters for radius:2.06 > WARNING Setting params for radius :2.06 > WARNING could not find parameters for radius:1.77 > WARNING Setting params for radius :1.77 > WARNING could not find parameters for radius:1.9924 > WARNING Setting params for radius :1.9924 > WARNING could not find parameters for radius:1.8 > WARNING Setting params for radius :1.8 > WARNING Object "Nameless" was never used. See the > IMP::Object documentation for an explanation. > > > But when trying to use CG in conjunction with MC > ( mco.set_local_optimizer( cgo ) ), I still experience problems; > sometimes it seems to pole in an infinite loop, sometimes it > prematurely ends with this message : > > Traceback (most recent call last): > File "./fitWithRestraints.py", line 222, in <module> > opti.optimize(params["optimizer"]["maxSteps"]) > File "/usr/local/imp/lib64/python2.6/site-packages/IMP/ > __init__.py", line 1569, in optimize > def optimize(self, *args): return _IMP.Optimizer_optimize(self, > *args) > _IMP.UsageException: Attempt to use uninitialized vector. > > > Ultimately, I am wondering if trying to get CG working is not a > waste of time for me… I mean, maybe there are better approaches... > So, If you have any comment or general advice that I should check, > feel free to do so. > > --Ben > _______________________________________________ > IMP-users mailing list > IMP-users@salilab.org > https://salilab.org/mailman/listinfo/imp-users
Hello Keren,
As you can see in the attached script (lines 168-210) I tested several configurations : FitRestraint alone or with an ExcludedVolumeRestraint; with or without weights (through a RestraintSet as suggested by Ben and Daniel).
Le 12 août 2010 à 17:40, Keren Lasker a écrit :
> Ben - > from the warning messages I see that you are using the FitRestraint. > Are you using any other restraints as well? > It will be useful to get your optimization script so we can take a look. > Keren. > On Aug 12, 2010, at 7:00 AM, Benjamin SCHWARZ wrote:
participants (4)
-
Ben Webb
-
Benjamin SCHWARZ
-
Daniel Russel
-
Keren Lasker