First significant code review

Daniel Russel

12 Dec 2007 12 Dec '07

2:18 p.m.

I changed the interfaces of DistanceRestraint and ExclusionRestraint to make them more consistent and more general.

Warning, it is quite long due to many similar changes in the test cases. Yeah for sed, but I wouldn't want to read through the diff :-)

DistanceRestraint now takes a pair of ParticleIndexes and a ScoreFunc which it then owns. I also removed the constructor with a passed radius as that does add any power and adds another path which has to be tested and maintained.

ExclusionRestraint takes a list or lists of ParticleIndexes (instead of ints). Various unused member variables are removed.

Restraint::show now has a default argument so it can be called from python.

Applying it requires first applying the decorators patch I sent before (since I removed the redundant creation of the FloatKeys in DistanceRestraint). It might also require the more minor kernel patch also included.

Attachments:

restraints.a (application/octet-stream — 43.0 KB)
kernel.a (application/octet-stream — 14.8 KB)

Show replies by date

Ben Webb

12 Dec 12 Dec

5:58 p.m.

Daniel Russel wrote: > DistanceRestraint now takes a pair of ParticleIndexes and a ScoreFunc > which it then owns.

OK, so the ScoreFunc change should be a separate patch, because it is a large change. And first we need to decide what the proper solution is, because your patch appears to only address part of the issues.

So what is a ScoreFunc? It is essentially what in the Modeller world is called a math form - given some value, it uses its own state (e.g. mean and standard deviation) to return a score. IMP has harmonic and upper/lower bound ScoreFuncs which are essentially identical to the Modeller equivalents.

Some restraints (e.g. DistanceRestraint) internally keep a pointer to a ScoreFunc object and use it in the score calculation.

Originally, Bret had DistanceRestraint take a ScoreFunc, a mean, and a standard deviation. I complained about this because it makes things like DOPE impossible - because those distance restraints do not have means or standard deviations, just a cubic spline. So he replaced ScoreFunc with ScoreFuncParams, which is a generator that can make ScoreFunc objects internally. For example, something like the ExclusionVolumeRestraint goes through all pairs of particles and builds distance restraints for each pair, calling ScoreFuncParams::set_mean to set the mean for each DistanceRestraint generated.

I don't like this patch, because it now requires you to use ScoreFuncParams in some places and regular ScoreFuncs in others, and to call the generator function yourself (even from Python). We should decide on one or the other. (I will quite happily write the code, unless Daniel really wants to, for the solution.)

One suggestion: we have a hierarchy of ScoreFunc classes, and just have the Restraint constructors create local copies to keep their own state (we can just use the copy constructor for this).

I'd like to hear people's thoughts on this, because other scoring functions (e.g. Lennard-Jones style 6-12 potential) are on my immediate todo list.

I also removed the constructor with a passed radius > as that does add any power and adds another path which has to be tested > and maintained.

Sure, I can commit that.

> ExclusionRestraint takes a list or lists of ParticleIndexes (instead of > ints).

Do others have any opinions on this? If people hate ParticleIndexes, now would be the time to say so - otherwise I'll commit this part of Daniel's patch too.

> Restraint::show now has a default argument so it can be called from python.

I didn't see a unit test for this, but I guess I can write one for you when I commit it.

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Daniel Russel

6:12 p.m.

> I don't like this patch, because it now requires you to use > ScoreFuncParams in some places and regular ScoreFuncs in others, and > to > call the generator function yourself (even from Python). We should > decide on one or the other.

The reason I introduced the split, and, I think the reason for having both the Params and the SF itself are that some restraints simply do use a function, where as others generate a family of functions. They are conceptually different, and so I think, as long as there are two different class, the interfaces should reflect that. (the other reason is I didn't want to have to change BasicScoreFuncParams every time I added a new subclass).

> (I will quite happily write the code, unless > Daniel really wants to, for the solution.) I'll pass :-)

> One suggestion: we have a hierarchy of ScoreFunc classes, and just > have > the Restraint constructors create local copies to keep their own state > (we can just use the copy constructor for this). I don't think I understand the proposal.

Two proposal (perhaps the first is the same as yours)

1) to have ScoreFunc which has a clone method and translate method (and perhaps a scale method). Then, a compound restraint can take one, clone it and shift the the origin to handle generating a family.

2) Have a factory per ScoreFunc which then takes a shift and a scale parameter and creates a corresponding score function. The factory can be easily generated with a macro.

As a side note, I propose we change the name to UnaryFunction because it is simply a function of one variable and I can see wanting functions of more than one. And having an abbreviation in the name is inconsistent. Also, the names mean and stdev have to go, since they don't make sense with statistical potentials.

Ben Webb

14 Dec 14 Dec

12:27 a.m.

Daniel Russel wrote: >> I don't like this patch, because it now requires you to use >> ScoreFuncParams in some places and regular ScoreFuncs in others, and to >> call the generator function yourself (even from Python). We should >> decide on one or the other. > > The reason I introduced the split, and, I think the reason for having > both the Params and the SF itself are that some restraints simply do use > a function, where as others generate a family of functions.

Yes, this is true. For example, a DistanceRestraint uses a 'raw' ScoreFunc, while an ExclusionVolumeRestraint uses a ScoreFunc generator (to generate a separate ScoreFunc for each DistanceRestraint). But don't you think the average user will be confused by this distinction? I have no problem with making it internally.

>> One suggestion: we have a hierarchy of ScoreFunc classes, and just have >> the Restraint constructors create local copies to keep their own state >> (we can just use the copy constructor for this). > I don't think I understand the proposal. > > Two proposal (perhaps the first is the same as yours)

Yes, the first is essentially the same as my proposal.

> 1) to have ScoreFunc which has a clone method and translate method (and > perhaps a scale method). Then, a compound restraint can take one, clone > it and shift the the origin to handle generating a family. > > 2) Have a factory per ScoreFunc which then takes a shift and a scale > parameter and creates a corresponding score function. The factory can be > easily generated with a macro.

There'd have to be a few more methods than that, though - for example an atomistic statistical potential would need to know the atom types of the two atoms it acts on, so there'd need to be a method to provide it with the two Particle pointers or indexes.

> As a side note, I propose we change the name to UnaryFunction

That's a rather vague name IMHO, and doesn't say anything about its use as a scoring function... you may as well just call it MathForm (although the comments in ScoreFunc suggest that's what it was called originally anyway). Why don't you like MathForm? We could have UnaryMathForm if you prefer. ;)

> because it > is simply a function of one variable and I can see wanting functions of > more than one.

That's not a huge stretch of the imagination, because we already have functions of more than one variable in Modeller (ND spline, Mike's MDG, multi bionormal). It's a shame that this was thrown away in the original IMP design, because now we've got to stick it back in. I guess you're arguing for a different base class for these functions, since the operator() call signature would be different? (std::vector<Float> rather than Float, I guess.)

> Also, the names mean and stdev have to go, since they don't make sense > with statistical potentials.

That's certainly true for statistical potentials, but most of the restraints that currently generate DistanceRestraints (e.g. ExclusionVolume) explicitly build restraints for a minimum or mean distance. So there'd need to at least be a ScoreFunc class that supported mean/stdev, that we can derive from, yes?

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Daniel Russel

7:42 a.m.

On Dec 14, 2007, at 12:27 AM, Ben Webb wrote:

> Yes, this is true. For example, a DistanceRestraint uses a 'raw' > ScoreFunc, while an ExclusionVolumeRestraint uses a ScoreFunc > generator > (to generate a separate ScoreFunc for each DistanceRestraint). But > don't > you think the average user will be confused by this distinction? I > have > no problem with making it internally. That I can't really say. It makes sense to me. But everyone things my mind works in strange ways. In addition, it needs to be able to support statistical potentials, which is a problem if you want to stick with the names mean and stddev.

Creating and destroying distance restraints is, at least in my code, a relatively common operation (in one of the inner loops) and so I would like it to be as cheap as possible. On that note, we should make the x_,y_,z_ static. > >> As a side note, I propose we change the name to UnaryFunction > > That's a rather vague name IMHO, and doesn't say anything about its > use > as a scoring function... you may as well just call it MathForm > (although > the comments in ScoreFunc suggest that's what it was called originally > anyway). Why don't you like MathForm? We could have UnaryMathForm if > you > prefer. ;) UnaryMathForm is the same. I don't really know what a math form is a priori. UnaryScoreFunction? A bit long, but quite explicit.

> I guess you're > arguing for a different base class for these functions, since the > operator() call signature would be different? (std::vector<Float> > rather > than Float, I guess.) I would say we have BinaryX, which takes two and perhaps NaryX which takes a vector, yes. Where X is ScoreFunction or MathForm.

>> Also, the names mean and stdev have to go, since they don't make >> sense >> with statistical potentials. > > That's certainly true for statistical potentials, but most of the > restraints that currently generate DistanceRestraints (e.g. > ExclusionVolume) explicitly build restraints for a minimum or mean > distance. So there'd need to at least be a ScoreFunc class that > supported mean/stdev, that we can derive from, yes? Well, the setting the mean at least is just translation which is an operation that can be applied to anything. For the current functions stddev just involves a scaling (and isn't actually stddev anyway).

Ben Webb

5:14 p.m.

Daniel Russel wrote: > DistanceRestraint now takes a pair of ParticleIndexes and a ScoreFunc > which it then owns.

On this subject, did you make any more progress with the reference counting code? It seems like this would be a obvious candidate for reference counting (e.g. an excluded volume may create thousands of distance restraints, all of which share a single scoring function by increasing its reference count). I seem to recall that you were stuck at the Python interface. If this is still the case, I can take a look, as I (probably) know a bit about the SWIG internals.

If others think reference counting of IMP objects is a bad idea, now's the time to say...

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Daniel Russel

5:25 p.m.

I gave up on reference counting as the existing swig support wasn't doing what we needed by itself. Getting things working right would not be too hard technically, but would take a bit of time and perhaps per- class entries in IMP.i.

Here is what I posted to the swig list. The only responses I got were "Yeah, that would be nice"

I would like to have reference counted objects that work across C++ and Python both for objects which are purely C++ objects and Python objects which inherit from C++ base classes. This seems relatively easily to implement by: - Purely C++ objects or python objects extending C++ objects which don't have virtual functions are just treated like they currently are-- their reference count is incremented if there is a python reference to them and decremented if there is no python reference. If the python proxy object goes away before the C++ object, it is fine. The C++ object can't go away without all python references being destroyed. - With Python objects which inherit from C++ objects with virtual functions, swig has to keep the python proxy object alive until the C++ object is destroyed. This can be done by internally keeping a reference to the python object until the C++ object is destroyed. Then the director can use the call to the virtual C++ destructor (which needs to be there on the C++ side anyway) to remove the reference to the python object so that it too goes away.

While it is straight forward, I don't see any way to easily turn on or provide this functionality for a large array of classes. I am quite new to SWIG so I might be missing something. Any suggestions or alternatives? Thanks.

On Dec 14, 2007, at 5:14 PM, Ben Webb wrote:

> Daniel Russel wrote: >> DistanceRestraint now takes a pair of ParticleIndexes and a ScoreFunc >> which it then owns. > > On this subject, did you make any more progress with the reference > counting code? It seems like this would be a obvious candidate for > reference counting (e.g. an excluded volume may create thousands of > distance restraints, all of which share a single scoring function by > increasing its reference count). I seem to recall that you were > stuck at > the Python interface. If this is still the case, I can take a look, > as I > (probably) know a bit about the SWIG internals. > > If others think reference counting of IMP objects is a bad idea, now's > the time to say... > > Ben > -- > ben@salilab.org http://salilab.org/~ben/ > "It is a capital mistake to theorize before one has data." > - Sir Arthur Conan Doyle > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

6319

Age (days ago)

6322

Last active (days ago)

List overview

Download

6 comments

2 participants

tags (0)

participants (2)

Ben Webb
Daniel Russel