Re: Helper functions

2 Nov 2007


      Rather than answer Ben's questions one by one, I'll try to address  
things in bulk.
The route we chose to go down for representing particles in IMP is  
very non-structured. You just give it a string and get back a value.  
This makes things very flexible (for example I can trivially  
implement the hierarchy on top of it) but makes it hard to maintain  
invariants. The best way to do this is to provide helper functions  
and beat users into using them. For example an add_child helper  
function adds a child to a node and makes sure the parent_index is  
correct. A compute_coordinates_from_center_of_mass_of_children  
function does exactly that. Unfortunately, there is no way of making  
sure that it gets called every time the the set of children change  
(although add_child/remove_child could of course be made more clever  
and we can use a State object to make sure it gets called after  
coordinates are updated by the optimizer).
Another issues is that all lookups involve searching for a string in  
a table. This can be expensive. The cost of generating the string  
should be trivial as they can easily be cached (I do so in my  
get_child helper function).
The alternative would have been to use an object hierarchy and have  
the objects manage everything internally. Then we can have all sorts  
of types of objects which allow you to get and set attributes  
directly (hiding the Model_data object and the indirection provided  
by the IntIndex sort of things from users of the various Particle  
classes). Then we would have a GeometricParticle which has methods x 
() and y() which return floats for the coordinates and a  
HierarchyParticle() which has child(i) etc. The main disadvantage is  
that you have to cast all over the place (but now that C++ has RTTI  
this isn't too bad). The other disadvantage is that loading data from  
files is more tricky as the mapping between the text string in the  
file and the attribute no longer happens for free (you have to know  
"X" corresponds to the function set_x()). We can provide macros to  
make this mapping easier though.
Personally I think the class based approach is better, but Brett  
liked databases and went with the former. The one thing I think we  
should not do is mix the two. Either everything is an object and you  
get things through C++ calls or everything is as it is currently and  
you manipulate things through helper functions. If we mix, it is hard  
to keep track of what everything is and make sure that things like  
saving and restoring state happen properly as well as just being ugly.
On Nov 2, 2007, at 4:54 PM, Ben Webb wrote:
> Daniel Russel wrote:
>>> - Is Residue just an example of a member of a hierarchy, or would  
>>> chains and proteins be treated differently?
>> A tree node is a tree node. It can happen to also have some  
>> biological function, but that is orthogonal to being a hierarchy  
>> node.
>
> I think you misunderstood my question.
Quite likely :-)
> The wiki page has a description of what attributes a Residue has,  
> but nothing about chains or proteins, so I was just trying to  
> ascertain whether you just put in Residue as an example (and just  
> haven't done chains/proteins yet) or whether you think they should  
> be treated specially. I think your answer means the former, yes?
Yes, the former. I just haven't had any reason to add more fields to  
chains or proteins other than what they have from being in the  
hierarchy and being a generic object (i.e. they have a name, a type  
and children and parents).
> Well, sure, but let's say I have a rigid body containing 500 atoms.  
> It has 7 attributes - the xyz of its center of mass, and an  
> orientation quaternion. These would both have to be updated if  
> particles were added to or removed from the rigid body. By making  
> these 'dumb' attributes, the only way to do that is to do the  
> update every time you want to use the rigid body, which seems  
> inefficient to me. In contrast, a ParticleContainer object could  
> have a method to add/remove particles, so that it could do the  
> update when necessary.
To not answer your question, for updates to locations caused by the  
optimizer, a State object would handle things quite nicely.
I see your point that we need somewhere to put the functionality to  
call it when you add or remove a point. Personally I would prefer a  
free floating function that you call passing a particle in the  
hierarchy (like my hierarchy helper functions for getting the ith  
child). Then you could easily provide your own function if you want  
to do something slightly different or could apply the "compute center  
of mass of all children" function to a body which didn't happen to be  
rigid.
>
>>> - If I wanted to pull out every atom in residue 1, I'd really  
>>> have to scan through every single particle to figure out which  
>>> ones a) have a residue attribute and b) have it = 1 ? That seems  
>>> inefficient.
>> You would find the particle for residue 1 and get "child_0",  
>> "child_1"...
>> I don't think you should ever have to scan through all particles  
>> (and, personally, I don't think you should be able to as it would  
>> encourage bad habits).
>
> Ah, I see - it wasn't clear to me from the wiki page. Then my  
> concerns here are 1) you have the information in two locations, so  
> you will need to do consistency checks to make sure that the child/ 
> parent pointers all point to the right thing;
Yes, that is true.
> 2) that seems grossly inefficient - imagine a container with 10000  
> atoms, doing the string concatenation and formatting to get child_0  
> through child_9999, then the hashtable lookup, as opposed to just  
> iterating through a std::vector<int>.
Well, you wouldn't actually do the string concatenation since that  
can be trivially cached (in fact I currently do it in my helper  
function). You would have to do the table lookup though. This is a  
general problem with our architecture which may prove to be a problem  
in the long run. Even if we special case the children in the  
hierachy, you still have the same problem when you want do to  
anything other than look at the children/parents of a hierarchy node  
(such as the coordinates).