Rather than answer Ben's questions one by one, I'll try to address things in bulk.
The route we chose to go down for representing particles in IMP is very non-structured. You just give it a string and get back a value. This makes things very flexible (for example I can trivially implement the hierarchy on top of it) but makes it hard to maintain invariants. The best way to do this is to provide helper functions and beat users into using them. For example an add_child helper function adds a child to a node and makes sure the parent_index is correct. A compute_coordinates_from_center_of_mass_of_children function does exactly that. Unfortunately, there is no way of making sure that it gets called every time the the set of children change (although add_child/remove_child could of course be made more clever and we can use a State object to make sure it gets called after coordinates are updated by the optimizer).
Another issues is that all lookups involve searching for a string in a table. This can be expensive. The cost of generating the string should be trivial as they can easily be cached (I do so in my get_child helper function).
The alternative would have been to use an object hierarchy and have the objects manage everything internally. Then we can have all sorts of types of objects which allow you to get and set attributes directly (hiding the Model_data object and the indirection provided by the IntIndex sort of things from users of the various Particle classes). Then we would have a GeometricParticle which has methods x () and y() which return floats for the coordinates and a HierarchyParticle() which has child(i) etc. The main disadvantage is that you have to cast all over the place (but now that C++ has RTTI this isn't too bad). The other disadvantage is that loading data from files is more tricky as the mapping between the text string in the file and the attribute no longer happens for free (you have to know "X" corresponds to the function set_x()). We can provide macros to make this mapping easier though.
Personally I think the class based approach is better, but Brett liked databases and went with the former. The one thing I think we should not do is mix the two. Either everything is an object and you get things through C++ calls or everything is as it is currently and you manipulate things through helper functions. If we mix, it is hard to keep track of what everything is and make sure that things like saving and restoring state happen properly as well as just being ugly.
On Nov 2, 2007, at 4:54 PM, Ben Webb wrote:
> Daniel Russel wrote: >>> - Is Residue just an example of a member of a hierarchy, or would >>> chains and proteins be treated differently? >> A tree node is a tree node. It can happen to also have some >> biological function, but that is orthogonal to being a hierarchy >> node. > > I think you misunderstood my question. Quite likely :-)
> The wiki page has a description of what attributes a Residue has, > but nothing about chains or proteins, so I was just trying to > ascertain whether you just put in Residue as an example (and just > haven't done chains/proteins yet) or whether you think they should > be treated specially. I think your answer means the former, yes? Yes, the former. I just haven't had any reason to add more fields to chains or proteins other than what they have from being in the hierarchy and being a generic object (i.e. they have a name, a type and children and parents).
> Well, sure, but let's say I have a rigid body containing 500 atoms. > It has 7 attributes - the xyz of its center of mass, and an > orientation quaternion. These would both have to be updated if > particles were added to or removed from the rigid body. By making > these 'dumb' attributes, the only way to do that is to do the > update every time you want to use the rigid body, which seems > inefficient to me. In contrast, a ParticleContainer object could > have a method to add/remove particles, so that it could do the > update when necessary. To not answer your question, for updates to locations caused by the optimizer, a State object would handle things quite nicely.
I see your point that we need somewhere to put the functionality to call it when you add or remove a point. Personally I would prefer a free floating function that you call passing a particle in the hierarchy (like my hierarchy helper functions for getting the ith child). Then you could easily provide your own function if you want to do something slightly different or could apply the "compute center of mass of all children" function to a body which didn't happen to be rigid.
> >>> - If I wanted to pull out every atom in residue 1, I'd really >>> have to scan through every single particle to figure out which >>> ones a) have a residue attribute and b) have it = 1 ? That seems >>> inefficient. >> You would find the particle for residue 1 and get "child_0", >> "child_1"... >> I don't think you should ever have to scan through all particles >> (and, personally, I don't think you should be able to as it would >> encourage bad habits). > > Ah, I see - it wasn't clear to me from the wiki page. Then my > concerns here are 1) you have the information in two locations, so > you will need to do consistency checks to make sure that the child/ > parent pointers all point to the right thing; Yes, that is true.
> 2) that seems grossly inefficient - imagine a container with 10000 > atoms, doing the string concatenation and formatting to get child_0 > through child_9999, then the hashtable lookup, as opposed to just > iterating through a std::vector<int>. Well, you wouldn't actually do the string concatenation since that can be trivially cached (in fact I currently do it in my helper function). You would have to do the table lookup though. This is a general problem with our architecture which may prove to be a problem in the long run. Even if we special case the children in the hierachy, you still have the same problem when you want do to anything other than look at the children/parents of a hierarchy node (such as the coordinates).