I have various helper functions that are probably generally useful. I am ambivalent about what to do with them.
One set is utilities for getting and setting attributes which look like /** Set the value of the string attribute name to val, whether or not it was already there. */ bool set_string(ModelData *m, Particle *p, const String& name, std::string val) and /** Get the string attribute name or return default_value if the particle does not have any such attribute. */ const String& get_string(ModelData *m, Particle *p, const String& name, const String& default_value=String()) Obviously there are int and float versions.
Should such things go in to imp proper? On one hand they are extremely useful. On the other it is increasing the size of the interface and while it makes it more convenient, it doesn't add any power.
Next I have some helpers for manipulating hierarchy and bond particles (getting the number of children, getting specific children etc). These should go somewhere to establish norms for hierarchies.
Daniel Russel wrote: > I have various helper functions that are probably generally useful. I am > ambivalent about what to do with them. ... > bool set_string(... > const String& get_string(... ... > Should such things go in to imp proper? On one hand they are extremely > useful. On the other it is increasing the size of the interface and > while it makes it more convenient, it doesn't add any power.
My vote is no, for the same reasons. Plus, it seems like a conceptual division between add/remove of the attributes themselves and get/set of the values is a good thing to have, and these helper functions would erode that. Can't you structure your code so that you don't create the necessary attributes at the first step, and so don't need to do this query every time?
> Next I have some helpers for manipulating hierarchy and bond particles > (getting the number of children, getting specific children etc). These > should go somewhere to establish norms for hierarchies.
I don't think we've agreed on how hierarchy should be handled yet. Could you put an outline of your suggestion into the wiki? Then Keren and I can tweak it and add our own ideas. I certainly have reservations right now with how I imagine you think hierarchy should be handled (guessing from looking at your attribute names wiki page).
Ben
On Oct 25, 2007, at 11:38 PM, Ben Webb wrote:
> Daniel Russel wrote: >> I have various helper functions that are probably generally >> useful. I am ambivalent about what to do with them. > ... >> bool set_string(... >> const String& get_string(... > ... >> Should such things go in to imp proper? On one hand they are >> extremely useful. On the other it is increasing the size of the >> interface and while it makes it more convenient, it doesn't add >> any power. > > My vote is no, for the same reasons. Plus, it seems like a > conceptual division between add/remove of the attributes themselves > and get/set of the values is a good thing to have, and these helper > functions would erode that. Can't you structure your code so that > you don't create the necessary attributes at the first step, and so > don't need to do this query every time? In certain cases it is much easier to not worry about whether it has an attribute yet and not practical to add every attribute that something might have and which you would need to use later. For example, if you want to add a child to a node, it saves a few lines not not have to check if the child count is there already or not.
> > I don't think we've agreed on how hierarchy should be handled yet. > Could you put an outline of your suggestion into the wiki? Then > Keren and I can tweak it and add our own ideas. I certainly have > reservations right now with how I imagine you think hierarchy > should be handled (guessing from looking at your attribute names > wiki page). I think the wiki page with the attributes says it all--it is just a general tree. You have parent and child "pointers", a counter for how many children and a parent index and a string naming what sort of thing you are. I am pretty sure we shouldn't have anything less general (unless we want to fix that a residue always has a chain as a parent) and unless you want to support DAGs, I am not sure what a more general thing to support is.
Daniel Russel wrote: > For example, if you > want to add a child to a node, it saves a few lines not not have to > check if the child count is there already or not.
Granted, it would be useful there.
> I think the wiki page with the attributes says it all--it is just a > general tree. You have parent and child "pointers", a counter for how > many children and a parent index and a string naming what sort of thing > you are. I am pretty sure we shouldn't have anything less general > (unless we want to fix that a residue always has a chain as a parent) > and unless you want to support DAGs, I am not sure what a more general > thing to support is.
Well, I was thinking from the other direction - a container class which can contain particles and/or other containers. So for example, a rigid body would be a simple container of particles, a residue would be similar, while a chain would be a container of residues. (Whether you have an actual Residue class derived from Container, and whether it's done at the C++ or Python level, is debatable.) The two approaches are, of course, essentially equivalent.
If the wiki page is going to be the only information I can work on, here are my concerns with it thus far: ;) - It seems odd to me to treat a bond as a 'particle'. Would you do angles and dihedrals in the same way? - Is Residue just an example of a member of a hierarchy, or would chains and proteins be treated differently? - How would you deal with references to particle IDs to turned-off particles? "PI" is just an Int variable, right? - If I wanted to pull out every atom in residue 1, I'd really have to scan through every single particle to figure out which ones a) have a residue attribute and b) have it = 1 ? That seems inefficient.
Ben
>> I think the wiki page with the attributes says it all--it is just >> a general tree. You have parent and child "pointers", a counter >> for how many children and a parent index and a string naming what >> sort of thing you are. I am pretty sure we shouldn't have anything >> less general (unless we want to fix that a residue always has a >> chain as a parent) and unless you want to support DAGs, I am not >> sure what a more general thing to support is. > > Well, I was thinking from the other direction - a container class > which can contain particles and/or other containers. So for > example, a rigid body would be a simple container of particles, a > residue would be similar, while a chain would be a container of > residues. (Whether you have an actual Residue class derived from > Container, and whether it's done at the C++ or Python level, is > debatable.) The two approaches are, of course, essentially equivalent. But containers need to support attributes. So they are then simply a type of particle which also has a list of other particles in it (and a separate list of other containers, since we need containers to be nested). I really don't see the need over just storing the things in attributes.
> > If the wiki page is going to be the only information I can work on, > here are my concerns with it thus far: ;) > - It seems odd to me to treat a bond as a 'particle'. Would you do > angles and dihedrals in the same way? Sure. We need a place to put things. We provide a class of objects (namely Particles) which have centralized memory management which allows for fields to be optimized and (eventually) for the whole Model to be stored and reloaded and such. So we should use these objects. Adding another type of object just complicates things without adding any benefit.
> - Is Residue just an example of a member of a hierarchy, or would > chains and proteins be treated differently? A tree node is a tree node. It can happen to also have some biological function, but that is orthogonal to being a hierarchy node.
> - How would you deal with references to particle IDs to turned-off > particles? "PI" is just an Int variable, right? Check if the particle is turned on or off? Seems the same as with any other system.
> - If I wanted to pull out every atom in residue 1, I'd really have > to scan through every single particle to figure out which ones a) > have a residue attribute and b) have it = 1 ? That seems inefficient. You would find the particle for residue 1 and get "child_0", "child_1"... I don't think you should ever have to scan through all particles (and, personally, I don't think you should be able to as it would encourage bad habits).
Daniel Russel wrote: >> - Is Residue just an example of a member of a hierarchy, or would >> chains and proteins be treated differently? > A tree node is a tree node. It can happen to also have some biological > function, but that is orthogonal to being a hierarchy node.
I think you misunderstood my question. The wiki page has a description of what attributes a Residue has, but nothing about chains or proteins, so I was just trying to ascertain whether you just put in Residue as an example (and just haven't done chains/proteins yet) or whether you think they should be treated specially. I think your answer means the former, yes?
>> - How would you deal with references to particle IDs to turned-off >> particles? "PI" is just an Int variable, right? > Check if the particle is turned on or off? Seems the same as with any > other system.
Well, sure, but let's say I have a rigid body containing 500 atoms. It has 7 attributes - the xyz of its center of mass, and an orientation quaternion. These would both have to be updated if particles were added to or removed from the rigid body. By making these 'dumb' attributes, the only way to do that is to do the update every time you want to use the rigid body, which seems inefficient to me. In contrast, a ParticleContainer object could have a method to add/remove particles, so that it could do the update when necessary.
>> - If I wanted to pull out every atom in residue 1, I'd really have to >> scan through every single particle to figure out which ones a) have a >> residue attribute and b) have it = 1 ? That seems inefficient. > You would find the particle for residue 1 and get "child_0", "child_1"... > I don't think you should ever have to scan through all particles (and, > personally, I don't think you should be able to as it would encourage > bad habits).
Ah, I see - it wasn't clear to me from the wiki page. Then my concerns here are 1) you have the information in two locations, so you will need to do consistency checks to make sure that the child/parent pointers all point to the right thing; 2) that seems grossly inefficient - imagine a container with 10000 atoms, doing the string concatenation and formatting to get child_0 through child_9999, then the hashtable lookup, as opposed to just iterating through a std::vector<int>.
Ben
Rather than answer Ben's questions one by one, I'll try to address things in bulk.
The route we chose to go down for representing particles in IMP is very non-structured. You just give it a string and get back a value. This makes things very flexible (for example I can trivially implement the hierarchy on top of it) but makes it hard to maintain invariants. The best way to do this is to provide helper functions and beat users into using them. For example an add_child helper function adds a child to a node and makes sure the parent_index is correct. A compute_coordinates_from_center_of_mass_of_children function does exactly that. Unfortunately, there is no way of making sure that it gets called every time the the set of children change (although add_child/remove_child could of course be made more clever and we can use a State object to make sure it gets called after coordinates are updated by the optimizer).
Another issues is that all lookups involve searching for a string in a table. This can be expensive. The cost of generating the string should be trivial as they can easily be cached (I do so in my get_child helper function).
The alternative would have been to use an object hierarchy and have the objects manage everything internally. Then we can have all sorts of types of objects which allow you to get and set attributes directly (hiding the Model_data object and the indirection provided by the IntIndex sort of things from users of the various Particle classes). Then we would have a GeometricParticle which has methods x () and y() which return floats for the coordinates and a HierarchyParticle() which has child(i) etc. The main disadvantage is that you have to cast all over the place (but now that C++ has RTTI this isn't too bad). The other disadvantage is that loading data from files is more tricky as the mapping between the text string in the file and the attribute no longer happens for free (you have to know "X" corresponds to the function set_x()). We can provide macros to make this mapping easier though.
Personally I think the class based approach is better, but Brett liked databases and went with the former. The one thing I think we should not do is mix the two. Either everything is an object and you get things through C++ calls or everything is as it is currently and you manipulate things through helper functions. If we mix, it is hard to keep track of what everything is and make sure that things like saving and restoring state happen properly as well as just being ugly.
On Nov 2, 2007, at 4:54 PM, Ben Webb wrote:
> Daniel Russel wrote: >>> - Is Residue just an example of a member of a hierarchy, or would >>> chains and proteins be treated differently? >> A tree node is a tree node. It can happen to also have some >> biological function, but that is orthogonal to being a hierarchy >> node. > > I think you misunderstood my question. Quite likely :-)
> The wiki page has a description of what attributes a Residue has, > but nothing about chains or proteins, so I was just trying to > ascertain whether you just put in Residue as an example (and just > haven't done chains/proteins yet) or whether you think they should > be treated specially. I think your answer means the former, yes? Yes, the former. I just haven't had any reason to add more fields to chains or proteins other than what they have from being in the hierarchy and being a generic object (i.e. they have a name, a type and children and parents).
> Well, sure, but let's say I have a rigid body containing 500 atoms. > It has 7 attributes - the xyz of its center of mass, and an > orientation quaternion. These would both have to be updated if > particles were added to or removed from the rigid body. By making > these 'dumb' attributes, the only way to do that is to do the > update every time you want to use the rigid body, which seems > inefficient to me. In contrast, a ParticleContainer object could > have a method to add/remove particles, so that it could do the > update when necessary. To not answer your question, for updates to locations caused by the optimizer, a State object would handle things quite nicely.
I see your point that we need somewhere to put the functionality to call it when you add or remove a point. Personally I would prefer a free floating function that you call passing a particle in the hierarchy (like my hierarchy helper functions for getting the ith child). Then you could easily provide your own function if you want to do something slightly different or could apply the "compute center of mass of all children" function to a body which didn't happen to be rigid.
> >>> - If I wanted to pull out every atom in residue 1, I'd really >>> have to scan through every single particle to figure out which >>> ones a) have a residue attribute and b) have it = 1 ? That seems >>> inefficient. >> You would find the particle for residue 1 and get "child_0", >> "child_1"... >> I don't think you should ever have to scan through all particles >> (and, personally, I don't think you should be able to as it would >> encourage bad habits). > > Ah, I see - it wasn't clear to me from the wiki page. Then my > concerns here are 1) you have the information in two locations, so > you will need to do consistency checks to make sure that the child/ > parent pointers all point to the right thing; Yes, that is true.
> 2) that seems grossly inefficient - imagine a container with 10000 > atoms, doing the string concatenation and formatting to get child_0 > through child_9999, then the hashtable lookup, as opposed to just > iterating through a std::vector<int>. Well, you wouldn't actually do the string concatenation since that can be trivially cached (in fact I currently do it in my helper function). You would have to do the table lookup though. This is a general problem with our architecture which may prove to be a problem in the long run. Even if we special case the children in the hierachy, you still have the same problem when you want do to anything other than look at the children/parents of a hierarchy node (such as the coordinates).
Daniel Russel wrote: (excellent summary snipped) > Personally I think the class based approach is better, but Brett liked > databases and went with the former. The one thing I think we should not > do is mix the two. Either everything is an object and you get things > through C++ calls or everything is as it is currently and you manipulate > things through helper functions. If we mix, it is hard to keep track of > what everything is and make sure that things like saving and restoring > state happen properly as well as just being ugly.
I agree 100%, and I guess I wasn't being very clear in my previous emails, because this is exactly my position. We can do attributes, or we can do objects, and shouldn't mix them. If it becomes clear that we have to switch from attributes to objects, I'd much rather we do that now, because it'll be really hard to do it later.
Ben
There are a couple questions which need to be resolved before redo particles.
1) Multiple inheritance It would be nice to be able to arbitrarily mix and match Particle types-e.g. if we have a GeometricParticle and a HierarchyParticle to be able to produce a GeometricHierarchyParticle by inheriting from both. This causes several complications: Either: - We have a Particle base class which contains a ModelData pointer (and a bool for active, although I am still not entirely sure what this does). Then we have to use virtual inheritance (which people are probably not that familiar with) and (unless we disallow multiple inheritance of anything derived from Particle) careful bookkeeping on IO (I think C++ gets the constructors and destructors right for us)
- Use templates. When you want to declare a new particle type you declare it as template <class ParentParticle=Particle> struct NamedParticle: public ParentParticle { NamedParticle(ModelData* md): Parent(md){ IMP_ALLOCATE_STRING_ATTRIBUTE(name); // something like string_name_index_= ParentParticle::get_model_data()->add_string("") } IMP_STRING_ATTRIBUTE(name); // declares void set_name(String), String get_name() and StringIndex string_name_index_ virtual void show(std::ostream &out) const { IMP_SHOW_STRING(name); // out << "name: " << ParentParticle::get_model_data()- >get_string(string_name_index_) << std::endl; }
}; Then if you want a composite type you declare typedef GeometricParticle<NamedParticle<> > GeometricNamedParticle;
Not sure what we would do with this in Python. Perhaps just export the most derived particle types (and Particle) as those are all that can be used anyway. Python wouldn't then have to deal with the inheritance hierachy. This approach has all the nice properties we want, but requires that people use templates. Since particles are likely to be fairly simple, using templates might not be too bad.
- or we could disallow multiple inheritance all together. Doing so certainly simplifies things, but makes we worry that things would get too rigid or duplicated too much.
2) IO Currently particles can be trivially read from an XML file since there is only one object type and an easy mapping between attribute names and function calls. Things get more complicated on reading with Particles as objects since we have to figure out how to create the correct type of Particle. There are ways of doing this which can probably be done more or less invisibly within C++ (but I don't want to write test code for them). But we may want to see if there is some library which makes things cleaner. Perhaps doing everything through python pickling? We want to be able to support both python and C++ declared particles, I think.
On Nov 2, 2007, at 6:12 PM, Ben Webb wrote:
> Daniel Russel wrote: > (excellent summary snipped) >> Personally I think the class based approach is better, but Brett >> liked databases and went with the former. The one thing I think we >> should not do is mix the two. Either everything is an object and >> you get things through C++ calls or everything is as it is >> currently and you manipulate things through helper functions. If >> we mix, it is hard to keep track of what everything is and make >> sure that things like saving and restoring state happen properly >> as well as just being ugly. > > I agree 100%, and I guess I wasn't being very clear in my previous > emails, because this is exactly my position. We can do attributes, > or we can do objects, and shouldn't mix them. If it becomes clear > that we have to switch from attributes to objects, I'd much rather > we do that now, because it'll be really hard to do it later. > > Ben > -- > ben@salilab.org http://salilab.org/~ben/ > "It is a capital mistake to theorize before one has data." > - Sir Arthur Conan Doyle
I lied. The second proposal doesn't give us everything we want as there is no well-defined GeometricParticle to cast to. So the first or no multiple inheritance are the only two options. And I would favor keeping it simple for now as much as I want to be able to mix things together.
On Nov 2, 2007, at 10:56 PM, Daniel Russel wrote:
> There are a couple questions which need to be resolved before redo > particles. > > 1) Multiple inheritance > It would be nice to be able to arbitrarily mix and match Particle > types-e.g. if we have a GeometricParticle and a HierarchyParticle > to be able to produce a GeometricHierarchyParticle by inheriting > from both. This causes several complications: > Either: > - We have a Particle base class which contains a ModelData pointer > (and a bool for active, although I am still not entirely sure what > this does). Then we have to use virtual inheritance (which people > are probably not that familiar with) and (unless we disallow > multiple inheritance of anything derived from Particle) careful > bookkeeping on IO (I think C++ gets the constructors and > destructors right for us) > > - Use templates. When you want to declare a new particle type you > declare it as > template <class ParentParticle=Particle> > struct NamedParticle: public ParentParticle { > NamedParticle(ModelData* md): Parent(md){ > IMP_ALLOCATE_STRING_ATTRIBUTE(name); > // something like string_name_index_= > ParentParticle::get_model_data()->add_string("") > } > IMP_STRING_ATTRIBUTE(name); > // declares void set_name(String), String get_name() and > StringIndex string_name_index_ > virtual void show(std::ostream &out) const { > IMP_SHOW_STRING(name); > // out << "name: " << ParentParticle::get_model_data()- > >get_string(string_name_index_) << std::endl; > } > > }; > Then if you want a composite type you declare > typedef GeometricParticle<NamedParticle<> > GeometricNamedParticle; > > Not sure what we would do with this in Python. Perhaps just export > the most derived particle types (and Particle) as those are all > that can be used anyway. Python wouldn't then have to deal with the > inheritance hierachy. > This approach has all the nice properties we want, but requires > that people use templates. Since particles are likely to be fairly > simple, using templates might not be too bad. > > - or we could disallow multiple inheritance all together. Doing so > certainly simplifies things, but makes we worry that things would > get too rigid or duplicated too much. > > > 2) IO > Currently particles can be trivially read from an XML file since > there is only one object type and an easy mapping between attribute > names and function calls. Things get more complicated on reading > with Particles as objects since we have to figure out how to create > the correct type of Particle. There are ways of doing this which > can probably be done more or less invisibly within C++ (but I don't > want to write test code for them). But we may want to see if there > is some library which makes things cleaner. Perhaps doing > everything through python pickling? We want to be able to support > both python and C++ declared particles, I think. > > > > > > On Nov 2, 2007, at 6:12 PM, Ben Webb wrote: > >> Daniel Russel wrote: >> (excellent summary snipped) >>> Personally I think the class based approach is better, but Brett >>> liked databases and went with the former. The one thing I think >>> we should not do is mix the two. Either everything is an object >>> and you get things through C++ calls or everything is as it is >>> currently and you manipulate things through helper functions. If >>> we mix, it is hard to keep track of what everything is and make >>> sure that things like saving and restoring state happen properly >>> as well as just being ugly. >> >> I agree 100%, and I guess I wasn't being very clear in my previous >> emails, because this is exactly my position. We can do attributes, >> or we can do objects, and shouldn't mix them. If it becomes clear >> that we have to switch from attributes to objects, I'd much rather >> we do that now, because it'll be really hard to do it later. >> >> Ben >> -- >> ben@salilab.org http://salilab.org/~ben/ >> "It is a capital mistake to theorize before one has data." >> - Sir Arthur Conan Doyle
> Well, I was thinking from the other direction - a container class > which can contain particles and/or other containers. So for > example, a rigid body would be a simple container of particles, a > residue would be similar, while a chain would be a container of > residues. (Whether you have an actual Residue class derived from > Container, and whether it's done at the C++ or Python level, is > debatable.) The two approaches are, of course, essentially equivalent. To cover a loose end in my last email, I am not sure that providing a subclass of Particle called Container which provides more direct access to children and parents would be such a good idea. Why should we just do it for Containers? Why not geometric particles? Spherical particles? If we head that route we should probably just chuck the get_string, get_int sort of access entirely and just have a real class hierarchy.
Daniel Russel wrote: > To cover a loose end in my last email, I am not sure that providing a > subclass of Particle called Container which provides more direct access > to children and parents would be such a good idea. Why should we just do > it for Containers? Why not geometric particles? Spherical particles? If > we head that route we should probably just chuck the get_string, get_int > sort of access entirely and just have a real class hierarchy.
That is certainly a possibility. I'm trying to see how we'd use the existing attributes to encode hierarchy without having to add a lot of logic to them, and I agree with you that if we need to do that, we'll probably either be going down the inner platform road or will need to use real classes.
Ben
participants (3)
-
Ben Webb
-
Daniel Russel
-
Daniel Russel