These methods are not a very good solution, as they will be true in 99% percent of the cases, or even 100% (why should I call get_is_molecule() if I know that is going to be true all the time ? ). It is far more difficult, but far more useful, to agree of what we expect to get after a PDB, and how we are going to organize our structures in IMP into hierarchies. Together with Keren, I can give suggestions for low resolution structures.

Javi


2009/10/9 Daniel Russel <drussel@gmail.com>
As a followup to my email, Javi had raised issues concerning the choice of names of HierarchyType. It is problematic to have such a type label (as a protein is a molecule, but it is complicated to express such a reslationship). I propose removing the Hierarchy::get_type() method and HierarchyType type and replacing it with methods like
- get_is_protein() (true for any piece of protein)
- get_is_molecule() (true for any molecule)
- get_is_residue()
- get_is_atom()
- get_is_assembly() (not sure if we want this)
etc...

Something else to think about for Tuesday.

I'm not entirely sure I agree (witness the last developers meeting :-). But am ambivalent. It could be structured better though which would help things converge faster.

That said, we have several outstanding questions:
- do we have PROTEINs which can contain CHAINs or just PROTEINS (which are chains). This we should probably just find something authorative and use it. I don't much care either way.

- what are the most useful things for one or more read_pdb functions to return? For this we should come up with standard usage cases. I would propose a couple here:
  - someone is running through lots of PDB files and wants to load one protein from each file. To do this, it would be nice to have a function which loads a protein from the pdb and returns a hierarchy containing only that protein. Whether this protein has one or more chains depends on the answer to the first question
  - load the whole structure from a pdb complete with many proteins and ligands and other molecules. For this it is useful to be able to read everything from one PDB model record.
  - take one piece of the pdb and use it (such as a chain or ligand). For this it is nice not to have to dissect a hierarchy.
  - load a bunch of model records from a single pdb and deal with all the molecules in each record.

Any other cases? Think about it and we will discuss it on Tuesday.

A proposal to think which handles the above cases is:
- one function which reads a protein from a pdb
- one function which reads everything from one model record in a pdb and returns it in a list/vector

_______________________________________________
IMP-dev mailing list
IMP-dev@salilab.org
https://salilab.org/mailman/listinfo/imp-dev