I'm not entirely sure I agree (witness the last developers meeting :-). But am ambivalent. It could be structured better though which would help things converge faster.
That said, we have several outstanding questions:
- do we have PROTEINs which can contain CHAINs or just PROTEINS (which are chains). This we should probably just find something authorative and use it. I don't much care either way.
- what are the most useful things for one or more read_pdb functions to return? For this we should come up with standard usage cases. I would propose a couple here:
- someone is running through lots of PDB files and wants to load one protein from each file. To do this, it would be nice to have a function which loads a protein from the pdb and returns a hierarchy containing only that protein. Whether this protein has one or more chains depends on the answer to the first question
- load the whole structure from a pdb complete with many proteins and ligands and other molecules. For this it is useful to be able to read everything from one PDB model record.
- take one piece of the pdb and use it (such as a chain or ligand). For this it is nice not to have to dissect a hierarchy.
- load a bunch of model records from a single pdb and deal with all the molecules in each record.
Any other cases? Think about it and we will discuss it on Tuesday.
A proposal to think which handles the above cases is:
- one function which reads a protein from a pdb
- one function which reads everything from one model record in a pdb and returns it in a list/vector