Daniel Russel wrote: > I think Ben and I have a disagreement about the significance of picking > a PDB reader at this point. As I see it, any PDB reading should be > hidden behind some very general interface (I have a proposal using the > MolecularHierarchyDecorator).
Agreed. We can certainly use whichever PDB reader you're happy with in the short term. But I have a lot of experience with reading dodgy PDB files (although Eashwar is probably the lab expert in this regard) so know there's a lot of corner cases to worry about, and I don't relish fixing all of these again with a new PDB reader. And that's before we start worrying about heterogens.
> I picked my pdb reader since it is small and so can be stuck in the with > rest of imp so that no one has to worry about installing external > libraries and it does what I want, namely give me a hierarchy for > proteins and a bond information.
We're talking about two different things here. You want to distribute your PDB reader with IMP. I don't want to include the code in IMP SVN. The two issues are orthogonal; if you really want to bundle the code, we can do it at makedist time (for tarball releases) or we can use an SVN externals definition if you want it for SVN users. For the latter, just point me to the URL of your SVN repository (presumably this is a path within the CGAL SVN, or if you prefer I can make a repository for your PDB reader at svn.salilab.org).
> the one Frido sent- I don't see how to get bonds out of it, but > otherwise fine. The documentation really sucks, so I might be missing > something.
How are you "getting bonds" out of a PDB file? PDB files don't provide that information. (The most you can get is the CONECT and SSBOND records.) For that you generally need a description of the topology, which is generally covered by the topology file portion of most MM forcefields. I really hope this stuff isn't hard-coded, because that would really have trouble with patches and other residue modifications (think covalently-bonded ligands, acetylated termini, disulfides, MSE's, cyclic proteins, nucleic acids, etc.).
>> Hao's project absolutely requires HETATMs, for example. And I >> don't share your concern for runtime checks, since PDB reading is not >> performance-critical. > My concern on checks was not for efficiency, it was for correctness. > Depending on strings is poor as capitalization or abbreviation errors > don't easily get caught.
I didn't mean you wouldn't have actual atom type objects, just that they shouldn't be hardcoded. For example, Modeller reads a set of residue types from its parameter files at runtime, and after that maps every residue type in the PDB file from the string to an integer residue type. Unknown residue types result in a warning, and the generation of a new integer residue type at runtime. You could of course use Residue objects rather than integer types.
Ben