>> At any rate, this PDB reader stuff needs to be discussed on imp-dev >> before we proceed. For example, what's wrong with the BALL stuff you >> were playing with before? > BALL is dead. No activity on email list. No response to bugs. No move to > actually document their newest version even though it was released a > year ago. I don't think we want to tie ourselves to it. Sure we can take > it to IMP dev. No one else seems to care much :-)
People certainly care (they keep coming to talk to me, anyway). But I guess they don't like writing emails.
If that's really the case for BALL, then we should probably explore other possibilities, as per Frido's email. I know that BALL's Python interface is rather lacking, certainly.
> I have looked around and asked around and couldn't find any decent PDB > readers (in C or C++) which are not buried in some huge project. >> Why can't we link against this PDB library, rather than >> cut-and-pasting thousands of lines of code? > The nice thing about it is that it is small and simple and mine so we > can just ship it along with IMP and not worry about dependencies, name > collisions etc. I don't want people to have to get another library from > somewhere else, hence my desire to put a copy into imp svn. Soon enough > the lib will make it to fedora extras (whenever the next CGAL release > is) so we could potentially just use that.
If it's an external library, it should be a dependency, not part of IMP. Otherwise, regardless of whether you describe it as a "fork", it'll fork as versions of it elsewhere change. CGAL source control sounds like the best place for it if it's going to be part of CGAL. Embedded copies of other projects are a great way to ensure that bugs never get fixed (think of all the projects that bundle zlib).
>> and 3. from a brief reading, it looks like a not-very-good PDB library >> anyway (hard-coded atom names - what's with that?) > Well, it is either that or use strings which pushes the checks to > runtime rather than compile time. Adding to an enum and recompiling is > trivial (and adding a constant externally works just as well for must > purposes). Checking everywhere than an object falls in a small set of > allowed strings is hard (especially if you can't specify that set of > strings anywhere). BALL has hardcoded atoms for that matter (just a lot > more of them :-)
A PDB reader which needs to be recompiled for every new HETATM type is simply not going to work. See http://www.bmrb.wisc.edu/elec_dep/pdb_het_library/pdbhetn.htm for example. Hao's project absolutely requires HETATMs, for example. And I don't share your concern for runtime checks, since PDB reading is not performance-critical.
Any PDB reader that we adopt needs to be extendable at runtime. Even Modeller can do that. PyMol, for example, has a library of HETATM fragments (stored as Python pickles, I believe). It also needs to be extensible to be able to read PDBML or possibly MMCIF.
Everybody and his dog has written a PDB reader. Andrej wrote one. Maya wrote one. Javi wrote one. Keren wrote one. There's one in biopython, one in BALL, one in PyMol, one in Chimera, and one in Biskit, all free and widely available software. I can't believe we have to burden the world with another one.
Ben