Re: [IMP-dev] [Fwd: PDB lib]

14 Nov 2007


      >> At any rate, this PDB reader stuff needs to be discussed on imp-dev 
>> before we proceed. For example, what's wrong with the BALL stuff you 
>> were playing with before? 
> BALL is dead. No activity on email list. No response to bugs. No move to
> actually document their newest version even though it was released a
> year ago. I don't think we want to tie ourselves to it. Sure we can take
> it to IMP dev. No one else seems to care much :-)
People certainly care (they keep coming to talk to me, anyway). But I 
guess they don't like writing emails.
If that's really the case for BALL, then we should probably explore 
other possibilities, as per Frido's email. I know that BALL's Python 
interface is rather lacking, certainly.
> I have looked around and asked around and couldn't find any decent PDB
> readers (in C or C++) which are not buried in some huge project.
>> Why can't we link against this PDB library, rather than 
>> cut-and-pasting thousands of lines of code?
> The nice thing about it is that it is small and simple and mine so we
> can just ship it along with IMP and not worry about dependencies, name
> collisions etc. I don't want people to have to get another library from
> somewhere else, hence my desire to put a copy into imp svn. Soon enough
> the lib will make it to fedora extras (whenever the next CGAL release
> is) so we could potentially just use that.
If it's an external library, it should be a dependency, not part of IMP. 
Otherwise, regardless of whether you describe it as a "fork", it'll fork 
as versions of it elsewhere change. CGAL source control sounds like the 
best place for it if it's going to be part of CGAL. Embedded copies of 
other projects are a great way to ensure that bugs never get fixed 
(think of all the projects that bundle zlib).
>> and 3. from a brief reading, it looks like a not-very-good PDB library 
>> anyway (hard-coded atom names - what's with that?)
> Well, it is either that or use strings which pushes the checks to
> runtime rather than compile time. Adding to an enum and recompiling is
> trivial (and adding a constant externally works just as well for must
> purposes). Checking everywhere than an object falls in a small set of
> allowed strings is hard (especially if you can't specify that set of
> strings anywhere). BALL has hardcoded atoms for that matter (just a lot
> more of them :-)
A PDB reader which needs to be recompiled for every new HETATM type is 
simply not going to work. See 
http://www.bmrb.wisc.edu/elec_dep/pdb_het_library/pdbhetn.htm for 
example. Hao's project absolutely requires HETATMs, for example. And I 
don't share your concern for runtime checks, since PDB reading is not 
performance-critical.
Any PDB reader that we adopt needs to be extendable at runtime. Even 
Modeller can do that. PyMol, for example, has a library of HETATM 
fragments (stored as Python pickles, I believe). It also needs to be 
extensible to be able to read PDBML or possibly MMCIF.
Everybody and his dog has written a PDB reader. Andrej wrote one. Maya 
wrote one. Javi wrote one. Keren wrote one. There's one in biopython, 
one in BALL, one in PyMol, one in Chimera, and one in Biskit, all free 
and widely available software. I can't believe we have to burden the 
world with another one.
Ben
-- 
ben@salilab.org                      http://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
    - Sir Arthur Conan Doyle

Re: [IMP-dev] [Fwd: PDB lib]

Ben Webb