Forwarding to the list:
-------- Original Message -------- Subject: Re: [IMP-dev] [Fwd: PDB lib] Date: Fri, 16 Nov 2007 12:18:04 -0800 From: Eswar Narayanan eswar@salilab.org To: Ben Webb ben@salilab.org CC: imp-dev@salilab.org References: 473A42F7.4060005@salilab.org 473BAF19.30207@salilab.org B3063E98-E0A6-4032-A70F-CF72419F8441@salilab.org 473DE358.4000203@salilab.org
Somebody called me an expert, so I'm compelled to reply! I must confess though that I haven't been paying attention to the discussions before this and couldn't locate the origin of this PDB thread. So I'm probably missing some critical information here - for instance, I have no clue why so much is being discussed about reading and interpreting a PDB file. Absolutely nothing in the PDB file is believable (I can give you any number of examples to prove this). So by definition, if someone claims they have the ultimate reader, I think that person should seriously consider taking astrology as a vocation.
But my two cents: I think the PDB parser of Modeller is already quite good and could be made better with just a few tweaks. More than anything else, I think we completely understand where the problems are. So even if it comes to recoding it from scratch (if licensing is an issue), in the long run, it is probably much easier and faster than tinkering with someone else's code. Ben and I have already had long email conversations on some of the limitations of the Modeller implementation and at least in principle know how to fix it. And even if Ben pokes fun at it, it has withstood the test of time for close to two decades.
Is there a reason you don't want to use the PDB reader from Modeller?
E.
On Nov 16, 2007, at 10:37 AM, Ben Webb wrote:
> Daniel Russel wrote: >> I think Ben and I have a disagreement about the significance of >> picking >> a PDB reader at this point. As I see it, any PDB reading should be >> hidden behind some very general interface (I have a proposal using >> the >> MolecularHierarchyDecorator). > > Agreed. We can certainly use whichever PDB reader you're happy with in > the short term. But I have a lot of experience with reading dodgy PDB > files (although Eashwar is probably the lab expert in this regard) so > know there's a lot of corner cases to worry about, and I don't relish > fixing all of these again with a new PDB reader. And that's before we > start worrying about heterogens. > >> I picked my pdb reader since it is small and so can be stuck in the >> with >> rest of imp so that no one has to worry about installing external >> libraries and it does what I want, namely give me a hierarchy for >> proteins and a bond information. > > We're talking about two different things here. You want to distribute > your PDB reader with IMP. I don't want to include the code in IMP SVN. > The two issues are orthogonal; if you really want to bundle the > code, we > can do it at makedist time (for tarball releases) or we can use an SVN > externals definition if you want it for SVN users. For the latter, > just > point me to the URL of your SVN repository (presumably this is a path > within the CGAL SVN, or if you prefer I can make a repository for your > PDB reader at svn.salilab.org). > >> the one Frido sent- I don't see how to get bonds out of it, but >> otherwise fine. The documentation really sucks, so I might be missing >> something. > > How are you "getting bonds" out of a PDB file? PDB files don't provide > that information. (The most you can get is the CONECT and SSBOND > records.) For that you generally need a description of the topology, > which is generally covered by the topology file portion of most MM > forcefields. I really hope this stuff isn't hard-coded, because that > would really have trouble with patches and other residue modifications > (think covalently-bonded ligands, acetylated termini, disulfides, > MSE's, > cyclic proteins, nucleic acids, etc.). > >>> Hao's project absolutely requires HETATMs, for example. And I >>> don't share your concern for runtime checks, since PDB reading is >>> not >>> performance-critical. >> My concern on checks was not for efficiency, it was for correctness. >> Depending on strings is poor as capitalization or abbreviation errors >> don't easily get caught. > > I didn't mean you wouldn't have actual atom type objects, just that > they > shouldn't be hardcoded. For example, Modeller reads a set of residue > types from its parameter files at runtime, and after that maps every > residue type in the PDB file from the string to an integer residue > type. > Unknown residue types result in a warning, and the generation of a new > integer residue type at runtime. You could of course use Residue > objects > rather than integer types. > > Ben > -- > ben@salilab.org http://salilab.org/~ben/ > "It is a capital mistake to theorize before one has data." > - Sir Arthur Conan Doyle > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
I was assuming the modeller reader was out for IP reasons, but if we can use it, it would be great if someone could write a bridge for it soon.
On Nov 16, 2007, at 12:57 PM, Ben Webb wrote:
> Forwarding to the list: > > -------- Original Message -------- > Subject: Re: [IMP-dev] [Fwd: PDB lib] > Date: Fri, 16 Nov 2007 12:18:04 -0800 > From: Eswar Narayanan eswar@salilab.org > To: Ben Webb ben@salilab.org > CC: imp-dev@salilab.org > References: 473A42F7.4060005@salilab.org > 473BAF19.30207@salilab.org > B3063E98-E0A6-4032-A70F-CF72419F8441@salilab.org > 473DE358.4000203@salilab.org > > Somebody called me an expert, so I'm compelled to reply! I must > confess though that I haven't been paying attention to the discussions > before this and couldn't locate the origin of this PDB thread. So I'm > probably missing some critical information here - for instance, I have > no clue why so much is being discussed about reading and interpreting > a PDB file. Absolutely nothing in the PDB file is believable (I can > give you any number of examples to prove this). So by definition, if > someone claims they have the ultimate reader, I think that person > should seriously consider taking astrology as a vocation. > > But my two cents: I think the PDB parser of Modeller is already quite > good and could be made better with just a few tweaks. More than > anything else, I think we completely understand where the problems > are. So even if it comes to recoding it from scratch (if licensing is > an issue), in the long run, it is probably much easier and faster than > tinkering with someone else's code. Ben and I have already had long > email conversations on some of the limitations of the Modeller > implementation and at least in principle know how to fix it. And even > if Ben pokes fun at it, it has withstood the test of time for close to > two decades. > > Is there a reason you don't want to use the PDB reader from Modeller? > > E. > > > > > On Nov 16, 2007, at 10:37 AM, Ben Webb wrote: > >> Daniel Russel wrote: >>> I think Ben and I have a disagreement about the significance of >>> picking >>> a PDB reader at this point. As I see it, any PDB reading should be >>> hidden behind some very general interface (I have a proposal using >>> the >>> MolecularHierarchyDecorator). >> >> Agreed. We can certainly use whichever PDB reader you're happy >> with in >> the short term. But I have a lot of experience with reading dodgy PDB >> files (although Eashwar is probably the lab expert in this regard) so >> know there's a lot of corner cases to worry about, and I don't relish >> fixing all of these again with a new PDB reader. And that's before we >> start worrying about heterogens. >> >>> I picked my pdb reader since it is small and so can be stuck in the >>> with >>> rest of imp so that no one has to worry about installing external >>> libraries and it does what I want, namely give me a hierarchy for >>> proteins and a bond information. >> >> We're talking about two different things here. You want to distribute >> your PDB reader with IMP. I don't want to include the code in IMP >> SVN. >> The two issues are orthogonal; if you really want to bundle the >> code, we >> can do it at makedist time (for tarball releases) or we can use an >> SVN >> externals definition if you want it for SVN users. For the latter, >> just >> point me to the URL of your SVN repository (presumably this is a path >> within the CGAL SVN, or if you prefer I can make a repository for >> your >> PDB reader at svn.salilab.org). >> >>> the one Frido sent- I don't see how to get bonds out of it, but >>> otherwise fine. The documentation really sucks, so I might be >>> missing >>> something. >> >> How are you "getting bonds" out of a PDB file? PDB files don't >> provide >> that information. (The most you can get is the CONECT and SSBOND >> records.) For that you generally need a description of the topology, >> which is generally covered by the topology file portion of most MM >> forcefields. I really hope this stuff isn't hard-coded, because that >> would really have trouble with patches and other residue >> modifications >> (think covalently-bonded ligands, acetylated termini, disulfides, >> MSE's, >> cyclic proteins, nucleic acids, etc.). >> >>>> Hao's project absolutely requires HETATMs, for example. And I >>>> don't share your concern for runtime checks, since PDB reading is >>>> not >>>> performance-critical. >>> My concern on checks was not for efficiency, it was for correctness. >>> Depending on strings is poor as capitalization or abbreviation >>> errors >>> don't easily get caught. >> >> I didn't mean you wouldn't have actual atom type objects, just that >> they >> shouldn't be hardcoded. For example, Modeller reads a set of residue >> types from its parameter files at runtime, and after that maps every >> residue type in the PDB file from the string to an integer residue >> type. >> Unknown residue types result in a warning, and the generation of a >> new >> integer residue type at runtime. You could of course use Residue >> objects >> rather than integer types. >> >> Ben >> -- >> ben@salilab.org http://salilab.org/~ben/ >> "It is a capital mistake to theorize before one has data." >> - Sir Arthur Conan Doyle >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > -- > ben@salilab.org http://salilab.org/~ben/ > "It is a capital mistake to theorize before one has data." > - Sir Arthur Conan Doyle > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
Daniel Russel wrote: > I was assuming the modeller reader was out for IP reasons
On the contrary, we should be using whatever we can from Modeller. If Modeller is in future determined to be deficient in some way, we can dump that functionality and write a new module in IMP, but if Modeller's up to the job, we should use it by all means. We're not setting out to rewrite everything from scratch - that would be silly.
> it would be great if someone could write a bridge for it > soon.
import modeller.scripts e = modeller.environ() e.libs.topology.read('${LIB}/top_heav.lib') e.libs.parameters.read('${LIB}/par.lib') m = modeller.scripts.complete_pdb(e, "my.pdb")
after which all the coordinates are available in m.atoms[], all the bonds in m.bonds[], angles in m.angles[], etc.
Ben
participants (2)
-
Ben Webb
-
Daniel Russel