Loading multiple molecules from a pdb file
We need to clean up what happens when multiple molecules are loaded from a PDB. My current favorite is to - make the current read_pdb only support loading a single protein (chain) from the first model of a PDB file and return it as a Hierarchy with the root being a PROTEIN. It can spit out warnings if there are other molecules in there.
- add a read_model_from_pdb which loads all molecules from a given model in the PDB (defaulting to the first) and which returns an array of Hierarchy decorators each of which has a root of either a PROTEIN or a MOLECULE (or some other name to specify a molecule that is not a protein or nucleic acid).
Thoughts?
a pdb might contain chains of an assembly, not all multiple chains are NMR structures. I think read_pdb is fine as is, but in case it would be decided to follow your suggestion, I vote for an additional function: read_pdb_assembly.
Oct 8, 2009, at 6:55 PM, Daniel Russel wrote:
> We need to clean up what happens when multiple molecules are loaded > from a PDB. My current favorite is to > - make the current read_pdb only support loading a single protein > (chain) from the first model of a PDB file and return it as a > Hierarchy with the root being a PROTEIN. It can spit out warnings if > there are other molecules in there. > > - add a read_model_from_pdb which loads all molecules from a given > model in the PDB (defaulting to the first) and which returns an > array of Hierarchy decorators each of which has a root of either a > PROTEIN or a MOLECULE (or some other name to specify a molecule that > is not a protein or nucleic acid). > > Thoughts? > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
On Oct 8, 2009, at 7:02 PM, Keren Lasker wrote:
> a pdb might contain chains of an assembly, not all multiple chains > are NMR structures. > I think read_pdb is fine as is, but in case it would be decided to > follow your suggestion, I vote for an additional function: > read_pdb_assembly. Currently it makes a PROTEIN consisting of anything if finds in the PDB (including random other molecules). Which then throws an exception when it discovers that this is not a valid hierarchy :-)
So something needs to be fixed.
A fixed version of the current approach (returning a UNIVERSE analogue containing the chains and other molecules) is annoying whenever you want to handle the different molecules separately as you have to remove each of them from the current hierarchy and then do something with them (rather than just sticking them into the new place). We could provide helper functions to merge UNIVERSES (or some alternate name as Javi doesn't like the name). You also have to make sure you get rid of all the water and other stuff in their yourself.
So maybe a better solution is: - read_pdb which returns a UNIVERSE consisting of everything found in the first model in the PDB and makes people sort out the proteins, ligands and waters. Making this switch broke a lot of example code, so it might not be a trivial change.
- read_protein_from_pdb which reads the first chain and returns a PROTEIN, for people who just want a protein and don't want to worry about the junk
if we feel like it, we could then provide - read_molecules_from_pdb which returns a Hierarchies, one for each molecule. This can be implemented later (but is easy).
On 10/08/2009 06:55 PM, Daniel Russel wrote: > We need to clean up what happens when multiple molecules are loaded from > a PDB.
Do you mean multiple chains in one model (potentially, but not necessarily, separated by TER records), multiple models separated by ENDMDL records, or both?
Ben
On Oct 8, 2009, at 7:03 PM, Ben Webb wrote:
> On 10/08/2009 06:55 PM, Daniel Russel wrote: >> We need to clean up what happens when multiple molecules are loaded >> from >> a PDB. > > Do you mean multiple chains in one model (potentially, but not > necessarily, separated by TER records), multiple models separated by > ENDMDL records, or both? Within one PDB MODEL record. Not just multiple chains of protein but heterogens too.
that is what it currently reads
On Thu, Oct 8, 2009 at 8:11 PM, Daniel Russel drussel@gmail.com wrote: > > On Oct 8, 2009, at 7:03 PM, Ben Webb wrote: > >> On 10/08/2009 06:55 PM, Daniel Russel wrote: >>> >>> We need to clean up what happens when multiple molecules are loaded from >>> a PDB. >> >> Do you mean multiple chains in one model (potentially, but not >> necessarily, separated by TER records), multiple models separated by ENDMDL >> records, or both? > > Within one PDB MODEL record. Not just multiple chains of protein but > heterogens too. > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >
Just to clarify:
1st cent)
Why limiting ourselves saying that we read models, or produce models, or do whatever with models? It is auto-destructive. We read, produce or treat STRUCTURES, and that is what a PDB contains, a chemical structure. And that is the term that I am proposing instead of UNIVERSE (which is confusing).
2nd cent)
Daniel's alternative of reading Hierarchies[1].Residues[1] is clearly less understandable that my proposal Structure.Proteins[1].Residues[1].Atoms[1].
My proposal, by the way, is Dina's 4 levels with more adequate chemical terms, imho. Substitute Proteins for Nucleic acids or whatever is adequate, of course.
Javi
2009/10/8 Dina Schneidman duhovka@gmail.com
> that is what it currently reads > > On Thu, Oct 8, 2009 at 8:11 PM, Daniel Russel drussel@gmail.com wrote: > > > > On Oct 8, 2009, at 7:03 PM, Ben Webb wrote: > > > >> On 10/08/2009 06:55 PM, Daniel Russel wrote: > >>> > >>> We need to clean up what happens when multiple molecules are loaded > from > >>> a PDB. > >> > >> Do you mean multiple chains in one model (potentially, but not > >> necessarily, separated by TER records), multiple models separated by > ENDMDL > >> records, or both? > > > > Within one PDB MODEL record. Not just multiple chains of protein but > > heterogens too. > > _______________________________________________ > > IMP-dev mailing list > > IMP-dev@salilab.org > > https://salilab.org/mailman/listinfo/imp-dev > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >
participants (5)
-
Ben Webb
-
Daniel Russel
-
Dina Schneidman
-
Javier Ángel Velázquez Muriel
-
Keren Lasker