Multichain proteins

Daniel Russel

8 Oct 2009 8 Oct '09

6:13 p.m.

Does it make sense to talk about a protein which consists of more than one chain? I've heard people use the words that way (and there are google hits, but not a huge number), but it was suggested that this is a misuse of the words. It would make the atom hierarchy a bit simpler to say a protein is a single chain and has HierarchyType PROTEIN (and to remove the CHAIN type).

Authoritative answers? Votes?

Show replies by date

Keren Lasker

8 Oct 8 Oct

6:15 p.m.

for me more then one chain is an assembly ( or complex) I would leave Chain because in modeling sometimes people takes domains from different places ( with different chain ids) and this information might be useful. On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote:

> Does it make sense to talk about a protein which consists of more > than one chain? I've heard people use the words that way (and there > are google hits, but not a huge number), but it was suggested that > this is a misuse of the words. It would make the atom hierarchy a > bit simpler to say a protein is a single chain and has HierarchyType > PROTEIN (and to remove the CHAIN type). > > Authoritative answers? Votes? > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Daniel Russel

6:18 p.m.

On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote:

> for me more then one chain is an assembly ( or complex) > I would leave Chain because in modeling sometimes people takes > domains from different places ( with different chain ids) and this > information might be useful. We would leave the Chain decorator around (and the chain ID), even if we remove the CHAIN type, if that is what you are worried about.

> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: > >> Does it make sense to talk about a protein which consists of more >> than one chain? I've heard people use the words that way (and there >> are google hits, but not a huge number), but it was suggested that >> this is a misuse of the words. It would make the atom hierarchy a >> bit simpler to say a protein is a single chain and has >> HierarchyType PROTEIN (and to remove the CHAIN type). >> >> Authoritative answers? Votes? >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Keren Lasker

6:32 p.m.

ok - if you mean that Chain should not be part of the Hierarchy, I guess it makes sense, as usually protein == chain. On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote:

> for me more then one chain is an assembly ( or complex) > I would leave Chain because in modeling sometimes people takes > domains from different places ( with different chain ids) and this > information might be useful. > On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: > >> Does it make sense to talk about a protein which consists of more >> than one chain? I've heard people use the words that way (and there >> are google hits, but not a huge number), but it was suggested that >> this is a misuse of the words. It would make the atom hierarchy a >> bit simpler to say a protein is a single chain and has >> HierarchyType PROTEIN (and to remove the CHAIN type). >> >> Authoritative answers? Votes? >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev >

Daniel Russel

6:35 p.m.

On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote:

> ok - if you mean that Chain should not be part of the Hierarchy, I > guess it makes sense, as usually protein == chain. To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are HierarchyTypes and Chain is a decorator. So there would not be a CHAIN hierarchy type, but a PROTEIN could be a Chain (if it has a chain designator). Sounds a bit icky...

> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: > >> for me more then one chain is an assembly ( or complex) >> I would leave Chain because in modeling sometimes people takes >> domains from different places ( with different chain ids) and this >> information might be useful. >> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: >> >>> Does it make sense to talk about a protein which consists of more >>> than one chain? I've heard people use the words that way (and >>> there are google hits, but not a huge number), but it was >>> suggested that this is a misuse of the words. It would make the >>> atom hierarchy a bit simpler to say a protein is a single chain >>> and has HierarchyType PROTEIN (and to remove the CHAIN type). >>> >>> Authoritative answers? Votes? >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev >> > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Keren Lasker

6:41 p.m.

sounds good to me On Oct 8, 2009, at 6:35 PM, Daniel Russel wrote:

> > On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote: > >> ok - if you mean that Chain should not be part of the Hierarchy, I >> guess it makes sense, as usually protein == chain. > To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are > HierarchyTypes and Chain is a decorator. So there would not be a > CHAIN hierarchy type, but a PROTEIN could be a Chain (if it has a > chain designator). Sounds a bit icky... > >> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: >> >>> for me more then one chain is an assembly ( or complex) >>> I would leave Chain because in modeling sometimes people takes >>> domains from different places ( with different chain ids) and this >>> information might be useful. >>> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: >>> >>>> Does it make sense to talk about a protein which consists of more >>>> than one chain? I've heard people use the words that way (and >>>> there are google hits, but not a huge number), but it was >>>> suggested that this is a misuse of the words. It would make the >>>> atom hierarchy a bit simpler to say a protein is a single chain >>>> and has HierarchyType PROTEIN (and to remove the CHAIN type). >>>> >>>> Authoritative answers? Votes? >>>> _______________________________________________ >>>> IMP-dev mailing list >>>> IMP-dev@salilab.org >>>> https://salilab.org/mailman/listinfo/imp-dev >>> >> >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev

Dina Schneidman

8:35 p.m.

Protein is more than a chain. Chain corresponds to tertiary structure. Protein's quaternary structure can have more than one chain! A classic example is hemoglobin, 4 chains. Another classics is antibody, 2 chains. So we need chains around! and also how can we add bonds without chains? do you plan to connect them together?

and let me put two more cents: PDB format does not define any hierarchy. it is a set of atoms. if we want to build an hierarchy out of PDB it should clearly follow from the format. So the best way is to have 4 levels that are well defined by the corresponding PDB fields: Atom, Residue, Chain, Root I think all other assumptions are only assumptions and a good source for bugs.

On Thu, Oct 8, 2009 at 6:41 PM, Keren Lasker kerenl@salilab.org wrote: > sounds good to me > On Oct 8, 2009, at 6:35 PM, Daniel Russel wrote: > >> >> On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote: >> >>> ok - if you mean that Chain should not be part of the Hierarchy, I guess >>> it makes sense, as usually protein == chain. >> >> To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are >> HierarchyTypes and Chain is a decorator. So there would not be a CHAIN >> hierarchy type, but a PROTEIN could be a Chain (if it has a chain >> designator). Sounds a bit icky... >> >>> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: >>> >>>> for me more then one chain is an assembly ( or complex) >>>> I would leave Chain because in modeling sometimes people takes domains >>>> from different places ( with different chain ids) and this information might >>>> be useful. >>>> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: >>>> >>>>> Does it make sense to talk about a protein which consists of more than >>>>> one chain? I've heard people use the words that way (and there are google >>>>> hits, but not a huge number), but it was suggested that this is a misuse of >>>>> the words. It would make the atom hierarchy a bit simpler to say a protein >>>>> is a single chain and has HierarchyType PROTEIN (and to remove the CHAIN >>>>> type). >>>>> >>>>> Authoritative answers? Votes? >>>>> _______________________________________________ >>>>> IMP-dev mailing list >>>>> IMP-dev@salilab.org >>>>> https://salilab.org/mailman/listinfo/imp-dev >>>> >>> >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >

Keren Lasker

8:48 p.m.

ok - fair enough .... all the proteins I worked with so far had a single chain :) 1. by Root you mean Protein? 2. Fragment is a good level to have, so I vote we keep it.

Just to engage others in the discussion, the current atom hierarchy is:

** \name Hierarchy Types The various valid levels for the atom Hierarchy: - ATOM (0) an atom - RESIDUE (1) a residue - NUCLEICACID (2) a nucleic acid - FRAGMENT (3) an arbitrary fragment - DOMAIN (4) a chain of a protein - CHAIN (5) a chain of a protein - PROTEIN (6) a protein - NUCLEOTIDE (7) a nucleotide - MOLECULE (8) an arbitrary molecule - ASSEMBLY (9) an assembly - COLLECTION (10) a group of assemblies - UNIVERSE is all the molecules in existance at once. - UNIVERSES is a set of universes - TRAJECTORY is an ordered set of UNIVERSES

On Oct 8, 2009, at 8:35 PM, Dina Schneidman wrote:

> Protein is more than a chain. Chain corresponds to tertiary structure. > Protein's quaternary structure can have more than one chain! > A classic example is hemoglobin, 4 chains. Another classics is > antibody, 2 chains. > So we need chains around! and also how can we add bonds without > chains? do you plan to connect them together? > > and let me put two more cents: > PDB format does not define any hierarchy. it is a set of atoms. if we > want to build an hierarchy out of PDB it should clearly follow from > the format. So the best way is to have 4 levels that are well defined > by the corresponding PDB fields: > Atom, Residue, Chain, Root > I think all other assumptions are only assumptions and a good source > for bugs. > > On Thu, Oct 8, 2009 at 6:41 PM, Keren Lasker kerenl@salilab.org > wrote: >> sounds good to me >> On Oct 8, 2009, at 6:35 PM, Daniel Russel wrote: >> >>> >>> On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote: >>> >>>> ok - if you mean that Chain should not be part of the Hierarchy, >>>> I guess >>>> it makes sense, as usually protein == chain. >>> >>> To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are >>> HierarchyTypes and Chain is a decorator. So there would not be a >>> CHAIN >>> hierarchy type, but a PROTEIN could be a Chain (if it has a chain >>> designator). Sounds a bit icky... >>> >>>> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: >>>> >>>>> for me more then one chain is an assembly ( or complex) >>>>> I would leave Chain because in modeling sometimes people takes >>>>> domains >>>>> from different places ( with different chain ids) and this >>>>> information might >>>>> be useful. >>>>> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: >>>>> >>>>>> Does it make sense to talk about a protein which consists of >>>>>> more than >>>>>> one chain? I've heard people use the words that way (and there >>>>>> are google >>>>>> hits, but not a huge number), but it was suggested that this is >>>>>> a misuse of >>>>>> the words. It would make the atom hierarchy a bit simpler to >>>>>> say a protein >>>>>> is a single chain and has HierarchyType PROTEIN (and to remove >>>>>> the CHAIN >>>>>> type). >>>>>> >>>>>> Authoritative answers? Votes? >>>>>> _______________________________________________ >>>>>> IMP-dev mailing list >>>>>> IMP-dev@salilab.org >>>>>> https://salilab.org/mailman/listinfo/imp-dev >>>>> >>>> >>>> _______________________________________________ >>>> IMP-dev mailing list >>>> IMP-dev@salilab.org >>>> https://salilab.org/mailman/listinfo/imp-dev >> >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev >> > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Keren Lasker

9:06 p.m.

New subject: design discussions

hi,

I think that it would be much more productive to have discussion on major design changes in IMP meeting and not via emails. It would be preferable to raise issues in an email before the meeting and write an email with conclusions after the meeting.

So - for example, I suggest that the discussion about proper atom Hierarchy should take place in our next IMP meeting. If Frido has suggestions he can email them.

Just to clarify, I do not think that all discussions should move to IMP meetings, just the major ones.

Keren.

Dina Schneidman

9:21 p.m.

New subject: design discussions

completely agree. these are resolved much faster by talking :)

On Thu, Oct 8, 2009 at 9:06 PM, Keren Lasker kerenl@salilab.org wrote: > hi, > > I think that it would be much more productive to have discussion on major > design changes in IMP meeting and not via emails. > It would be preferable to raise issues in an email before the meeting and > write an email with conclusions after the meeting. > > So - for example, I suggest that the discussion about proper atom Hierarchy > should take place in our next IMP meeting. If Frido has suggestions he can > email them. > > Just to clarify, I do not think that all discussions should move to IMP > meetings, just the major ones. > > > ?? > > > Keren. > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >

Daniel Russel

9:59 p.m.

New subject: design discussions

On Oct 8, 2009, at 9:21 PM, Dina Schneidman wrote:

> completely agree. these are resolved much faster by talking :) I'm not entirely sure I agree (witness the last developers meeting :-). But am ambivalent. It could be structured better though which would help things converge faster.

That said, we have several outstanding questions: - do we have PROTEINs which can contain CHAINs or just PROTEINS (which are chains). This we should probably just find something authorative and use it. I don't much care either way.

- what are the most useful things for one or more read_pdb functions to return? For this we should come up with standard usage cases. I would propose a couple here: - someone is running through lots of PDB files and wants to load one protein from each file. To do this, it would be nice to have a function which loads a protein from the pdb and returns a hierarchy containing only that protein. Whether this protein has one or more chains depends on the answer to the first question - load the whole structure from a pdb complete with many proteins and ligands and other molecules. For this it is useful to be able to read everything from one PDB model record. - take one piece of the pdb and use it (such as a chain or ligand). For this it is nice not to have to dissect a hierarchy. - load a bunch of model records from a single pdb and deal with all the molecules in each record.

Any other cases? Think about it and we will discuss it on Tuesday.

A proposal to think which handles the above cases is: - one function which reads a protein from a pdb - one function which reads everything from one model record in a pdb and returns it in a list/vector

Good night.

Javier Ángel Velázquez Muriel

10:11 p.m.

New subject: design discussions

2009/10/8 Daniel Russel drussel@gmail.com

> > On Oct 8, 2009, at 9:21 PM, Dina Schneidman wrote: > > completely agree. these are resolved much faster by talking :) >> > I'm not entirely sure I agree (witness the last developers meeting :-). But > am ambivalent. It could be structured better though which would help things > converge faster. >

IMP meetings, and each meeting in general tend to be focused explaining or convincing others from things already done rather than discussing future development or listening to others. Ideas are rarely discussed.

> That said, we have several outstanding questions: > - do we have PROTEINs which can contain CHAINs or just PROTEINS (which are > chains). This we should probably just find something authorative and use it. > I don't much care either way. >

Or easier, agree on something. But agree everybody.

- what are the most useful things for one or more read_pdb functions to > return? For this we should come up with standard usage cases. I would > propose a couple here: >

> - someone is running through lots of PDB files and wants to load one > protein from each file. To do this, it would be nice to have a function > which loads a protein from the pdb and returns a hierarchy containing only > that protein. Whether this protein has one or more chains depends on the > answer to the first question > - load the whole structure from a pdb complete with many proteins and > ligands and other molecules. For this it is useful to be able to read > everything from one PDB model record. > - take one piece of the pdb and use it (such as a chain or ligand). For > this it is nice not to have to dissect a hierarchy. > - load a bunch of model records from a single pdb and deal with all the > molecules in each record. >

Everybody agrees on that, a I would bet that everybody agree on 4 levels for the hierarchy. The things that need to be clarified is what we get after doing each of the tasks proposed, and how we call them to obtain a result the least obscure possible.

> > Any other cases? Think about it and we will discuss it on Tuesday. > > A proposal to think which handles the above cases is: > - one function which reads a protein from a pdb > - one function which reads everything from one model record in a pdb and > returns it in a list/vector > > > Good night. > >

Francisco Melo

9 Oct 9 Oct

5:38 a.m.

New subject: design discussions

Hello Guys,

I agree with Javier suggestions.

Think of IMP as a product. Think of future IMP users as a customer. Any successful development should be always focused on the customer. I know that in this case the end user would be a biologist, a physicist, a chemist, an engineer, etc. (very broad). However, you should try to adopt/use a logic that is familiar to the user, depending on the problem that is being attacked. Certainly, the case of proteins will be the more difficult to define clearly (biology is full of exceptions).

You should sit and discuss ideas, as Javier suggested. I know you have a great team down there, and of course, you will end up with a great application !

Cheers,

Pancho.

On Oct 9, 2009, at 1:11 AM, Javier Ángel Velázquez Muriel wrote:

> > > 2009/10/8 Daniel Russel drussel@gmail.com > > On Oct 8, 2009, at 9:21 PM, Dina Schneidman wrote: > > completely agree. these are resolved much faster by talking :) > I'm not entirely sure I agree (witness the last developers > meeting :-). But am ambivalent. It could be structured better > though which would help things converge faster. > > IMP meetings, and each meeting in general tend to be focused > explaining or convincing others from things already done rather > than discussing future development or listening to others. Ideas > are rarely discussed. > > > > That said, we have several outstanding questions: > - do we have PROTEINs which can contain CHAINs or just PROTEINS > (which are chains). This we should probably just find something > authorative and use it. I don't much care either way. > > Or easier, agree on something. But agree everybody. > > > - what are the most useful things for one or more read_pdb > functions to return? For this we should come up with standard usage > cases. I would propose a couple here: > > - someone is running through lots of PDB files and wants to load > one protein from each file. To do this, it would be nice to have a > function which loads a protein from the pdb and returns a hierarchy > containing only that protein. Whether this protein has one or more > chains depends on the answer to the first question > - load the whole structure from a pdb complete with many > proteins and ligands and other molecules. For this it is useful to > be able to read everything from one PDB model record. > - take one piece of the pdb and use it (such as a chain or > ligand). For this it is nice not to have to dissect a hierarchy. > - load a bunch of model records from a single pdb and deal with > all the molecules in each record. > > Everybody agrees on that, a I would bet that everybody agree on 4 > levels for the hierarchy. The things that need to be clarified is > what we get after doing each of the tasks proposed, and how we call > them to obtain a result the least obscure possible. > > > > Any other cases? Think about it and we will discuss it on Tuesday. > > A proposal to think which handles the above cases is: > - one function which reads a protein from a pdb > - one function which reads everything from one model record in a > pdb and returns it in a list/vector > > > Good night. > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Keren Lasker

2:02 a.m.

New subject: design discussions

cool!

Daniel - I assume that most people just gave up on reading this entire tread (I know I almost did ...) , as it is too long with many brunches that were never resolved. It would be a good idea to send on Monday a remainder with the topics for discussion and the proposed solutions, even better as a wiki page so others can add their solutions as well.

I will add the low-res perspective once you will post your ideas :)

Keren. On Oct 8, 2009, at 9:59 PM, Daniel Russel wrote:

> > On Oct 8, 2009, at 9:21 PM, Dina Schneidman wrote: > >> completely agree. these are resolved much faster by talking :) > I'm not entirely sure I agree (witness the last developers > meeting :-). But am ambivalent. It could be structured better though > which would help things converge faster. > > That said, we have several outstanding questions: > - do we have PROTEINs which can contain CHAINs or just PROTEINS > (which are chains). This we should probably just find something > authorative and use it. I don't much care either way. > > - what are the most useful things for one or more read_pdb functions > to return? For this we should come up with standard usage cases. I > would propose a couple here: > - someone is running through lots of PDB files and wants to load > one protein from each file. To do this, it would be nice to have a > function which loads a protein from the pdb and returns a hierarchy > containing only that protein. Whether this protein has one or more > chains depends on the answer to the first question > - load the whole structure from a pdb complete with many proteins > and ligands and other molecules. For this it is useful to be able to > read everything from one PDB model record. > - take one piece of the pdb and use it (such as a chain or > ligand). For this it is nice not to have to dissect a hierarchy. > - load a bunch of model records from a single pdb and deal with > all the molecules in each record. > > Any other cases? Think about it and we will discuss it on Tuesday. > > A proposal to think which handles the above cases is: > - one function which reads a protein from a pdb > - one function which reads everything from one model record in a pdb > and returns it in a list/vector > > > Good night. > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev

Daniel Russel

9:40 a.m.

New subject: design discussions

As a followup to my email, Javi had raised issues concerning the choice of names of HierarchyType. It is problematic to have such a type label (as a protein is a molecule, but it is complicated to express such a reslationship). I propose removing the Hierarchy::get_type() method and HierarchyType type and replacing it with methods like - get_is_protein() (true for any piece of protein) - get_is_molecule() (true for any molecule) - get_is_residue() - get_is_atom() - get_is_assembly() (not sure if we want this) etc...

Something else to think about for Tuesday.

> I'm not entirely sure I agree (witness the last developers > meeting :-). But am ambivalent. It could be structured better though > which would help things converge faster. > > That said, we have several outstanding questions: > - do we have PROTEINs which can contain CHAINs or just PROTEINS > (which are chains). This we should probably just find something > authorative and use it. I don't much care either way. > > - what are the most useful things for one or more read_pdb functions > to return? For this we should come up with standard usage cases. I > would propose a couple here: > - someone is running through lots of PDB files and wants to load > one protein from each file. To do this, it would be nice to have a > function which loads a protein from the pdb and returns a hierarchy > containing only that protein. Whether this protein has one or more > chains depends on the answer to the first question > - load the whole structure from a pdb complete with many proteins > and ligands and other molecules. For this it is useful to be able to > read everything from one PDB model record. > - take one piece of the pdb and use it (such as a chain or > ligand). For this it is nice not to have to dissect a hierarchy. > - load a bunch of model records from a single pdb and deal with > all the molecules in each record. > > Any other cases? Think about it and we will discuss it on Tuesday. > > A proposal to think which handles the above cases is: > - one function which reads a protein from a pdb > - one function which reads everything from one model record in a pdb > and returns it in a list/vector

Javier Ángel Velázquez Muriel

10:02 a.m.

New subject: design discussions

These methods are not a very good solution, as they will be true in 99% percent of the cases, or even 100% (why should I call get_is_molecule() if I know that is going to be true all the time ? ). It is far more difficult, but far more useful, to agree of what we expect to get after a PDB, and how we are going to organize our structures in IMP into hierarchies. Together with Keren, I can give suggestions for low resolution structures.

Javi

2009/10/9 Daniel Russel drussel@gmail.com

> As a followup to my email, Javi had raised issues concerning the choice of > names of HierarchyType. It is problematic to have such a type label (as a > protein is a molecule, but it is complicated to express such a > reslationship). I propose removing the Hierarchy::get_type() method and > HierarchyType type and replacing it with methods like > - get_is_protein() (true for any piece of protein) > - get_is_molecule() (true for any molecule) > - get_is_residue() > - get_is_atom() > - get_is_assembly() (not sure if we want this) > etc... > > Something else to think about for Tuesday. > > I'm not entirely sure I agree (witness the last developers meeting :-). >> But am ambivalent. It could be structured better though which would help >> things converge faster. >> >> That said, we have several outstanding questions: >> - do we have PROTEINs which can contain CHAINs or just PROTEINS (which are >> chains). This we should probably just find something authorative and use it. >> I don't much care either way. >> >> - what are the most useful things for one or more read_pdb functions to >> return? For this we should come up with standard usage cases. I would >> propose a couple here: >> - someone is running through lots of PDB files and wants to load one >> protein from each file. To do this, it would be nice to have a function >> which loads a protein from the pdb and returns a hierarchy containing only >> that protein. Whether this protein has one or more chains depends on the >> answer to the first question >> - load the whole structure from a pdb complete with many proteins and >> ligands and other molecules. For this it is useful to be able to read >> everything from one PDB model record. >> - take one piece of the pdb and use it (such as a chain or ligand). For >> this it is nice not to have to dissect a hierarchy. >> - load a bunch of model records from a single pdb and deal with all the >> molecules in each record. >> >> Any other cases? Think about it and we will discuss it on Tuesday. >> >> A proposal to think which handles the above cases is: >> - one function which reads a protein from a pdb >> - one function which reads everything from one model record in a pdb and >> returns it in a list/vector >> > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >

Daniel Russel

10:42 a.m.

New subject: design discussions

> nt of the cases, or even 100% (why should I call get_is_molecule() > if I know that is going to be true all the time ? ). It is far more > difficult, but far more useful, to agree of what we expect to get > after a PDB, and how we are going to organize our structures in IMP > into hierarchies. You need some way of traversing the resulting hierarchy and telling what you are looking at. Say read_pdb returns a Hierarchy. You then have to go through the children of the node and determine which is a protein and which is a ligand if you want to do much useful with it. The docs for the read_pdb function can't possibly specify the answer. If you are writing code that knows the pdb that is being loaded, then you can hard code it, but that is not always possible (say you are writing something like modpipe or an application to handle fitting into em maps).

Or you load a protein and decide you want to define subdomains. You then modify the hierarchy to have a node corresponding to the structure. Some code which alter traverses the hierarchy (such as code to build a simplified representation), may need to know when it has gotten to the atoms or residues (it can't assume a constant structure).

> Together with Keren, I can give suggestions for low resolution > structures. Looking forwards to it.

Javier Ángel Velázquez Muriel

8 Oct 8 Oct

9:30 p.m.

2009/10/8 Dina Schneidman duhovka@gmail.com

Perhaps not the best source, but wikipedia says:

In biochemistry http://en.wikipedia.org/wiki/Biochemistry, *quaternary structure* is the arrangement of multiple foldedhttp://en.wikipedia.org/wiki/Protein_folding protein http://en.wikipedia.org/wiki/Protein molecules in a multi-subunit complex.

1 protein = 1 chain. more chains = complex.

> and let me put two more cents: > PDB format does not define any hierarchy. it is a set of atoms. if we > want to build an hierarchy out of PDB it should clearly follow from > the format. So the best way is to have 4 levels that are well defined > by the corresponding PDB fields: > Atom, Residue, Chain, Root > I think all other assumptions are only assumptions and a good source for > bugs. > > The problem is that root is not well defined either. We can agree on how to define it, but please please please avoid the name UNIVERSE. Otherwise I'm going to decorate all my universes with decorators called God.

On Thu, Oct 8, 2009 at 6:41 PM, Keren Lasker kerenl@salilab.org wrote: > > sounds good to me > > On Oct 8, 2009, at 6:35 PM, Daniel Russel wrote: > > > >> > >> On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote: > >> > >>> ok - if you mean that Chain should not be part of the Hierarchy, I > guess > >>> it makes sense, as usually protein == chain. > >> > >> To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are > >> HierarchyTypes and Chain is a decorator. So there would not be a CHAIN > >> hierarchy type, but a PROTEIN could be a Chain (if it has a chain > >> designator). Sounds a bit icky... > >> > >>> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: > >>> > >>>> for me more then one chain is an assembly ( or complex) > >>>> I would leave Chain because in modeling sometimes people takes domains > >>>> from different places ( with different chain ids) and this information > might > >>>> be useful. > >>>> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: > >>>> > >>>>> Does it make sense to talk about a protein which consists of more > than > >>>>> one chain? I've heard people use the words that way (and there are > google > >>>>> hits, but not a huge number), but it was suggested that this is a > misuse of > >>>>> the words. It would make the atom hierarchy a bit simpler to say a > protein > >>>>> is a single chain and has HierarchyType PROTEIN (and to remove the > CHAIN > >>>>> type). > >>>>> > >>>>> Authoritative answers? Votes? > >>>>> _______________________________________________ > >>>>> IMP-dev mailing list > >>>>> IMP-dev@salilab.org > >>>>> https://salilab.org/mailman/listinfo/imp-dev > >>>> > >>> > >>> _______________________________________________ > >>> IMP-dev mailing list > >>> IMP-dev@salilab.org > >>> https://salilab.org/mailman/listinfo/imp-dev > > > > _______________________________________________ > > IMP-dev mailing list > > IMP-dev@salilab.org > > https://salilab.org/mailman/listinfo/imp-dev > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >

Daniel Russel

9:42 p.m.

> Otherwise I'm going to decorate all my universes with decorators > called God. Yeah, but think of how difficult the discussions of what that should look like will be :-)

Dina Schneidman

9:52 p.m.

I used "introduction to protein structure" branden &tooze as a source (nice book!). If I remember correctly the same story appears in biochemistry textbooks.

> > Perhaps not the best source, but wikipedia says: > > In biochemistry, quaternary structure is the arrangement of multiple folded > protein molecules in a multi-subunit complex. > > 1 protein = 1 chain. > more chains = complex. > >> >> and let me put two more cents: >> PDB format does not define any hierarchy. it is a set of atoms. if we >> want to build an hierarchy out of PDB it should clearly follow from >> the format. So the best way is to have 4 levels that are well defined >> by the corresponding PDB fields: >> Atom, Residue, Chain, Root >> I think all other assumptions are only assumptions and a good source for >> bugs. >> > > The problem is that root is not well defined either. We can agree on how to > define it, but please please please avoid the name UNIVERSE. Otherwise I'm > going to decorate all my universes with decorators called God. > >> On Thu, Oct 8, 2009 at 6:41 PM, Keren Lasker kerenl@salilab.org wrote: >> > sounds good to me >> > On Oct 8, 2009, at 6:35 PM, Daniel Russel wrote: >> > >> >> >> >> On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote: >> >> >> >>> ok - if you mean that Chain should not be part of the Hierarchy, I >> >>> guess >> >>> it makes sense, as usually protein == chain. >> >> >> >> To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are >> >> HierarchyTypes and Chain is a decorator. So there would not be a CHAIN >> >> hierarchy type, but a PROTEIN could be a Chain (if it has a chain >> >> designator). Sounds a bit icky... >> >> >> >>> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: >> >>> >> >>>> for me more then one chain is an assembly ( or complex) >> >>>> I would leave Chain because in modeling sometimes people takes >> >>>> domains >> >>>> from different places ( with different chain ids) and this >> >>>> information might >> >>>> be useful. >> >>>> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: >> >>>> >> >>>>> Does it make sense to talk about a protein which consists of more >> >>>>> than >> >>>>> one chain? I've heard people use the words that way (and there are >> >>>>> google >> >>>>> hits, but not a huge number), but it was suggested that this is a >> >>>>> misuse of >> >>>>> the words. It would make the atom hierarchy a bit simpler to say a >> >>>>> protein >> >>>>> is a single chain and has HierarchyType PROTEIN (and to remove the >> >>>>> CHAIN >> >>>>> type). >> >>>>> >> >>>>> Authoritative answers? Votes? >> >>>>> _______________________________________________ >> >>>>> IMP-dev mailing list >> >>>>> IMP-dev@salilab.org >> >>>>> https://salilab.org/mailman/listinfo/imp-dev >> >>>> >> >>> >> >>> _______________________________________________ >> >>> IMP-dev mailing list >> >>> IMP-dev@salilab.org >> >>> https://salilab.org/mailman/listinfo/imp-dev >> > >> > _______________________________________________ >> > IMP-dev mailing list >> > IMP-dev@salilab.org >> > https://salilab.org/mailman/listinfo/imp-dev >> > >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev > >

Javier Ángel Velázquez Muriel

10:01 p.m.

2009/10/8 Dina Schneidman duhovka@gmail.com

> I used "introduction to protein structure" branden &tooze as a source > (nice book!). If I remember correctly the same story appears in > biochemistry textbooks. > > The very same definition, put it in extreme terms, will allow us to call a entire fiber of "proteins" in the cell as one protein (think about filaments, etc), something that I've seen nobody doing. I guess that the definition is kind of loose, but is not going to help us, in terms of our software, to call a PROTEIN to something that has a big number of chains. I would slightly prefer complex[1].protein[1] than protein[1].chain[1], but not enough to discuss it too much.

> > > > > > Perhaps not the best source, but wikipedia says: > > > > In biochemistry, quaternary structure is the arrangement of multiple > folded > > protein molecules in a multi-subunit complex. > > > > 1 protein = 1 chain. > > more chains = complex. > > > >> > >> and let me put two more cents: > >> PDB format does not define any hierarchy. it is a set of atoms. if we > >> want to build an hierarchy out of PDB it should clearly follow from > >> the format. So the best way is to have 4 levels that are well defined > >> by the corresponding PDB fields: > >> Atom, Residue, Chain, Root > >> I think all other assumptions are only assumptions and a good source for > >> bugs. > >> > > > > The problem is that root is not well defined either. We can agree on how > to > > define it, but please please please avoid the name UNIVERSE. Otherwise > I'm > > going to decorate all my universes with decorators called God. > > > >> On Thu, Oct 8, 2009 at 6:41 PM, Keren Lasker kerenl@salilab.org > wrote: > >> > sounds good to me > >> > On Oct 8, 2009, at 6:35 PM, Daniel Russel wrote: > >> > > >> >> > >> >> On Oct 8, 2009, at 6:32 PM, Keren Lasker wrote: > >> >> > >> >>> ok - if you mean that Chain should not be part of the Hierarchy, I > >> >>> guess > >> >>> it makes sense, as usually protein == chain. > >> >> > >> >> To make things clear, I'm using the IMP names, so CHAIN, PROTEIN are > >> >> HierarchyTypes and Chain is a decorator. So there would not be a > CHAIN > >> >> hierarchy type, but a PROTEIN could be a Chain (if it has a chain > >> >> designator). Sounds a bit icky... > >> >> > >> >>> On Oct 8, 2009, at 6:15 PM, Keren Lasker wrote: > >> >>> > >> >>>> for me more then one chain is an assembly ( or complex) > >> >>>> I would leave Chain because in modeling sometimes people takes > >> >>>> domains > >> >>>> from different places ( with different chain ids) and this > >> >>>> information might > >> >>>> be useful. > >> >>>> On Oct 8, 2009, at 6:13 PM, Daniel Russel wrote: > >> >>>> > >> >>>>> Does it make sense to talk about a protein which consists of more > >> >>>>> than > >> >>>>> one chain? I've heard people use the words that way (and there are > >> >>>>> google > >> >>>>> hits, but not a huge number), but it was suggested that this is a > >> >>>>> misuse of > >> >>>>> the words. It would make the atom hierarchy a bit simpler to say a > >> >>>>> protein > >> >>>>> is a single chain and has HierarchyType PROTEIN (and to remove the > >> >>>>> CHAIN > >> >>>>> type). > >> >>>>> > >> >>>>> Authoritative answers? Votes? > >> >>>>> _______________________________________________ > >> >>>>> IMP-dev mailing list > >> >>>>> IMP-dev@salilab.org > >> >>>>> https://salilab.org/mailman/listinfo/imp-dev > >> >>>> > >> >>> > >> >>> _______________________________________________ > >> >>> IMP-dev mailing list > >> >>> IMP-dev@salilab.org > >> >>> https://salilab.org/mailman/listinfo/imp-dev > >> > > >> > _______________________________________________ > >> > IMP-dev mailing list > >> > IMP-dev@salilab.org > >> > https://salilab.org/mailman/listinfo/imp-dev > >> > > >> _______________________________________________ > >> IMP-dev mailing list > >> IMP-dev@salilab.org > >> https://salilab.org/mailman/listinfo/imp-dev > > > > > > _______________________________________________ > > IMP-dev mailing list > > IMP-dev@salilab.org > > https://salilab.org/mailman/listinfo/imp-dev > > > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >

Daniel Russel

9:41 p.m.

On Oct 8, 2009, at 8:35 PM, Dina Schneidman wrote:

> So we need chains around! and also how can we add bonds without > chains? do you plan to connect them together? No one said anything about getting rid of chains. Just about getting rid the PROTEIN/CHAIN distinction.

> and let me put two more cents: > PDB format does not define any hierarchy. it is a set of atoms. It does too define a hierarchy: a pdb file contains models models contain chains and heterogens chains contain residues residues contain atoms heterogens contain atoms

> if we > want to build an hierarchy out of PDB it should clearly follow from > the format. I don't see that the second follows at all from the first. In fact I would say quite the opposite. But I disagree with the first, so it doesn't really matter :-)

> So the best way is to have 4 levels that are well defined > by the corresponding PDB fields: > Atom, Residue, Chain, Root > I think all other assumptions are only assumptions and a good source > for bugs. Plus the parallel various ligands and stuff which also need to get attached to root (preventing root from being a protein or molecule).

To put one of the problems another way, the big problem is that, ultimately, one would like a hieararchy with a molecule (protein) containing multiple chains. The PDB reader can't, in general, create such a thing since it doesn't know how the chains are grouped into molecules. As a result, it has to return something intermediate. Currently it returns a hierarchy would would have to be broken apart and put back tother in order to get the presumably desired result. I would suggest producing a vector of molecules instead so that the user can filter them/assemble them as needed. We could provide special versions of the reader to handle easy cases (like where you know the pdb file only contains one protein).

Javier Ángel Velázquez Muriel

9:47 p.m.

2009/10/8 Daniel Russel drussel@gmail.com

> > On Oct 8, 2009, at 8:35 PM, Dina Schneidman wrote: > > Protein is more than a chain. Chain corresponds to tertiary structure. >> Protein's quaternary structure can have more than one chain! >> A classic example is hemoglobin, 4 chains. Another classics is >> antibody, 2 chains. >> > That is what I always assumed :-) > > So we need chains around! and also how can we add bonds without >> chains? do you plan to connect them together? >> > No one said anything about getting rid of chains. Just about getting rid > the PROTEIN/CHAIN distinction. > > and let me put two more cents: >> PDB format does not define any hierarchy. it is a set of atoms. >> > It does too define a hierarchy: > a pdb file contains models > models contain chains and heterogens > chains contain residues > residues contain atoms > heterogens contain atoms > > if we >> want to build an hierarchy out of PDB it should clearly follow from >> the format. >> > I don't see that the second follows at all from the first. In fact I would > say quite the opposite. But I disagree with the first, so it doesn't really > matter :-) > > So the best way is to have 4 levels that are well defined >> by the corresponding PDB fields: >> Atom, Residue, Chain, Root >> I think all other assumptions are only assumptions and a good source for >> bugs. >> > Plus the parallel various ligands and stuff which also need to get attached > to root (preventing root from being a protein or molecule). > > To put one of the problems another way, the big problem is that, > ultimately, one would like a hieararchy with a molecule (protein) containing > multiple chains.

Perhaps not the best source, but wikipedia says:

1 protein = 1 chain. more chains = complex.

> The PDB reader can't, in general, create such a thing since it doesn't know > how the chains are grouped into molecules. As a result, it has to return > something intermediate.

True

> Currently it returns a hierarchy would would have to be broken apart and > put back tother in order to get the presumably desired result.

True

> I would suggest producing a vector of molecules instead so that the user > can filter them/assemble them as needed.

Yes

> We could provide special versions of the reader to handle easy cases (like > where you know the pdb file only contains one protein). > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >

Dina Schneidman

9:48 p.m.

> To put one of the problems another way, the big problem is that, ultimately, > one would like a hieararchy with a molecule (protein) containing multiple > chains. The PDB reader can't, in general, create such a thing since it > doesn't know how the chains are grouped into molecules. As a result, it has > to return something intermediate. Currently it returns a hierarchy would > would have to be broken apart and put back tother in order to get the > presumably desired result. I would suggest producing a vector of molecules > instead so that the user can filter them/assemble them as needed. We could > provide special versions of the reader to handle easy cases (like where you > know the pdb file only contains one protein).

How about more simple solution: producing a vector of atoms (very well defined!) And the user can do with that what he wants later, build hierarchies, assemblies or whatever...

Javier Ángel Velázquez Muriel

9:49 p.m.

> How about more simple solution: producing a vector of atoms (very well > defined!) > And the user can do with that what he wants later, build hierarchies, > assemblies or whatever...

That solves the problem but makes reading PDBs tedious as hell

5655

Age (days ago)

5655

Last active (days ago)

List overview

Download

24 comments

5 participants

tags (0)

participants (5)

Daniel Russel
Dina Schneidman
Francisco Melo
Javier Ángel Velázquez Muriel
Keren Lasker