Including hetatm when hetatm does not immediately follow ATOM list
Hi
I am trying to thread a target rhodopsin sequence from humans to the template bovine rhodopsin structure (pdb1f88).
The rhodopsin crystal structure exists in the pdb file as a crystal dimer represented by chain A and B. Each chain is 348 amino acids, but there are missing residues on both chains.
I used the simple automodel script, and it worked well. When I tried to include both the ligands and the carbohydrates (hetatms) into the structure by setting env.io.hetatm = True, I get errors that do not make much sense to me. My structure pdb is different from the hetatm example pdb for in that the ligands hetatm residue numbers do not immediately follow the atom residue sequence.
Here is a summary of the heteroatoms that follow the seqres
HET NAG A 501 14 HET NAG A 502 14 HET NAG B 601 14 HET NAG B 602 14 HET MAN B 603 11 HET NAG A 701 14 HET NAG A 702 14 HET NAG B 801 14 HET NAG B 802 14 HET HG A 901 1 HET HG B 902 1 HET HG A 903 1 HET HG B 904 1 HET HG A 905 1 HET HG B 906 1 HET ZN A 907 1 HET ZN B 908 1 HET ZN B 909 1 HET ZN A 910 1 HET RET A 977 20 HET RET B 978 20
There are no amino acids residue numbers that flank the hetam numbers. That is, there is no amino acid numbers from 803 to 900. These entries are left unspecified.
All hetatms are described after chain B in the ATOM section.
ATOM 5067 OD1 ASN B 326 30.599 31.795 -7.049 1.00107.30 O ATOM 5068 ND2 ASN B 326 30.914 33.473 -8.519 1.00107.62 N TER 5069 ASN B 326 HETATM 5070 C1 NAG A 501 37.620 3.757 -28.808 1.00 65.21 C HETATM 5071 C2 NAG A 501 38.420 5.021 -29.058 1.00 65.24 C HETATM 5072 C3 NAG A 501 37.477 6.163 -29.455 1.00 66.66 C
I first tried to include all hetatms as '.' with the unspecified residues past the end of the protein (aa 348) replaced with '-'.
>P1;B._taurus structureY:B._taurus:1 :A: 977:A:ground state rhodopsin:Bovine: : MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLA VADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFT WVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAA----S ATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTT LCCGKNP------STTVSKTETSQVAPA---------------------------------------------------- -------------------------------------------------------------------------------- --------------------..---------------------------------------------------------- -------------------------------------------------------------------------------- ------------------------------------------------------------..------------------ -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- --------------------.-.-.----.-------------------------------------------------- ----------------.*
It results in this error:
_modeller.ModellerError: read_te_290E> Number of residues in the alignment and pdb files are different: 347 650 For alignment entry: 1 B._taurus
When I forgo the trailing '-' and simply append hetatms to the end in a series of '.':
>P1;B._taurus structureY:B._taurus:1 :A: 977:A:ground state rhodopsin:Bovine: : MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAA----SATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNP------STTVSKTETSQVAPA........*
I get the following error:
_modeller.ModellerError: read_te_290E> Number of residues in the alignment and pdb files are different: 347 650 For alignment entry: 1 B._taurus
I've seen the following thread and tried to follow the advice enclosed, but I am still having trouble. http://salilab.org/archives/modeller_usage/2008/msg00183.html http://salilab.org/archives/modeller_usage/2008/msg00186.html
The structure ID of bovine rhodopsin is pdb1f88.
Thanks in advance,
On 09/02/2009 01:54 PM, Shuo Huai Johnny Wu wrote: > I used the simple automodel script, and it worked well. When I tried > to include both the ligands and the carbohydrates (hetatms) into the > structure by setting env.io.hetatm = True, I get errors that do not > make much sense to me. My structure pdb is different from the hetatm > example pdb for in that the ligands hetatm residue numbers do not > immediately follow the atom residue sequence. ...
The way Modeller handles this situation is very simple and predictable. (Admittedly it would be nice if it were more complex and did the "right thing", but then it would be less predictable. ;)
When Modeller reads a PDB file, it reads it sequentially, from beginning to end. Each residue (ATOM or HETATM or water, if you have env.io.hetatm and/or env.io.water turned on) is read in, in exactly the order given in the PDB file. So if your PDB file contains 10 amino acid residues in chain A, then 10 more in chain B, then two ligands in chain A then two ligands in chain B, Modeller will read the following sequence from the PDB file, where a and b are amino acids and A and B ligands: aaaaaaaaaa/bbbbbbbbbb/AA/BB
When you read an alignment that contains a structure, Modeller needs to read the PDB file to match the sequence. This sequence must match exactly. Because you often only want a subset of the PDB, the alignment file header can specify the first residue and chain to start reading at, and the residue and chain to finish at. So if in the example above your A chain amino acids are numbered 1 through 10, the B chain also 1 through 10, and the four ligands are labeled 11:A, 12:A, 11:B, and 12:B and you tell Modeller to read from 5:A to 11:A, you will get (remember that it reads the PDB file sequentially): aaaaa/bbbbbbbbbb/A i.e. the sequence of residues starting at 5:A and ending at 11:A. Note that since the entire B chain lies between 10:A and 11:A, it'll also read that.
> I first tried to include all hetatms as '.' with the unspecified > residues past the end of the protein (aa 348) replaced with '-'. > >> P1;B._taurus > structureY:B._taurus:1 :A: 977:A:ground state rhodopsin:Bovine: : > MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLA > VADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFT > WVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAA----S > ATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTT > LCCGKNP------STTVSKTETSQVAPA---------------------------------------------------- > -------------------------------------------------------------------------------- > --------------------..---------------------------------------------------------- > -------------------------------------------------------------------------------- > ------------------------------------------------------------..------------------ > -------------------------------------------------------------------------------- > -------------------------------------------------------------------------------- > --------------------.-.-.----.-------------------------------------------------- > ----------------.*
You have a typo here - 'structureY' should be 'structureX'. But you are asking Modeller to read residues 1:A through 977:A, so it'll read all of the A chain, then all of the B chain (since it hasn't reached 977:A yet), then all the HETATM residues until it gets to 977:A (a bunch of NAG, HG, and ZN residues).
> It results in this error: > > _modeller.ModellerError: read_te_290E> Number of residues in the > alignment and pdb files are different: 347 650 For > alignment entry: 1 B._taurus
Remember, Modeller needs to match the alignment sequence against the PDB sequence. So you need the full sequence of the B chain in your B._taurus alignment entry - you can't have any "unspecified residues". (So your solution above is close, but you need the gaps in the target sequence, not the template structure.)
Alternatively, you can edit the PDB file in a text editor and make a modified PDB file that only contains the residues you're interested in, if you don't want to specify the whole B sequence in your alignment.
> I've seen the following thread and tried to follow the advice > enclosed, but I am still having trouble. > http://salilab.org/archives/modeller_usage/2008/msg00183.html > http://salilab.org/archives/modeller_usage/2008/msg00186.html
The problem in that thread is different - they had the wrong ending residue number in their alignment file header, so Modeller simply stopped reading the PDB file before it reached the ligands.
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
Shuo Huai Johnny Wu