Re: [modeller_usage] Including hetatm when hetatm does not immediately follow ATOM list

4 Sep 2009


      On 09/02/2009 01:54 PM, Shuo Huai Johnny Wu wrote:
>   I used the simple automodel script, and it worked well. When I tried
> to include both the ligands and the carbohydrates (hetatms) into the
> structure by setting env.io.hetatm = True, I get errors that do not
> make much sense to me. My structure pdb is different from the hetatm
> example pdb for in that the ligands hetatm residue numbers do not
> immediately follow the atom residue sequence.
...
The way Modeller handles this situation is very simple and predictable. 
(Admittedly it would be nice if it were more complex and did the "right 
thing", but then it would be less predictable. ;)
When Modeller reads a PDB file, it reads it sequentially, from beginning 
to end. Each residue (ATOM or HETATM or water, if you have env.io.hetatm 
and/or env.io.water turned on) is read in, in exactly the order given in 
the PDB file. So if your PDB file contains 10 amino acid residues in 
chain A, then 10 more in chain B, then two ligands in chain A then two 
ligands in chain B, Modeller will read the following sequence from the 
PDB file, where a and b are amino acids and A and B ligands:
aaaaaaaaaa/bbbbbbbbbb/AA/BB
When you read an alignment that contains a structure, Modeller needs to 
read the PDB file to match the sequence. This sequence must match 
exactly. Because you often only want a subset of the PDB, the alignment 
file header can specify the first residue and chain to start reading at, 
and the residue and chain to finish at. So if in the example above your 
A chain amino acids are numbered 1 through 10, the B chain also 1 
through 10, and the four ligands are labeled 11:A, 12:A, 11:B, and 12:B 
and you tell Modeller to read from 5:A to 11:A, you will get (remember 
that it reads the PDB file sequentially):
aaaaa/bbbbbbbbbb/A
i.e. the sequence of residues starting at 5:A and ending at 11:A. Note 
that since the entire B chain lies between 10:A and 11:A, it'll also 
read that.
> I first tried to include all hetatms as '.' with the unspecified
> residues past the end of the protein (aa 348) replaced with '-'.
>
>> P1;B._taurus
> structureY:B._taurus:1    :A:  977:A:ground state rhodopsin:Bovine: :
> MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLA
> VADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFT
> WVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAA----S
> ATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTT
> LCCGKNP------STTVSKTETSQVAPA----------------------------------------------------
> --------------------------------------------------------------------------------
> --------------------..----------------------------------------------------------
> --------------------------------------------------------------------------------
> ------------------------------------------------------------..------------------
> --------------------------------------------------------------------------------
> --------------------------------------------------------------------------------
> --------------------.-.-.----.--------------------------------------------------
> ----------------.*
You have a typo here - 'structureY' should be 'structureX'. But you are 
asking Modeller to read residues 1:A through 977:A, so it'll read all of 
the A chain, then all of the B chain (since it hasn't reached 977:A 
yet), then all the HETATM residues until it gets to 977:A (a bunch of 
NAG, HG, and ZN residues).
> It results in this error:
>
> _modeller.ModellerError: read_te_290E>  Number of residues in the
> alignment and  pdb files are different:      347      650 For
> alignment entry:        1  B._taurus
Remember, Modeller needs to match the alignment sequence against the PDB 
sequence. So you need the full sequence of the B chain in your B._taurus 
alignment entry - you can't have any "unspecified residues". (So your 
solution above is close, but you need the gaps in the target sequence, 
not the template structure.)
Alternatively, you can edit the PDB file in a text editor and make a 
modified PDB file that only contains the residues you're interested in, 
if you don't want to specify the whole B sequence in your alignment.
> I've seen the following thread and tried to follow the advice
> enclosed, but I am still having trouble.
> http://salilab.org/archives/modeller_usage/2008/msg00183.html
> http://salilab.org/archives/modeller_usage/2008/msg00186.html
The problem in that thread is different - they had the wrong ending 
residue number in their alignment file header, so Modeller simply 
stopped reading the PDB file before it reached the ligands.
Ben Webb, Modeller Caretaker
-- 
modeller-care@salilab.org             http://www.salilab.org/modeller/
Modeller mail list: http://salilab.org/mailman/listinfo/modeller_usage