Re: [modeller_usage] Including hetatm when hetatm does not immediately follow ATOM list
To: Shuo Huai Johnny Wu <>
Subject: Re: [modeller_usage] Including hetatm when hetatm does not immediately follow ATOM list
From: Modeller Caretaker <>
Date: Fri, 04 Sep 2009 15:23:34 -0700
Cc:
On 09/02/2009 01:54 PM, Shuo Huai Johnny Wu wrote:
I used the simple automodel script, and it worked well. When I tried
to include both the ligands and the carbohydrates (hetatms) into the
structure by setting env.io.hetatm = True, I get errors that do not
make much sense to me. My structure pdb is different from the hetatm
example pdb for in that the ligands hetatm residue numbers do not
immediately follow the atom residue sequence.
...
The way Modeller handles this situation is very simple and predictable.
(Admittedly it would be nice if it were more complex and did the "right
thing", but then it would be less predictable. ;)
When Modeller reads a PDB file, it reads it sequentially, from beginning
to end. Each residue (ATOM or HETATM or water, if you have env.io.hetatm
and/or env.io.water turned on) is read in, in exactly the order given in
the PDB file. So if your PDB file contains 10 amino acid residues in
chain A, then 10 more in chain B, then two ligands in chain A then two
ligands in chain B, Modeller will read the following sequence from the
PDB file, where a and b are amino acids and A and B ligands:
aaaaaaaaaa/bbbbbbbbbb/AA/BB
When you read an alignment that contains a structure, Modeller needs to
read the PDB file to match the sequence. This sequence must match
exactly. Because you often only want a subset of the PDB, the alignment
file header can specify the first residue and chain to start reading at,
and the residue and chain to finish at. So if in the example above your
A chain amino acids are numbered 1 through 10, the B chain also 1
through 10, and the four ligands are labeled 11:A, 12:A, 11:B, and 12:B
and you tell Modeller to read from 5:A to 11:A, you will get (remember
that it reads the PDB file sequentially):
aaaaa/bbbbbbbbbb/A
i.e. the sequence of residues starting at 5:A and ending at 11:A. Note
that since the entire B chain lies between 10:A and 11:A, it'll also
read that.
I first tried to include all hetatms as '.' with the unspecified
residues past the end of the protein (aa 348) replaced with '-'.
You have a typo here - 'structureY' should be 'structureX'. But you are
asking Modeller to read residues 1:A through 977:A, so it'll read all of
the A chain, then all of the B chain (since it hasn't reached 977:A
yet), then all the HETATM residues until it gets to 977:A (a bunch of
NAG, HG, and ZN residues).
It results in this error:
_modeller.ModellerError: read_te_290E> Number of residues in the
alignment and pdb files are different: 347 650 For
alignment entry: 1 B._taurus
Remember, Modeller needs to match the alignment sequence against the PDB
sequence. So you need the full sequence of the B chain in your B._taurus
alignment entry - you can't have any "unspecified residues". (So your
solution above is close, but you need the gaps in the target sequence,
not the template structure.)
Alternatively, you can edit the PDB file in a text editor and make a
modified PDB file that only contains the residues you're interested in,
if you don't want to specify the whole B sequence in your alignment.
The problem in that thread is different - they had the wrong ending
residue number in their alignment file header, so Modeller simply
stopped reading the PDB file before it reached the ligands.