[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [modeller_usage] Including hetatm when hetatm does not immediately follow ATOM list

On 09/02/2009 01:54 PM, Shuo Huai Johnny Wu wrote:
  I used the simple automodel script, and it worked well. When I tried
to include both the ligands and the carbohydrates (hetatms) into the
structure by setting env.io.hetatm = True, I get errors that do not
make much sense to me. My structure pdb is different from the hetatm
example pdb for in that the ligands hetatm residue numbers do not
immediately follow the atom residue sequence.

The way Modeller handles this situation is very simple and predictable. (Admittedly it would be nice if it were more complex and did the "right thing", but then it would be less predictable. ;)

When Modeller reads a PDB file, it reads it sequentially, from beginning to end. Each residue (ATOM or HETATM or water, if you have env.io.hetatm and/or env.io.water turned on) is read in, in exactly the order given in the PDB file. So if your PDB file contains 10 amino acid residues in chain A, then 10 more in chain B, then two ligands in chain A then two ligands in chain B, Modeller will read the following sequence from the PDB file, where a and b are amino acids and A and B ligands:

When you read an alignment that contains a structure, Modeller needs to read the PDB file to match the sequence. This sequence must match exactly. Because you often only want a subset of the PDB, the alignment file header can specify the first residue and chain to start reading at, and the residue and chain to finish at. So if in the example above your A chain amino acids are numbered 1 through 10, the B chain also 1 through 10, and the four ligands are labeled 11:A, 12:A, 11:B, and 12:B and you tell Modeller to read from 5:A to 11:A, you will get (remember that it reads the PDB file sequentially):
i.e. the sequence of residues starting at 5:A and ending at 11:A. Note that since the entire B chain lies between 10:A and 11:A, it'll also read that.

I first tried to include all hetatms as '.' with the unspecified
residues past the end of the protein (aa 348) replaced with '-'.

structureY:B._taurus:1    :A:  977:A:ground state rhodopsin:Bovine: :

You have a typo here - 'structureY' should be 'structureX'. But you are asking Modeller to read residues 1:A through 977:A, so it'll read all of the A chain, then all of the B chain (since it hasn't reached 977:A yet), then all the HETATM residues until it gets to 977:A (a bunch of NAG, HG, and ZN residues).

It results in this error:

_modeller.ModellerError: read_te_290E>  Number of residues in the
alignment and  pdb files are different:      347      650 For
alignment entry:        1  B._taurus

Remember, Modeller needs to match the alignment sequence against the PDB sequence. So you need the full sequence of the B chain in your B._taurus alignment entry - you can't have any "unspecified residues". (So your solution above is close, but you need the gaps in the target sequence, not the template structure.)

Alternatively, you can edit the PDB file in a text editor and make a modified PDB file that only contains the residues you're interested in, if you don't want to specify the whole B sequence in your alignment.

I've seen the following thread and tried to follow the advice
enclosed, but I am still having trouble.

The problem in that thread is different - they had the wrong ending residue number in their alignment file header, so Modeller simply stopped reading the PDB file before it reached the ligands.

	Ben Webb, Modeller Caretaker
Modeller mail list: http://salilab.org/mailman/listinfo/modeller_usage