[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [modeller_usage] more sequence difference issues



Hi Irene,

On Thu, 18 Nov 2010 10:54:07 -1000 Irene Newhouse <> wrote:

> In the interests of appearing to make progress, I dropped the alignment of
> which I wrote an hour or so ago from the mulitple alignments considered.
> Now I have another sequence difference issue with 1x06.pdb. This is how it
> aligns with my sequence [clustal w server pir output, with line 2 edited by
> hand]
> 
> :>P1;1x06structureX:1x06:12:A:240:A:UPP       :Erischeria
> coli   : :-------------------------------------MMLSATQPLSEKLPAHGCR-HVAIIMDGNGRWAKKQGK-IRAFGHKAGAKSVRRAVSFAANNGIEALTLYAFSSENWNRPAQEVSALMELFVWALD---SEVKSLHRHNVRLRIIGDTSRFNSRLQERIRKSEALTAGNTGLTLNIAANYGGRWDIVQGVRQLAEKVQQ----GNLQPDQIDEEMLNQHVCMHELA------------------PVDLVIRTGGEHRISNFLLWQIAYAELYFTDVLWPDFDEQDFEGALNAFANRERRFGGTEPGDETAI
> checked it myself against the original fasta sequence deposited in the pdb
> & can't find any differences. The only missing residues are those below 12
> & above 240. I scrolled through the pdb file & can't find any residues
> missing that might not have been noted in the header. Is there a way to get
> more detailed information on the problem region, or a tool to check fasta
> sequences against a pdb file?
> 
> Thanks!
> IreneThe relevant end of the error log is:Dynamically allocated memory at
> amaxstructure [B,KiB,MiB]:      3879167    3788.249     3.699read_te_291E>
> Sequence difference between alignment and  pdb :                  x
> (mismatch at alignment position      1) Alignment
> MMLSATQPLSEKLPAHGCRHVAIIMDGNGRWAKKQGKIRAFGHKAGAKSVRRAVSF       PDB
> KLPAHGCRHVAIIMDGNGRWAKKQGKIRAFGHKAGAKSVRRAVSFAANNGIEALTL
> Match                          *             *   *         *     *
> Alignment residue type   11 (M, MET) does not match pdb  residue type    9
> (K, LYS),  for align code 1x06 (atom file 1x06), pdb residue number "12",
> chain "A"  Please check your alignment file header to be sure you correctly
> specified  the starting and ending residue numbers and chains. The
> alignment sequence  must match that from the atom file exactly.  Another
> possibility is that some residues in the atom file are missing,  perhaps
> because they could not be resolved experimentally. (Note that Modeller
> reads only the ATOM and HETATM records in PDB, NOT the SEQRES records.)  In
> this case, simply replace the section of your alignment corresponding  to
> these missing residues with gaps.read_te_288W> Protein not accepted:
> 3  1x06

If you look carefully at the PDB file 1x06, you'll note that while the SEQRES
entry shows the sequence you have used, the structure does not start until
the sequence "KLPAHGC".  Note the section of the PDB file with the
"REMARK 465" lines headed "MISSING RESIDUES".  The residues before the Lys
were undoubtedly not visible in the crystal structure even though they may
have been present in the protein that was crystallized.  

No doubt there are other programs that can do this, but I have a little python
script that strips out the sequence from the coordinates.  It can found at my
website:    http://pldserver1.biochem.queensu.ca/~rlc/work/scripts/

It is called "seq_convert.py" (and it needs also to have my MyPDB.py script
for reading PDB files).

Cheers,
Rob
-- 
Robert L. Campbell, Ph.D.
Senior Research Associate/Adjunct Assistant Professor 
Botterell Hall Rm 644
Department of Biochemistry, Queen's University, 
Kingston, ON K7L 3N6  Canada
Tel: 613-533-6821            Fax: 613-533-2497
<>    http://pldserver1.biochem.queensu.ca/~rlc