Hi Irene,
On Thu, 18 Nov 2010 10:54:07 -1000 Irene Newhouse irenenew@hawaii.edu wrote:
> In the interests of appearing to make progress, I dropped the alignment of > which I wrote an hour or so ago from the mulitple alignments considered. > Now I have another sequence difference issue with 1x06.pdb. This is how it > aligns with my sequence [clustal w server pir output, with line 2 edited by > hand] > > :>P1;1x06structureX:1x06:12:A:240:A:UPP :Erischeria > coli : :-------------------------------------MMLSATQPLSEKLPAHGCR-HVAIIMDGNGRWAKKQGK-IRAFGHKAGAKSVRRAVSFAANNGIEALTLYAFSSENWNRPAQEVSALMELFVWALD---SEVKSLHRHNVRLRIIGDTSRFNSRLQERIRKSEALTAGNTGLTLNIAANYGGRWDIVQGVRQLAEKVQQ----GNLQPDQIDEEMLNQHVCMHELA------------------PVDLVIRTGGEHRISNFLLWQIAYAELYFTDVLWPDFDEQDFEGALNAFANRERRFGGTEPGDETAI > checked it myself against the original fasta sequence deposited in the pdb > & can't find any differences. The only missing residues are those below 12 > & above 240. I scrolled through the pdb file & can't find any residues > missing that might not have been noted in the header. Is there a way to get > more detailed information on the problem region, or a tool to check fasta > sequences against a pdb file? > > Thanks! > IreneThe relevant end of the error log is:Dynamically allocated memory at > amaxstructure [B,KiB,MiB]: 3879167 3788.249 3.699read_te_291E> > Sequence difference between alignment and pdb : x > (mismatch at alignment position 1) Alignment > MMLSATQPLSEKLPAHGCRHVAIIMDGNGRWAKKQGKIRAFGHKAGAKSVRRAVSF PDB > KLPAHGCRHVAIIMDGNGRWAKKQGKIRAFGHKAGAKSVRRAVSFAANNGIEALTL > Match * * * * * > Alignment residue type 11 (M, MET) does not match pdb residue type 9 > (K, LYS), for align code 1x06 (atom file 1x06), pdb residue number "12", > chain "A" Please check your alignment file header to be sure you correctly > specified the starting and ending residue numbers and chains. The > alignment sequence must match that from the atom file exactly. Another > possibility is that some residues in the atom file are missing, perhaps > because they could not be resolved experimentally. (Note that Modeller > reads only the ATOM and HETATM records in PDB, NOT the SEQRES records.) In > this case, simply replace the section of your alignment corresponding to > these missing residues with gaps.read_te_288W> Protein not accepted: > 3 1x06
If you look carefully at the PDB file 1x06, you'll note that while the SEQRES entry shows the sequence you have used, the structure does not start until the sequence "KLPAHGC". Note the section of the PDB file with the "REMARK 465" lines headed "MISSING RESIDUES". The residues before the Lys were undoubtedly not visible in the crystal structure even though they may have been present in the protein that was crystallized.
No doubt there are other programs that can do this, but I have a little python script that strips out the sequence from the coordinates. It can found at my website: http://pldserver1.biochem.queensu.ca/~rlc/work/scripts/
It is called "seq_convert.py" (and it needs also to have my MyPDB.py script for reading PDB files).
Cheers, Rob