[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[modeller_usage] Alignment file not recognizing a chain break?



Hello,

I am attempting to model a target sequence from multiple templates. When I check the alignment, I'm getting an error I can't resolve either on my own or by consulting previous entries on the mailing list. It appears that the program is concatenating the end of one chain directly onto the beginning of the other and ignoring a gap. The manual page on alignment.append() says "This command can raise ... a SequenceMismatchError if a 'PIR' sequence does not match that read from PDB (when an empty range is given)" - does this mean that automodel will not run on the alignment? What is the appropriate fix?

I suspect I may also have trouble down the line with matching the chains together, given that 3TNP is formatted as 1-2-1-2 and 3J4Q is formatted as 3-1-1-2-2. Will this be an issue, and if so, what is the workaround?

Thanks,
Elizabeth


Details below:

Parser commands:

from modeller import *
from modeller.scripts import complete_pdb

env = environ()
env.io.hetatm = True

aln = alignment(env)
aln.append(file='hybrid-msa.ali', align_codes='all', remove_gaps=False)
aln.check()

Error message:

read_te_291E> Sequence difference between alignment and  pdb :
                  x  (mismatch at alignment position    338)
 Alignment   EFTEFRNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRG
       PDB   EFTEFINRFTRRASVCAEAYNPDRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMSQVLDA
     Match   ***** **            * *        *                  *
  Alignment residue type   15 (R, ARG) does not match pdb
  residue type    8 (I, ILE),
  for align code 3TNP (atom file 3TNP), pdb residue number "104", chain "B"

  Please check your alignment file header to be sure you correctly specified
  the starting and ending residue numbers and chains. The alignment sequence
  must match that from the atom file exactly.

  Another possibility is that some residues in the atom file are missing,
  perhaps because they could not be resolved experimentally. (Note that Modeller
  reads only the ATOM and HETATM records in PDB, NOT the SEQRES records.)
  In this case, simply replace the section of your alignment corresponding
  to these missing residues with gaps.
read_te_288W> Protein not accepted:        2  3TNP
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/modeller/alignment.py", line 208, in check
    self.check_structure_structure(io=io)
  File "/usr/lib64/python2.7/site-packages/modeller/alignment.py", line 217, in check_structure_structure
    return f(self.modpt, io.modpt, self.env.libs.modpt, eqvdst)
_modeller.SequenceMismatchError: read_te_291E> Sequence difference between alignment and  pdb :

hybrid-msa.ali (line breaks introduced for readability, formatting to highlight error location):

>P1;target
sequence:target: FIRST:@ : END:::: :
MSIEIPAGLTELLQGFTVEVLRHQPADLLEFALQHFTRLQQENERKGAARFGHEGRTWGDAGAAAGGGIPSKGVNFAEEPM
RSDSENGEEEEAAEAGAFNAPVINRFTRRASVCAEAYNPDEEEDDAESRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMS
QVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALW
GLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMK
RKGKSEVEENGAVEIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDVQAFERLLGPCMEIMKRNIATYEEQLVALF
GTNMDIVEPTA/GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHY
AMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLRRIGRFSEPHARFYAAQI
VLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEM
AAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEA
PFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF/MSIEIPAGLTELLQGFTVEVLRHQPADLLEFALQHFTRLQQE
NERKGAARFGHEGRTWGDAGAAAGGGIPSKGVNFAEEPMRSDSENGEEEEAAEAGAFNAPVINRFTRRASVCAEAYNPDEE
EDDAESRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCD
GVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLK
VVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMKRKGKSEVEENGAVEIARCFRGQYFGELALVTNKPRAASAHAI
GTVKCLAMDVQAFERLLGPCMEIMKRNIATYEEQLVALFGTNMDIVEPTA/GNAAAAKKGSEQESVKEFLAKAKEDFLKKW
ETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFK
DNSNLYMVMEYVAGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVK
GRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLL
QVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF*

>P1;3TNP
structureX:3TNP: FIRST:@: END:::::
-------------SVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVK
LKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLRRIGRF.EPHARFYAAQIVLTFEYLHSLDL
IYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTW.LCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQP
IQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDT
SNFDDYEEEEIRV.INEKCGKEFTEF/------------------------------------------------------
---------------------------------------------------------------------------------
-----RNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDN
RGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYN
DGEQIIAQGDLADSFFIVESGEVKITMKV-------------EIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDV
QAFERLLGPCMEIMKRN-----------------------/-------------SVKEFLAKAKEDFLKKWETPSQNTAQL
DQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVME
YVAGGEMFSHLRRIGRF.EPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTW.LCGTP
EYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGN
LKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRV.INEKCGKEFTEF/-------------
---------------------------------------------------------------------------------
---------------------------------------------RNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKE
GEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVK
NNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMKV------------EIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDVQAFERLLGPCMEIMKRN-----------------------*

>P1;3J4Q
structureN:3J4Q:FIRST:@:END:::::
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
--------------------------------------------------------------------/--------GLTE
LLQGYTVEVLRQQPPDLVDFAVEYFTRL-----------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
----------------------------------------------/--------GLTELLQGYTVEVLRQQPPDLVDFAV
EYFTRL---------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
------------------------/--------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
--/------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
----*