Alignment file not recognizing a chain break?
Hello,
I am attempting to model a target sequence from multiple templates. When I check the alignment, I'm getting an error I can't resolve either on my own or by consulting previous entries on the mailing list. It appears that the program is concatenating the end of one chain directly onto the beginning of the other and ignoring a gap. The manual page on alignment.append() says "This command can raise ... a SequenceMismatchError if a 'PIR' sequence does not match that read from PDB (when an empty range is given)" - does this mean that automodel will not run on the alignment? What is the appropriate fix?
I suspect I may also have trouble down the line with matching the chains together, given that 3TNP is formatted as 1-2-1-2 and 3J4Q is formatted as 3-1-1-2-2. Will this be an issue, and if so, what is the workaround?
Thanks, Elizabeth
Details below:
Parser commands:
from modeller import * from modeller.scripts import complete_pdb
env = environ() env.io.hetatm = True
aln = alignment(env) aln.append(file='hybrid-msa.ali', align_codes='all', remove_gaps=False) aln.check()
Error message:
read_te_291E> Sequence difference between alignment and pdb : x (mismatch at alignment position 338) Alignment EFTEFRNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRG PDB EFTEFINRFTRRASVCAEAYNPDRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMSQVLDA Match ***** ** * * * * Alignment residue type 15 (R, ARG) does not match pdb residue type 8 (I, ILE), for align code 3TNP (atom file 3TNP), pdb residue number "104", chain "B"
Please check your alignment file header to be sure you correctly specified the starting and ending residue numbers and chains. The alignment sequence must match that from the atom file exactly.
Another possibility is that some residues in the atom file are missing, perhaps because they could not be resolved experimentally. (Note that Modeller reads only the ATOM and HETATM records in PDB, NOT the SEQRES records.) In this case, simply replace the section of your alignment corresponding to these missing residues with gaps. read_te_288W> Protein not accepted: 2 3TNP Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/site-packages/modeller/alignment.py", line 208, in check self.check_structure_structure(io=io) File "/usr/lib64/python2.7/site-packages/modeller/alignment.py", line 217, in check_structure_structure return f(self.modpt, io.modpt, self.env.libs.modpt, eqvdst) _modeller.SequenceMismatchError: read_te_291E> Sequence difference between alignment and pdb :
hybrid-msa.ali (line breaks introduced for readability, formatting to highlight error location):
>P1;target sequence:target: FIRST:@ : END:::: : MSIEIPAGLTELLQGFTVEVLRHQPADLLEFALQHFTRLQQENERKGAARFGHEGRTWGDAGAAAGGGIPSKGVNFAEEPM RSDSENGEEEEAAEAGAFNAPVINRFTRRASVCAEAYNPDEEEDDAESRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMS QVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALW GLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMK RKGKSEVEENGAVEIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDVQAFERLLGPCMEIMKRNIATYEEQLVALF GTNMDIVEPTA/GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHY AMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLRRIGRFSEPHARFYAAQI VLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEM AAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEA PFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF/MSIEIPAGLTELLQGFTVEVLRHQPADLLEFALQHFTRLQQE NERKGAARFGHEGRTWGDAGAAAGGGIPSKGVNFAEEPMRSDSENGEEEEAAEAGAFNAPVINRFTRRASVCAEAYNPDEE EDDAESRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCD GVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLK VVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMKRKGKSEVEENGAVEIARCFRGQYFGELALVTNKPRAASAHAI GTVKCLAMDVQAFERLLGPCMEIMKRNIATYEEQLVALFGTNMDIVEPTA/GNAAAAKKGSEQESVKEFLAKAKEDFLKKW ETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFK DNSNLYMVMEYVAGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVK GRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLL QVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF*
>P1;3TNP structureX:3TNP: FIRST:@: END::::: -------------SVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVK LKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLRRIGRF.EPHARFYAAQIVLTFEYLHSLDL IYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTW.LCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQP IQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDT SNFDDYEEEEIRV.INEKCGKEFTEF/------------------------------------------------------ --------------------------------------------------------------------------------- -----RNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDN RGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYN DGEQIIAQGDLADSFFIVESGEVKITMKV-------------EIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDV QAFERLLGPCMEIMKRN-----------------------/-------------SVKEFLAKAKEDFLKKWETPSQNTAQL DQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVME YVAGGEMFSHLRRIGRF.EPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTW.LCGTP EYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGN LKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRV.INEKCGKEFTEF/------------- --------------------------------------------------------------------------------- ---------------------------------------------RNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKE GEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVK NNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMKV------------EIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDVQAFERLLGPCMEIMKRN-----------------------*
>P1;3J4Q structureN:3J4Q:FIRST:@:END::::: --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------/--------GLTE LLQGYTVEVLRQQPPDLVDFAVEYFTRL----------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- ----------------------------------------------/--------GLTELLQGYTVEVLRQQPPDLVDFAV EYFTRL--------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- ------------------------/-------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --/------------------------------------------------------------------------------ --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- ----*
On 06/09/2015 08:32 AM, Kolmus, Elizabeth K. wrote: > I am attempting to model a target sequence from multiple templates. When > I check the alignment, I'm getting an error I can't resolve either on my > own or by consulting previous entries on the mailing list. It appears > that the program is concatenating the end of one chain directly onto the > beginning of the other and ignoring a gap.
Modeller doesn't do anything special. It reads the PDB file strictly sequentially. Every residue in the PDB file needs to also be in the alignment file, in the same order. Gaps (and chain breaks) are used to align the template sequence with the target - they have no effect on the matching between alignment file and PDB.
> "This command can raise ... a > SequenceMismatchErrorif a 'PIR' sequence does not match that read from > PDB (when an empty range is given)" - does this mean that automodel will > not run on the alignment? What is the appropriate fix?
The fix is simple - your alignment sequence must match the primary sequence from the PDB. Looking at 3TNP from PDB, it does not match your alignment.
> I suspect I may also have trouble down the line with matching the chains > together, given that 3TNP is formatted as 1-2-1-2 and 3J4Q is formatted > as 3-1-1-2-2. Will this be an issue, and if so, what is the workaround?
Modeller doesn't care what order the chains are in in your PDB files, as long as you put them in the same order in the alignment file. All the information it uses to build the model is structural. If you want to change the order, edit the input PDB files accordingly.
Ben Webb, Modeller Caretaker
participants (2)
-
Kolmus, Elizabeth K.
-
Modeller Caretaker