SequenceMismatchError when adding missing residues in LoopModel()
Dear Ben and Modeller Mailing List,
Hello, I was hoping to get some help on an error I get when trying to add missing residues to a structure using the AutoModel/LoopModel classes.
The error looks like:
x (mismatch at alignment position 9) Alignment NSFTRVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHNPVLPFNDGVYFASTKSNIIRGWI PDB NSFTRGVYYPDKVFRSSVL.STQDLFLPFFSNVTWF.AI.VSGTNGTKRFDNPVLPFNDG Match ***** * * * Alignment residue type 18 (V, VAL) does not match pdb residue type 6 (G, GLY), for align code 6z43 (atom file structure_aligned), pdb residue number " 35", chain "A"
When I check my alignment file, I can begin to see what’s going on. Here is the first entry of the alignment file:
>P1;6z43 structureM:structure_aligned:27:A:+3600:C:::-1.00:-1.00 --------------------------AYTNSFTR-VYYPDKVFRSSVLHSTQDLFLPFFS NVTWFH--------------NPVLPFNDGVYFAST >P1;6z43_fill sequence:6z43_fill:.:.:.:.:::: MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFAST
It appears that Modeller does not recognize the dashes “-“ as actual missing residues. Instead, it appears to read the alignment as NSFTR()V…, rather than NSFTRG. However, isn’t the dash supposed to signal AutoModel/LoopModel to treat that as a missing residue to fill?
Thank you for your help!
Best wishes, Steven Truong sdt45@cam.ac.ukmailto:sdt45@cam.ac.uk Cambridge University
On 4/5/21 5:05 PM, Steven Truong wrote: > x (mismatch at alignment position 9) > Alignment NSFTRVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHNPVLPFNDGVYFASTKSNIIRGWI > PDB NSFTRGVYYPDKVFRSSVL.STQDLFLPFFSNVTWF.AI.VSGTNGTKRFDNPVLPFNDG > Match ***** * * * > Alignment residue type 18 (V, VAL) does not match pdb > residue type 6 (G, GLY), > for align code 6z43 (atom file structure_aligned), pdb residue number " 35", chain "A"
The template sequence in the alignment has to match exactly the sequence of ATOM/HETATM records in the PDB file. You are missing a glycine.
> It appears that Modeller does not recognize the dashes “-“ as actual > missing residues.
This is an alignment. Dashes have no meaning except to align residues in one sequence with those in another. Obviously Modeller ignores them when reading the PDB file, as there are no "-" residues there.
> However, isn’t the dash supposed to > signal AutoModel/LoopModel to treat that as a missing residue to fill?
No. A residue in your target sequence that isn't aligned with a residue in the template sequence (e.g. it lines up with a gap or chain break, or extends past the end of the template sequence) will be constructed from first principles.
If you have a residue in your template which for some reason you don't want in your target, align it with a gap in the target.
Ben Webb, Modeller Caretaker
Dear Ben,
I apologize for not understanding correctly here. Currently, I am following the .PIR file setup I see in the tutorial (https://salilab.org/modeller/wiki/Missing%20residues):
>P1;1qg8 structureX:1qg8: 2 :A: 256 :A:undefined:undefined:-1.00:-1.00 PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLN---DIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYIT--------- -----EFVRNLPPQRNCRELRESLKKLGMG* >P1;1qg8_fill sequence::::::::: PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNENRDIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYITDQSIHFQLF ELEKNEFVRNLPPQRNCRELRESLKKLGMG*
I think I have the template sequence alignment matching exactly to the PDB file, in the same way the second entry “1qg8_fill” has a full sequence. However, I am indicating missing residues with “1qg8.” Should I fill in those dashes with missing residues? I’m likely misunderstanding how AutoModel/LoopModel works then. When you refer to “template sequence,” is that “1qg8” or “1qg8_fill” in the align.pir file on the tutorial?
Pasting my version of align.pir again for reference (I’ve bolded/colored red the glycine Modeller is detecting an error on): >P1;6z43 structureM:6z43:27:A:+3600:C:::-1.00:-1.00 AYTNSFTR-VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFH--------------NPVLPF NDGVYFAST […] >P1;6z43_fill sequence:6z43_fill:::::::: AYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF NDGVYFAST
Thank you for your help again! Apologies for the misunderstanding.
Best wishes, Steven Truong
On Apr 5, 2021, at 7:33 PM, Modeller Caretaker <modeller-care@salilab.orgmailto:modeller-care@salilab.org> wrote:
On 4/5/21 5:05 PM, Steven Truong wrote: x (mismatch at alignment position 9) Alignment NSFTRVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHNPVLPFNDGVYFASTKSNIIRGWI PDB NSFTRGVYYPDKVFRSSVL.STQDLFLPFFSNVTWF.AI.VSGTNGTKRFDNPVLPFNDG Match ***** * * * Alignment residue type 18 (V, VAL) does not match pdb residue type 6 (G, GLY), for align code 6z43 (atom file structure_aligned), pdb residue number " 35", chain "A"
The template sequence in the alignment has to match exactly the sequence of ATOM/HETATM records in the PDB file. You are missing a glycine.
It appears that Modeller does not recognize the dashes “-“ as actual missing residues.
This is an alignment. Dashes have no meaning except to align residues in one sequence with those in another. Obviously Modeller ignores them when reading the PDB file, as there are no "-" residues there.
However, isn’t the dash supposed to signal AutoModel/LoopModel to treat that as a missing residue to fill?
No. A residue in your target sequence that isn't aligned with a residue in the template sequence (e.g. it lines up with a gap or chain break, or extends past the end of the template sequence) will be constructed from first principles.
If you have a residue in your template which for some reason you don't want in your target, align it with a gap in the target.
Ben Webb, Modeller Caretaker -- modeller-care@salilab.orgmailto:modeller-care@salilab.org https://salilab.org/modeller/ Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage
On 4/5/21 5:42 PM, Steven Truong wrote: > I think I have the template sequence alignment matching exactly to the > PDB file, in the same way the second entry “1qg8_fill” has a full > sequence. However, I am indicating missing residues with “1qg8.” > Should I fill in those dashes with missing residues? I’m likely > misunderstanding how AutoModel/LoopModel works then. When you refer to > “template sequence,” is that “1qg8” or “1qg8_fill” in the align.pir file > on the tutorial?
The template is the known structure, i.e. the PDB file, 1qg8 in the example.
> >P1;6z43 > structureM:6z43:27:A:+3600:C:::-1.00:-1.00 > AYTNSFTR*-*VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFH--------------NPVLPF > NDGVYFAST
Well, this simply isn't correct. The "missing" glycine here isn't missing in the 6z43 PDB file - it is there (residue 35 in chain A). What are you trying to accomplish here?
Ben Webb, Modeller Caretaker
Dear Ben,
Okay, I think I see where I’m making the mistake then! Currently, I am following the tutorial (), and my interpretation was that “1qg8_fill” would be the template. It seems like that is NOT the case. Instead, we can ignore 1qg8_fill for our current purposes.
>P1;1qg8 structureX:1qg8: 2 :A: 256 :A:undefined:undefined:-1.00:-1.00 PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLN---DIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYIT--------- -----EFVRNLPPQRNCRELRESLKKLGMG* >P1;1qg8_fill sequence::::::::: PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNENRDIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYITDQSIHFQLF ELEKNEFVRNLPPQRNCRELRESLKKLGMG*
I mistook the above and created a “6z43” template myself which contained the dashes like in 1qg8 in the tutorial. I’ll replace the dashes in that case with actual residues then. Thank you for the help!
Many thanks, Steven Truong
On Apr 5, 2021, at 8:02 PM, Modeller Caretaker <modeller-care@salilab.orgmailto:modeller-care@salilab.org> wrote:
On 4/5/21 5:42 PM, Steven Truong wrote: I think I have the template sequence alignment matching exactly to the PDB file, in the same way the second entry “1qg8_fill” has a full sequence. However, I am indicating missing residues with “1qg8.” Should I fill in those dashes with missing residues? I’m likely misunderstanding how AutoModel/LoopModel works then. When you refer to “template sequence,” is that “1qg8” or “1qg8_fill” in the align.pir file on the tutorial?
The template is the known structure, i.e. the PDB file, 1qg8 in the example.
>P1;6z43 structureM:6z43:27:A:+3600:C:::-1.00:-1.00 AYTNSFTR*-*VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFH--------------NPVLPF NDGVYFAST
Well, this simply isn't correct. The "missing" glycine here isn't missing in the 6z43 PDB file - it is there (residue 35 in chain A). What are you trying to accomplish here?
Ben Webb, Modeller Caretaker -- modeller-care@salilab.orgmailto:modeller-care@salilab.org https://salilab.org/modeller/ Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage
Dear Ben,
I was hoping to follow up with a related question. Thank you for your help in the previous issue; I’ve been able to make it work finally.
If I have multiple structure templates, how would I tell when AutoModel should use which template? I see in the documentation (https://salilab.org/modeller/manual/node21.html) that you can build structures using AutoModel with multiple templates, but is there a way to specify which missing residues to fill in the alignment file? For example, can I use template1.pdb for all residues, except resid 50-60 and 100-130, and then use template2.pdb for resid 50-60 and 100-130?
Thank you for your help again! Steven Truong
On Apr 5, 2021, at 8:12 PM, Steven Truong <sdt45@cam.ac.ukmailto:sdt45@cam.ac.uk> wrote:
Dear Ben,
Okay, I think I see where I’m making the mistake then! Currently, I am following the tutorial (), and my interpretation was that “1qg8_fill” would be the template. It seems like that is NOT the case. Instead, we can ignore 1qg8_fill for our current purposes.
>P1;1qg8 structureX:1qg8: 2 :A: 256 :A:undefined:undefined:-1.00:-1.00 PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLN---DIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYIT--------- -----EFVRNLPPQRNCRELRESLKKLGMG* >P1;1qg8_fill sequence::::::::: PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNENRDIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYITDQSIHFQLF ELEKNEFVRNLPPQRNCRELRESLKKLGMG*
I mistook the above and created a “6z43” template myself which contained the dashes like in 1qg8 in the tutorial. I’ll replace the dashes in that case with actual residues then. Thank you for the help!
Many thanks, Steven Truong
On Apr 5, 2021, at 8:02 PM, Modeller Caretaker <modeller-care@salilab.orgmailto:modeller-care@salilab.org> wrote:
On 4/5/21 5:42 PM, Steven Truong wrote: I think I have the template sequence alignment matching exactly to the PDB file, in the same way the second entry “1qg8_fill” has a full sequence. However, I am indicating missing residues with “1qg8.” Should I fill in those dashes with missing residues? I’m likely misunderstanding how AutoModel/LoopModel works then. When you refer to “template sequence,” is that “1qg8” or “1qg8_fill” in the align.pir file on the tutorial?
The template is the known structure, i.e. the PDB file, 1qg8 in the example.
>P1;6z43 structureM:6z43:27:A:+3600:C:::-1.00:-1.00 AYTNSFTR*-*VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFH--------------NPVLPF NDGVYFAST
Well, this simply isn't correct. The "missing" glycine here isn't missing in the 6z43 PDB file - it is there (residue 35 in chain A). What are you trying to accomplish here?
Ben Webb, Modeller Caretaker -- modeller-care@salilab.orgmailto:modeller-care@salilab.org https://salilab.org/modeller/ Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage
On 4/7/21 9:00 AM, Steven Truong wrote: > If I have multiple structure templates, how would I tell when > AutoModel should use which template?
Modeller uses all aligned templates for a given residue.
> For example, can I use template1.pdb for all residues, except resid > 50-60 and 100-130, and then use template2.pdb for resid 50-60 and > 100-130?
If you don't want Modeller to use information from one or more templates for a given residue range, just make sure your alignment is such that the target isn't aligned with those templates. See also https://salilab.org/modeller/FAQ.html#1 and https://salilab.org/modeller/FAQ.html#2
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
Steven Truong