Hi,
Thanks for your previous kind reply.
I came across a problem which is quite strange to me:
When running modeller to model a sequence on a template pdb 4QTA (obtained from BLAST, pdbaa database), I figured out that the sequence of the pdb from blast output is different from the original pdb sequence, Please see below the error I get from modeller because of this issue:
Alignment sequence: MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKIL LRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDL KPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNR PIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
PDB sequence matching range provided in alignment: GPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENII GINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNT TCDLKICDFGLARVADTRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPS QEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEP IAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY Traceback (most recent call last): File "modified-comp-align.py", line 11, in <module> a.auto_align() # get an automatic alignment File "/data/deepak-protein-modeling/modlib/modeller/automodel/automodel.py", line 146, in auto_align self.sequence, matrix_file, overhang, write_fit) File "/data/deepak-protein-modeling/modlib/modeller/scripts/align_strs_seq.py", line 8, in align_strs_seq aln = alignment(env, file=segfile, align_codes=knowns) File "/data/deepak-protein-modeling/modlib/modeller/alignment.py", line 20, in __init__ self.append(**vars) File "/data/deepak-protein-modeling/modlib/modeller/alignment.py", line 79, in append allow_alternates) _modeller.SequenceMismatchError: get_ran_648E> Alignment sequence does not match that in PDB file: 1 ./4QTA.pdb (You didn't specify the starting and ending residue numbers and chain IDs in the alignment, so Modeller tried to guess these from the PDB file.) Suggestion: put in the residue numbers and chain IDs (see the manual) and run again for more detailed diagnostics. You could also try running with allow_alternates=True to accept alternate one-letter code matches (e.g. B to N, Z to Q).
when checking the alignment file, I see that the alignment (obtained from BLAST result) and modified to PIR format is fine:
>P1;NP_002736.3.35214 sequence:NP_002736.3.35214:1 : :360 : :::-1.00:-1.00 MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS* >P1;4QTA structure:4QTA: :A : :A :::-1.00:-1.00 MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS*
The problem is arising because the original PDB file 4QTA has different chain A sequence than the chain A sequence obtained from BLAST.
May I please ask your comments and solution to how this problem can be solved? , and please let me know why is this problem occurring?
Thanks much, Deepak
On Sun, Oct 30, 2016 at 4:18 AM, Modeller Caretaker < modeller-care@salilab.org> wrote:
> On 10/28/16 9:33 AM, deepak kumar wrote: > >> However, does not >> mentioning the "start" and "end" residue of the PDB file influence the >> quality of the structure? or influence the model any way? >> > > No. > > Am I right, if I say, that modeller by default takes the residues of >> chain "B", finds the corresponding residues in chain B and correctly >> models the sequence as per the alignment. >> > > That's correct. > > > Ben Webb, Modeller Caretaker > -- > modeller-care@salilab.org https://salilab.org/modeller/ > Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage >
On 11/2/16 7:43 AM, deepak kumar wrote: > When running modeller to model a sequence on a template pdb 4QTA > (obtained from BLAST, pdbaa database), I figured out that the sequence > of the pdb from blast output is different from the original pdb > sequence
BLAST works with the full primary amino acid sequence. Modeller is only interested in the sequence for which there is structure in the PDB file (ATOM records). If you look at the REMARK 465 records in the 4qta PDB file you'll see that a number of residues at the N and C termini, plus a few around residues 176-189, are missing in the PDB file.
> _modeller.SequenceMismatchError: get_ran_648E> Alignment sequence does > not match that in PDB file: 1 ./4QTA.pdb (You didn't specify the > starting and ending residue numbers and chain IDs in the alignment, so > Modeller tried to guess these from the PDB file.) Suggestion: put in the > residue numbers and chain IDs (see the manual) and run again for more > detailed diagnostics. You could also try running with > allow_alternates=True to accept alternate one-letter code matches (e.g. > B to N, Z to Q). ... > May I please ask your comments and solution to how this problem can be > solved? , and please let me know why is this problem occurring?
Put residue numbers into your alignment file, as suggested by Modeller's error message, and then it will show you where the two sequences diverge.
Ben Webb, Modeller Caretaker
participants (2)
-
deepak kumar
-
Modeller Caretaker