Hi,
Thanks for your previous kind reply.
I came across a problem which is quite strange to me:
When running modeller to model a sequence on a template pdb 4QTA (obtained from BLAST, pdbaa database), I figured out that the sequence of the pdb from blast output is different from the original pdb sequence, Please see below the error I get from modeller because of this issue:
Alignment sequence:
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKIL
LRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDL
KPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNR
PIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
PDB sequence matching range provided in alignment:
GPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENII
GINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNT
TCDLKICDFGLARVADTRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPS
QEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEP
IAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
Traceback (most recent call last):
File "modified-comp-align.py", line 11, in <module>
a.auto_align() # get an automatic alignment
File "/data/deepak-protein-modeling/modlib/modeller/automodel/automodel.py", line 146, in auto_align
self.sequence, matrix_file, overhang, write_fit)
File "/data/deepak-protein-modeling/modlib/modeller/scripts/align_strs_seq.py", line 8, in align_strs_seq
aln = alignment(env, file=segfile, align_codes=knowns)
File "/data/deepak-protein-modeling/modlib/modeller/alignment.py", line 20, in __init__
self.append(**vars)
File "/data/deepak-protein-modeling/modlib/modeller/alignment.py", line 79, in append
allow_alternates)
_modeller.SequenceMismatchError: get_ran_648E> Alignment sequence does not match that in PDB file: 1 ./4QTA.pdb (You didn't specify the starting and ending residue numbers and chain IDs in the alignment, so Modeller tried to guess these from the PDB file.) Suggestion: put in the residue numbers and chain IDs (see the manual) and run again for more detailed diagnostics. You could also try running with allow_alternates=True to accept alternate one-letter code matches (e.g. B to N, Z to Q).
when checking the alignment file, I see that the alignment (obtained from BLAST result) and modified to PIR format is fine:
>P1;NP_002736.3.35214
sequence:NP_002736.3.35214:1 : :360 : :::-1.00:-1.00
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS*
>P1;4QTA
structure:4QTA: :A : :A :::-1.00:-1.00
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS*