[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[modeller_usage] PDB problem



Hi,

Thanks for your previous kind reply.


I came across a problem which is quite strange to me:


When running modeller to model a sequence on a template pdb 4QTA (obtained from BLAST, pdbaa database), I figured out that the sequence of the pdb from blast output is different from the original pdb sequence, Please see below the error I get from modeller because of this issue:

Alignment sequence:
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKIL
LRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDL
KPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNR
PIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS

PDB sequence matching range provided in alignment:
GPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENII
GINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNT
TCDLKICDFGLARVADTRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPS
QEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEP
IAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
Traceback (most recent call last):
  File "modified-comp-align.py", line 11, in <module>
    a.auto_align() # get an automatic alignment
  File "/data/deepak-protein-modeling/modlib/modeller/automodel/automodel.py", line 146, in auto_align
    self.sequence, matrix_file, overhang, write_fit)
  File "/data/deepak-protein-modeling/modlib/modeller/scripts/align_strs_seq.py", line 8, in align_strs_seq
    aln = alignment(env, file=segfile, align_codes=knowns)
  File "/data/deepak-protein-modeling/modlib/modeller/alignment.py", line 20, in __init__
    self.append(**vars)
  File "/data/deepak-protein-modeling/modlib/modeller/alignment.py", line 79, in append
    allow_alternates)
_modeller.SequenceMismatchError: get_ran_648E> Alignment sequence does not match that in PDB file:        1  ./4QTA.pdb (You didn't specify the starting and ending residue numbers and chain IDs in the alignment, so Modeller tried to guess these from the PDB file.) Suggestion: put in the residue numbers and chain IDs (see the manual) and run again for more detailed diagnostics. You could also try running with allow_alternates=True to accept alternate one-letter code matches (e.g. B to N, Z to Q).

when checking the alignment file, I see that the alignment (obtained from BLAST result) and modified to PIR format is fine:

>P1;NP_002736.3.35214
sequence:NP_002736.3.35214:1 : :360 : :::-1.00:-1.00
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS*
>P1;4QTA
structure:4QTA: :A : :A :::-1.00:-1.00
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS*


The problem is arising because the original PDB file 4QTA has different chain A sequence than the chain A sequence obtained from BLAST.


May I please ask your comments and solution to how this problem can be solved? , and please let me know why is this problem occurring?


Thanks much,
Deepak

 
On Sun, Oct 30, 2016 at 4:18 AM, Modeller Caretaker <">> wrote:
On 10/28/16 9:33 AM, deepak kumar wrote:
However, does not
mentioning the "start" and "end" residue of the PDB file influence the
quality of the structure? or influence the model any way?

No.

Am I right, if I say, that modeller by default takes the residues of
chain "B", finds the corresponding residues in chain B and correctly
models the sequence as per the alignment.

That's correct.