Generating chimeric protein from templates
Dear Users,
I would like to generate a chimeric protein from templates, especially following the tips from FAQ 1 ( https://salilab.org/modeller/9.10/manual/node36.html) and also here https://salilab.org/modeller/9.10/manual/node21.html.
To better understand the code and its meaning, I used two simple sequences together as test case. In pir file, it has:
>P1;2xkmA structureN:2xkm GVIDTSAVESAITDGQGDMKAIGGYIVGALVILAVAGLIYSMLRKA -------------------------------------------*
>P1;4xknE structureX:4xkh ---------------------------------------------- NGLSEDEALQRALELSLAEAKPQVLSSQEEDDLALAQALSASE*
>P1;test sequence:test GVIDTSAVESAITDGQGDMKAIGGYIVGALVILAVAGLIYSMLRKA NGLSEDEALQRALELSLAEAKPQVLSSQEEDDLALAQALSASE*
and almost the same for ali file.
However, using the build_profile.py, it shows the first error: _modeller.FileFormatError: parse_pir__E> Invalid PIR file header line: structureN:2xkm There should be 10 fields separated by colons, : This line actually contains 2 fields.
Now here comes my question, from the FAQ, I thought it was enough to use two fields for the template? Even when I put enough colon to make it as 10 fields, it shows the next error that it could not find the PDB file. To fix this problem, I tried to put the entire information of the structureN (and structureX) line from the pdb_95.pir which I download from the Modeller website, with the same build_profile.py, the third error comes: _modeller.StatisticsError: regress_657E> Not enough bins in histogram - cannot calculate statistics, nbins: 1
In the end, I don't understand the exact problem to cause all these errors, did miss something in the pir or ali file? or I used the wrong script. Hope to hear any suggestions from you.
Best wishes.
On 2/3/23 9:54 PM, Ta-Chou Huang via modeller_usage wrote: > However, using the build_profile.py, it shows the first error: > _modeller.FileFormatError: parse_pir__E> Invalid PIR file header line: > structureN:2xkm > There should be 10 fields separated by colons, : > This line actually contains 2 fields.
This is correct - you need to provide information on the structure so that Modeller knows which PDB to read, and which range of residues in that PDB file match the sequence. The alignment file format is documented at https://salilab.org/modeller/10.4/manual/node501.html
> Now here comes my question, from the FAQ, I thought it was enough to use > two fields for the template?
I had thought it was clear that the alignment file in the FAQ was not a "real" alignment (obviously the sequence isn't really "aaaa" either!) but I will try to clarify the text.
> with the same build_profile.py, the third error comes: > _modeller.StatisticsError: regress_657E> Not enough bins in histogram - > cannot calculate statistics, nbins: 1
The purpose of profile.build() is to build a profile by scanning your sequence against a large database of sequences - usually something like PDB95 or UniProt90. This error suggests that you have given it a much smaller database, perhaps even a single sequence.
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
Ta-Chou Huang