Missing residues: Difference between revisions
No edit summary |
(Add to Examples category) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
<!-- ## page was renamed from Missing_residues --> | |||
Often, you will encounter files in the PDB which have missing residues. Special care must be taken in this case, as MODELLER only reads the ATOM and HETATM records, not the SEQRES records, and so will not handle missing residues automatically. (Unfortunately PDB is not reliable enough to be able to automatically rely on SEQRES.) | Often, you will encounter files in the PDB which have missing residues. Special care must be taken in this case, as MODELLER only reads the ATOM and HETATM records, not the SEQRES records, and so will not handle missing residues automatically. (Unfortunately PDB is not reliable enough to be able to automatically rely on SEQRES.) | ||
One example is [ | One example is [https://salilab.org/modeller/archive/pdb1qg8.ent PDB code 1qg8], which is missing residues 134-136 and 218-231 (see the REMARK 465 lines in the PDB file). We can use Modeller to 'fill in' these missing residues by treating the original structure (without the missing residues) as a template, and building a comparative model using the full sequence. | ||
First, we obtain the sequence of residues with known structure: | First, we obtain the sequence of residues with known structure: | ||
< | <syntaxhighlight lang="python"> | ||
from modeller import * | |||
# Get the sequence of the 1qg8 PDB file, and write to an alignment file | # Get the sequence of the 1qg8 PDB file, and write to an alignment file | ||
code = '1qg8' | code = '1qg8' | ||
e = | e = Environ() | ||
m = | m = Model(e, file=code) | ||
aln = | aln = Alignment(e) | ||
aln.append_model(m, align_codes=code) | aln.append_model(m, align_codes=code) | ||
aln.write(file=code+'.seq') | aln.write(file=code+'.seq') | ||
</ | </syntaxhighlight> | ||
This produces a sequence file, | This produces a sequence file, <code>1qg8.seq</code>: | ||
<pre><nowiki> | <pre><nowiki> | ||
Line 30: | Line 31: | ||
From the PDB REMARKs or SEQRES records, we know the missing residues, so now we can make an alignment between the original 1qg8 structure (as the template), with gap characters corresponding to the missing residues, and the full sequence. This we place in a new alignment file, | From the PDB REMARKs or SEQRES records, we know the missing residues, so now we can make an alignment between the original 1qg8 structure (as the template), with gap characters corresponding to the missing residues, and the full sequence. This we place in a new alignment file, <code>alignment.ali</code>: | ||
<pre><nowiki> | <pre><nowiki> | ||
Line 40: | Line 41: | ||
-----EFVRNLPPQRNCRELRESLKKLGMG* | -----EFVRNLPPQRNCRELRESLKKLGMG* | ||
>P1;1qg8_fill | >P1;1qg8_fill | ||
sequence: | sequence::::::::: | ||
PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR | PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR | ||
YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNENRDIVKETVRPAAQVTW | YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNENRDIVKETVRPAAQVTW | ||
Line 48: | Line 49: | ||
We can now use the standard Modeller [ | We can now use the standard Modeller [https://salilab.org/modeller/10.0/manual/node33.html 'LoopModel' class] to generate a model with all residues, and then to refine the loop regions: | ||
< | <syntaxhighlight lang="python"> | ||
from modeller import * | from modeller import * | ||
from modeller.automodel import * # Load the | from modeller.automodel import * # Load the AutoModel class | ||
log.verbose() | log.verbose() | ||
env = | env = Environ() | ||
# directories for input atom files | # directories for input atom files | ||
env.io.atom_files_directory = ['.', '../atom_files'] | env.io.atom_files_directory = ['.', '../atom_files'] | ||
a = | a = LoopModel(env, alnfile = 'alignment.ali', | ||
knowns = '1qg8', sequence = '1qg8_fill') | knowns = '1qg8', sequence = '1qg8_fill') | ||
a.starting_model= 1 | a.starting_model= 1 | ||
Line 70: | Line 71: | ||
a.make() | a.make() | ||
</ | </syntaxhighlight> | ||
If you do not want loop refinement, simply use the | If you do not want loop refinement, simply use the <code>AutoModel</code> class rather than <code>LoopModel</code>, and remove the three lines which set <code>a.loop</code> parameters. | ||
⚠️ Note that loop modeling will only refine the shorter of the two loops by default. You can modify the <code>select_loop_atoms</code> routine to refine both loops, but you are not likely to get good results with this long insertion. In this case, you should probably try to find another template for this part of the sequence, or impose secondary structure restraints if you have reason to believe the insertion is not a loop. | |||
💡 Because either AutoModel or LoopModel will build a comparative model using your input PDB as a template, potentially all of the atoms in your final model could move. If you really don't want the non-missing residues to move, you can override the select_atoms method to select only the missing residues with a script similar to that below (note that the residue numbers are off by 1, since Modeller numbers the model starting at 1 in chain A, while the original PDB started numbering at 2): | |||
< | <syntaxhighlight lang="python"> | ||
from modeller import * | from modeller import * | ||
from modeller.automodel import * # Load the | from modeller.automodel import * # Load the AutoModel class | ||
log.verbose() | log.verbose() | ||
env = | env = Environ() | ||
# directories for input atom files | # directories for input atom files | ||
env.io.atom_files_directory = ['.', '../atom_files'] | env.io.atom_files_directory = ['.', '../atom_files'] | ||
class MyModel( | class MyModel(AutoModel): | ||
def select_atoms(self): | def select_atoms(self): | ||
return | return Selection(self.residue_range('133:A', '135:A'), | ||
self.residue_range(' | self.residue_range('217:A', '230:A')) | ||
a = MyModel(env, alnfile = 'alignment.ali', | a = MyModel(env, alnfile = 'alignment.ali', | ||
Line 100: | Line 101: | ||
a.make() | a.make() | ||
</ | </syntaxhighlight> | ||
[[Category:Examples]] |
Latest revision as of 21:17, 16 August 2022
Often, you will encounter files in the PDB which have missing residues. Special care must be taken in this case, as MODELLER only reads the ATOM and HETATM records, not the SEQRES records, and so will not handle missing residues automatically. (Unfortunately PDB is not reliable enough to be able to automatically rely on SEQRES.)
One example is PDB code 1qg8, which is missing residues 134-136 and 218-231 (see the REMARK 465 lines in the PDB file). We can use Modeller to 'fill in' these missing residues by treating the original structure (without the missing residues) as a template, and building a comparative model using the full sequence.
First, we obtain the sequence of residues with known structure:
from modeller import *
# Get the sequence of the 1qg8 PDB file, and write to an alignment file
code = '1qg8'
e = Environ()
m = Model(e, file=code)
aln = Alignment(e)
aln.append_model(m, align_codes=code)
aln.write(file=code+'.seq')
This produces a sequence file, 1qg8.seq
:
>P1;1qg8 structureX:1qg8: 2 :A: 256 :A:undefined:undefined:-1.00:-1.00 PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNDIVKETVRPAAQVTWNAP CAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYITEFVRNLPPQRNC RELRESLKKLGMG*
From the PDB REMARKs or SEQRES records, we know the missing residues, so now we can make an alignment between the original 1qg8 structure (as the template), with gap characters corresponding to the missing residues, and the full sequence. This we place in a new alignment file, alignment.ali
:
>P1;1qg8 structureX:1qg8: 2 :A: 256 :A:undefined:undefined:-1.00:-1.00 PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLN---DIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYIT--------- -----EFVRNLPPQRNCRELRESLKKLGMG* >P1;1qg8_fill sequence::::::::: PKVSVIMTSYNKSDYVAKSISSILSQTFSDFELFIMDDNSNEETLNVIRPFLNDNRVRFYQSDISGVKERTEKTR YAALINQAIEMAEGEYITYATDDNIYMPDRLLKMVRELDTHPEKAVIYSASKTYHLNENRDIVKETVRPAAQVTW NAPCAIDHCSVMHRYSVLEKVKEKFGSYWDESPAFYRIGDARFFWRVNHFYPFYPLDEELDLNYITDQSIHFQLF ELEKNEFVRNLPPQRNCRELRESLKKLGMG*
We can now use the standard Modeller 'LoopModel' class to generate a model with all residues, and then to refine the loop regions:
from modeller import *
from modeller.automodel import * # Load the AutoModel class
log.verbose()
env = Environ()
# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']
a = LoopModel(env, alnfile = 'alignment.ali',
knowns = '1qg8', sequence = '1qg8_fill')
a.starting_model= 1
a.ending_model = 1
a.loop.starting_model = 1
a.loop.ending_model = 2
a.loop.md_level = refine.fast
a.make()
If you do not want loop refinement, simply use the AutoModel
class rather than LoopModel
, and remove the three lines which set a.loop
parameters.
⚠️ Note that loop modeling will only refine the shorter of the two loops by default. You can modify the select_loop_atoms
routine to refine both loops, but you are not likely to get good results with this long insertion. In this case, you should probably try to find another template for this part of the sequence, or impose secondary structure restraints if you have reason to believe the insertion is not a loop.
💡 Because either AutoModel or LoopModel will build a comparative model using your input PDB as a template, potentially all of the atoms in your final model could move. If you really don't want the non-missing residues to move, you can override the select_atoms method to select only the missing residues with a script similar to that below (note that the residue numbers are off by 1, since Modeller numbers the model starting at 1 in chain A, while the original PDB started numbering at 2):
from modeller import *
from modeller.automodel import * # Load the AutoModel class
log.verbose()
env = Environ()
# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']
class MyModel(AutoModel):
def select_atoms(self):
return Selection(self.residue_range('133:A', '135:A'),
self.residue_range('217:A', '230:A'))
a = MyModel(env, alnfile = 'alignment.ali',
knowns = '1qg8', sequence = '1qg8_fill')
a.starting_model= 1
a.ending_model = 1
a.make()