Missing residues modeling in Modeller
Dear all,
I am trying to model the missing residues in a beta barrel shaped protein that contains 12 missing residues in the original pdb. As instructed for the 1qg8 tutorial, I also first obtained the full amino acid sequence of the protein using:
from modeller import * # Get the sequence of the 1qg8 PDB file, and write to an alignment file code = 'pdbid'
e = Environ() m = Model(e, file=code) aln = Alignment(e) aln.append_model(m, align_codes=code) aln.write(file=code+'.seq') and I am obtaining an output file named pdbid.seq
The actual protein sequence contains 324 amino acid residues and 12 residues are missing in the pdbid.pdb file
This is the alignment.ali file I am preparing for missing residue addition:
>P1;pdbid structureX:pdbid:1:A:+312:A:MOL_ID 1; ASDQRGYKP------------GGHVGTSVEYEDKVTRGFNNTDKKEKTITNEVFNFFYNNPQWNFMGFYSFKIENREQKEPGYYENEDGIKQLFSLNKGHDLGNGWATGLIYELEYTRSKVYSPDVSGLRKNLAEHSIRPYLTYWNNDYNMGFYSNLEYLLSKEDRNAWGKRQEQGYSALFKPYKRFGNWEVGVEFYYQIKTNDEKQPDGTINEKSDFNERYIEPIVQYSFDDAGTLYTRVRVGKNETKNTDRSGGGNAGINYFKDIRKATVGYEQSIGESWVAKAEYEYANEVEKKSRLSGWEARNKSELTQHTFYAQALYRF* >P1;pdb_fill sequence::::::::: ASDQRGYKPEDVAFDESFFSFGGHVGTSVEYEDKVTRGFNNTDKKEKTITNEVFNFFYNNPQWNFMGFYSFKIENREQKEPGYYENEDGIKQLFSLNKGHDLGNGWATGLIYELEYTRSKVYSPDVSGLRKNLAEHSIRPYLTYWNNDYNMGFYSNLEYLLSKEDRNAWGKRQEQGYSALFKPYKRFGNWEVGVEFYYQIKTNDEKQPDGTINEKSDFNERYIEPIVQYSFDDAGTLYTRVRVGKNETKNTDRSGGGNAGINYFKDIRKATVGYEQSIGESWVAKAEYEYANEVEKKSRLSGWEARNKSELTQHTFYAQALYRF*
But in the template I get residue number written as 312 instead of 324, I directly copied the entry from the pdbid.seq file that was provided by Modeller at the first step that I explained at the beginning. Due to 12 missing residues Modeller is reporting 312 residues in the second line of the alignment.ali file: structureX:pdbid:1:A:+312:A:MOL_ID 1;
However, I checked other examples on Modeller tutorial and it seems in some examples the total residue numbers (including the missing residue numbers) are written in the template section for the alignment.ali file, so I could have used structureX:pdbid:1:A:+324:A:MOL_ID 1; writing 324 as total number of residues for the template; so my question is that does it matter if write structureX:pdbid:1:A:+312:A:MOL_ID 1; or structureX:pdbid:1:A:+324:A:MOL_ID 1;
I generated a new pdb with all missing residues modelled by Modeller using: from modeller import * from modeller.automodel import * # Load the AutoModel class
log.verbose() env = Environ()
# directories for input atom files env.io.atom_files_directory = ['.', '../atom_files']
class MyModel(AutoModel): def select_atoms(self): return Selection(self.residue_range('10:A', '21:A'))
#a = MyModel(env, alnfile = 'alignment.ali', # knowns = 'pdbid', sequence = 'pdbid_fill')
a = AutoModel(env, alnfile = 'alignment.ali', knowns = 'pdbid', sequence = 'pdbid_fill') a.starting_model= 1 a.ending_model = 5 a.make()
I checked that using structureX:pdbid:1:A:+312:A:MOL_ID 1; or structureX:pdbid:1:A:+324:A:MOL_ID 1; in the second line of alignment.ali for the template, I am obtaining a very similar or more or less identical modeled structure. Can anybody tell me if the number of residues written in the second line of alignment.ali file for template pdb matters or not? Or only the template sequence and the gaps really matter? Any help would be much appreciated, thank you
On 11/3/24 11:16 AM, bbmpresi--- via modeller_usage wrote: > I am trying to model the missing residues in a beta barrel shaped > protein that contains 12 missing residues in the original pdb. ... > The actual protein sequence contains 324 amino acid residues and 12 > residues are missing in the pdbid.pdb file ... > But in the template I get residue number written as 312 instead of > 324, I directly copied the entry from the pdbid.seq file that was > provided by Modeller at the first step that I explained at the > beginning. Due to 12 missing residues Modeller is reporting 312 > residues in the second line of the alignment.ali file: > structureX:pdbid:1:A:+312:A:MOL_ID 1;
This instructs Modeller to read up to 312 residues from your PDB file, starting at residue 1 in chain A. See https://salilab.org/modeller/10.6/manual/node501.html Since you said your template contains 312 residues, this is correct.
> However, I checked other examples on Modeller tutorial and it seems > in some examples the total residue numbers (including the missing > residue numbers) are written in the template section for the > alignment.ali file, so I could have used structureX:pdbid:1:A: > +324:A:MOL_ID 1; writing 324 as total number of residues for the > template; so my question is that does it matter if write > structureX:pdbid:1:A:+312:A:MOL_ID 1; or structureX:pdbid:1:A: > +324:A:MOL_ID 1;
If you use +324 (or any other number higher than 312) Modeller will try to read more than 312 residues, but won't be able to, since it will hit the end of the file. So you'll get the same result (although if your PDB file contains more chains after the A chain, Modeller will now read those and you'll likely end up with a sequence mismatch). Your original alignment file is correct.
Ben Webb, Modeller Caretaker
participants (2)
-
bbmpresi@gmail.com
-
Modeller Caretaker