Missing Residues at the Start, As Well As in the Middle, of the Sequence of a Chain
Dear Modeller Discussion Forum Members,
I am trying to repair Chain B in the RCSB PDB 5BS8. 5BS8's structure is that of DNA gyrase (from Mycobacterium tuberculosis). I used the example scripts, for filling in missing residues with Modeller, which were given at the URL https://salilab.org/modeller/wiki/Missing_residues (in Modeller Wiki), as well as the "basic-example" tutorial at the main Modeller website, and a YouTube tutorial video for guidance. *Chain B contains 2 missing residues at the start of the sequence associated with the chain in the PDB file- S423 and N424. Thereafter, it contains the sequence "A(425)LVRRK(430)" (with atom records/coordinates) and then a stretch of 6 missing residues- "S(431)ATDIG(436)". *I used the first script given at the abovementioned URL to generate a sequence file extracted from the PDB. *I then used the following as my alignment file (using the NCBI RefSeq (NP_214519.2) for Mycobacterium tuberculosis gyrB (DNA Gyrase subunit B):*
>P1;5bs8 structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase::: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ALVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
>P1;5bs8B_fill sequence::::::::: MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKARELVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* *I used the following as the script to run AutoModel to model only the selected residues:*
from modeller import * from modeller.automodel import * # Load the AutoModel class
log.verbose() env = Environ()
# directories for input atom files env.io.atom_files_directory = ['.', '../atom_files']
class MyModel(AutoModel): def select_atoms(self): return Selection(self.residue_range('431:B', '436:B'))
a = MyModel(env, alnfile = '5bs8_B-alignment.ali', knowns = '5bs8', sequence = '5bs8B_fill') a.starting_model= 1 a.ending_model = 1
*This then raised the following error:*
return Selection(self.residue_range('431:B', '436:B')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 385, in residue_range start = self.residues[start]._num ~~~~~~~~~~~~~^^^^^^^ File "C:\Program Files (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 302, in __getitem__ ret = modutil.handle_seq_indx(self, indx, self.mdl._indxres, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files (x86)\Modeller10.5\modlib\modeller\util\modutil.py", line 24, in handle_seq_indx int_indx = lookup_func(*args) ^^^^^^^^^^^^^^^^^^ File "C:\Program Files (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 379, in _indxres self._report_bad_index(indx, suffix, "residue", 0) File "C:\Program Files (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 372, in _report_bad_index raise KeyError("No such %s: %s" % (indxtyp, indx)) *KeyError: 'No such residue: 431:B'*
Next, I tried to run it again after deleting the 424 "-"s that preceded the sequence in the structure-associated sequence portion of the alignment file (>P1;5bs8 structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase:::) and replacing them with 2 "-"s for S423 and N424 and the again, without these 2 preceding "-"s. Both times, I then got the same error: (...... *KeyError: 'No such residue: 431:B'*)
*Please advise me on how to fill in missing residues for a chain that (a) has coordinates only for a middle portion/domain of the entire possible sequence (for the full-length protein) (because only the middle portion/domain was crystallised and subjected to X-ray crystallography, say) and (b) has missing residues at the start of this chain (due to high B-factors, say) with respect to the sequence that is associated with the solved structure of the chain in question (as can be seen in PDB viewer softwares such as UCSF Chimera) (e.g.: chain B of RCSB PDB 5BS8).*
Thanks, and regards, Siddhartha A. Barua, Ph.D.
Hi Siddhartha I hope your well:).
I superficially scanned your inquiry and I successfully completed missing residue modeling via Modeller via the below link, accordingly if you follow this link step-by-step you should be able to successfully build these missing residues (if you try it maybe again and it does not function feel free to inquire further and I will assist-you:).)
Best, Joel
On Wed, May 29, 2024 at 8:49 AM Siddhartha Barua via modeller_usage < modeller_usage@salilab.org> wrote:
> Dear Modeller Discussion Forum Members, > > I am trying to repair Chain B in the RCSB PDB 5BS8. 5BS8's structure is > that of DNA gyrase (from Mycobacterium tuberculosis). I used the example > scripts, for filling in missing residues with Modeller, which were given at > the URL https://salilab.org/modeller/wiki/Missing_residues (in Modeller > Wiki), as well as the "basic-example" tutorial at the main Modeller > website, and a YouTube tutorial video for guidance. *Chain B contains 2 > missing residues at the start of the sequence associated with the chain in > the PDB file- S423 and N424. Thereafter, it contains the sequence > "A(425)LVRRK(430)" (with atom records/coordinates) and then a stretch of 6 > missing residues- "S(431)ATDIG(436)". *I used the first script given at > the abovementioned URL to generate a sequence file extracted from the PDB. *I > then used the following as my alignment file (using the NCBI RefSeq > (NP_214519.2) for Mycobacterium tuberculosis gyrB (DNA Gyrase subunit B):* > > >P1;5bs8 > structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase::: > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ALVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > > >P1;5bs8B_fill > sequence::::::::: > > MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKARELVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > *I used the following as the script to run AutoModel to model only the > selected residues:* > > from modeller import * > from modeller.automodel import * # Load the AutoModel class > > log.verbose() > env = Environ() > > # directories for input atom files > env.io.atom_files_directory = ['.', '../atom_files'] > > class MyModel(AutoModel): > def select_atoms(self): > return Selection(self.residue_range('431:B', '436:B')) > > a = MyModel(env, alnfile = '5bs8_B-alignment.ali', > knowns = '5bs8', sequence = '5bs8B_fill') > a.starting_model= 1 > a.ending_model = 1 > > *This then raised the following error:* > > return Selection(self.residue_range('431:B', '436:B')) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "C:\Program Files > (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 385, in > residue_range > start = self.residues[start]._num > ~~~~~~~~~~~~~^^^^^^^ > File "C:\Program Files > (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 302, in __getitem__ > ret = modutil.handle_seq_indx(self, indx, self.mdl._indxres, > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "C:\Program Files > (x86)\Modeller10.5\modlib\modeller\util\modutil.py", line 24, in > handle_seq_indx > int_indx = lookup_func(*args) > ^^^^^^^^^^^^^^^^^^ > File "C:\Program Files > (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 379, in _indxres > self._report_bad_index(indx, suffix, "residue", 0) > File "C:\Program Files > (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 372, in > _report_bad_index > raise KeyError("No such %s: %s" % (indxtyp, indx)) > *KeyError: 'No such residue: 431:B'* > > Next, I tried to run it again after deleting the 424 "-"s that preceded > the sequence in the structure-associated sequence portion of the alignment > file > (>P1;5bs8 > structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase:::) and replacing them with 2 > "-"s for S423 and N424 and the again, without these 2 preceding "-"s. Both > times, I then got the same error: > (...... *KeyError: 'No such residue: 431:B'*) > > *Please advise me on how to fill in missing residues for a chain that (a) > has coordinates only for a middle portion/domain of the entire possible > sequence (for the full-length protein) (because only the middle > portion/domain was crystallised and subjected to X-ray crystallography, > say) and (b) has missing residues at the start of this chain (due to high > B-factors, say) with respect to the sequence that is associated with the > solved structure of the chain in question (as can be seen in PDB viewer > softwares such as UCSF Chimera) (e.g.: chain B of RCSB PDB 5BS8).* > > Thanks, and regards, > Siddhartha A. Barua, Ph.D. > -- > Siddhartha A. Barua, Ph.D. > Mb.: +91 7777093994 > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > https://salilab.org/mm/postorius/lists/modeller_usage.salilab.org/
*https://salilab.org/modeller/wiki/Missing_residues https://salilab.org/modeller/wiki/Missing_residues*
On Wed, May 29, 2024 at 8:57 AM Joel Subach mjsubach@alumni.ncsu.edu wrote:
> Hi Siddhartha I hope your well:). > > I superficially scanned your inquiry and I successfully completed missing > residue modeling via Modeller via the below link, > accordingly if you follow this link step-by-step you should be able to > successfully build these missing residues (if you try it > maybe again and it does not function feel free to inquire further and I > will assist-you:).) > > Best, > Joel > > On Wed, May 29, 2024 at 8:49 AM Siddhartha Barua via modeller_usage < > modeller_usage@salilab.org> wrote: > >> Dear Modeller Discussion Forum Members, >> >> I am trying to repair Chain B in the RCSB PDB 5BS8. 5BS8's structure is >> that of DNA gyrase (from Mycobacterium tuberculosis). I used the example >> scripts, for filling in missing residues with Modeller, which were given at >> the URL https://salilab.org/modeller/wiki/Missing_residues (in Modeller >> Wiki), as well as the "basic-example" tutorial at the main Modeller >> website, and a YouTube tutorial video for guidance. *Chain B contains 2 >> missing residues at the start of the sequence associated with the chain in >> the PDB file- S423 and N424. Thereafter, it contains the sequence >> "A(425)LVRRK(430)" (with atom records/coordinates) and then a stretch of 6 >> missing residues- "S(431)ATDIG(436)". *I used the first script given at >> the abovementioned URL to generate a sequence file extracted from the PDB. *I >> then used the following as my alignment file (using the NCBI RefSeq >> (NP_214519.2) for Mycobacterium tuberculosis gyrB (DNA Gyrase subunit B):* >> >> >P1;5bs8 >> structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase::: >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ALVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >> >> >P1;5bs8B_fill >> sequence::::::::: >> >> MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKARELVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >> *I used the following as the script to run AutoModel to model only the >> selected residues:* >> >> from modeller import * >> from modeller.automodel import * # Load the AutoModel class >> >> log.verbose() >> env = Environ() >> >> # directories for input atom files >> env.io.atom_files_directory = ['.', '../atom_files'] >> >> class MyModel(AutoModel): >> def select_atoms(self): >> return Selection(self.residue_range('431:B', '436:B')) >> >> a = MyModel(env, alnfile = '5bs8_B-alignment.ali', >> knowns = '5bs8', sequence = '5bs8B_fill') >> a.starting_model= 1 >> a.ending_model = 1 >> >> *This then raised the following error:* >> >> return Selection(self.residue_range('431:B', '436:B')) >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 385, in >> residue_range >> start = self.residues[start]._num >> ~~~~~~~~~~~~~^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 302, in __getitem__ >> ret = modutil.handle_seq_indx(self, indx, self.mdl._indxres, >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\util\modutil.py", line 24, in >> handle_seq_indx >> int_indx = lookup_func(*args) >> ^^^^^^^^^^^^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 379, in _indxres >> self._report_bad_index(indx, suffix, "residue", 0) >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 372, in >> _report_bad_index >> raise KeyError("No such %s: %s" % (indxtyp, indx)) >> *KeyError: 'No such residue: 431:B'* >> >> Next, I tried to run it again after deleting the 424 "-"s that preceded >> the sequence in the structure-associated sequence portion of the alignment >> file >> (>P1;5bs8 >> structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase:::) and replacing them with >> 2 "-"s for S423 and N424 and the again, without these 2 preceding "-"s. >> Both times, I then got the same error: >> (...... *KeyError: 'No such residue: 431:B'*) >> >> *Please advise me on how to fill in missing residues for a chain that (a) >> has coordinates only for a middle portion/domain of the entire possible >> sequence (for the full-length protein) (because only the middle >> portion/domain was crystallised and subjected to X-ray crystallography, >> say) and (b) has missing residues at the start of this chain (due to high >> B-factors, say) with respect to the sequence that is associated with the >> solved structure of the chain in question (as can be seen in PDB viewer >> softwares such as UCSF Chimera) (e.g.: chain B of RCSB PDB 5BS8).* >> >> Thanks, and regards, >> Siddhartha A. Barua, Ph.D. >> -- >> Siddhartha A. Barua, Ph.D. >> Mb.: +91 7777093994 >> _______________________________________________ >> modeller_usage mailing list >> modeller_usage@salilab.org >> https://salilab.org/mm/postorius/lists/modeller_usage.salilab.org/ > >
Dear Ben (Ben Webb, Modeller Caretaker) and Joel (Subach),
Thanks a lot for your tips!
I tinkered with the alignment and Python script files and got Modeller to model the missing residues.
I found two possible solutions to the problem:
*1) Use of a dash at the beginning of the structure-derived sequence portion of the alignment file, for each of the residues that were missing relative to the full-length protein sequence, as per NCBI's RefSeq (Reference Sequence):*
For this, I used the following alignment file (with additional formatting at the relevant portions, for emphasis- but I used the plain text version for modelling), where I *explicitly specified the starting and ending residue positions of the model segment* that had coordinates (except for the short 6-residue stretch at S(431)ATDIG(436) (with missing coordinates)):
>P1;5bs8_B structure:5bs8_B.pdb:*425:B:675:B*:DNA Gyrase::: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *A* LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >P1;5bs8B_fill sequence::::::::: MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKAR *E**LVRRK**SATDIG* *GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* *
To make it easier for me to obtain a string of 424 dashes ("-"s) for the above alignment file and then copy and paste this sequence at the start of the structure-derived sequence part of the alignment file, without having to manually type and count them, I used the following short Python script (It can be modified according to the version of Python used, since some of the older versions of Python use a different syntax for print statements [e.g.: print "hello" vs print("hello")] ):
"""This script generates dashes. You need to enter the number of dashes to print, when prompted to do so.""" dashes = ""
n = int(input(("Please enter the number of dashes that you want to print as a contiguous stretch of dashes. Enter a non-zero, positive integer: "))) for i in range(1, (n + 1)): dashes += "-"
print(dashes) print("\n") print(f"The number of dashes stored in the variable 'dashes' is {len(dashes)}.")
This modelled the long stretch of 424 missing residues at the start of the structure-derived sequence portion of the alignment file (the first of the two sequences in the file) as a long loop region, without secondary structures. I then simply deleted the unnecessary residues at the N-terminal part of each Modeller-generated model, in UCSF Chimera (i.e., I deleted residues 1-422) and saved the modified PDB file.
2) Use of only a portion of the full-length protein sequence from NCBI (NCBI RefSeq), the residues corresponding to the region 425-675, which correspond exactly to the length of the residues present in the atom/structure file used (a PDB file generated from the original PDB 5BS8 by selecting chain B and saving only the selected atoms as a separate PDB file), except for the 6 missing residues inside this chain (S(431)ATDIG(436)), as the template sequence- the second sequence listed in the alignment file:
For this, in the alignment file, I mentioned the model segment bearing atom records (coordinates) as 425:B:675:B as shown below:
>P1;5bs8_B structure:5bs8_B.pdb:425:B:675:B:DNA Gyrase::: *A* LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >P1;5bs8B_fill sequence::::::::: *E* LVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
*In the Python script file, I only replaced the following line in the definition of the select_atoms function [def select_atoms(self):]*
*return Selection(self.residue_range('431:B', '436:B'))* *with* *return Selection(self.residue_range('7:A', '12:A'))*
This specified the portion to be allowed to move during model generation/refinement, without allowing the rest of the atoms to move. *The residue ranges '431:B', '436:B' and '7:A', '12:A' both refer to S(431)ATDIG(436), with respect to the numbering in the full-length sequence (NCBI refSeq)*, but in the latter format, it corresponds to the *numbering of residues given by Modeller to each of the newly generated models*, which *starts with residue number 1*.
The two residues corresponding to residue positions 423 and 424 (as per the full-length sequence could then be modelled as a dipeptide using UCSF Chimera's Build Structure and then this dipeptide model could be saved as a PDB and then opened in UCSF Chimera along with the Modeller-generated model and the two chains (Chimera-generated dipeptide and Modeller-generated model) could be joined into a single model by forming a peptide bond between them using the Join Model function/tool in UCSF Chimera.
Note that the start of the sequence of residues in the PDB 5BS8 at chain b that has atom records/coordinates (sequence *A*LVRRK...) differs from the corresponding sequence in the NCBI RefSeq (where it is *E*LVRRK...) by the identity of a single residue and Modeller includes E rather than A at the start of this sequence, giving preference to the template sequence provided as the second sequence in the alignment file. So, if I wanted it to be "A" in the model, as in the structure file's sequence, I would need to make this change in the alignment file in the second sequence (template sequence) listed in the file.
Thanks, and regards, Siddhartha
On Wed, May 29, 2024 at 12:27 PM Joel Subach mjsubach@alumni.ncsu.edu wrote:
> Hi Siddhartha I hope your well:). > > I superficially scanned your inquiry and I successfully completed missing > residue modeling via Modeller via the below link, > accordingly if you follow this link step-by-step you should be able to > successfully build these missing residues (if you try it > maybe again and it does not function feel free to inquire further and I > will assist-you:).) > > Best, > Joel > > On Wed, May 29, 2024 at 8:49 AM Siddhartha Barua via modeller_usage < > modeller_usage@salilab.org> wrote: > >> Dear Modeller Discussion Forum Members, >> >> I am trying to repair Chain B in the RCSB PDB 5BS8. 5BS8's structure is >> that of DNA gyrase (from Mycobacterium tuberculosis). I used the example >> scripts, for filling in missing residues with Modeller, which were given at >> the URL https://salilab.org/modeller/wiki/Missing_residues (in Modeller >> Wiki), as well as the "basic-example" tutorial at the main Modeller >> website, and a YouTube tutorial video for guidance. *Chain B contains 2 >> missing residues at the start of the sequence associated with the chain in >> the PDB file- S423 and N424. Thereafter, it contains the sequence >> "A(425)LVRRK(430)" (with atom records/coordinates) and then a stretch of 6 >> missing residues- "S(431)ATDIG(436)". *I used the first script given at >> the abovementioned URL to generate a sequence file extracted from the PDB. *I >> then used the following as my alignment file (using the NCBI RefSeq >> (NP_214519.2) for Mycobacterium tuberculosis gyrB (DNA Gyrase subunit B):* >> >> >P1;5bs8 >> structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase::: >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ALVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >> >> >P1;5bs8B_fill >> sequence::::::::: >> >> MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKARELVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >> *I used the following as the script to run AutoModel to model only the >> selected residues:* >> >> from modeller import * >> from modeller.automodel import * # Load the AutoModel class >> >> log.verbose() >> env = Environ() >> >> # directories for input atom files >> env.io.atom_files_directory = ['.', '../atom_files'] >> >> class MyModel(AutoModel): >> def select_atoms(self): >> return Selection(self.residue_range('431:B', '436:B')) >> >> a = MyModel(env, alnfile = '5bs8_B-alignment.ali', >> knowns = '5bs8', sequence = '5bs8B_fill') >> a.starting_model= 1 >> a.ending_model = 1 >> >> *This then raised the following error:* >> >> return Selection(self.residue_range('431:B', '436:B')) >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 385, in >> residue_range >> start = self.residues[start]._num >> ~~~~~~~~~~~~~^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 302, in __getitem__ >> ret = modutil.handle_seq_indx(self, indx, self.mdl._indxres, >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\util\modutil.py", line 24, in >> handle_seq_indx >> int_indx = lookup_func(*args) >> ^^^^^^^^^^^^^^^^^^ >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 379, in _indxres >> self._report_bad_index(indx, suffix, "residue", 0) >> File "C:\Program Files >> (x86)\Modeller10.5\modlib\modeller\coordinates.py", line 372, in >> _report_bad_index >> raise KeyError("No such %s: %s" % (indxtyp, indx)) >> *KeyError: 'No such residue: 431:B'* >> >> Next, I tried to run it again after deleting the 424 "-"s that preceded >> the sequence in the structure-associated sequence portion of the alignment >> file >> (>P1;5bs8 >> structure:5bs8.pdb:FIRST:B:LAST:B:DNA Gyrase:::) and replacing them with >> 2 "-"s for S423 and N424 and the again, without these 2 preceding "-"s. >> Both times, I then got the same error: >> (...... *KeyError: 'No such residue: 431:B'*) >> >> *Please advise me on how to fill in missing residues for a chain that (a) >> has coordinates only for a middle portion/domain of the entire possible >> sequence (for the full-length protein) (because only the middle >> portion/domain was crystallised and subjected to X-ray crystallography, >> say) and (b) has missing residues at the start of this chain (due to high >> B-factors, say) with respect to the sequence that is associated with the >> solved structure of the chain in question (as can be seen in PDB viewer >> softwares such as UCSF Chimera) (e.g.: chain B of RCSB PDB 5BS8).* >> >> Thanks, and regards, >> Siddhartha A. Barua, Ph.D. >> -- >> Siddhartha A. Barua, Ph.D. >> Mb.: +91 7777093994 >> _______________________________________________ >> modeller_usage mailing list >> modeller_usage@salilab.org >> https://salilab.org/mm/postorius/lists/modeller_usage.salilab.org/ > >
On 5/28/24 11:48 PM, Siddhartha Barua via modeller_usage wrote: > *KeyError: 'No such residue: 431:B'*
Residues in the model are by default numbered starting at 1 and the chains labeled alphabetically starting at A. Since you only have a single chain, it will be labeled A, not B. See https://salilab.org/modeller/10.5/manual/node23.html If you want to number the residues differently, see https://salilab.org/modeller/10.5/manual/node30.html
It looks like you are mistakenly using the template residue numbering here.
Ben Webb, Modeller Caretaker
Dear Ben (Ben Webb, Modeller Caretaker) and Joel (Subach),
Thanks a lot for your tips!
I tinkered with the alignment and Python script files and got Modeller to model the missing residues.
I found two possible solutions to the problem:
*1) Use of a dash at the beginning of the structure-derived sequence portion of the alignment file, for each of the residues that were missing relative to the full-length protein sequence, as per NCBI's RefSeq (Reference Sequence):*
For this, I used the following alignment file (with additional formatting at the relevant portions, for emphasis- but I used the plain text version for modelling), where I *explicitly specified the starting and ending residue positions of the model segment* that had coordinates (except for the short 6-residue stretch at S(431)ATDIG(436) (with missing coordinates)):
>P1;5bs8_B structure:5bs8_B.pdb:*425:B:675:B*:DNA Gyrase::: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *A* LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >P1;5bs8B_fill sequence::::::::: MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKAR *E**LVRRK**SATDIG* *GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* *
To make it easier for me to obtain a string of 424 dashes ("-"s) for the above alignment file and then copy and paste this sequence at the start of the structure-derived sequence part of the alignment file, without having to manually type and count them, I used the following short Python script (It can be modified according to the version of Python used, since some of the older versions of Python use a different syntax for print statements [e.g.: print "hello" vs print("hello")] ):
"""This script generates dashes. You need to enter the number of dashes to print, when prompted to do so.""" dashes = ""
n = int(input(("Please enter the number of dashes that you want to print as a contiguous stretch of dashes. Enter a non-zero, positive integer: "))) for i in range(1, (n + 1)): dashes += "-"
print(dashes) print("\n") print(f"The number of dashes stored in the variable 'dashes' is {len(dashes)}.")
This modelled the long stretch of 424 missing residues at the start of the structure-derived sequence portion of the alignment file (the first of the two sequences in the file) as a long loop region, without secondary structures. I then simply deleted the unnecessary residues at the N-terminal part of each Modeller-generated model, in UCSF Chimera (i.e., I deleted residues 1-422) and saved the modified PDB file.
2) Use of only a portion of the full-length protein sequence from NCBI (NCBI RefSeq), the residues corresponding to the region 425-675, which correspond exactly to the length of the residues present in the atom/structure file used (a PDB file generated from the original PDB 5BS8 by selecting chain B and saving only the selected atoms as a separate PDB file), except for the 6 missing residues inside this chain (S(431)ATDIG(436)), as the template sequence- the second sequence listed in the alignment file:
For this, in the alignment file, I mentioned the model segment bearing atom records (coordinates) as 425:B:675:B as shown below:
>P1;5bs8_B structure:5bs8_B.pdb:425:B:675:B:DNA Gyrase::: *A* LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* >P1;5bs8B_fill sequence::::::::: *E* LVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
*In the Python script file, I only replaced the following line in the definition of the select_atoms function [def select_atoms(self):]*
*return Selection(self.residue_range('431:B', '436:B'))* *with* *return Selection(self.residue_range('7:A', '12:A'))*
This specified the portion to be allowed to move during model generation/refinement, without allowing the rest of the atoms to move. *The residue ranges '431:B', '436:B' and '7:A', '12:A' both refer to S(431)ATDIG(436), with respect to the numbering in the full-length sequence (NCBI refSeq)*, but in the latter format, it corresponds to the *numbering of residues given by Modeller to each of the newly generated models*, which *starts with residue number 1*.
The two residues corresponding to residue positions 423 and 424 (as per the full-length sequence could then be modelled as a dipeptide using UCSF Chimera's Build Structure and then this dipeptide model could be saved as a PDB and then opened in UCSF Chimera along with the Modeller-generated model and the two chains (Chimera-generated dipeptide and Modeller-generated model) could be joined into a single model by forming a peptide bond between them using the Join Model function/tool in UCSF Chimera.
Note that the start of the sequence of residues in the PDB 5BS8 at chain b that has atom records/coordinates (sequence *A*LVRRK...) differs from the corresponding sequence in the NCBI RefSeq (where it is *E*LVRRK...) by the identity of a single residue and Modeller includes E rather than A at the start of this sequence, giving preference to the template sequence provided as the second sequence in the alignment file. So, if I wanted it to be "A" in the model, as in the structure file's sequence, I would need to make this change in the alignment file in the second sequence (template sequence) listed in the file.
Thanks, and regards, Siddhartha
On Wed, May 29, 2024 at 12:30 PM Modeller Caretaker < modeller-care@salilab.org> wrote:
> On 5/28/24 11:48 PM, Siddhartha Barua via modeller_usage wrote: > > *KeyError: 'No such residue: 431:B'* > > Residues in the model are by default numbered starting at 1 and the > chains labeled alphabetically starting at A. Since you only have a > single chain, it will be labeled A, not B. > See https://salilab.org/modeller/10.5/manual/node23.html > If you want to number the residues differently, see > https://salilab.org/modeller/10.5/manual/node30.html > > It looks like you are mistakenly using the template residue numbering here. > > Ben Webb, Modeller Caretaker > -- > modeller-care@salilab.org https://salilab.org/modeller/ > Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage >
Hi Siddhartha you're welcome:) and feel free to always inquire if you followed the instructions and still generated an error(s).
On Thu, May 30, 2024 at 9:10 AM Siddhartha Barua via modeller_usage < modeller_usage@salilab.org> wrote:
> Dear Ben (Ben Webb, Modeller Caretaker) and Joel (Subach), > > Thanks a lot for your tips! > > I tinkered with the alignment and Python script files and got Modeller to > model the missing residues. > > I found two possible solutions to the problem: > > *1) Use of a dash at the beginning of the structure-derived sequence > portion of the alignment file, for each of the residues that were missing > relative to the full-length protein sequence, as per NCBI's RefSeq > (Reference Sequence):* > > For this, I used the following alignment file (with additional formatting > at the relevant portions, for emphasis- but I used the plain text version > for modelling), where I *explicitly specified the starting and ending > residue positions of the model segment* that had coordinates (except for > the short 6-residue stretch at S(431)ATDIG(436) (with missing coordinates)): > > >P1;5bs8_B > structure:5bs8_B.pdb:*425:B:675:B*:DNA Gyrase::: > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > *A* > LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > >P1;5bs8B_fill > sequence::::::::: > > MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKAR > *E**LVRRK**SATDIG* > *GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > * > > To make it easier for me to obtain a string of 424 dashes ("-"s) for the > above alignment file and then copy and paste this sequence at the start of > the structure-derived sequence part of the alignment file, without having > to manually type and count them, I used the following short Python script > (It can be modified according to the version of Python used, since some of > the older versions of Python use a different syntax for print statements > [e.g.: print "hello" vs print("hello")] ): > > """This script generates dashes. You need to enter the number of dashes to > print, when prompted to do so.""" > dashes = "" > > n = int(input(("Please enter the number of dashes that you want to print > as a contiguous stretch of dashes. Enter a non-zero, positive integer: "))) > for i in range(1, (n + 1)): > dashes += "-" > > print(dashes) > print("\n") > print(f"The number of dashes stored in the variable 'dashes' is > {len(dashes)}.") > > This modelled the long stretch of 424 missing residues at the start of the > structure-derived sequence portion of the alignment file (the first of the > two sequences in the file) as a long loop region, without secondary > structures. I then simply deleted the unnecessary residues at the > N-terminal part of each Modeller-generated model, in UCSF Chimera (i.e., I > deleted residues 1-422) and saved the modified PDB file. > > 2) Use of only a portion of the full-length protein sequence from NCBI > (NCBI RefSeq), the residues corresponding to the region 425-675, which > correspond exactly to the length of the residues present in the > atom/structure file used (a PDB file generated from the original PDB 5BS8 > by selecting chain B and saving only the selected atoms as a separate PDB > file), except for the 6 missing residues inside this chain > (S(431)ATDIG(436)), as the template sequence- the second sequence listed in > the alignment file: > > For this, in the alignment file, I mentioned the model segment bearing > atom records (coordinates) as 425:B:675:B as shown below: > > >P1;5bs8_B > structure:5bs8_B.pdb:425:B:675:B:DNA Gyrase::: > *A* > LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > >P1;5bs8B_fill > sequence::::::::: > *E* > LVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > > *In the Python script file, I only replaced the following line in the > definition of the select_atoms function [def select_atoms(self):]* > > *return Selection(self.residue_range('431:B', '436:B'))* > *with* > *return Selection(self.residue_range('7:A', '12:A'))* > > This specified the portion to be allowed to move during model > generation/refinement, without allowing the rest of the atoms to move. *The > residue ranges '431:B', '436:B' and '7:A', '12:A' both refer to > S(431)ATDIG(436), with respect to the numbering in the full-length sequence > (NCBI refSeq)*, but in the latter format, it corresponds to the *numbering > of residues given by Modeller to each of the newly generated models*, > which *starts with residue number 1*. > > The two residues corresponding to residue positions 423 and 424 (as per > the full-length sequence could then be modelled as a dipeptide using UCSF > Chimera's Build Structure and then this dipeptide model could be saved as a > PDB and then opened in UCSF Chimera along with the Modeller-generated model > and the two chains (Chimera-generated dipeptide and Modeller-generated > model) could be joined into a single model by forming a peptide bond > between them using the Join Model function/tool in UCSF Chimera. > > Note that the start of the sequence of residues in the PDB 5BS8 at chain b > that has atom records/coordinates (sequence *A*LVRRK...) differs from > the corresponding sequence in the NCBI RefSeq (where it is *E*LVRRK...) > by the identity of a single residue and Modeller includes E rather than A > at the start of this sequence, giving preference to the template sequence > provided as the second sequence in the alignment file. So, if I wanted it > to be "A" in the model, as in the structure file's sequence, I would need > to make this change in the alignment file in the second sequence (template > sequence) listed in the file. > > Thanks, and regards, > Siddhartha > > On Wed, May 29, 2024 at 12:30 PM Modeller Caretaker < > modeller-care@salilab.org> wrote: > >> On 5/28/24 11:48 PM, Siddhartha Barua via modeller_usage wrote: >> > *KeyError: 'No such residue: 431:B'* >> >> Residues in the model are by default numbered starting at 1 and the >> chains labeled alphabetically starting at A. Since you only have a >> single chain, it will be labeled A, not B. >> See https://salilab.org/modeller/10.5/manual/node23.html >> If you want to number the residues differently, see >> https://salilab.org/modeller/10.5/manual/node30.html >> >> It looks like you are mistakenly using the template residue numbering >> here. >> >> Ben Webb, Modeller Caretaker >> -- >> modeller-care@salilab.org https://salilab.org/modeller/ >> Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage >> > > > -- > Siddhartha A. Barua, Ph.D. > Mb.: +91 7777093994 > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > https://salilab.org/mm/postorius/lists/modeller_usage.salilab.org/
participants (3)
-
Joel Subach
-
Modeller Caretaker
-
Siddhartha Barua