Hi Siddhartha you're welcome:) and feel free to always inquire if you followed the instructions and still generated an error(s).
On Thu, May 30, 2024 at 9:10 AM Siddhartha Barua via modeller_usage < modeller_usage@salilab.org> wrote:
> Dear Ben (Ben Webb, Modeller Caretaker) and Joel (Subach), > > Thanks a lot for your tips! > > I tinkered with the alignment and Python script files and got Modeller to > model the missing residues. > > I found two possible solutions to the problem: > > *1) Use of a dash at the beginning of the structure-derived sequence > portion of the alignment file, for each of the residues that were missing > relative to the full-length protein sequence, as per NCBI's RefSeq > (Reference Sequence):* > > For this, I used the following alignment file (with additional formatting > at the relevant portions, for emphasis- but I used the plain text version > for modelling), where I *explicitly specified the starting and ending > residue positions of the model segment* that had coordinates (except for > the short 6-residue stretch at S(431)ATDIG(436) (with missing coordinates)): > > >P1;5bs8_B > structure:5bs8_B.pdb:*425:B:675:B*:DNA Gyrase::: > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > *A* > LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > >P1;5bs8B_fill > sequence::::::::: > > MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKAR > *E**LVRRK**SATDIG* > *GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > * > > To make it easier for me to obtain a string of 424 dashes ("-"s) for the > above alignment file and then copy and paste this sequence at the start of > the structure-derived sequence part of the alignment file, without having > to manually type and count them, I used the following short Python script > (It can be modified according to the version of Python used, since some of > the older versions of Python use a different syntax for print statements > [e.g.: print "hello" vs print("hello")] ): > > """This script generates dashes. You need to enter the number of dashes to > print, when prompted to do so.""" > dashes = "" > > n = int(input(("Please enter the number of dashes that you want to print > as a contiguous stretch of dashes. Enter a non-zero, positive integer: "))) > for i in range(1, (n + 1)): > dashes += "-" > > print(dashes) > print("\n") > print(f"The number of dashes stored in the variable 'dashes' is > {len(dashes)}.") > > This modelled the long stretch of 424 missing residues at the start of the > structure-derived sequence portion of the alignment file (the first of the > two sequences in the file) as a long loop region, without secondary > structures. I then simply deleted the unnecessary residues at the > N-terminal part of each Modeller-generated model, in UCSF Chimera (i.e., I > deleted residues 1-422) and saved the modified PDB file. > > 2) Use of only a portion of the full-length protein sequence from NCBI > (NCBI RefSeq), the residues corresponding to the region 425-675, which > correspond exactly to the length of the residues present in the > atom/structure file used (a PDB file generated from the original PDB 5BS8 > by selecting chain B and saving only the selected atoms as a separate PDB > file), except for the 6 missing residues inside this chain > (S(431)ATDIG(436)), as the template sequence- the second sequence listed in > the alignment file: > > For this, in the alignment file, I mentioned the model segment bearing > atom records (coordinates) as 425:B:675:B as shown below: > > >P1;5bs8_B > structure:5bs8_B.pdb:425:B:675:B:DNA Gyrase::: > *A* > LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > >P1;5bs8B_fill > sequence::::::::: > *E* > LVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV* > > *In the Python script file, I only replaced the following line in the > definition of the select_atoms function [def select_atoms(self):]* > > *return Selection(self.residue_range('431:B', '436:B'))* > *with* > *return Selection(self.residue_range('7:A', '12:A'))* > > This specified the portion to be allowed to move during model > generation/refinement, without allowing the rest of the atoms to move. *The > residue ranges '431:B', '436:B' and '7:A', '12:A' both refer to > S(431)ATDIG(436), with respect to the numbering in the full-length sequence > (NCBI refSeq)*, but in the latter format, it corresponds to the *numbering > of residues given by Modeller to each of the newly generated models*, > which *starts with residue number 1*. > > The two residues corresponding to residue positions 423 and 424 (as per > the full-length sequence could then be modelled as a dipeptide using UCSF > Chimera's Build Structure and then this dipeptide model could be saved as a > PDB and then opened in UCSF Chimera along with the Modeller-generated model > and the two chains (Chimera-generated dipeptide and Modeller-generated > model) could be joined into a single model by forming a peptide bond > between them using the Join Model function/tool in UCSF Chimera. > > Note that the start of the sequence of residues in the PDB 5BS8 at chain b > that has atom records/coordinates (sequence *A*LVRRK...) differs from > the corresponding sequence in the NCBI RefSeq (where it is *E*LVRRK...) > by the identity of a single residue and Modeller includes E rather than A > at the start of this sequence, giving preference to the template sequence > provided as the second sequence in the alignment file. So, if I wanted it > to be "A" in the model, as in the structure file's sequence, I would need > to make this change in the alignment file in the second sequence (template > sequence) listed in the file. > > Thanks, and regards, > Siddhartha > > On Wed, May 29, 2024 at 12:30 PM Modeller Caretaker < > modeller-care@salilab.org> wrote: > >> On 5/28/24 11:48 PM, Siddhartha Barua via modeller_usage wrote: >> > *KeyError: 'No such residue: 431:B'* >> >> Residues in the model are by default numbered starting at 1 and the >> chains labeled alphabetically starting at A. Since you only have a >> single chain, it will be labeled A, not B. >> See https://salilab.org/modeller/10.5/manual/node23.html >> If you want to number the residues differently, see >> https://salilab.org/modeller/10.5/manual/node30.html >> >> It looks like you are mistakenly using the template residue numbering >> here. >> >> Ben Webb, Modeller Caretaker >> -- >> modeller-care@salilab.org https://salilab.org/modeller/ >> Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage >> > > > -- > Siddhartha A. Barua, Ph.D. > Mb.: +91 7777093994 > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > https://salilab.org/mm/postorius/lists/modeller_usage.salilab.org/