---------- Forwarded message ---------
From: Anwesha Mohapatra <anwesha.mohapatra11@gmail.com>
Date: Tue, Mar 31, 2020 at 11:58 PM
Subject: Re: [modeller_usage] reg: issue in importing modeller
To: Modeller Caretaker <modeller-care@salilab.org>


Dear Sir,
I read the modeller forum for multichain proteins however I am not sure whether I have understood correctly. I am a novice in the area of structural biology and hence have lots of queries.
1) In case of a homomeric protein.
For example protein with id 1sv6 is my template which has 5 chains (A-E). In order to align my target protein with this template, I copied and pasted the sequence of my target 5 times separated by the delimiter '/'.
The alignment of the template (1SV6) with my target gene ABM40476.1 gave the following result. Is this the right way to proceed in case of homomeric proteins?
>P1;1sv6.pdb
structureX:1sv6.pdb:   1 :A:+1305:E:MOL_ID  1; MOLECULE  2-KETO-4-PENTENOATE HYDRATASE; CHAIN  A, B, C, D, E; SYNONYM  2-HYDROXYPENTADIENOIC ACID HYDRATASE; EC  4.2.1.-; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  ESCHERICHIA COLI; ORGANISM_TAXID  562; GENE  MHPD, B0350; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562: 2.90: 0.24
MT--KHTLEQLAADLRRAAEQGEAIAPLRDLIGIDNAEAAYAIQHINVQHDVAQGRRVVGRKVGLTHPKVQQQLG
VDQPDFGTLFADMCYGDNEIIPFSRVLQPRIEAEIALVLNRDLPATDITFDELYNAIEWVLPALEVVGSRIRDWS
IQFVDTVADNASCGVYVIGGPAQRPAGLDLKNCAMKMTRNNEEVSSGRGSECLGHPLNAAVWLARKMASLGEPLR
TGDIILTGALGPMVAVNAGDRFEAHIEGIGSVAATFSS/
MT--KHTLEQLAADLRRAAEQGEAIAPLRDLIGIDNAEAAYAIQHINVQHDVAQGRRVVGRKVGLTHPKVQQQLGVD
QPDFGTLFADMCYGDNEIIPFSRVLQPRIEAEIALVLNRDLPATDITFDELYNAIEWVLPALEVVGSRIRDWSIQFVDT
VADNASCGVYVIGGPAQRPAGLDLKNCAMKMTRNNEEVSSGRGSECLGHPLNAAVWLARKMASLGEPLRTGDIILT-G
ALGPMVAVNAGDRFEAHIEGIGSVAATFSS/
MTKHTLEQLAADLRRAAEQGEAIAPLRD XXXXXXXXXXX repetition of the chain  XXXXXXXXXXXXX*

>P1;ABM40476.1_-_Acidovorax_sp._JS42
sequence:ABM40476.1_-_Acidovorax_sp._JS42:     : :     : ::: 0.00: 0.00
MTMTPALIEQLGDELYQALTQRRMLEPLTNRHADITIDDAYAIQQKMLARRLAAGEKVVGKKIGVTSKAVMDMLG
VFQPDFGWLTDGMVFNEGQAVQANTLIQPKAEGEIAFVLKKTLKGPGITAADVLAATEGVMACFEIVDSRIRDWK
IKIQDTVADNASCGVFVLGDRLVDPRDVDLGTCGMVLEKNGDIVATGAGAA------------------------
--------ALGH-PA-NA-------------------V/
MTMTPALIEQLGDELYQALTQRRMLEPLTNRHADIT
IDDAYAIQQKMLARRLAAGEKVVGKKIGVTSKAVMDMLGVFQPDFGWLTDGMVFNEGQAVQANTLIQPKAEGEIA
FVLKKTLKGPGITAADVLAATEGVMACFEIVDSRIRDWKIKIQDTVADNASCGVFVLGDRLVDPRDVDLGTCGMV
LEKNGDIVATGAGAAALGHPANA----------------------V/
MTMTPALIEQLGDE-------------LYQA-LTQR---RMLEPLTNR------HADIT----IDD---AYAIQQKMLARRLAAGEKVVGK
KIGVTSKAVMDMLGVFQPDFGWLTDGMVFNEGQAVQANTLIQPKAEGEIAFVLKKTLKGPGITAADVLAATEGVM
ACFEIVDSRIRDWKIKIQDTVADNASCGVFVLGDRLVDPRDVDLGTCGMVLEKNGDIVATGAGAAALGHPANA--------------
--------V/
(XXXX and so on )--------*

2.) In order to further model this protein should I be using MyModel instead of automodel?
If so do I need to make changes in this part of the code as per the number of chains in the template (highlighted below)?
  s1 = selection(self.chains['A']).only_atom_types('CA')
  s2 = selection(self.chains['B']).only_atom_types('CA') #Should the C,D,E chains be appended to self.restraints.symmetry?
  self.restraints.symmetry.append(symmetry(s1, s2, 1.0))
3). I also have a heteromeric protein such as 1O7G which has a large and small subunit as chain A and B. 
The target gene that I want to model shows sequence similarity only with chain A. How should my alignment file be in this case?
for example if the template is
>P1;1o7g
structure::::
MNOPTHSWYRTY....XXXXXXXXX...../
MRSF...........* 
then should my target gene which has aligned to only chainA be depicted as
>P1,TargetGene
sequence:::::::
MNOPTYIKL--DFT--WANH--XXX--/
-----------------------------------------------*

Sir , request you to please guide me in this issue as I am stuck and unable to proceed further.
Thanks & Regards
Anwesha


On Sat, Mar 28, 2020 at 6:21 PM Anwesha Mohapatra <anwesha.mohapatra11@gmail.com> wrote:
Hello Sir,

Could you please suggest how to proceed . I have attached my files and their corresponding scenarios in the previous mail.

Kindly guide me.

Thanks & Regards
Anwesha

On Fri, Mar 27, 2020, 7:25 PM Anwesha Mohapatra <anwesha.mohapatra11@gmail.com> wrote:
Dear Sir,

Firstly thank you, the previous fortran error got resolved when I used relative path instead of absolute and a shorter name for files.
I have a few queries which I have mentioned below:
-----------------------------------------------------------------------
1. As per our previous discussion regarding multi-chain protein . I have used the template 1o7g to align against my query gene sequence.
In the alignment I do see a chain break '/' in the sequence for the template however the query protein is aligned to the entire of the template protein (To both the chains A and B).
But I do now this same query sequence aligns to A chain of the template as per my Blastp result. I can see lots of gaps added to the query gene when aligned to entire protein. Is this correct?
I have attached the alignment files in the 'query.zip file'. One file shows alignment of target with all chains of the template and the other aligned to only chain A.

2. Since the PDB file(1o7g) that I am using as a template  has all the HETATOMS after the chains I have added the hetatms information after the chains in the alignment file as well.
On modelling however I dont see the HETATMS added to the pdb file created. Moreoever I am unable to open the file on PyMol. (*** buffer overflow detected ***: python2.7 terminated).
I have attached the file 'ModelError_Test1.zip' which contains the input alignment file and the model generated in .pdb format. Could you tell me whether I have made any mistakes in the alignment file.

3. Since modelller only reads ATOMS and HETATOMS ,I tried to rewrite the PDB template file where the HETATOMS come directly after the chain.
I have changed the alignment files accordingly.
However, in this case modeller is not able to open the file and says its a corrupt pdb file. I have attached the pdb ,alignment and log files and shared in the attached folder named "Query3.zip"

4. I am unable to understand the concept of missing residues.
If some residues are missing in the template then how should I proceed.
Do I need to check for the presence of missing residues for each template file?

Kindly help me with these queries.
Thanks & Regards
Anwesha

On Tue, Mar 24, 2020 at 7:51 AM Anwesha Mohapatra <anwesha.mohapatra11@gmail.com> wrote:
Thank you Sir, I will try with shorter names and relative paths

On Tue, Mar 24, 2020, 7:48 AM Modeller Caretaker <modeller-care@salilab.org> wrote:
On 3/21/20 11:15 PM, Anwesha Mohapatra wrote:
> I am sending the input files and the code I have been using .

I think that what's happening here is that your alignment file names are
really long, and you're hitting an internal limit inside Modeller (your
input files work for me, but I can reproduce your problem if I put the
files inside a directory with a very long name). This will be fixed in
the next Modeller release. In the meantime you can work around it by
using shorter file or directory names, or using relative rather than
absolute paths to your alignment files.

        Ben Webb, Modeller Caretaker
--
modeller-care@salilab.org             https://salilab.org/modeller/
Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage