[modeller_usage] Fibrillar collagen structure model 3hr2 Inbox

27 Jan 2021


      Dear Modeller Users,
I am trying to use Modeller to model a comparative protein structure for
the type I collagen.
I came to this trying to reproduce a model from this paper: doi
10.1021/nl103943u
( *Nano Lett.* 2011, 11, 2, 757–766 )
It also used Modeller to create such a model. I contacted the 1st author
for instructions on how he created the model, but unfortunately he said not
to remember anymore (it has been 10 years since).
I also want to mention that before posting this, I read the Modeller manual
and several questions and answers posted in the mail listing. They helped
me get to this point.
In the following I will describe in some details what I have done, with
what files and how:
All necessary files for reproduction are attached in this email (I tried to
attach all files - pdb, target fasta, outputs, etc. - , but it exceeded the
limit of 100 KB).
*1 - Target:* P02452 (CO1A1_HUMAN) and P08123 (CO1A2_HUMAN)
from https://www.uniprot.org/uniprot/P02452 and https://www.uniprot.org/
uniprot/P08123
The Targets are in the Fasta format. I converted them to .ali files in the
PIR format, as suggested by Ben Webb in
https://salilab.org/archives/modeller_usage/2010/msg00072.html
.
(see FASTAtoPIR.py in the attached files).
I manually edited the protein code and field2 of the .ali files (based on
what I read in the manual - *File formats B.1 Alignment file (PIR)*)
That is how it looks like now:
* COL1A1.ali*
___________________________________________________________________
>P1;COL1A1
sequence:COL1A1:     : :     : :::-1.00:-1.00
MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRICVCDNGKVLCDDVIC
...*
___________________________________________________________________
*2 - Template:* 3hr2.pdb
from https://www.rcsb.org/structure/3HR2
It is very important to me to get a model as close to this structure as
possible, since it comprises the collagen fibrillar structure.
*3 - Alignment: *I decided to align each chain and sequence individually,
as many answers in the mail listing suggested.
First of all I tried to align the COL1A1.ali with the chain A of the 3hr2.
pdb structure.
I aligned the sequences using salign (align2d was taking too long).
The 3hr2.pdb contains 2 non standard amino acids:
HYP hydroxyproline
LYZ Hydroxylysine
I added - env.io.hetatm = True - in order to read them.
*1A_salign.py*
___________________________________________________________________
env = environ()
aln = alignment(env)
env.io.hetatm = True
mdl = model(env, file='3hr2', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='3hr2A', atom_files='3hr2.pdb')
aln.append(file='COL1A1.ali', align_codes='COL1A1')
aln.salign()
aln.write(file='COL1A1-3hr2A.ali', alignment_format='PIR')
aln.write(file='COL1A1-3hr2A.pap', alignment_format='PAP')
___________________________________________________________________
I added to restyp.lib:
HETATM | HYP                 | O |   | HYP  | hydroxyproline
HETATM | LYZ                  | K |   | LYS  | lysine, +1
I have HYP topology and parameters, but since I don't have them for LYZ I
chose to treat LYZ as LYS (Lysine).
I do not know if I can do it. Is it better to do so or to simply let
Modeller continue with the following warning?
*read_pd_459W> Residue type  LYZ not recognized. 'automodel' model building
will treat this residue as a rigid body.*
*4 - Automodel: *I created 10 models and chose the one with lowest DOPE
score (all scored 1.0 in the GA341 score), in my case, COL1A1.B99990010.pdb.
I included HYP topology and parameters information to the *top_heav*.lib
and *par*.lib files from the Modeller Library. Furthermore I added the HB1
(new type in HYP top) atom type information to *solv*.lib, *radii*.lib and
*radii14*.lib.
The HYP information was taken from *top_all36_prot_modify_res*.str and
*par_all36_prot_modify_res*.str in the CHARMm FF.
*2A_model-single.py*
___________________________________________________________________
from modeller.automodel import *
env = environ()
env.io.hetatm = True
env.libs.topology.read('${LIB}/top_heav.lib')
env.libs.parameters.read('$(LIB)/par.lib') # from manual
a = automodel(env, alnfile='COL1A1-3hr2A.ali',
              knowns='3hr2A', sequence='COL1A1',
              assess_methods=(assess.DOPE,
                              assess.GA341))
a.starting_model = 1
a.ending_model = 10
a.make()
___________________________________________________________________
*5 - Final model: *I also ran the 3A_evaluate_model and 4A_plot_profiles
python scripts and after that opened the 3hr2 A chain and the final model
in VMD.
The model looks good, however, it is not as close to the 3hr2 structured as
I wished. As a mechanical engineer I am no used to those models nor to
homology modeling and have been learning all here by reading the manual,
instructions and mail listing. Did I make many/any mistakes? Where and
what? Can the model become better (more close to the 3hr2 template
structure)? If yes, how? How to know if a model is "good enough"? Can I
proceed like that and create the models for the chains B and C and complete
the collagen structure?
I am afraid to lose the fibrillar collagen structure (which in this case is
similar to a crystalline periodic structure) from 3hr2.pdb if the model is
not very similar to the the 3hr2 structure.
Thank you in advance for reading up to here. I am looking forward to
reading your observations, suggestions and corrections.
They are all very welcome.
Amadeus

[modeller_usage] Fibrillar collagen structure model 3hr2 Inbox

Amadeus Cavalcanti Salvador de Alcântara