Dear Modeller Users,
I am trying to use Modeller to model a comparative protein structure for the type I collagen. I came to this trying to reproduce a model from this paper: doi 10.1021/nl103943u ( *Nano Lett.* 2011, 11, 2, 757–766 ) It also used Modeller to create such a model. I contacted the 1st author for instructions on how he created the model, but unfortunately he said not to remember anymore (it has been 10 years since). I also want to mention that before posting this, I read the Modeller manual and several questions and answers posted in the mail listing. They helped me get to this point.
In the following I will describe in some details what I have done, with what files and how: All necessary files for reproduction are attached in this email (I tried to attach all files - pdb, target fasta, outputs, etc. - , but it exceeded the limit of 100 KB).
*1 - Target:* P02452 (CO1A1_HUMAN) and P08123 (CO1A2_HUMAN) from https://www.uniprot.org/uniprot/P02452 and https://www.uniprot.org/ uniprot/P08123
The Targets are in the Fasta format. I converted them to .ali files in the PIR format, as suggested by Ben Webb in https://salilab.org/archives/modeller_usage/2010/msg00072.html . (see FASTAtoPIR.py in the attached files). I manually edited the protein code and field2 of the .ali files (based on what I read in the manual - *File formats B.1 Alignment file (PIR)*) That is how it looks like now:
* COL1A1.ali* ___________________________________________________________________ >P1;COL1A1 sequence:COL1A1: : : : :::-1.00:-1.00 MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRICVCDNGKVLCDDVIC ...* ___________________________________________________________________
*2 - Template:* 3hr2.pdb from https://www.rcsb.org/structure/3HR2 It is very important to me to get a model as close to this structure as possible, since it comprises the collagen fibrillar structure.
*3 - Alignment: *I decided to align each chain and sequence individually, as many answers in the mail listing suggested.
First of all I tried to align the COL1A1.ali with the chain A of the 3hr2. pdb structure. I aligned the sequences using salign (align2d was taking too long).
The 3hr2.pdb contains 2 non standard amino acids: HYP hydroxyproline LYZ Hydroxylysine
I added - env.io.hetatm = True - in order to read them.
*1A_salign.py* ___________________________________________________________________ env = environ() aln = alignment(env) env.io.hetatm = True
mdl = model(env, file='3hr2', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='3hr2A', atom_files='3hr2.pdb') aln.append(file='COL1A1.ali', align_codes='COL1A1') aln.salign() aln.write(file='COL1A1-3hr2A.ali', alignment_format='PIR') aln.write(file='COL1A1-3hr2A.pap', alignment_format='PAP') ___________________________________________________________________
I added to restyp.lib: HETATM | HYP | O | | HYP | hydroxyproline HETATM | LYZ | K | | LYS | lysine, +1
I have HYP topology and parameters, but since I don't have them for LYZ I chose to treat LYZ as LYS (Lysine). I do not know if I can do it. Is it better to do so or to simply let Modeller continue with the following warning?
*read_pd_459W> Residue type LYZ not recognized. 'automodel' model building will treat this residue as a rigid body.*
*4 - Automodel: *I created 10 models and chose the one with lowest DOPE score (all scored 1.0 in the GA341 score), in my case, COL1A1.B99990010.pdb.
I included HYP topology and parameters information to the *top_heav*.lib and *par*.lib files from the Modeller Library. Furthermore I added the HB1 (new type in HYP top) atom type information to *solv*.lib, *radii*.lib and *radii14*.lib. The HYP information was taken from *top_all36_prot_modify_res*.str and *par_all36_prot_modify_res*.str in the CHARMm FF.
*2A_model-single.py* ___________________________________________________________________ from modeller.automodel import * env = environ() env.io.hetatm = True env.libs.topology.read('${LIB}/top_heav.lib') env.libs.parameters.read('$(LIB)/par.lib') # from manual
a = automodel(env, alnfile='COL1A1-3hr2A.ali', knowns='3hr2A', sequence='COL1A1', assess_methods=(assess.DOPE, assess.GA341))
a.starting_model = 1 a.ending_model = 10 a.make() ___________________________________________________________________
*5 - Final model: *I also ran the 3A_evaluate_model and 4A_plot_profiles python scripts and after that opened the 3hr2 A chain and the final model in VMD.
The model looks good, however, it is not as close to the 3hr2 structured as I wished. As a mechanical engineer I am no used to those models nor to homology modeling and have been learning all here by reading the manual, instructions and mail listing. Did I make many/any mistakes? Where and what? Can the model become better (more close to the 3hr2 template structure)? If yes, how? How to know if a model is "good enough"? Can I proceed like that and create the models for the chains B and C and complete the collagen structure?
I am afraid to lose the fibrillar collagen structure (which in this case is similar to a crystalline periodic structure) from 3hr2.pdb if the model is not very similar to the the 3hr2 structure.
Thank you in advance for reading up to here. I am looking forward to reading your observations, suggestions and corrections. They are all very welcome.
Amadeus