Fibrillar collagen structure model 3hr2 Inbox
Dear Modeller Users,
I am trying to use Modeller to model a comparative protein structure for the type I collagen. I came to this trying to reproduce a model from this paper: doi 10.1021/nl103943u ( *Nano Lett.* 2011, 11, 2, 757–766 ) It also used Modeller to create such a model. I contacted the 1st author for instructions on how he created the model, but unfortunately he said not to remember anymore (it has been 10 years since). I also want to mention that before posting this, I read the Modeller manual and several questions and answers posted in the mail listing. They helped me get to this point.
In the following I will describe in some details what I have done, with what files and how: All necessary files for reproduction are attached in this email (I tried to attach all files - pdb, target fasta, outputs, etc. - , but it exceeded the limit of 100 KB).
*1 - Target:* P02452 (CO1A1_HUMAN) and P08123 (CO1A2_HUMAN) from https://www.uniprot.org/uniprot/P02452 and https://www.uniprot.org/ uniprot/P08123
The Targets are in the Fasta format. I converted them to .ali files in the PIR format, as suggested by Ben Webb in https://salilab.org/archives/modeller_usage/2010/msg00072.html . (see FASTAtoPIR.py in the attached files). I manually edited the protein code and field2 of the .ali files (based on what I read in the manual - *File formats B.1 Alignment file (PIR)*) That is how it looks like now:
* COL1A1.ali* ___________________________________________________________________ >P1;COL1A1 sequence:COL1A1: : : : :::-1.00:-1.00 MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRICVCDNGKVLCDDVIC ...* ___________________________________________________________________
*2 - Template:* 3hr2.pdb from https://www.rcsb.org/structure/3HR2 It is very important to me to get a model as close to this structure as possible, since it comprises the collagen fibrillar structure.
*3 - Alignment: *I decided to align each chain and sequence individually, as many answers in the mail listing suggested.
First of all I tried to align the COL1A1.ali with the chain A of the 3hr2. pdb structure. I aligned the sequences using salign (align2d was taking too long).
The 3hr2.pdb contains 2 non standard amino acids: HYP hydroxyproline LYZ Hydroxylysine
I added - env.io.hetatm = True - in order to read them.
*1A_salign.py* ___________________________________________________________________ env = environ() aln = alignment(env) env.io.hetatm = True
mdl = model(env, file='3hr2', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='3hr2A', atom_files='3hr2.pdb') aln.append(file='COL1A1.ali', align_codes='COL1A1') aln.salign() aln.write(file='COL1A1-3hr2A.ali', alignment_format='PIR') aln.write(file='COL1A1-3hr2A.pap', alignment_format='PAP') ___________________________________________________________________
I added to restyp.lib: HETATM | HYP | O | | HYP | hydroxyproline HETATM | LYZ | K | | LYS | lysine, +1
I have HYP topology and parameters, but since I don't have them for LYZ I chose to treat LYZ as LYS (Lysine). I do not know if I can do it. Is it better to do so or to simply let Modeller continue with the following warning?
*read_pd_459W> Residue type LYZ not recognized. 'automodel' model building will treat this residue as a rigid body.*
*4 - Automodel: *I created 10 models and chose the one with lowest DOPE score (all scored 1.0 in the GA341 score), in my case, COL1A1.B99990010.pdb.
I included HYP topology and parameters information to the *top_heav*.lib and *par*.lib files from the Modeller Library. Furthermore I added the HB1 (new type in HYP top) atom type information to *solv*.lib, *radii*.lib and *radii14*.lib. The HYP information was taken from *top_all36_prot_modify_res*.str and *par_all36_prot_modify_res*.str in the CHARMm FF.
*2A_model-single.py* ___________________________________________________________________ from modeller.automodel import * env = environ() env.io.hetatm = True env.libs.topology.read('${LIB}/top_heav.lib') env.libs.parameters.read('$(LIB)/par.lib') # from manual
a = automodel(env, alnfile='COL1A1-3hr2A.ali', knowns='3hr2A', sequence='COL1A1', assess_methods=(assess.DOPE, assess.GA341))
a.starting_model = 1 a.ending_model = 10 a.make() ___________________________________________________________________
*5 - Final model: *I also ran the 3A_evaluate_model and 4A_plot_profiles python scripts and after that opened the 3hr2 A chain and the final model in VMD.
The model looks good, however, it is not as close to the 3hr2 structured as I wished. As a mechanical engineer I am no used to those models nor to homology modeling and have been learning all here by reading the manual, instructions and mail listing. Did I make many/any mistakes? Where and what? Can the model become better (more close to the 3hr2 template structure)? If yes, how? How to know if a model is "good enough"? Can I proceed like that and create the models for the chains B and C and complete the collagen structure?
I am afraid to lose the fibrillar collagen structure (which in this case is similar to a crystalline periodic structure) from 3hr2.pdb if the model is not very similar to the the 3hr2 structure.
Thank you in advance for reading up to here. I am looking forward to reading your observations, suggestions and corrections. They are all very welcome.
Amadeus
On 1/27/21 1:48 PM, Amadeus Cavalcanti Salvador de Alcântara wrote: > The 3hr2.pdb contains 2 non standard amino acids: > HYP hydroxyproline > LYZ Hydroxylysine
I would be inclined to edit the PDB file and replace the coordinates for these modified residues with the corresponding standard amino acids. You can certainly use the '.' residue type, and treat them as ligands, but that will simply copy them from target to template, allowing no relaxation (or mutation), usually not what you want for modified residues.
> The model looks good, however, it is not as close to the 3hr2 structured > as I wished. As a mechanical engineer I am no used to those models nor > to homology modeling and have been learning all here by reading the > manual, instructions and mail listing. Did I make many/any mistakes? > Where and what? Can the model become better (more close to the 3hr2 > template structure)? If yes, how?
Modeller builds models that resemble the template, by construction. If you're seeing a big divergence, you likely have an alignment error. You should carefully inspect the alignment and look for obvious problems, such as lots of gaps, sequences of different lengths, or gaps in regions of known secondary structure.
> How to know if a model is "good enough"? This depends on what you want to use it for. But DOPE is only a ranking - it doesn't tell you how "good" the model is, only whether it's better than your other models. A score like normalized DOPE or TSVMod would give you a better idea; these can be calculated at https://salilab.org/evaluation/
> Can I proceed like that and create the models for the chains B > and C and complete the collagen structure?
To build a multi-chain structure, it's usually best to build all chains simultaneously, so you get any inter-chain interactions correct.
Ben Webb, Modeller Caretaker
participants (2)
-
Amadeus Cavalcanti Salvador de Alcântara
-
Modeller Caretaker