structure refinement and loop optimization protocol
Dear Modellers,
I've read previous posts on the same topic and concluded that it is better to generate multiple models with moderate refinement and loop optimization level, rather that a few with very thorough parameterization. I've also noticed myself that with the thorough parameterization parts of the secondary structure are distorted.
I have concluded about the optimum alignment after a lot of experimentation and would like to set up a very effective optimization process. However I'm not sure about the output files. My code looks like this:
a = MyLoopModel(env, alnfile=alignment, > knowns=known_templates, > assess_methods=(assess.DOPEHR,assess.normalized_dope), > sequence='target') > a.starting_model = 1 > a.ending_model = 2 > # Normal VTFM model optimization: > a.library_schedule = autosched.normal > a.max_var_iterations = 200 ## 200 by default > # Very thorough MD model optimization: > a.md_level = refine.slow > a.repeat_optimization = 1 > > a.loop.starting_model = 1 # First loop model > a.loop.ending_model = 5 # Last loop model > a.loop.md_level = refine.slow # Loop model refinement > level >
Which generates the following pdb files:
target.B99990001.pdb target.B99990002.pdb target.BL00040002.pdb > target.IL00000001.pdb target.IL00000002.pdb >
I thought the above should perform model refinement twice and write 5 different conformations (loop optimization) for each. So my questions are the following:
1) Can you explain what's happening with the .pdb files?
2) I 'd like to ask your opinion about the most effective way to find a near-native protein conformation in low sequence identity levels. How should the parameters shown above be set? I don't care if it's running a day or so as long as I get good results.
3) I also attempted to cluster the models with a.cluster(cluster_cut=1.5), which generated a representative structure with the parts of the protein that remained similar in most of the models but without the variable parts (files cluster.ini and cluster.opt). Does it make sense to select the model that is closer to that consensus structure? If yes is there a way to do it with Modeller? I know it can been found with Maxcluster program. Or alternatively, do you reckon it is better to select the based model based on the normalized DOPE z-score?
Hope to get some answers on these question cause I've been strangling to find the best refinement/optimization protocol for several weeks.
thanks, Thomas
On 6/15/10 3:29 AM, Thomas Evangelidis wrote: > I've read previous posts on the same topic and concluded that it is > better to generate multiple models with moderate refinement and loop > optimization level, rather that a few with very thorough > parameterization.
Indeed.
> I have concluded about the optimum alignment after a lot of > experimentation and would like to set up a very effective optimization > process. However I'm not sure about the output files. My code looks like > this: > > a = MyLoopModel(env, alnfile=alignment, > knowns=known_templates, > assess_methods=(assess.DOPEHR,assess.normalized_dope), > sequence='target') > a.starting_model = 1 > a.ending_model = 2
This seems to contradict your statement above. Two models is probably not going to give you sufficient sampling - as John W suggested, you should be building many more models - perhaps 100.
> a.loop.starting_model = 1 # First loop model > a.loop.ending_model = 5 # Last loop model
This should generate 12 models in total - 2 comparative models (*.B999*.pdb files), and then 5 further loop models for each comparative model (*.BL*.pdb files).
> Which generates the following pdb files: > > target.B99990001.pdb target.B99990002.pdb target.BL00040002.pdb > target.IL00000001.pdb target.IL00000002.pdb
The .IL files are initial (unoptimized) loops, so they are of little utility. But there should be many more loop (.BL) models generated. The log file will tell you why a particular model optimization failed.
> I thought the above should perform model refinement twice and write 5 > different conformations (loop optimization) for each.
Indeed.
> 2) I 'd like to ask your opinion about the most effective way to find a > near-native protein conformation in low sequence identity levels. How > should the parameters shown above be set? I don't care if it's running a > day or so as long as I get good results.
Build more models - that's the most effective way.
Ben Webb, Modeller Caretaker
>> a = MyLoopModel(env, alnfile=alignment, >> knowns=known_templates, >> assess_methods=(assess.DOPEHR,assess.normalized_dope), >> sequence='target') >> a.starting_model = 1 >> a.ending_model = 2 >> > > This seems to contradict your statement above. Two models is probably not > going to give you sufficient sampling - as John W suggested, you should be > building many more models - perhaps 100. > > That was a test code to explain why I don't get enough models. Of course the numbers I use are much higher.
> a.loop.starting_model = 1 # First loop model >> a.loop.ending_model = 5 # Last loop model >> > > The .IL files are initial (unoptimized) loops, so they are of little > utility. But there should be many more loop (.BL) models generated. The log > file will tell you why a particular model optimization failed. > > Indeed I have 9 messages like this:
target.BL00010001.pdb check_inf__E> Atom 1 has out-of-range coordinates (usually infinity). The objective function can thus not be calculated.
> 2) I 'd like to ask your opinion about the most effective way to find a >> near-native protein conformation in low sequence identity levels. How >> should the parameters shown above be set? I don't care if it's running a >> day or so as long as I get good results. >> > > Build more models - that's the most effective way. > > You mean without loop optimization? From your experience what values should be assigned to the following parameters:
# Normal VTFM model optimization: a.library_schedule = autosched.normal a.max_var_iterations = 200 # 200 by default # Very thorough MD model optimization: a.md_level = refine.slow a.repeat_optimization = 1 a.loop.md_level = refine.slow # Loop model refinement level
On 6/16/10 7:45 AM, Thomas Evangelidis wrote: ... > From your experience what values > should be assigned to the following parameters: > > # Normal VTFM model optimization: > a.library_schedule = autosched.normal > a.max_var_iterations = 200 # 200 by default > # Very thorough MD model optimization: > a.md_level = refine.slow > a.repeat_optimization = 1 > a.loop.md_level = refine.slow # Loop model refinement level
In my experience the defaults are generally fine here - there's little substitute for simply building more models to get a better sampling.
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
Thomas Evangelidis