This is quite similar to the question I posted recently quoted below:
3) I also attempted to cluster the models with dopehr_loopmodel.cluster(cluster_cut=1.5), which generated a representative structure
with the parts of the protein that remained similar in most of the
models but without the variable parts (files cluster.ini and
cluster.opt). Does it make sense to select the model that is closer to
that consensus structure? If yes is there a way to do it with Modeller? I
know it can been found with Maxcluster program. Or alternatively, do
you reckon it is better to select the model based on the
normalized DOPE z-score?
according to posts in this mailing list as well as the background
information to ModLoop, the best loop-model is chosen by lowest
pseudo-energy score
(http://modbase.compbio.ucsf.edu/modloop/ - Fiser's and Sali's papers cited
at the bottom of the page).
However, the tutorial of Modeller indicates that "it is important to note
that a most accurate approach to loop refinement requires the modeling of
hundreds of independent conformations and their clustering to select the
most representative structures of the loop" http://www.salilab.org/modeller/tutorial/advanced.html).
I have been comparing different loop-models generated by loop.model for a
selected region of a pdb-file and I am tempted to simply choose the best
DOPE-HR-scoring model. Yet the clustering idea does makes sense. So far, the
greatest cluster of models often contains (one of) the best scoring
model(s), but not in every case.
My question is therefore: Should the best model be chosen, or should the
best model of the greatest cluster be chosen?
I wonder about your opinions regarding this issue.
In case anyone is voting for the clustering method: What method is easily
suitable for clustering - unfortunately, the loop.model-class does not seem
to have an integrated clustering option, does it?