Lucky16: Ordered search across multiple models?
hi Ben, i'm finding an odd pattern across what i think of as random trials, and i wonder why.
Suppose i run N independently seeded runs from the same alignment and generate K models each time; ie, i have a distribution of N * K models. Now consider a small fraction EPSILON of these with good GA341 scores. i would expect the fraction of models which happened to be model "target.B ..._k" ( ie, the model that gets generated the k-th time by Modeller) to be uniform over choice of k, wouldn't you?
instead, i find that for some values of k, ~50% have very good GA341 scores (across various random seed instances), while some have none. here's a sample distribution with K=20:
> ModNum Freq > 1 0.01 > 3 0.03 > 4 0.02 > 5 0.01 > 6 0.09 > 7 0.01 > 8 0.01 > 9 0.05 > 11 0.06 > 12 0.08 > 15 0.01 > 16 0.50 > 19 0.05 > 20 0.01
what makes model#16 so consistently good?1
rik
On 11/7/11 2:31 PM, R K Belew wrote: > Suppose i run N independently seeded runs from the same alignment and > generate K models each time; ie, i have a distribution of N * K models. > Now consider a small fraction EPSILON of these with good GA341 > scores. i would expect the fraction of models which happened to > be model "target.B ..._k" ( ie, the model that gets generated the > k-th time by Modeller) to be uniform over choice of k, wouldn't you?
Yes, I would expect that.
> what makes model#16 so consistently good?
That is rather puzzling, since Modeller doesn't know how to generate "good" models (if it did, it would do it every time, not only for model #16!) All it can do is try to build models that violate the restraints as minimally as possible. The only difference between model #15 and model #16 is the starting conformation (the restraints and optimization are the same) but since that is generated randomly (by default) there shouldn't be any difference in the final statistics. I would suspect a bug in your procedure somewhere...
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
R K Belew