Dear Andrej,
I have a few questions I would highly value your opinions on, and below a
status report on what I am doing on evaluating fold models and my opinions
on the methods I am using. My models are based on fold predictions, not
sequence homology.
Questions.
1. I am going to objectively evaluate my models, and for this I need some
homology models and some incorrect, misthreaded refined or highly refined
models to compare to. If you could send me about 100 pdb files of each
type, that vary in length and include lengths in the 100-400 range, that
would be extremely helpful.
2. Where can I get other models to evaluate? I can find some homology
models at Brookhaven, can you refer me elsewhere?
3. Please comment on my evaluation plan. I will calculate the z-scores of
combined energy using the zscore command of Prosa II from Sippl's group or
nqachk from Whatif. ProsaII comes with pre-calculated z-scores for a
database of proteins, and shows plots of z-score versus length. I want to
add points for 1) my models, 2) "correct" homology models, and 3)
misthreaded, but well energy minimized, misthreaded models. My hope is
that my models will lie in the range of the homology models, and be
intermediate between the real structures and the misthreaded models, and
clearly better than the misthreaded models. I would dearly love to have a
statistical test showing the model is correctly threaded, i.e.
significantly reject the null hypothesis of a misthreaded model.
Comments. I have compared the structure evaluation tools of Prosa and
Whatif. Whatif has a second generation quality control feature called
nqachk (new quachk). Prosa and nqachk appear to better discriminate
models from xray structures than quachk. z-scores for both Prosa and
nqachk are dependent on the length of the protein. My models are n11, d11,
and r11. They are compared to two templates, mo6_s2 and 1tbgB. The latter
both have native scores. My models do well on surface potentials, but have
sc-sc and bb-sc contacts that lower their pairwise z scores and nqachk
values. My models are better than any of the modules in the polyprotein
of ProsaII, ranking first for combined energy in all cases.
ProsaII (Sippl) output
Hide & Seek on polyprotein pII3.0.short.ply - selection of parameters
molecule seq-l zp-comb zp-pair zp-surf rk-comb rk-pair rk-surf ep-comb
ep-pair ep-surf
n11 265 -5.48 -2.33 -5.17 1 191 1 -3.34
11.72 -6.41
d11 264 -5.34 -2.65 -4.42 1 56 1 7.33
4.88 1.13
r11 260 -5.71 -2.42 -5.11 1 125 1 -11.04
6.57 -9.05
mo6_s2 257 -9.74 -6.26 -6.68 1 1 1 116.35
-65.75 -26.03
1tbgB 296 -9.67 -5.79 -7.25 1 1 1 112.59
-53.15 -30.41
Whatif (Vriend) output ("Average for range" is quachk, "All contacts" and
following 4 lines are nqachk)
n11.pdb.txt
Average for range 1 - 265 : -1.032
All contacts : Average = -0.685 Z-score = -4.37
BB-BB contacts : Average = -0.066 Z-score = -0.46
BB-SC contacts : Average = -0.968 Z-score = -5.24
SC-BB contacts : Average = -0.156 Z-score = -0.78
SC-SC contacts : Average = -0.695 Z-score = -3.51
d11.pdb.txt
Average for range 1 - 264 : -0.569
All contacts : Average = -0.576 Z-score = -3.65
BB-BB contacts : Average = 0.026 Z-score = 0.20
BB-SC contacts : Average = -0.946 Z-score = -5.12
SC-BB contacts : Average = -0.039 Z-score = -0.07
SC-SC contacts : Average = -0.580 Z-score = -2.88
r11.pdb.txt
Average for range 1 - 260 : -0.743
All contacts : Average = -0.551 Z-score = -3.49
BB-BB contacts : Average = 0.038 Z-score = 0.28
BB-SC contacts : Average = -0.868 Z-score = -4.69
SC-BB contacts : Average = -0.131 Z-score = -0.63
SC-SC contacts : Average = -0.537 Z-score = -2.64
*******************************************************************************
* Timothy A. Springer, Ph.D. *
* Latham Family Professor of Pathology *
* Harvard Medical School e-mail: *
* Center for Blood Research phone: 617-278-3200 *
* 200 Longwood Ave. Room 251 fax: 617-278-3232 *
* Boston MA 02115 *
*******************************************************************************