Dear Andrej,
I have a few questions I would highly value your opinions on, and below a status report on what I am doing on evaluating fold models and my opinions on the methods I am using. My models are based on fold predictions, not sequence homology.
Questions.
1. I am going to objectively evaluate my models, and for this I need some homology models and some incorrect, misthreaded refined or highly refined models to compare to. If you could send me about 100 pdb files of each type, that vary in length and include lengths in the 100-400 range, that would be extremely helpful.
2. Where can I get other models to evaluate? I can find some homology models at Brookhaven, can you refer me elsewhere?
3. Please comment on my evaluation plan. I will calculate the z-scores of combined energy using the zscore command of Prosa II from Sippl's group or nqachk from Whatif. ProsaII comes with pre-calculated z-scores for a database of proteins, and shows plots of z-score versus length. I want to add points for 1) my models, 2) "correct" homology models, and 3) misthreaded, but well energy minimized, misthreaded models. My hope is that my models will lie in the range of the homology models, and be intermediate between the real structures and the misthreaded models, and clearly better than the misthreaded models. I would dearly love to have a statistical test showing the model is correctly threaded, i.e. significantly reject the null hypothesis of a misthreaded model.
Comments. I have compared the structure evaluation tools of Prosa and Whatif. Whatif has a second generation quality control feature called nqachk (new quachk). Prosa and nqachk appear to better discriminate models from xray structures than quachk. z-scores for both Prosa and nqachk are dependent on the length of the protein. My models are n11, d11, and r11. They are compared to two templates, mo6_s2 and 1tbgB. The latter both have native scores. My models do well on surface potentials, but have sc-sc and bb-sc contacts that lower their pairwise z scores and nqachk values. My models are better than any of the modules in the polyprotein of ProsaII, ranking first for combined energy in all cases.
ProsaII (Sippl) output Hide & Seek on polyprotein pII3.0.short.ply - selection of parameters molecule seq-l zp-comb zp-pair zp-surf rk-comb rk-pair rk-surf ep-comb ep-pair ep-surf n11 265 -5.48 -2.33 -5.17 1 191 1 -3.34 11.72 -6.41 d11 264 -5.34 -2.65 -4.42 1 56 1 7.33 4.88 1.13 r11 260 -5.71 -2.42 -5.11 1 125 1 -11.04 6.57 -9.05 mo6_s2 257 -9.74 -6.26 -6.68 1 1 1 116.35 -65.75 -26.03 1tbgB 296 -9.67 -5.79 -7.25 1 1 1 112.59 -53.15 -30.41
Whatif (Vriend) output ("Average for range" is quachk, "All contacts" and following 4 lines are nqachk) n11.pdb.txt Average for range 1 - 265 : -1.032 All contacts : Average = -0.685 Z-score = -4.37 BB-BB contacts : Average = -0.066 Z-score = -0.46 BB-SC contacts : Average = -0.968 Z-score = -5.24 SC-BB contacts : Average = -0.156 Z-score = -0.78 SC-SC contacts : Average = -0.695 Z-score = -3.51
d11.pdb.txt Average for range 1 - 264 : -0.569 All contacts : Average = -0.576 Z-score = -3.65 BB-BB contacts : Average = 0.026 Z-score = 0.20 BB-SC contacts : Average = -0.946 Z-score = -5.12 SC-BB contacts : Average = -0.039 Z-score = -0.07 SC-SC contacts : Average = -0.580 Z-score = -2.88
r11.pdb.txt Average for range 1 - 260 : -0.743 All contacts : Average = -0.551 Z-score = -3.49 BB-BB contacts : Average = 0.038 Z-score = 0.28 BB-SC contacts : Average = -0.868 Z-score = -4.69 SC-BB contacts : Average = -0.131 Z-score = -0.63 SC-SC contacts : Average = -0.537 Z-score = -2.64
******************************************************************************* * Timothy A. Springer, Ph.D. * * Latham Family Professor of Pathology * * Harvard Medical School e-mail: springer@sprsgi.med.harvard.edu * * Center for Blood Research phone: 617-278-3200 * * 200 Longwood Ave. Room 251 fax: 617-278-3232 * * Boston MA 02115 * *******************************************************************************