evaluation of models

26 Mar 1998


      Dear Andrej,
I have a few questions I would highly value your opinions on, and below a
status report on what I am doing on evaluating fold models and my opinions
on the methods I am using.   My models are based on fold predictions, not
sequence homology.
Questions.
1.  I am going to objectively evaluate my models, and for this I need some
homology models and some incorrect, misthreaded refined or highly refined
models to compare to.  If you could send me about 100 pdb files of each
type, that vary in length and include lengths in the 100-400 range, that
would be extremely helpful.
2.  Where can I get other models to evaluate?  I can find some homology
models at Brookhaven, can you refer me elsewhere?
3.  Please comment on my evaluation plan.  I will calculate the z-scores of
combined energy using the zscore command of Prosa II from Sippl's group or
nqachk from Whatif.  ProsaII comes with pre-calculated z-scores for a
database of proteins, and shows plots of z-score versus length.  I want to
add points for 1) my models, 2) "correct" homology models, and 3)
misthreaded, but well energy minimized, misthreaded models.  My hope is
that my models will lie in the range of the homology models, and be
intermediate between the real structures and the misthreaded models, and
clearly better than the misthreaded models.  I would dearly love to have a
statistical test showing the model is correctly threaded, i.e.
significantly reject the null hypothesis of a misthreaded model.
Comments.  I have compared the structure evaluation tools of Prosa and
Whatif.  Whatif has a second generation quality control feature called
nqachk (new quachk).    Prosa and  nqachk appear to better discriminate
models from xray structures than quachk.  z-scores for both Prosa and
nqachk are dependent on the length of the protein.  My models are n11, d11,
and r11.  They are compared to two templates, mo6_s2 and 1tbgB.  The latter
both have native scores.  My models do well on surface potentials, but have
sc-sc and bb-sc contacts that lower their pairwise z scores and nqachk
values.  My models are better than any of the  modules in the polyprotein
of ProsaII, ranking first for combined energy in all cases.
ProsaII (Sippl) output
Hide & Seek on polyprotein pII3.0.short.ply - selection of parameters
molecule     seq-l zp-comb zp-pair zp-surf rk-comb rk-pair rk-surf  ep-comb
ep-pair  ep-surf
n11         265   -5.48   -2.33   -5.17       1     191       1    -3.34
11.72    -6.41
d11         264   -5.34   -2.65   -4.42       1      56       1     7.33
4.88     1.13
r11         260   -5.71   -2.42   -5.11       1     125       1   -11.04
6.57    -9.05
mo6_s2      257   -9.74   -6.26   -6.68       1       1       1   116.35
-65.75   -26.03
1tbgB       296   -9.67   -5.79   -7.25       1       1       1   112.59
-53.15   -30.41
Whatif (Vriend) output ("Average for range" is quachk, "All contacts" and
following 4 lines are nqachk)
n11.pdb.txt
Average for range    1 - 265 :  -1.032
 All   contacts    : Average = -0.685 Z-score =  -4.37
 BB-BB contacts    : Average = -0.066 Z-score =  -0.46
 BB-SC contacts    : Average = -0.968 Z-score =  -5.24
 SC-BB contacts    : Average = -0.156 Z-score =  -0.78
 SC-SC contacts    : Average = -0.695 Z-score =  -3.51
d11.pdb.txt
Average for range    1 - 264 :  -0.569
 All   contacts    : Average = -0.576 Z-score =  -3.65
 BB-BB contacts    : Average =  0.026 Z-score =   0.20
 BB-SC contacts    : Average = -0.946 Z-score =  -5.12
 SC-BB contacts    : Average = -0.039 Z-score =  -0.07
 SC-SC contacts    : Average = -0.580 Z-score =  -2.88
r11.pdb.txt
Average for range    1 - 260 :  -0.743
 All   contacts    : Average = -0.551 Z-score =  -3.49
 BB-BB contacts    : Average =  0.038 Z-score =   0.28
 BB-SC contacts    : Average = -0.868 Z-score =  -4.69
 SC-BB contacts    : Average = -0.131 Z-score =  -0.63
 SC-SC contacts    : Average = -0.537 Z-score =  -2.64
*******************************************************************************
*  Timothy A. Springer, Ph.D.                                                 *
*  Latham Family Professor of Pathology                                       *
*  Harvard Medical School            e-mail: springer@sprsgi.med.harvard.edu  *
*  Center for Blood Research            phone:  617-278-3200                  *
*  200 Longwood Ave.  Room 251          fax:    617-278-3232                  *
*  Boston MA 02115                                                            *
*******************************************************************************

evaluation of models

Timothy A. Springer Ph.D.