Sali lab decoy sets for model assessment


1997 Azat Badretdinov's
The models are organized in groups of different accuracy; they were used for testing statistical potentials.
1998 Melo's good and bad sets
Models for proteins of known structure, organized in a GOOD group containing models based on correct templates and approximately correct alignments, and BAD group containing models based on incorrect templates or very poor alignments.
This decoy set was used in the following publications:
Sanchez & Sali, PNAS (1998)
Melo et al. Prot. Sci (2002)
2000 Fiser's loop sets
Sets of loops derived from known structures spanding a range of size from 4 to 12 residues.
This decoy set was used in the following publications:
Fiser et al., Prot. Sci. (2000)
2003 John's MOULDER set
20 sequences were randomly selected from the Fischer set of 68 pairs of remotely related protein structures from 51 to 568 residues in size. For each sequence, 300 comparative models were built using its closest structurally related sequence as the template. The models were built using alignments that shared no more than 95% of identically aligned positions or had at least 5 different alignment positions. A single comparative model of the target sequence that contains all non-hydrogen atoms was built for each alignment by MODELLER-6, applying the default model building routine model with fast refinement.
This decoy set was used in the following publications:
John & Sali, NAR (2003)
2005 Topf's ModEM sets
The Mod-EM benchmark set include native proteins, comparative models, and density maps.The benchmark for testing the new moulding protocol consists of 20 pairs of proteins of known structure sharing between 10% and 31% sequence identity (17% on average), including target-template pairs from the two original studies as well as several new pairs. These proteins range in size from 81 to 388 residues (203 on average) and represent all major fold classes. For each of the native structures of the 20 target proteins, a density map was simulated at 10 Å resolution using the PDB2MRC command in the EMAN package, an achievable resolution for single particle cryoEM. For 3 proteins in the benchmark, additional density maps were simulated at 5, 15, 20, and 25 Å resolution.
This decoy set was used in the following publications:
Topf et al., J. Struct. Biol. (2005)
Topf et al., J. Mol. Biol. (2006)
2006 Eramian's SVMod sets
MOULDER set
Twenty target/template pairs of protein sequences with known structures ranging from 81 to 340 residues in length were randomly selected from the Fischer set of remotely related homologs. The 20 targets do not share significant structural similarity to each other. For each of the 20 targets, the structural template specified by the Fischer set was used as the template. The target-template alignments were obtained using MOULDER (see above) with MODELLER to create 300 different target-template alignments. The 300 alignments uniformly ranged from approximately 0 to 100% of both the native overlap and the correctly aligned positions with respect to the CE structure-based alignment. A comparative model was built from each target-template alignment using the default parameters for the model routine in MODELLER. Thus, the final decoy set consisted of a total of 300 models for each of the 20 targets. All scores for models in this set generated for the SVMod paper can be found here (~4Mb)

MODPIPE set
A total of 168,632 comparative models were calculated by our automated comparative modeling protocol MODPIPE for the PDB-select40 list (6,877 sequences as of March 2005). All models shorter than 100 residues or larger than 250 residues were removed from the testing set. This length restriction reduced the set size to 80,593 models for 4,011 different sequences. The RMSD binning of the models in the MODPIPE set shows that ~5% of models are within 1 Å RMSD to the native structure (very good models), ~13% are within 1-3Å RMSD (good models), ~20% are within the RMSD range 3-5Å (acceptable models), and ~62% superimpose to the native structure with an RMSD >5Å (bad models). All scores for models in this set generated for the SVMod paper can be found here (~31Mb).
These decoy sets were used in the following publications:
Eramian et al., Prot. Sci. (2006)