Sali lab decoy sets for model assessment
The models are organized in groups of different accuracy; they were used for testing statistical potentials.
1998 Melo's good and bad sets
Models for proteins of known structure, organized in a GOOD group containing models based on
correct templates and approximately correct alignments, and BAD group containing models based on incorrect
templates or very poor alignments.
Sets of loops derived from known structures spanding a range of size from 4 to 12 residues.
20 sequences were randomly selected from the Fischer set of 68 pairs of remotely related protein structures from 51 to 568 residues in size. For each sequence, 300 comparative models were built using its closest structurally related sequence as the template. The models were built using alignments that shared no more than 95% of identically aligned positions or had at least 5 different alignment positions. A single comparative model of the target sequence that contains all non-hydrogen atoms was built for each alignment by MODELLER-6, applying the default model building routine model with fast refinement.
The Mod-EM benchmark set include native proteins, comparative models, and density maps.The benchmark for testing the new moulding protocol consists of 20 pairs of proteins of known structure sharing between 10% and 31% sequence identity (17% on average), including target-template pairs from the two original studies as well as several new pairs. These proteins range in size from 81 to 388 residues (203 on average) and represent all major fold classes. For each of the native structures of the 20 target proteins, a density map was simulated at 10 Å resolution using the PDB2MRC command in the EMAN package, an achievable resolution for single particle cryoEM. For 3 proteins in the benchmark, additional density maps were simulated at 5, 15, 20, and 25 Å resolution.
2006 Eramian's SVMod sets
Twenty target/template pairs of protein sequences with known structures
ranging from 81 to 340 residues in length were randomly selected from the Fischer set of remotely related homologs.
The 20 targets do not share significant structural similarity to each other. For each of the 20 targets, the
structural template specified by the Fischer set was used as the template. The target-template alignments were
obtained using MOULDER (see above) with MODELLER to create 300 different target-template alignments. The 300
alignments uniformly ranged from approximately 0 to 100% of both the native overlap and the correctly aligned positions
with respect to the CE structure-based alignment. A comparative model was built from each target-template alignment
using the default parameters for the model
routine in MODELLER. Thus, the final decoy set consisted of a
total of 300 models for each of the 20 targets. All scores for models in this set generated for the SVMod paper
can be found here
A total of 168,632 comparative models were calculated by our automated comparative modeling
protocol MODPIPE for the PDB-select40 list (6,877 sequences as of March 2005). All models shorter than 100 residues
or larger than 250 residues were removed from the testing set. This length restriction reduced the set size to
80,593 models for 4,011 different sequences. The RMSD binning of the models in the MODPIPE set shows that ~5% of
models are within 1 Å RMSD to the native structure (very good models), ~13% are within 1-3Å RMSD (good models),
~20% are within the RMSD range 3-5Å (acceptable models), and ~62% superimpose to the native structure with an
RMSD >5Å (bad models). All scores for models in this set generated for the SVMod paper can be found here