Evaluating a model

Next: Iterating alignment, modeling and Up: Comparative modeling steps Previous: Model Building

Evaluating a model

After a model is built, it is important to check it for possible errors. The quality of a model can be approximately predicted from the sequence similarity between the target and the template (Figure 3). Sequence identity above 30% is a relatively good predictor of the expected accuracy of a model. However, other factors, including the environment, can strongly influence the accuracy of a model. For instance, some calcium-binding proteins undergo large conformational changes when bound to calcium. If a calcium-free template is used to model the calcium-bound state of a target, it is likely that the model will be incorrect irrespective of the target-template similarity. This estimate also applies to determination of protein structure by experiment; a structure must be determined in the functionally meaningful environment. If the target-template sequence identity falls below 30%, the sequence identity becomes significantly less reliable as a measure of expected accuracy of a single model. The reason is that below 30% sequence identity, models are often obtained that deviate significantly, in both directions, from the average accuracy. It is in such cases that model evaluation methods are most informative.

Two types of evaluation can be carried out. ``Internal'' evaluation of self-consistency checks whether or not a model satisfies the restraints used to calculate it. ``External'' evaluation relies on information that was not used in the calculation of the model [68,45].

Assessment of model's stereochemistry (e.g., bonds, bond angles, dihedral angles, and non-bonded atom-atom distances) with programs such as PROCHECK [69] and WHATCHECK [70] is an example of internal evaluation. Although errors in stereochemistry are rare and less informative than errors detected by methods for external evaluation, a cluster of stereochemical errors may indicate that the corresponding region also contains other larger errors (e.g., alignment errors).

When the model is based on less than $\sim$ 30% sequence identity to the template, the first purpose of the external evaluation is to test whether or not a correct template was used. This test is especially important when the alignment is only marginally significant or several alternative templates with different folds are to be evaluated. A complication is that at low similarities the alignment generally contains many errors, making it difficult to distinguish between an incorrect template on one hand and an incorrect alignment with a correct template on the other hand. It is generally possible to recognize a correct template only if the alignment is at least approximately correct. This complication can sometimes be overcome by testing models from several alternative alignments for each template. One way to predict whether or not a template is correct is to compare the PROSAII Z-score [45] for the model and the template structure(s). Since the Z-score of a model is a measure of compatibility between its sequence and structure, the model Z-score should be comparable to that of the template. However, this evaluation does not always work. For example, a well modeled part of a domain is likely to have a bad Z-score because some interactions that stabilize the fold are not present in the model. Correct models for some membrane proteins and small disulfide-rich proteins also tend to be evaluated incorrectly, apparently because these structures have distributions of residue accessibility and residue-residue distances that are different from those for the larger globular domains, which were the source of the PROSAII statistical potential functions.

The second, more detailed kind of external evaluation is the prediction of unreliable regions in the model. One way to approach this problem is to calculate a ``pseudo energy'' profile of a model, such as that produced by PROSAII. The profile reports the energy for each position in the model. Peaks in the profile frequently correspond to errors in the model. There are several pitfalls in the use of energy profiles for local error detection. For example, a region can be identified as unreliable only because it interacts with an incorrectly modeled region; there are also more fundamental problems [24].

Finally, a model should be consistent with experimental observations, such as site-directed mutagenesis, cross-linking data, and ligand binding.

Are comparative models ``better'' than their templates? In general, models are as close to the target structure as the templates, or slightly closer if the alignment is correct [44]. This is not a trivial achievement because of the many residue substitutions, deletions and insertions that occur when the sequence of one protein is transformed into the sequence of another. Even in a favorable modeling case with a template that is 50% identical to the target, half of the sidechains change and have to be packed in the protein core such that they avoid atom clashes and violations of stereochemical restraints. When more than one template is used for modeling, it is sometimes possible to obtain a model that is significantly closer to the target structure than any of the templates [43,44]. This improvement occurs because the model tends to inherit the best regions from each template. Alignment errors are the main factor that may make models worse than the templates. However, to represent the target, it is always better to use a comparative model rather than the template. The reason is that the errors in the alignment affect similarly the use of the template as a representation of the target as well as a comparative model based on that template [44].

Next: Iterating alignment, modeling and Up: Comparative modeling steps Previous: Model Building

Andras Fiser
2001-08-09