Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3D model for a protein (target) that is related to at least one known protein structure (template) [1,2,3,4,5,6,7].
Despite progress in ab initio protein structure prediction [8], comparative modeling remains the only method that can reliably predict the 3D structure of a protein with an accuracy comparable to a low-resolution experimentally determined structure [6]. Even models with errors may be useful, because some aspects of function can be predicted from only coarse structural features of a model. Typical uses of comparative models are listed in Table 1 [4,6].
3D structure of proteins from the same family is more conserved than
their primary sequences [9]. Therefore, if similarity
between two proteins is detectable at the sequence level, structural
similarity can usually be assumed. Moreover, proteins that share low
or even non-detectable sequence similarity many times also have
similar structures. Currently, the probability to find related proteins
of known structure for a sequence picked randomly from a genome ranges
approximately from 20% to 65%, depending on the genome
[10,11]. Approximately one half of all known
sequences have at least one domain that is detectably related to at
least one protein of known structure [10]. Since the number
of known protein sequences is approximately 600,000
[12,13], comparative modeling can be applied to
domains in approximately 300,000 proteins. This number is an order of
magnitude larger than the number of experimentally determined protein
structures deposited in the Protein Data Bank (PDB) (
)
[14]. Furthermore, the usefulness of
comparative modeling is steadily increasing because the number of
different structural folds that proteins adopt is limited
[15,16,17,18] and because the number of
experimentally determined new structures is increasing exponentially
[19]. This trend is accentuated by the recently initiated
structural genomics project that aims to determine at least one
structure for most protein families [20,21]. It is
conceivable that this aim will be substantially achieved in less than
10 years, making comparative modeling applicable to most protein
sequences.
Comparative modeling usually consists of the following five steps: search for related protein structures, selection of one or more templates, target-template alignment, model building, and model evaluation (Figure 1). If the model is not satisfactory, some or all of the steps can be repeated.
There are several computer programs and web servers that automate the
comparative modeling process. The first web server for automated
comparative modeling was the Swiss-Model server (
http://www.expasy.ch/swissmod/), followed by CPHModels (
http://www.cbs.dtu.dk/services/CPHmodels/), SDSC1 (
http://cl.sdsc.edu/hm), FAMS (
http://physchem.pharm.kitasato-u.ac.jp/FAMS/fams.html) and
MODWEB (http://guitar.rockefeller.edu/modweb/).
These servers accept a sequence from a user and return an all atom
comparative model when possible. In addition to modeling a given
sequence, MODWEB is also capable of returning comparative models for
all sequences in the TrEMBL database that are detectably related to an
input, user provided structure. While the web servers are convenient
and useful, the best results in the difficult or unusual modeling
cases, such as problematic alignments, modeling of loops, existence of
multiple conformational states, and modeling of ligand binding, are
still obtained by non-automated, expert use of the various modeling
tools. A number of resources useful in comparative modeling are listed
in Table 2.
Next, we describe generic considerations in all five steps of comparative modeling (Section 2). We then illustrate these considerations in practice by discussing three applications of our program MODELLER [22,23,24] to specific modeling problems (Section 3). This chapter does not review the comparative modeling field in general [6].