Next: Comparative modeling steps Up: Comparative protein structure modeling Previous: Comparative protein structure modeling

Introduction

Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3D model for a protein (target) that is related to at least one known protein structure (template) [1,2,3,4,5,6,7].

Despite progress in ab initio protein structure prediction [8], comparative modeling remains the only method that can reliably predict the 3D structure of a protein with an accuracy comparable to a low-resolution experimentally determined structure [6]. Even models with errors may be useful, because some aspects of function can be predicted from only coarse structural features of a model. Typical uses of comparative models are listed in Table 1 [4,6].

3D structure of proteins from the same family is more conserved than their primary sequences [9]. Therefore, if similarity between two proteins is detectable at the sequence level, structural similarity can usually be assumed. Moreover, proteins that share low or even non-detectable sequence similarity many times also have similar structures. Currently, the probability to find related proteins of known structure for a sequence picked randomly from a genome ranges approximately from 20% to 65%, depending on the genome [10,11]. Approximately one half of all known sequences have at least one domain that is detectably related to at least one protein of known structure [10]. Since the number of known protein sequences is approximately 600,000 [12,13], comparative modeling can be applied to domains in approximately 300,000 proteins. This number is an order of magnitude larger than the number of experimentally determined protein structures deposited in the Protein Data Bank (PDB) ( $\sim 15,000$ ) [14]. Furthermore, the usefulness of comparative modeling is steadily increasing because the number of different structural folds that proteins adopt is limited [15,16,17,18] and because the number of experimentally determined new structures is increasing exponentially [19]. This trend is accentuated by the recently initiated structural genomics project that aims to determine at least one structure for most protein families [20,21]. It is conceivable that this aim will be substantially achieved in less than 10 years, making comparative modeling applicable to most protein sequences.

Comparative modeling usually consists of the following five steps: search for related protein structures, selection of one or more templates, target-template alignment, model building, and model evaluation (Figure 1). If the model is not satisfactory, some or all of the steps can be repeated.

There are several computer programs and web servers that automate the comparative modeling process. The first web server for automated comparative modeling was the Swiss-Model server (http://www.expasy.ch/swissmod/), followed by CPHModels (http://www.cbs.dtu.dk/services/CPHmodels/), SDSC1 (http://cl.sdsc.edu/hm), FAMS (http://physchem.pharm.kitasato-u.ac.jp/FAMS/fams.html) and MODWEB (http://guitar.rockefeller.edu/modweb/).
These servers accept a sequence from a user and return an all atom comparative model when possible. In addition to modeling a given sequence, MODWEB is also capable of returning comparative models for all sequences in the TrEMBL database that are detectably related to an input, user provided structure. While the web servers are convenient and useful, the best results in the difficult or unusual modeling cases, such as problematic alignments, modeling of loops, existence of multiple conformational states, and modeling of ligand binding, are still obtained by non-automated, expert use of the various modeling tools. A number of resources useful in comparative modeling are listed in Table 2.

Next, we describe generic considerations in all five steps of comparative modeling (Section 2). We then illustrate these considerations in practice by discussing three applications of our program MODELLER [22,23,24] to specific modeling problems (Section 3). This chapter does not review the comparative modeling field in general [6].

Next: Comparative modeling steps Up: Comparative protein structure modeling Previous: Comparative protein structure modeling

Andras Fiser
2001-08-09