I'm glad to hear that some reviewers take the time to read in depth the
methodology and make detailed comments. I have read quite a few low
quality homology modeling studies in the past. Recently, I read one
where the authors claim they used experimental structures to model some
very flexible regions missing from every crystal structure of a
particular enzyme (probably to get published easier), yet the given
templates possess no evident sequence or structural similarity with
their final model.
We have constructed homology models based on templates of low sequence
identity (27-28%). It was an enzyme and we also did some ligand-enzyme
interaction study using Modeller. We did some superpositions of
template and target structure and the deviation was below 1 Angstrom. We
have submitted this to a journal and got a few comments. I need your
advise in this regard. As you will notice that these are related to
deducing and improving the quality of a homology model, I would be
grateful if I can get a few tips in order to further improve the model
quality in light of the listed parameters. A few of the points are
concept based i.e. sequence identity and homology. Kindly comment on
those. Here is the summary:
RMSD related:
The backbone could deviate from the true structure by a few angstroms
and the impact of such deviation for enzyme catalysis could be huge.
Therefore, the application of such models is often limited. Trying to
deduce the specific interactions between the active site residues and a
substrate based on such a model is thus questionable. The rmsd between
the model and the template was ~1 angstrom is not the evidence of high
quality of the models. The term rmsd is being mis-understood. In
structure prediction, the rmsd between a homology model of the target
protein and the X-ray structure of the same target protein was often
calculated as the indication of the quality of the models, but not the
rmsd between the model and the X-ray structure of the template.
In principle MODELLER "copies" the
coordinates of the atoms in the template to the corresponding residues
(as defined by the alignment) in the target sequence. If multiple
templates are used then it tries to find a middle solution. As you
realise the amino-acid sequence of target protein has little in common
with the amino-acid sequence of your templates, therefore its is
plausible to expect that the native structure of your enzyme will
deviate from the templates.
The following paragraph is taken from the Review with PMID
16510277 :Two important factors influence the ability to predict accurate models:
the extent of structural conservation between target and template, and
the correctness of alignment [
4 and 14•• A. Kryshtafovych, C. Venclovas, K. Fidelis and J. Moult, Progress over the first decade of CASP experiments, Proteins 61 (2005) (suppl 7), pp. 225–236. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (84)
A description of the progress made in protein structure prediction during the course of the CASP experiments.14••]. Models based on templates with more than 50% sequence identity are generally very accurate and can exhibit
1 Å
Cα atom rmsd from the experimental structure. Proteins with 30–50%
sequence identity share at least 80% of their structures; the best CASP
models within this range usually do not exceed 4 Å rmsd (typically
2–3 Å) from the native structure, with errors located mainly in loop
regions. Structural conservation can be as low as 55% for proteins that
display 20–30% sequence identity or even lower when sequence identity
drops below 20%. Whereas alignments are most often near optimal for
targets with more than 30% sequence identity to template structures
(easy targets), below this threshold (mainly difficult targets),
alignment quality sharply decreases and even as many as half of all
residues may be misaligned when sequence identity is less than 20% [
14••].
Sequence Related:
The multiple alignment was only used for the phylogenetic analysis,
whereas the pairwise alignments were used as a starting point to create
the models. This procedure is conceptually wrong, since the alignment
created by PSI-BLAST is not necessarily the best one. In fact, PSI-BLAST
makes a pairwise alignment to find the most similar sequences, not to
find the best alignment between the sequences. Moreover, with such a low
sequence similarity (less than 30%) the best procedure to be sure of a
good starting alignment is to perform a multiple alignment, and in case
to use also predictions of the position of secondary structure elements,
etc in order to improve as much as possible the quality of the
alignment (see refs. to the CASP competitions). Then, the alignment
between template(s) and model should be extrapolated by the multiple
sequence alignment.
The explanation is very punchy
here. The proper way to create the alignment for homology modeling is to
collect your templates and align them together with the target sequence
using a more sophisticated algorithm that takes into account structural
similarity apart from sequence similarity, like that employed by
S-Align or T-Coffee programs. Then you have to inspect carefully the
multiple sequence alignment (MSA) and correct local mismatches by
looking at the template structures and using your intuition. This is an
iterative process of correcting the MSA and creating the model until it
satisfies the common structural features that enzymes of this family
share (i.e. secondary structure, inter-atomic distances, H-bonds,
conformation of biologically important residues). Any structural
information reported in the literature about your target protein can be
utilised as a means of validation or even as restraints in homology
modeling.
Homolgy Definition:
The
term "homology" simply indicates the presence of a common evolutionary
origin between two biological entities: therefore, two proteins are
homologous or not. Somebody said that talking about "X% of homology"
would be more or less the same as talking about a women who is "X%
pregnant". Instead, it is correct to say that two proteins have X% of
their amino acids identical.
Energy Minimization:
The authors do not have the energy minimized structure, which is a must
for carrying out interaction analyses and if not done, leads to
incorrect interpretations. They have assumed the program Modeller
provides a proper energy-minimized structure with 'automodel'
environment. However, published literature that use Modeller in such an
environment still have been found to energy minimize the structure and
only then use further.
As I explained above MODELLER
in some way "copies" the coordinates of the templates to the target
sequence thus introducing stetic clashes, phi and psi angle deviations,
etc. To alleviate these issues one should relax the structure to adopt a
native-like conformation, which is distinctive for that particular
amino acid sequence. I suggest reading some previous threads about
minimization options like the following:
http://salilab.org/archives/modeller_usage/2010/msg00382.html
IMO,
homology modeling in such a low sequence identity level should always
end up with MD in order to get an average conformation(s) which is
reminiscent of the native-like one.
My impression is that you have approached a very difficult problem
superficially. I would recommend starting from some good reviews about
homology modeling, and then reading very thoroughly the publications
that describe the experimental structures that you used as templates.
Thomas