Re: [modeller_usage] MD Simulation

We have constructed homology models based on templates of low sequence identity (27-28%). It was an enzyme and we also did some ligand-enzyme interaction study using Modeller. We did some superpositions of template and target structure and the deviation was below 1 Angstrom. We have submitted this to a journal and got a few comments. I need your advise in this regard. As you will notice that these are related to deducing and improving the quality of a homology model, I would be grateful if I can get a few tips in order to further improve the model quality in light of the listed parameters. A few of the points are concept based i.e. sequence identity and homology. Kindly comment on those. Here is the summary:

RMSD related:
The backbone could deviate from the true structure by a few angstroms and the impact of such deviation for enzyme catalysis could be huge. Therefore, the application of such models is often limited. Trying to deduce the specific interactions between the active site residues and a substrate based on such a model is thus questionable. The rmsd between the model and the template was ~1 angstrom is not the evidence of high quality of the models. The term rmsd is being mis-understood. In structure prediction, the rmsd between a homology model of the target protein and the X-ray structure of the same target protein was often calculated as the indication of the quality of the models, but not the rmsd between the model and the X-ray structure of the template.

In principle MODELLER "copies" the coordinates of the atoms in the template to the corresponding residues (as defined by the alignment) in the target sequence. If multiple templates are used then it tries to find a middle solution. As you realise the amino-acid sequence of target protein has little in common with the amino-acid sequence of your templates, therefore its is plausible to expect that the native structure of your enzyme will deviate from the templates.

The following paragraph is taken from the Review with PMID 16510277 :

Two important factors influence the ability to predict accurate models: the extent of structural conservation between target and template, and the correctness of alignment [⁴^and14•• A. Kryshtafovych, C. Venclovas, K. Fidelis and J. Moult, Progress over the first decade of CASP experiments, Proteins 61 (2005) (suppl 7), pp. 225–236. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (84)
A description of the progress made in protein structure prediction during the course of the CASP experiments.^14••]. Models based on templates with more than 50% sequence identity are generally very accurate and can exhibit not, vert, similar

1 Å Cα atom rmsd from the experimental structure. Proteins with 30–50% sequence identity share at least 80% of their structures; the best CASP models within this range usually do not exceed 4 Å rmsd (typically 2–3 Å) from the native structure, with errors located mainly in loop regions. Structural conservation can be as low as 55% for proteins that display 20–30% sequence identity or even lower when sequence identity drops below 20%. Whereas alignments are most often near optimal for targets with more than 30% sequence identity to template structures (easy targets), below this threshold (mainly difficult targets), alignment quality sharply decreases and even as many as half of all residues may be misaligned when sequence identity is less than 20% [14^••].

Sequence Related:
The multiple alignment was only used for the phylogenetic analysis, whereas the pairwise alignments were used as a starting point to create the models. This procedure is conceptually wrong, since the alignment created by PSI-BLAST is not necessarily the best one. In fact, PSI-BLAST makes a pairwise alignment to find the most similar sequences, not to find the best alignment between the sequences. Moreover, with such a low sequence similarity (less than 30%) the best procedure to be sure of a good starting alignment is to perform a multiple alignment, and in case to use also predictions of the position of secondary structure elements, etc in order to improve as much as possible the quality of the alignment (see refs. to the CASP competitions). Then, the alignment between template(s) and model should be extrapolated by the multiple sequence alignment.

The explanation is very punchy here. The proper way to create the alignment for homology modeling is to collect your templates and align them together with the target sequence using a more sophisticated algorithm that takes into account structural similarity apart from sequence similarity, like that employed by S-Align or T-Coffee programs. Then you have to inspect carefully the multiple sequence alignment (MSA) and correct local mismatches by looking at the template structures and using your intuition. This is an iterative process of correcting the MSA and creating the model until it satisfies the common structural features that enzymes of this family share (i.e. secondary structure, inter-atomic distances, H-bonds, conformation of biologically important residues). Any structural information reported in the literature about your target protein can be utilised as a means of validation or even as restraints in homology modeling.

Homolgy Definition:
The term "homology" simply indicates the presence of a common evolutionary origin between two biological entities: therefore, two proteins are homologous or not. Somebody said that talking about "X% of homology" would be more or less the same as talking about a women who is "X% pregnant". Instead, it is correct to say that two proteins have X% of their amino acids identical.

Energy Minimization:
The authors do not have the energy minimized structure, which is a must for carrying out interaction analyses and if not done, leads to incorrect interpretations. They have assumed the program Modeller provides a proper energy-minimized structure with 'automodel' environment. However, published literature that use Modeller in such an environment still have been found to energy minimize the structure and only then use further.