Non-member submission forwarded by the list owner
This is a reply to "modeling question" by Douglas Kojetin, but may
be useful for others interested in the difference between models
and crystal structures:
One way you can check for similarity to the experimental structure
whilst including experimental error in a rigorous way is to
calculate the RMSD weighted by the B-factors.
I have a description of a method to do this in a paper in Proteins
that is due to be online/published in a couple of days. Ref:
Forrest LR, Woolf TB, "Discrimination of native loop conformations
in membrane proteins: Decoy library design and evaluation of
effective energy scoring functions", 2003.
Essentially the method is this:
define the 'experimental uncertainty' as a sphere around the atom
coordinate, whose radius is dependent on the B-factor of that atom.
This depends on the relationship between root mean square
fluctuation and B-factor (B = 8 * pi^2 * RMSF^2 / 3). If the
distance between the model atom and the xray atom (centers) is more
than the radius of the sphere, then you define the effective
distance as zero (the atom is within the 'experimental
uncertainty'). If the model atom falls outside the sphere, then you
subtract the radius of the sphere from the distance between the
Then you calculate the Root-mean-squared deviations of these
effective distances, rather than the distances between the centers.
I bet you find that the model of the 100% identical structure has a
B-factor-weighted RMSD of less than 0.1A.
for question 1, i think it is normal and expected that a model,
built on a sequentially 100% identical template, will be somewhat
different compared to an experimental solution. Although it should
go beyond let us say 0.5, or certainly not beyond 1.0 Ang RMSD.
It is below the "experimental error" i.e. if the same protein is
experimentally in different crystal forms, or at different
levels, or solved at high resolution but once by X-ray and once by
you will still see an approx <1 Ang RMSD difference among the
structures. So there is nothing special to see that your model is
exactly identical to the experimental one. for a reference you can
up figure 6 (and text) in chapter 7 (pp.167-206), book: Protein
Structure (determination, analysis and applications for drug
editor: DI Chasman, 2003 Marcel Dekker.
question 2: it is a very interesting and useful survey that you did.
Unfortunately it is difficult to generalize, because in each
case the set of available templates (their sequence identity to the
target and structural variability with each other) is different.
However your experiment about a proper "essay" is near exhaustive
your specific experiment, so you are certainly in a position to
Of course the best would be to use instead of Procheck or other
programs the actual experimental structures to verify the
e.g. re-model your protein A without the 100 % identical template
explore the same question you did for protein B. In this case you
compare your resulting models with the actual X-ray structure.
On Mon, 2003-05-12 at 15:52, Douglas Kojetin wrote:
> please see the message, originally directed towards dr. sali,
> if anyone has any comments, please send them!
> many thanks,
> doug kojetin
> Begin forwarded message:
> > Dr. Sali:
> > I am a graduate student in the Department of Molecular and
> > Biochemistry at North Carolina State University. I have a
> > more about modeling process itself rather than the program
> > I have used your program, MODELLER, to create models of a
> > proteins our lab and collaborators are interested in (total ~
> > There are approximately 10 solved structures to the domain of
> > interest. One of these solved structures (structure A) is in
> > subfamily within the same species of proteins we are modeling
> > A), whereas the other 29 proteins are of unknown solved
> > question concerning the use of templates in the modeling
> > ##############
> > my main question
> > ##############
> > (if this is confusing, please let me know and i will
> > Would using a solved structure (structure A) to model a protein
> > exact sequence (model A) which will be used in a comparison of
> > other structures with no known structures (and lower 'homology'
> > compared to that of structure A to model A -- which is 100%)
> > model A? Overall, we are interested in comparing all 30
> > This comes mostly from outside comments that our modeled
> > not look 'exactly' like the solved structure. As one would like
> > look as close as possible to the solved structure, it is a
> > all, and perhaps we just need to be more descriptive in
> > results, especially pertaining to this specific model.
> > #####################
> > how i modeled the proteins
> > #####################
> > I performed a 'modeling parameter assay' to find the number of
> > templates to use to model a protein (model B), ranging from 1
> > templates. In addition, I 'assayed' the amount of refinement to
> > Overall, I had an assay 'shaped' like a matrix with, for
> > refinement across the top and # of templates going down. I
> > models for each and ran a variety of analyses on the models
> > Ca RMSD to the most homologous protein, ERRAT, PROCHECK, etc)
> > computed the average 'value' output from the respective
> > All in all, using four (4) templates and a refinement value of 1
> > produced the 'stereochemically best' models.
> > I applied the same rationale to another protein of interest
> > and the same trends were extrapolated.
> > question
> > --> is this rationale 'acceptable'? or how would you do
> > similar?
> > Many thanks for your input, and I'm sorry for the long-winded
> > Douglas Kojetin
------ End of Forwarded Message