Dear Modeller users,
I beg your kind help or explanations for I feel quite at loss with
calculation of structure-structure alignments and optimal RMSD
superpositions with modeller.
At first, I found it quite problematic to produce homology models
properly superimposed on templates. I started with seemingly simple
idea using automodel class, namely adding:
a.final_malign3d = True
to the model-default.py script from modeller examples. The result was
that the final superposition identified just 8 equivalent CA-CA
pairs, for which the RMSD CA was evaluated as 2.79A.
However, when I superimposed the very same pair of structures (model
1fdx on 5fd1 template) and calculate RMSD CA in another program using
all 54 CA-CA pairs according to alignment.ali used by model-defult.py,
I got 0.5A.
As for the automodel class, intended for simple use, it surprised me
that quite basic task as RMSD CA superposition is not done correctly.
By "correct" or at least the most basic way I understand the superposition
giving the minimum of RMSD using all CA atoms equivalent according to the
alignment.
Maybe I used wrong functionality but I have not found an alternative in
the documentation for the automodel class.
Finding the automated final multiple structure alignment unreliable,
I started looking into proper ways of superimposing structures. I have
read the documentation on alignment.malign3d and finding it obsolete, on
alignment.salign. Now I am at least able to calculate alignments based
on sequence similarity only and use them to superimpose structures.
However, I am still not able to calculate/improve the alignments based
on structure-structure similarity, and again this seems due to
insufficient number of equivalent CA-CA pairs identified by modeller.
This is the code I tested, using the structures from modeller examples.
First, sequence-sequence alignment (which works):
from modeller import *
env = environ()
env.io.atom_files_directory = ['.', 'atom_files']
mdl = model(env)
aln = alignment(env)
for code in '1fas', '2ctx':
mdl.read(file=code)
aln.append_model(mdl, align_codes=code, atom_files=code)
aln.salign(rr_file='${LIB}/blosum62.sim.mat',
feature_weights=(1,0,0,0,0,0),
improve_alignment=True,
similarity_flag=True, # The score matrix is not rescaled
rms_cutoff=300,
current_directory=True, write_fit=True,
fit=True, fit_atoms="CA",
output='ALIGNMENT QUALITY')
aln.write(file='test1.ali')
The above part produces sensible sequence alignment in test1.ali and it
reports 61 equivalent CA-CA pairs, but structures saved in the files
1fas_fit.pdb and 2ctx_fit.pdb are NOT superimposed.
I can, however, use the aln object from above to correctly superimpose
structures with model and selection classes:
mdl.read(file='1fas')
sel = selection(mdl).only_atom_types('CA')
mdl2 = model(env, file='2ctx')
sel.superpose(mdl2, aln)
mdl.write(file='1fas_fit2.pdb')
mdl2.write(file='2ctx_fit2.pdb')
Now the structures written out to 1fas_fit2.pdb and 2ctx_fit2.pdb ARE
superimposed. They are however different enough that only 5 equivalent
CA-CA pairs within 3.5A cutoff are reported.
I am able to confirm that with sufficiently large cutoff, there are 61
CA-CA pairs, consistently with the alignment aln (checked by
examination of the test1.ali file). Such confirmation is produced with:
aln.compare_structures(rms_cutoffs=[999]*11)
Finally, what does not work for me at all is when I try to improve the
alignment based on structure-structure similarity. The command below -
as a continuation of the previous code snippets - fails with the
exception that the number of equivalent positions is 0:
aln.salign(fit=True, fit_atoms="CA",
feature_weights=(0,1,0,0,0,0),
rms_cutoff=1000.0,
improve_alignment=True,
current_directory=True, write_fit=True,
output='ALIGNMENT QUALITY')
Questions:
1. Why it fails, if the previous checks find at least 5 equivalent
positions even with the default cutoff (3.5), and with the
rms_cutoff=1000.0 and fit_atoms="CA" there should be 61 equivalent
positions according to the alignment?
2. Why the first salign invocation has saved structures which were not
superimposed, incoherently with the options fit=True, write_fit=True?
3. How one shoud use the salign function to reliably calculate optimal
structure-structure alignment, or optimize some initial one as I tried
to do?
Thanks in advance,
Paweł Kędzierski