Problem with structure-structure alignment with salign
Dear Modeller users,
I beg your kind help or explanations for I feel quite at loss with calculation of structure-structure alignments and optimal RMSD superpositions with modeller.
At first, I found it quite problematic to produce homology models properly superimposed on templates. I started with seemingly simple idea using automodel class, namely adding:
a.final_malign3d = True
to the model-default.py script from modeller examples. The result was that the final superposition identified just 8 equivalent CA-CA pairs, for which the RMSD CA was evaluated as 2.79A. However, when I superimposed the very same pair of structures (model 1fdx on 5fd1 template) and calculate RMSD CA in another program using all 54 CA-CA pairs according to alignment.ali used by model-defult.py, I got 0.5A. As for the automodel class, intended for simple use, it surprised me that quite basic task as RMSD CA superposition is not done correctly. By "correct" or at least the most basic way I understand the superposition giving the minimum of RMSD using all CA atoms equivalent according to the alignment. Maybe I used wrong functionality but I have not found an alternative in the documentation for the automodel class.
Finding the automated final multiple structure alignment unreliable, I started looking into proper ways of superimposing structures. I have read the documentation on alignment.malign3d and finding it obsolete, on alignment.salign. Now I am at least able to calculate alignments based on sequence similarity only and use them to superimpose structures. However, I am still not able to calculate/improve the alignments based on structure-structure similarity, and again this seems due to insufficient number of equivalent CA-CA pairs identified by modeller.
This is the code I tested, using the structures from modeller examples. First, sequence-sequence alignment (which works):
from modeller import * env = environ() env.io.atom_files_directory = ['.', 'atom_files'] mdl = model(env) aln = alignment(env) for code in '1fas', '2ctx': mdl.read(file=code) aln.append_model(mdl, align_codes=code, atom_files=code) aln.salign(rr_file='${LIB}/blosum62.sim.mat', feature_weights=(1,0,0,0,0,0), improve_alignment=True, similarity_flag=True, # The score matrix is not rescaled rms_cutoff=300, current_directory=True, write_fit=True, fit=True, fit_atoms="CA", output='ALIGNMENT QUALITY') aln.write(file='test1.ali')
The above part produces sensible sequence alignment in test1.ali and it reports 61 equivalent CA-CA pairs, but structures saved in the files 1fas_fit.pdb and 2ctx_fit.pdb are NOT superimposed. I can, however, use the aln object from above to correctly superimpose structures with model and selection classes:
mdl.read(file='1fas') sel = selection(mdl).only_atom_types('CA') mdl2 = model(env, file='2ctx') sel.superpose(mdl2, aln) mdl.write(file='1fas_fit2.pdb') mdl2.write(file='2ctx_fit2.pdb')
Now the structures written out to 1fas_fit2.pdb and 2ctx_fit2.pdb ARE superimposed. They are however different enough that only 5 equivalent CA-CA pairs within 3.5A cutoff are reported. I am able to confirm that with sufficiently large cutoff, there are 61 CA-CA pairs, consistently with the alignment aln (checked by examination of the test1.ali file). Such confirmation is produced with:
aln.compare_structures(rms_cutoffs=[999]*11)
Finally, what does not work for me at all is when I try to improve the alignment based on structure-structure similarity. The command below - as a continuation of the previous code snippets - fails with the exception that the number of equivalent positions is 0:
aln.salign(fit=True, fit_atoms="CA", feature_weights=(0,1,0,0,0,0), rms_cutoff=1000.0, improve_alignment=True, current_directory=True, write_fit=True, output='ALIGNMENT QUALITY')
Questions: 1. Why it fails, if the previous checks find at least 5 equivalent positions even with the default cutoff (3.5), and with the rms_cutoff=1000.0 and fit_atoms="CA" there should be 61 equivalent positions according to the alignment?
2. Why the first salign invocation has saved structures which were not superimposed, incoherently with the options fit=True, write_fit=True?
3. How one shoud use the salign function to reliably calculate optimal structure-structure alignment, or optimize some initial one as I tried to do?
Thanks in advance, Paweł Kędzierski
Dear modellers, possibly I have posted too long message so no one got through to my questions. Let me take it to the basics, then:
How to reliably calculate the minimum RMSD (and superposition) between CA carbons, based on sequence alignment?
This simply does not work for me. Steps to reproduce the problem using modeller examples are described in the message cited below. Thanks in advance, Paweł Kędzierski
W dniu 06.06.2015 o 10:49, Pawel Kedzierski pisze: > Dear Modeller users, > > I beg your kind help or explanations for I feel quite at loss with > calculation of structure-structure alignments and optimal RMSD > superpositions with modeller. > > At first, I found it quite problematic to produce homology models > properly superimposed on templates. I started with seemingly simple > idea using automodel class, namely adding: > > a.final_malign3d = True > > to the model-default.py script from modeller examples. The result was > that the final superposition identified just 8 equivalent CA-CA > pairs, for which the RMSD CA was evaluated as 2.79A. > However, when I superimposed the very same pair of structures (model > 1fdx on 5fd1 template) and calculate RMSD CA in another program using > all 54 CA-CA pairs according to alignment.ali used by model-defult.py, > I got 0.5A. > As for the automodel class, intended for simple use, it surprised me > that quite basic task as RMSD CA superposition is not done correctly. > By "correct" or at least the most basic way I understand the > superposition > giving the minimum of RMSD using all CA atoms equivalent according to the > alignment. > Maybe I used wrong functionality but I have not found an alternative in > the documentation for the automodel class. > > Finding the automated final multiple structure alignment unreliable, > I started looking into proper ways of superimposing structures. I have > read the documentation on alignment.malign3d and finding it obsolete, on > alignment.salign. Now I am at least able to calculate alignments based > on sequence similarity only and use them to superimpose structures. > However, I am still not able to calculate/improve the alignments based > on structure-structure similarity, and again this seems due to > insufficient number of equivalent CA-CA pairs identified by modeller. > > This is the code I tested, using the structures from modeller examples. > First, sequence-sequence alignment (which works): > > from modeller import * > env = environ() > env.io.atom_files_directory = ['.', 'atom_files'] > mdl = model(env) > aln = alignment(env) > for code in '1fas', '2ctx': > mdl.read(file=code) > aln.append_model(mdl, align_codes=code, atom_files=code) > aln.salign(rr_file='${LIB}/blosum62.sim.mat', > feature_weights=(1,0,0,0,0,0), > improve_alignment=True, > similarity_flag=True, # The score matrix is not rescaled > rms_cutoff=300, > current_directory=True, write_fit=True, > fit=True, fit_atoms="CA", > output='ALIGNMENT QUALITY') > aln.write(file='test1.ali') > > The above part produces sensible sequence alignment in test1.ali and it > reports 61 equivalent CA-CA pairs, but structures saved in the files > 1fas_fit.pdb and 2ctx_fit.pdb are NOT superimposed. > I can, however, use the aln object from above to correctly superimpose > structures with model and selection classes: > > mdl.read(file='1fas') > sel = selection(mdl).only_atom_types('CA') > mdl2 = model(env, file='2ctx') > sel.superpose(mdl2, aln) > mdl.write(file='1fas_fit2.pdb') > mdl2.write(file='2ctx_fit2.pdb') > > Now the structures written out to 1fas_fit2.pdb and 2ctx_fit2.pdb ARE > superimposed. They are however different enough that only 5 equivalent > CA-CA pairs within 3.5A cutoff are reported. > I am able to confirm that with sufficiently large cutoff, there are 61 > CA-CA pairs, consistently with the alignment aln (checked by > examination of the test1.ali file). Such confirmation is produced with: > > aln.compare_structures(rms_cutoffs=[999]*11) > > Finally, what does not work for me at all is when I try to improve the > alignment based on structure-structure similarity. The command below - > as a continuation of the previous code snippets - fails with the > exception that the number of equivalent positions is 0: > > aln.salign(fit=True, fit_atoms="CA", > feature_weights=(0,1,0,0,0,0), > rms_cutoff=1000.0, > improve_alignment=True, > current_directory=True, write_fit=True, > output='ALIGNMENT QUALITY') > > Questions: > 1. Why it fails, if the previous checks find at least 5 equivalent > positions even with the default cutoff (3.5), and with the > rms_cutoff=1000.0 and fit_atoms="CA" there should be 61 equivalent > positions according to the alignment? > > 2. Why the first salign invocation has saved structures which were not > superimposed, incoherently with the options fit=True, write_fit=True? > > 3. How one shoud use the salign function to reliably calculate optimal > structure-structure alignment, or optimize some initial one as I tried > to do? > > Thanks in advance, > Paweł Kędzierski > > > > > > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > https://salilab.org/mailman/listinfo/modeller_usage
Dear Pawel,
Could you try the web-based version of MODELLER to accomplish this task and see if this suits your needs:
1) Go to http://structuropedia.org 2) Under the TRANSFORM menu, choose "Superposition" and follow the instructions by clicking on the "i" icon.
With warmest regards,
*AMJAD FAROOQ PhD DIC | Associate Professor*Dept of Biochemistry, University of Miami School of Medicine *Location*: Gautier Building, Suite 217 *Address*: 1011 NW 15th Street #217, Miami , FL 33136, USA *Contact*: amjad@farooqlab.net | 305-243-2429 *Labpage*: farooqlab.net | structuropedia.org
On Mon, Jun 15, 2015 at 6:18 AM, Pawel Kedzierski < pawel.kedzierski@pwr.edu.pl> wrote:
> Dear modellers, > possibly I have posted too long message so no one got through to my > questions. > Let me take it to the basics, then: > > How to reliably calculate the minimum RMSD (and superposition) between CA > carbons, based on sequence alignment? > > This simply does not work for me. > Steps to reproduce the problem using modeller examples are described in > the message cited below. > Thanks in advance, > Paweł Kędzierski > > W dniu 06.06.2015 o 10:49, Pawel Kedzierski pisze: > > Dear Modeller users, > > I beg your kind help or explanations for I feel quite at loss with > calculation of structure-structure alignments and optimal RMSD > superpositions with modeller. > > At first, I found it quite problematic to produce homology models > properly superimposed on templates. I started with seemingly simple > idea using automodel class, namely adding: > > a.final_malign3d = True > > to the model-default.py script from modeller examples. The result was > that the final superposition identified just 8 equivalent CA-CA > pairs, for which the RMSD CA was evaluated as 2.79A. > However, when I superimposed the very same pair of structures (model > 1fdx on 5fd1 template) and calculate RMSD CA in another program using > all 54 CA-CA pairs according to alignment.ali used by model-defult.py, > I got 0.5A. > As for the automodel class, intended for simple use, it surprised me > that quite basic task as RMSD CA superposition is not done correctly. > By "correct" or at least the most basic way I understand the superposition > giving the minimum of RMSD using all CA atoms equivalent according to the > alignment. > Maybe I used wrong functionality but I have not found an alternative in > the documentation for the automodel class. > > Finding the automated final multiple structure alignment unreliable, > I started looking into proper ways of superimposing structures. I have > read the documentation on alignment.malign3d and finding it obsolete, on > alignment.salign. Now I am at least able to calculate alignments based > on sequence similarity only and use them to superimpose structures. > However, I am still not able to calculate/improve the alignments based > on structure-structure similarity, and again this seems due to > insufficient number of equivalent CA-CA pairs identified by modeller. > > This is the code I tested, using the structures from modeller examples. > First, sequence-sequence alignment (which works): > > from modeller import * > env = environ() > env.io.atom_files_directory = ['.', 'atom_files'] > mdl = model(env) > aln = alignment(env) > for code in '1fas', '2ctx': > mdl.read(file=code) > aln.append_model(mdl, align_codes=code, atom_files=code) > aln.salign(rr_file='${LIB}/blosum62.sim.mat', > feature_weights=(1,0,0,0,0,0), > improve_alignment=True, > similarity_flag=True, # The score matrix is not rescaled > rms_cutoff=300, > current_directory=True, write_fit=True, > fit=True, fit_atoms="CA", > output='ALIGNMENT QUALITY') > aln.write(file='test1.ali') > > The above part produces sensible sequence alignment in test1.ali and it > reports 61 equivalent CA-CA pairs, but structures saved in the files > 1fas_fit.pdb and 2ctx_fit.pdb are NOT superimposed. > I can, however, use the aln object from above to correctly superimpose > structures with model and selection classes: > > mdl.read(file='1fas') > sel = selection(mdl).only_atom_types('CA') > mdl2 = model(env, file='2ctx') > sel.superpose(mdl2, aln) > mdl.write(file='1fas_fit2.pdb') > mdl2.write(file='2ctx_fit2.pdb') > > Now the structures written out to 1fas_fit2.pdb and 2ctx_fit2.pdb ARE > superimposed. They are however different enough that only 5 equivalent > CA-CA pairs within 3.5A cutoff are reported. > I am able to confirm that with sufficiently large cutoff, there are 61 > CA-CA pairs, consistently with the alignment aln (checked by > examination of the test1.ali file). Such confirmation is produced with: > > aln.compare_structures(rms_cutoffs=[999]*11) > > Finally, what does not work for me at all is when I try to improve the > alignment based on structure-structure similarity. The command below - > as a continuation of the previous code snippets - fails with the > exception that the number of equivalent positions is 0: > > aln.salign(fit=True, fit_atoms="CA", > feature_weights=(0,1,0,0,0,0), > rms_cutoff=1000.0, > improve_alignment=True, > current_directory=True, write_fit=True, > output='ALIGNMENT QUALITY') > > Questions: > 1. Why it fails, if the previous checks find at least 5 equivalent > positions even with the default cutoff (3.5), and with the > rms_cutoff=1000.0 and fit_atoms="CA" there should be 61 equivalent > positions according to the alignment? > > 2. Why the first salign invocation has saved structures which were not > superimposed, incoherently with the options fit=True, write_fit=True? > > 3. How one shoud use the salign function to reliably calculate optimal > structure-structure alignment, or optimize some initial one as I tried > to do? > > Thanks in advance, > Paweł Kędzierski > > > > > > _______________________________________________ > modeller_usage mailing listmodeller_usage@salilab.orghttps://salilab.org/mailman/listinfo/modeller_usage > > > > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > https://salilab.org/mailman/listinfo/modeller_usage > >
Dear Amjad, Thank you for your reply. Following your suggestion, I prepared a PDB file consisting of 1fas and 2ctx structures (from modeller examples) and submitted it to structuropedia.org. It does seem to work, reporting modeller calculated alignment with RMSD CA of 1.32A (http://dalton.med.miami.edu/4ki/models/modXX.txt). While it does not report how many pairs were used for RMSD calculation, both the number and visualization seem correct.
So now I know that I am doing it wrong :-) but still I am at loss, why? I would be grateful for any hint to get it straight. What I am after is an automated/batch solution for comparing multiple structures. The modeller salign function seemed promising but I was stopped at the simplest attempt and I try to understand how I should use it correctly. The function does calculate the sequence alignment for me and I am able to use this alignment object to superimpose structures, but:
1. the options "fit=True" and rms_cutoff of alignment.salign do not work as described in documentation; 2. when I calculate the superposition using selection.superpose(model, alignment) or automodel.final_malign3d, modeller does not use all pairs of residues from the alignment, only a few, the RMSD and the superposition are not optimal, and sometimes it fails miserably with "0 equivalent positions".
With gratitude, Pawel
W dniu 15.06.2015 o 17:04, Amjad Farooq pisze: > Dear Pawel, > > Could you try the web-based version of MODELLER to accomplish this > task and see if this suits your needs: > > 1) Go to http://structuropedia.org > 2) Under the TRANSFORM menu, choose "Superposition" and follow the > instructions by clicking on the "i" icon. > > With warmest regards, > > *AMJAD FAROOQ PhD DIC | Associate Professor > *Dept of Biochemistry, University of Miami School of Medicine* > */Location/: Gautier Building, Suite 217 > /Address/:1011 NW 15th Street #217, Miami , FL 33136, USA > /Contact/: amjad@farooqlab.net mailto:amjad@farooqlab.net | 305-243-2429 > /Labpage/:farooqlab.net http://farooqlab.net | structuropedia.org > http://structuropedia.org > > > > > On Mon, Jun 15, 2015 at 6:18 AM, Pawel Kedzierski > <pawel.kedzierski@pwr.edu.pl mailto:pawel.kedzierski@pwr.edu.pl> wrote: > > Dear modellers, > possibly I have posted too long message so no one got through to > my questions. > Let me take it to the basics, then: > > How to reliably calculate the minimum RMSD (and superposition) > between CA carbons, based on sequence alignment? > > This simply does not work for me. > Steps to reproduce the problem using modeller examples are > described in the message cited below. > Thanks in advance, > Paweł Kędzierski > > W dniu 06.06.2015 o 10:49, Pawel Kedzierski pisze: >> Dear Modeller users, >> >> I beg your kind help or explanations for I feel quite at loss with >> calculation of structure-structure alignments and optimal RMSD >> superpositions with modeller. >> >> At first, I found it quite problematic to produce homology models >> properly superimposed on templates. I started with seemingly simple >> idea using automodel class, namely adding: >> >> a.final_malign3d = True >> >> to the model-default.py script from modeller examples. The result >> was >> that the final superposition identified just 8 equivalent CA-CA >> pairs, for which the RMSD CA was evaluated as 2.79A. >> However, when I superimposed the very same pair of structures (model >> 1fdx on 5fd1 template) and calculate RMSD CA in another program >> using >> all 54 CA-CA pairs according to alignment.ali used by >> model-defult.py, >> I got 0.5A. >> As for the automodel class, intended for simple use, it surprised me >> that quite basic task as RMSD CA superposition is not done >> correctly. >> By "correct" or at least the most basic way I understand the >> superposition >> giving the minimum of RMSD using all CA atoms equivalent >> according to the >> alignment. >> Maybe I used wrong functionality but I have not found an >> alternative in >> the documentation for the automodel class. >> >> Finding the automated final multiple structure alignment unreliable, >> I started looking into proper ways of superimposing structures. I >> have >> read the documentation on alignment.malign3d and finding it >> obsolete, on >> alignment.salign. Now I am at least able to calculate alignments >> based >> on sequence similarity only and use them to superimpose structures. >> However, I am still not able to calculate/improve the alignments >> based >> on structure-structure similarity, and again this seems due to >> insufficient number of equivalent CA-CA pairs identified by >> modeller. >> >> This is the code I tested, using the structures from modeller >> examples. >> First, sequence-sequence alignment (which works): >> >> from modeller import * >> env = environ() >> env.io.atom_files_directory = ['.', 'atom_files'] >> mdl = model(env) >> aln = alignment(env) >> for code in '1fas', '2ctx': >> mdl.read(file=code) >> aln.append_model(mdl, align_codes=code, atom_files=code) >> aln.salign(rr_file='${LIB}/blosum62.sim.mat', >> feature_weights=(1,0,0,0,0,0), >> improve_alignment=True, >> similarity_flag=True, # The score matrix is not >> rescaled >> rms_cutoff=300, >> current_directory=True, write_fit=True, >> fit=True, fit_atoms="CA", >> output='ALIGNMENT QUALITY') >> aln.write(file='test1.ali') >> >> The above part produces sensible sequence alignment in test1.ali >> and it >> reports 61 equivalent CA-CA pairs, but structures saved in the files >> 1fas_fit.pdb and 2ctx_fit.pdb are NOT superimposed. >> I can, however, use the aln object from above to correctly >> superimpose >> structures with model and selection classes: >> >> mdl.read(file='1fas') >> sel = selection(mdl).only_atom_types('CA') >> mdl2 = model(env, file='2ctx') >> sel.superpose(mdl2, aln) >> mdl.write(file='1fas_fit2.pdb') >> mdl2.write(file='2ctx_fit2.pdb') >> >> Now the structures written out to 1fas_fit2.pdb and 2ctx_fit2.pdb >> ARE >> superimposed. They are however different enough that only 5 >> equivalent >> CA-CA pairs within 3.5A cutoff are reported. >> I am able to confirm that with sufficiently large cutoff, there >> are 61 >> CA-CA pairs, consistently with the alignment aln (checked by >> examination of the test1.ali file). Such confirmation is produced >> with: >> >> aln.compare_structures(rms_cutoffs=[999]*11) >> >> Finally, what does not work for me at all is when I try to >> improve the >> alignment based on structure-structure similarity. The command >> below - >> as a continuation of the previous code snippets - fails with the >> exception that the number of equivalent positions is 0: >> >> aln.salign(fit=True, fit_atoms="CA", >> feature_weights=(0,1,0,0,0,0), >> rms_cutoff=1000.0, >> improve_alignment=True, >> current_directory=True, write_fit=True, >> output='ALIGNMENT QUALITY') >> >> Questions: >> 1. Why it fails, if the previous checks find at least 5 equivalent >> positions even with the default cutoff (3.5), and with the >> rms_cutoff=1000.0 and fit_atoms="CA" there should be 61 equivalent >> positions according to the alignment? >> >> 2. Why the first salign invocation has saved structures which >> were not >> superimposed, incoherently with the options fit=True, >> write_fit=True? >> >> 3. How one shoud use the salign function to reliably calculate >> optimal >> structure-structure alignment, or optimize some initial one as I >> tried >> to do? >> >> Thanks in advance, >> Paweł Kędzierski >> >> >> >> >> >> _______________________________________________ >> modeller_usage mailing list >> modeller_usage@salilab.org mailto:modeller_usage@salilab.org >> https://salilab.org/mailman/listinfo/modeller_usage > > > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org mailto:modeller_usage@salilab.org > https://salilab.org/mailman/listinfo/modeller_usage > >
On 6/15/15 3:18 AM, Pawel Kedzierski wrote: > How to reliably calculate the minimum RMSD (and superposition) between > CA carbons, based on sequence alignment?
If you don't want to change the existing alignment, there is only one way to it in Modeller - use selection.superpose(). The alignment routines (salign, malign3d) will modify the alignment.
Ben Webb, Modeller Caretaker
participants (3)
-
Amjad Farooq
-
Modeller Caretaker
-
Pawel Kedzierski