modeller_usage June 2015

modeller_usage@salilab.org

4 participants
2 discussions

Problem with structure-structure alignment with salign
by Pawel Kedzierski 01 Jul '15

01 Jul '15

Dear Modeller users, I beg your kind help or explanations for I feel quite at loss with calculation of structure-structure alignments and optimal RMSD superpositions with modeller. At first, I found it quite problematic to produce homology models properly superimposed on templates. I started with seemingly simple idea using automodel class, namely adding: a.final_malign3d = True to the model-default.py script from modeller examples. The result was that the final superposition identified just 8 equivalent CA-CA pairs, for which the RMSD CA was evaluated as 2.79A. However, when I superimposed the very same pair of structures (model 1fdx on 5fd1 template) and calculate RMSD CA in another program using all 54 CA-CA pairs according to alignment.ali used by model-defult.py, I got 0.5A. As for the automodel class, intended for simple use, it surprised me that quite basic task as RMSD CA superposition is not done correctly. By "correct" or at least the most basic way I understand the superposition giving the minimum of RMSD using all CA atoms equivalent according to the alignment. Maybe I used wrong functionality but I have not found an alternative in the documentation for the automodel class. Finding the automated final multiple structure alignment unreliable, I started looking into proper ways of superimposing structures. I have read the documentation on alignment.malign3d and finding it obsolete, on alignment.salign. Now I am at least able to calculate alignments based on sequence similarity only and use them to superimpose structures. However, I am still not able to calculate/improve the alignments based on structure-structure similarity, and again this seems due to insufficient number of equivalent CA-CA pairs identified by modeller. This is the code I tested, using the structures from modeller examples. First, sequence-sequence alignment (which works): from modeller import * env = environ() env.io.atom_files_directory = ['.', 'atom_files'] mdl = model(env) aln = alignment(env) for code in '1fas', '2ctx': mdl.read(file=code) aln.append_model(mdl, align_codes=code, atom_files=code) aln.salign(rr_file='${LIB}/blosum62.sim.mat', feature_weights=(1,0,0,0,0,0), improve_alignment=True, similarity_flag=True, # The score matrix is not rescaled rms_cutoff=300, current_directory=True, write_fit=True, fit=True, fit_atoms="CA", output='ALIGNMENT QUALITY') aln.write(file='test1.ali') The above part produces sensible sequence alignment in test1.ali and it reports 61 equivalent CA-CA pairs, but structures saved in the files 1fas_fit.pdb and 2ctx_fit.pdb are NOT superimposed. I can, however, use the aln object from above to correctly superimpose structures with model and selection classes: mdl.read(file='1fas') sel = selection(mdl).only_atom_types('CA') mdl2 = model(env, file='2ctx') sel.superpose(mdl2, aln) mdl.write(file='1fas_fit2.pdb') mdl2.write(file='2ctx_fit2.pdb') Now the structures written out to 1fas_fit2.pdb and 2ctx_fit2.pdb ARE superimposed. They are however different enough that only 5 equivalent CA-CA pairs within 3.5A cutoff are reported. I am able to confirm that with sufficiently large cutoff, there are 61 CA-CA pairs, consistently with the alignment aln (checked by examination of the test1.ali file). Such confirmation is produced with: aln.compare_structures(rms_cutoffs=[999]*11) Finally, what does not work for me at all is when I try to improve the alignment based on structure-structure similarity. The command below - as a continuation of the previous code snippets - fails with the exception that the number of equivalent positions is 0: aln.salign(fit=True, fit_atoms="CA", feature_weights=(0,1,0,0,0,0), rms_cutoff=1000.0, improve_alignment=True, current_directory=True, write_fit=True, output='ALIGNMENT QUALITY') Questions: 1. Why it fails, if the previous checks find at least 5 equivalent positions even with the default cutoff (3.5), and with the rms_cutoff=1000.0 and fit_atoms="CA" there should be 61 equivalent positions according to the alignment? 2. Why the first salign invocation has saved structures which were not superimposed, incoherently with the options fit=True, write_fit=True? 3. How one shoud use the salign function to reliably calculate optimal structure-structure alignment, or optimize some initial one as I tried to do? Thanks in advance, Paweł Kędzierski

3 4

Alignment file not recognizing a chain break?
by Kolmus, Elizabeth K. 09 Jun '15

09 Jun '15

Hello, I am attempting to model a target sequence from multiple templates. When I check the alignment, I'm getting an error I can't resolve either on my own or by consulting previous entries on the mailing list. It appears that the program is concatenating the end of one chain directly onto the beginning of the other and ignoring a gap. The manual page on alignment.append() says "This command can raise ... a SequenceMismatchError if a 'PIR' sequence does not match that read from PDB (when an empty range is given)" - does this mean that automodel will not run on the alignment? What is the appropriate fix? I suspect I may also have trouble down the line with matching the chains together, given that 3TNP is formatted as 1-2-1-2 and 3J4Q is formatted as 3-1-1-2-2. Will this be an issue, and if so, what is the workaround? Thanks, Elizabeth Details below: Parser commands: from modeller import * from modeller.scripts import complete_pdb env = environ() env.io.hetatm = True aln = alignment(env) aln.append(file='hybrid-msa.ali', align_codes='all', remove_gaps=False) aln.check() Error message: read_te_291E> Sequence difference between alignment and pdb : x (mismatch at alignment position 338) Alignment EFTEFRNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRG PDB EFTEFINRFTRRASVCAEAYNPDRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMSQVLDA Match ***** ** * * * * Alignment residue type 15 (R, ARG) does not match pdb residue type 8 (I, ILE), for align code 3TNP (atom file 3TNP), pdb residue number "104", chain "B" Please check your alignment file header to be sure you correctly specified the starting and ending residue numbers and chains. The alignment sequence must match that from the atom file exactly. Another possibility is that some residues in the atom file are missing, perhaps because they could not be resolved experimentally. (Note that Modeller reads only the ATOM and HETATM records in PDB, NOT the SEQRES records.) In this case, simply replace the section of your alignment corresponding to these missing residues with gaps. read_te_288W> Protein not accepted: 2 3TNP Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/site-packages/modeller/alignment.py", line 208, in check self.check_structure_structure(io=io) File "/usr/lib64/python2.7/site-packages/modeller/alignment.py", line 217, in check_structure_structure return f(self.modpt, io.modpt, self.env.libs.modpt, eqvdst) _modeller.SequenceMismatchError: read_te_291E> Sequence difference between alignment and pdb : hybrid-msa.ali (line breaks introduced for readability, formatting to highlight error location): >P1;target sequence:target: FIRST:@ : END:::: : MSIEIPAGLTELLQGFTVEVLRHQPADLLEFALQHFTRLQQENERKGAARFGHEGRTWGDAGAAAGGGIPSKGVNFAEEPM RSDSENGEEEEAAEAGAFNAPVINRFTRRASVCAEAYNPDEEEDDAESRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMS QVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALW GLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMK RKGKSEVEENGAVEIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDVQAFERLLGPCMEIMKRNIATYEEQLVALF GTNMDIVEPTA/GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHY AMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLRRIGRFSEPHARFYAAQI VLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEM AAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEA PFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF/MSIEIPAGLTELLQGFTVEVLRHQPADLLEFALQHFTRLQQE NERKGAARFGHEGRTWGDAGAAAGGGIPSKGVNFAEEPMRSDSENGEEEEAAEAGAFNAPVINRFTRRASVCAEAYNPDEE EDDAESRIIHPKTDDQRNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCD GVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLK VVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMKRKGKSEVEENGAVEIARCFRGQYFGELALVTNKPRAASAHAI GTVKCLAMDVQAFERLLGPCMEIMKRNIATYEEQLVALFGTNMDIVEPTA/GNAAAAKKGSEQESVKEFLAKAKEDFLKKW ETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFK DNSNLYMVMEYVAGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVK GRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLL QVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF* >P1;3TNP structureX:3TNP: FIRST:@: END::::: -------------SVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVK LKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLRRIGRF.EPHARFYAAQIVLTFEYLHSLDL IYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTW.LCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQP IQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDT SNFDDYEEEEIRV.INEKCGKEFTEF/------------------------------------------------------ --------------------------------------------------------------------------------- -----RNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKEGEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDN RGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVKNNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYN DGEQIIAQGDLADSFFIVESGEVKITMKV-------------EIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDV QAFERLLGPCMEIMKRN-----------------------/-------------SVKEFLAKAKEDFLKKWETPSQNTAQL DQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVME YVAGGEMFSHLRRIGRF.EPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTW.LCGTP EYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGN LKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRV.INEKCGKEFTEF/------------- --------------------------------------------------------------------------------- ---------------------------------------------RNRLQEACKDILLFKNLDPEQMSQVLDAMFEKLVKE GEHVIDQGDDGDNFYVIDRGTFDIYVKCDGVGRCVGNYDNRGSFGELALMYNTPKAATITATSPGALWGLDRVTFRRIIVK NNAKKRKMYESFIESLPFLKSLEVSERLKVVDVIGTKVYNDGEQIIAQGDLADSFFIVESGEVKITMKV------------EIARCFRGQYFGELALVTNKPRAASAHAIGTVKCLAMDVQAFERLLGPCMEIMKRN-----------------------* >P1;3J4Q structureN:3J4Q:FIRST:@:END::::: --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------/--------GLTE LLQGYTVEVLRQQPPDLVDFAVEYFTRL----------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- ----------------------------------------------/--------GLTELLQGYTVEVLRQQPPDLVDFAV EYFTRL--------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- ------------------------/-------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --/------------------------------------------------------------------------------ --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- ----*

2 1

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

modeller_usage June 2015