modeller_usage June 2006

modeller_usage@salilab.org

16 participants
22 discussions

Query
by Madhan Kumar A 30 Jun '06

30 Jun '06

Sir(s), I had written to you yesterday, reporting an error message. As you had suggested, I have attached the alignment file, PDB file, script file and the log file. Kindly let me know what the problem is. Thanking you in advance. Madhan Kumar

2 1

align_2d vs salign
by Mike White 30 Jun '06

30 Jun '06

Hi there, I am modeling a homopentamer using three structures of homologous homopentamers as the template.The monomers are 210 aa's. Following the advanced tutorial, I first aligned the three known structures using salign with the three cycles of refinement, and got good alignment. In the tutorial, the next step was to then align the test sequence with the aligned templates using align_2d. After approximately 3900 seconds this produced what looks like a nice alignment, with structural features such as cys involved in disulfides and conserved aromatic residues lining up between all four sequences (three templates, one test). I then did the alignment using salign instead of align_2d, following the description in the online manual for salign (examples/salign.saling_align_2d). Once again, a nice (but slightly different in the regions without much homology, but still with the other features retained) is obtained. The surprising thing is that this alignment only took 17 seconds, a 200-fold improvement in time). Both alignments gave acceptable-looking models. Are the two algorithms so different that salign is that much faster? Is there any advantage to using align_2d instead of salign? Thanks Mike White Here are the two scripts: ************************************************************************** ALIGN-2D: # Demonstrating ALIGN2D, aligning with variable gap penalty # uses a version of the 5HT3ARXD with first 12 residues removed (no overlap with templates) # default gap_penalties_2d: (3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0., 0.) # default gap_penalties_1d: (-450, -50) from modeller import * log.verbose() env = environ() env.libs.topology.read(file='$(LIB)/top_heav.lib') # Read aligned structure(s): aln = alignment(env) aln.append(file='achbpsPentamer.ali', align_codes='all') aln_block = len(aln) # Read aligned sequence(s): aln.append(file='5HT3ARXDPentamerTrim.ali', align_codes='5HT3ARXDPentamerTrim') # Structure sensitive variable gap penalty sequence-sequence alignment: aln.align2d(overhang=0, gap_penalties_1d=(-450, -50), gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0., 0.), align_block=aln_block) # write files. In *.pap files indicate helices and beta structures aln.write(file='5HT3ARXDPentamerTrim-mult.ali', alignment_format='PIR') aln.write(file='5HT3ARXDPentamerTrim-mult.pap', alignment_format='PAP', alignment_features=' INDICES HELIX BETA') **************************************************************************** SALIGN: # Demonstrating ALIGN2D using salign commands, aligning with variable gap penalty # uses a version of the 5HT3ARXD with first 12 residues removed (no overlap with templates) # default gap_penalties_2d: (3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0., 0.) # default gap_penalties_1d: (-450, -50) from modeller import * log.verbose() env = environ() env.libs.topology.read(file='$(LIB)/top_heav.lib') # Read aligned structure(s): aln = alignment(env) aln.append(file='achbpsPentamer.ali', align_codes='all') aln_block = len(aln) # Read aligned sequence(s): aln.append(file='5HT3ARXDPentamerTrim.ali', align_codes='5HT3ARXDPentamerTrim') # Structure sensitive variable gap penalty sequence-sequence alignment: aln.salign(rr_file='$(LIB)/as1.sim.mat', output='', max_gap_length=20, gap_function=True, feature_weights=(1.,0., 0., 0., 0., 0.), gap_penalties_1d=(-450,-50), gap_penalties_2d=(3.5, 3.5, 3.5, 0.2, 4.0, 6.5, 2.0, 0., 0.), similarity_flag=True, align_block=aln_block) # write files. In *.pap files indicate helices and beta structures aln.write(file='5HT3ARXDPentamerTrim-multSalign.ali', alignment_format='PIR') aln.write(file='5HT3ARXDPentamerTrim-multSalign.pap', alignment_format='PAP', alignment_features=' INDICES HELIX BETA') *****************************************************************************

2 1

How Modeller avoid CA clashes?
by Yun He 30 Jun '06

30 Jun '06

When the distance between two adjacent CA atoms (i.e. Ca_i and Ca_i+1) is shorter than 3.6 Angstrom, the two residues may take part in clashes. These are severely unrealistic geometry, however, cis-proline may be an exception usually. When we use Modeller to build model, the default model routine (automodel class) seems only to check the improper chain breaks (the distance between two adjacent CA atoms is larger than 8 Angstrom) of templates, but not check the CA clashes of generated models. And the models sometimes have the CA clashes problem. How could Modeller avoid such improper geometry? Add some distance restraints by hand? A typical example: T0345, a target just released of CASP7. Two templates with very high similarities (id% > 65%) are found, 2F8A and 1GP1. Here is my alignment file of 1GP1_A and T0345 -------------- >P1;1GP1A structure:1gp1:10:A:192:A:GLUTATHIONE PEROXIDASE:NA:2.00:0.171 ---RTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASL-GTTVRDYTQMNDLQRRLGPRGLVV LGFPCNQFGHQENAKNEEILNCLKYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTP SDDATALMTDPKFITWSPVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLL-* >P1;T0345 sequence:T0345:1: :185: :T0345: :2.00:-1.00 MIAKSFYDLSAINL-DGEKVDFNTFRGRAVLIENVASLCGTTTRDFTQLNELQCRF-PRRLVV LGFPCNQFGHQENCQNEEILNSLKYVRPGGGYQPTFTLVQKCEVNGQNEHPVFAYLKDKLPYP YDDPFSLMTDPKLIIWSPVRRSDVAWNFEKFLIGPEGEPFRRYSRTFPTINIEPDIKRLLK* --------------- There are two cis-prolines (PRO:97, PRO:150) in each chain of 1GP1, correspondingly, there are two prolines on the same sites of T0345 (PRO:89, PRO:142), and residues around these prolines are very conserved, so I think the prolines in T0345 should be also cis-proline. After building models from this alignment, I have checked the CA clashes of models, and found that there are two clashes: -- pair distance 88 ARG: === 89 PRO: 2.798 A 141 SER: === 142 PRO: 2.803 A -- However there are NO CA clashes occurred in 1GP1. How do the two clashes happen? Are there some means to avoid these clashes? Best regards, Yun 2006/06/30

2 1

error message
by Madhan Kumar A 27 Jun '06

27 Jun '06

Sir(s), While trying to model my protein using Modeller 8v2, I encounter the following error message in the log file: * Protein specified in ALIGN_CODES(i) was not found in the alignment file; ALIGN_CODES( 2) = ted * I have checked the ALI and ATM files. Everything seems to be alright. Pls let me know what the problem could be. Thanking you in advance. Madhan Kumar *My alignment file:* >P1;1ABC structureX:1ABC:2 :O:231 :O:::: GGSDGLQDVTIMNQDQEQIIFAFPPVLGYGLMYQNLSSRLPS-YKLCAFDFIEE-EDRLDRYADLIQKLQPEGPLTLFGYSAGCSLAFEAAKKLEGQGRIVQRIIMVDSYKKQGVSDLDGRTVESDVEALMNVNRDNEALNSEAVKHGLKQKTHAFYSYYVNLISTGQVKADIDLLTSGADFDIPEWLASWEEATTGAYRMKRGFGTHAEMLQGETLDRNAGILLEFLNTQT* >P1;ted sequence:ted: :: ::::: ------------------PVFVFHPAGGSTVVYEPLLGRLPADTPMYGFERVEGSIEERAQQYVPKLIEMQGDGPYVLVGWSLGGVLAYACAIGLRRLGKDVRFVGLIDAVRAG-----------------------------------------------------------------------------------------------------------------------* *My script file:* # Homology modelling by the automodel class from modeller.automodel import * # Load the automodel class log.verbose() # request verbose output env = environ() # create a new MODELLER environment to build this model in # directories for input atom files env.io.atom_files_directory = '' a = automodel(env, alnfile = 'align_ted.ali', # alignment filename knowns = '1ABC', # codes of the templates sequence = 'ted') # code of the target a.starting_model= 1 # index of the first model a.ending_model = 1 # index of the last model # (determines how many models to calculate) a.make() # do the actual homology modelling *Log file:* ** openf5__224_> Open 11 OLD SEQUENTIAL $(LIB)/restyp.lib openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/resdih.lib rdrdih__263_> Number of dihedral angle types : 9 Maximal number of dihedral angle optima: 3 Dihedral angle names : Alph Phi Psi Omeg chi1 chi2 chi3 chi4 chi5 openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/radii.lib openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/radii14.lib openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/af_mnchdef.lib rdwilmo_274_> Mainchain residue conformation classes: APBLE openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/mnch.lib rdclass_257_> Number of classes: 5 openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/mnch1.lib openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/mnch2.lib openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/mnch3.lib openf5__224_> Open 11 OLD SEQUENTIAL ${MODINSTALL8v2}/modlib/xs4.mat rdrrwgh_268_> Number of residue types: 21 runcmd______> alignment.append(align_codes=['1ABC', 'ted'], atom_files=[], file='align_ted.ali', (def)remove_gaps=True, (def)alignment_format='PIR', add_sequence=True, (def)rewind_file=False, (def)close_file=True) openf___224_> Open align_ted.ali Dynamically allocated memory at amaxalignment [B,kB,MB]: 2124923 2075.120 2.026 Dynamically allocated memory at amaxalignment [B,kB,MB]: 2126623 2076.780 2.028 Dynamically allocated memory at amaxalignment [B,kB,MB]: 2130023 2080.101 2.031 Dynamically allocated memory at amaxalignment [B,kB,MB]: 2136823 2086.741 2.038 Dynamically allocated memory at amaxalignment [B,kB,MB]: 2147161 2096.837 2.048 read_al_373E> Protein specified in ALIGN_CODES(i) was not found in the alignment file; ALIGN_CODES( 2) = ted

2 1

question about multiple alignment tutorial
by Mike White 20 Jun '06

20 Jun '06

I have a question about the example script salign.py in the Advanced tutorial for version 8v2. I understand the logic of the script where salign is called three successive times incorporating more information each time in the form of the "weights" parameter, and one sees increase in alignment quality with each cycle. However, I am unclear of the purpose of the last call to salign after the alignment files are written. The tutorial text says that the last call is to generate a quality score, but after examining the log file the quality of the alignment after this call is worse than after the first three calls. I noticed that the rms_cutoff for the first three calls is 3.5, while that for the final call is 1.0 (presumably more stringent), and that the alignment type is "progressive" rather than "tree." Since the alignment files are already written, and this is what is used in the alignment with the sequenced to be modeled, why is this last step run? As usual, thanks for all of your help Mike White

1 0

Restraints to apply for nucleic acid
by Charlotte Habegger-Polomat 20 Jun '06

20 Jun '06

Hello! Modeller has been quite successful at modelling my protein of interest and I am very happy with the results. I am now using Modeller to model my protein in complex with a nucelic acid (tNRA), as there is a homolog complex of known structure. The family of models I obtained is quite good but the nucleic acid seems a little out of shape. It looks like some of the base pairing and/or stacking has been lost. I have been thinking of adding restraints to the modelling process. Do you have any ideas on the best type of restraints to use (H-bonds, distance, etc.)? Thanks! -Charlotte Habegger-Polomat, Quebec

1 0

Using template's ligand as model's ligand
by Frederico Arnoldi 19 Jun '06

19 Jun '06

Dear Modeller's users, Could I superimpose the template and the best model of this template and use the template's ligand as model's ligand? Which are the theorical problem of doing it if I used Modeller to build the model? How precise is it? Thanks for the help. Fred UNESP - Brazil

2 1

homomultimers and symmetry
by Mike White 19 Jun '06

19 Jun '06

Hi there, I am new to hands-on experience with Modeller. I would like to create a homology model of the protein that I am interested in as a homopentamer, using known homopentamer X-ray structures of related proteins. I have run across several questions in the archives about imposing symmetry constraints in the modeling process. Is it necessary to impose such constraints if the template is already a homopentamer with 5-fold symmetry? Doesn't this by itself impose symmetry constraints? Or, is it possible that an energy-minimized structure based on a symmetric template may not retain all of the symmetry of the template and the symmetry restraints make sure that the original symmetry is forced to be retained? Finally, since model.symmetry.define() deals with pairs, does one enforce pentameric symmetry in the complex by forcing AB, BC, CD, DE, and EA symmetry, where A-E are the five individual subunits in the homopentamer? Thanks Mike White

2 1

Questions on loop modelling
by hori koshii 16 Jun '06

16 Jun '06

My target sequence has two insertions at 2 loop regions (14aa, 9aa) when compare to my template structure. I have three questions: 1. Since 12aa is the limit of loop modelling, is that viable to build a "model" with the "template sequence plus 2 aa of the "target" insertion, then use this "template +2aa" model as template for building the model of the target???? 2. The residues flanking the insertions also belong to the loop regions (of the template), so, my question is, when we do loop modelling, do we need to model the whole piece of loop for accuracy?? 3. When we generate loop models, do you model one loop and then the other? Since I have two loop regions, I wonder if I should build (~500) models for one regions and then another set for the other region? Does one loop region actually affect the conformation of the other loop? Do you think it's necessary to build them both at the same time? Thank you very much for answering the questions. Have a great day. Sincerely, Jasmine --------------------------------- Do you Yahoo!? Next-gen email? Have it all with the all-new Yahoo! Mail Beta.

2 2

alignment using multiple templates
by Mike White 15 Jun '06

15 Jun '06

HI there, I am working my way through the tutorials, playing with the Python scripts to get a better understanding of how things work. I have successfully done a multiple structural alignment using a modified version of the salign.py script in the advanced tutorial aligning multiple sequences for a protein from several different species using the pdb files as input. The protein is a homopentamer, and using the example script, I aligned the "A" chains and everything worked fine. Since I really want to work on modeling a homopentamer, I then tried to do the alignment on the pentamers. I receive the following error: ************************************************************************* linux:/home/mike/Modeller_files/Modelling_projects/Test_run/Mult_align # mod8v2 salignPentamer.py Traceback (most recent call last): File "salignPentamer.py", line 11, in ? mdl = model(env, file=code, model_segment=('FIRST:@, LAST:')) File "/usr/bin/modeller8v2/modlib/modeller/model.py", line 23, in __init__ self.read(io, aln, libs, **vars) File "/usr/bin/modeller8v2/modlib/modeller/model.py", line 40, in read io=io.modpt, libs=libs.modpt, **vars) File "/usr/bin/modeller8v2/modlib/modeller/util/top.py", line 189, in read_model return _modeller.read_model(mdl, aln, io, libs, *args) ValueError: Expected a sequence for model_segment ************************************************************************* I assume that the problem is in reading the pdb files and getting the information.The pdb files do contain the chain ID's (A-E). Here is the Python script I used, which is essentially a slightly modified version of the example script: ********************************************************************* from modeller import * log.verbose() env = environ() env.io.atom_files_directory = './:../atom_files/' aln = alignment(env) for (code, chain) in (('1i9b', 'A'), ('2br7', 'A'), ('2bj0', 'A')): mdl = model(env, file=code, model_segment=('FIRST:@, LAST:')) aln.append_model(mdl, atom_files=code, align_codes=code+chain) for (weights, write_fit, whole) in (((1., 0., 0., 0., 1., 0.), False, True), ((1., 0.5, 1., 1., 1., 0.), False, True), ((1., 1., 1., 1., 1., 0.), True, False)): aln.salign(rms_cutoffs=(3.5, 6., 60, 60, 15, 60, 60, 60, 60, 60, 60), normalize_pp_scores=False, rr_file='$(LIB)/as1.sim.mat', overhang=30, gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0, dendrogram_file='fm00495.tree', alignment_type='tree', # If 'progresive', the tree is not # computed and all structues will be # aligned sequentially to the first feature_weights=weights, # For a multiple sequence alignment only # the first feature needs to be non-zero improve_alignment=True, fit=True, write_fit=write_fit, write_whole_pdb=whole, output='ALIGNMENT QUALITY') aln.write(file='achbpsPentamer.pap', alignment_format='PAP') aln.write(file='achbpsPentamer.ali', alignment_format='PIR') aln.salign(rms_cutoffs=(1.0, 6., 60, 60, 15, 60, 60, 60, 60, 60, 60), normalize_pp_scores=False, rr_file='$(LIB)/as1.sim.mat', overhang=30, gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0, dendrogram_file='1is3A.tree', alignment_type='progressive', feature_weights=[0]*6, improve_alignment=False, fit=False, write_fit=True, write_whole_pdb=False, output='QUALITY') ********************************************************************************** the only change that I made from the script that worked on the single subunits was to change the "model_segment" parameters from "('FIRST:'+chain, 'LAST:'+chain)" to "('FIRST:@, LAST:'))". I couldn't find anything in the Online manual concerning the arguments on the line before (for (code,chain) in.....)- should I have replaced chain "A" with something else to read the entire set of chains? Thanks for whatever help you can provide. I am sure that as I get deeper into things that I'll have more questions- hopefully not so basic. Mike White

2 1

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

modeller_usage June 2006