Re: [modeller_usage] Error running build_profile.py :Not enough bins in histogram
> > Dear Modeller users, > > I am trying to search for the best template for my target molecule, and > therefore I created a file in PIR format with my sequences and then > I ran build_profile.py. I got the following error in my log file: > > > MODELLER 9.10, 2011/09/28, r8346 > > PROTEIN STRUCTURE MODELLING BY SATISFACTION OF SPATIAL RESTRAINTS > > > Copyright(c) 1989-2011 Andrej Sali > All Rights Reserved > > Written by A. Sali > with help from > B. Webb, M.S. Madhusudhan, M-Y. Shen, M.A. Marti-Renom, > N. Eswar, F. Alber, M. Topf, B. Oliva, A. Fiser, > R. Sanchez, B. Yerkovich, A. Badretdinov, > F. Melo, J.P. Overington, E. Feyfant > University of California, San Francisco, USA > Rockefeller University, New York, USA > Harvard University, Cambridge, USA > Imperial Cancer Research Fund, London, UK > Birkbeck College, University of London, London, UK > > > Kind, OS, HostName, Kernel, Processor: 4, Linux pan 3.0.0-12-generic x86_64 > Date and time of compilation : 2011/09/28 17:58:44 > MODELLER executable type : x86_64-intel8 > Job starting time (YY/MM/DD HH:MM:SS): 2012/02/29 16:37:57 > > openf___224_> Open $(LIB)/restyp.lib > openf___224_> Open ${MODINSTALL9v10}/modlib/resgrp.lib > rdresgr_266_> Number of residue groups: 2 > openf___224_> Open ${MODINSTALL9v10}/modlib/sstruc.lib > > Dynamically allocated memory at amaxlibraries [B,KiB,MiB]: 3234076 > 3158.277 3.084 > > Dynamically allocated memory at amaxlibraries [B,KiB,MiB]: 3234604 > 3158.793 3.085 > openf___224_> Open ${MODINSTALL9v10}/modlib/resdih.lib > > Dynamically allocated memory at amaxlibraries [B,KiB,MiB]: 3283204 > 3206.254 3.131 > rdrdih__263_> Number of dihedral angle types : 9 > Maximal number of dihedral angle optima: 3 > Dihedral angle names : Alph Phi Psi Omeg > chi1 chi2 chi3 chi4 chi5 > openf___224_> Open ${MODINSTALL9v10}/modlib/radii.lib > > Dynamically allocated memory at amaxlibraries [B,KiB,MiB]: 3292444 > 3215.277 3.140 > openf___224_> Open ${MODINSTALL9v10}/modlib/radii14.lib > openf___224_> Open ${MODINSTALL9v10}/modlib/af_mnchdef.lib > rdwilmo_274_> Mainchain residue conformation classes: APBLE > openf___224_> Open ${MODINSTALL9v10}/modlib/mnch.lib > rdclass_257_> Number of classes: 5 > openf___224_> Open ${MODINSTALL9v10}/modlib/mnch1.lib > openf___224_> Open ${MODINSTALL9v10}/modlib/mnch2.lib > openf___224_> Open ${MODINSTALL9v10}/modlib/mnch3.lib > openf___224_> Open ${MODINSTALL9v10}/modlib/xs4.mat > rdrrwgh_268_> Number of residue types: 21 > openf___224_> Open pdb.pir > > Dynamically allocated memory at amaxsequence_db [B,KiB,MiB]: 3293057 > 3215.876 3.141 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3294891 > 3217.667 3.142 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3296341 > 3219.083 3.144 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3299241 > 3221.915 3.146 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3305041 > 3227.579 3.152 > > Dynamically allocated memory at amaxsequence_db [B,KiB,MiB]: 3310040 > 3232.461 3.157 > > SEQ_DATABASE_FILE : pdb.pir > SEQ_DATABASE_FORMAT : PIR > CHAINS_LIST : ALL > CLEAN_SEQUENCES : T > MINMAX_DB_SEQ_LEN : 30 4000 > Number of sequences : 4 > Number of residues : 1060 > Length of longest sequence: 271 > > > SEQ_DATABASE_FILE : pdb_3.bin > SEQ_DATABASE_FORMAT : BINARY > SEARCH_CHAINS_LIST : ALL > Number of sequences : 4 > Number of residues : 1060 > Length of longest sequence: 271 > > > SEQ_DATABASE_FILE : pdb_3.bin > SEQ_DATABASE_FORMAT : BINARY > CHAINS_LIST : ALL > CLEAN_SEQUENCES : T > MINMAX_DB_SEQ_LEN : 0 999999 > Number of sequences : 4 > Number of residues : 1060 > Length of longest sequence: 271 > > openf___224_> Open chst11.ali > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3300046 > 3222.701 3.147 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3301496 > 3224.117 3.149 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3304396 > 3226.949 3.151 > > Dynamically allocated memory at amaxalignment [B,KiB,MiB]: 3310196 > 3232.613 3.157 > > Dynamically allocated memory at amaxsequence [B,KiB,MiB]: 3311600 > 3233.984 3.158 > > Read the alignment from file : chst11.ali > Total number of alignment positions: 352 > > # Code #_Res #_Segm PDB_code Name > > ------------------------------------------------------------------------------- > 1 chst11 352 1 chst11 > > Dynamically allocated memory at amaxprofile [B,KiB,MiB]: 3313325 > 3235.669 3.160 > openf___224_> Open ${LIB}/blosum62.sim.mat > rdrrwgh_268_> Number of residue types: 21 > profile_iteration_> processing sequence: 1 352 1 > 0.0100000 0.0100000 0.0100000 1 > regress_657E> Not enough bins in histogram - cannot calculate statistics, > nbins: 1 > > My build_profile.py file is the following: > > from modeller import * > > log.verbose() > env = environ() > > #-- Prepare the input files > > #-- Read in the sequence database > sdb = sequence_db(env) > sdb.read(seq_database_file='pdb.pir', seq_database_format='PIR', > chains_list='ALL', minmax_db_seq_len=(30, 4000), > clean_sequences=True) > > #-- Write the sequence database in binary form > sdb.write(seq_database_file='pdb_3.bin', seq_database_format='BINARY', > chains_list='ALL') > > #-- Now, read in the binary database > sdb.read(seq_database_file='pdb_3.bin', seq_database_format='BINARY', > chains_list='ALL') > > #-- Read in the target sequence/alignment > aln = alignment(env) > aln.append(file='chst11.ali', alignment_format='PIR', align_codes='ALL') > > #-- Convert the input sequence/alignment into > # profile format > prf = aln.to_profile() > > #-- Scan sequence database to pick up homologous sequences > prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat', > gap_penalties_1d=(-500, -50), n_prof_iterations=10, > check_profile=True, max_aln_evalue=0.01) > > #-- Write out the profile in text format > prf.write(file='build_profile.prf', profile_format='TEXT') > > #-- Convert the profile back to alignment format > aln = prf.to_alignment() > > #-- Write out the alignment file > aln.write(file='build_profile.ali', alignment_format='PIR') > > If anyone could help me with this I would be grateful. > > Many thanks in advance, > > Susana > >
On 2/29/12 10:01 AM, Susana Tomasio wrote: > I am trying to search for the best template for my target molecule, > and therefore I created a file in PIR format with my sequences and then > I ran build_profile.py. I got the following error in my log file:
build_profile is designed to search through a sequence database (e.g. all sequences from PDB or UniProt) to find matches for a single query sequence. You've built a database of only 4 sequences. This is not nearly large enough for build_profile to be able to determine accurate statistics. What are you trying to do? If you want to determine which of the 4 sequences is the best template, one possible solution is simply to build models for each of the 4 templates and then assess them.
Ben Webb, Modeller Caretaker
Thank you very much for your reply. Yes, I am trying to determine which of the 4 sequences is the best. I will make a model for each of the 4 templates then.
Thank you.
Best wishes, Susana
On Wed, Feb 29, 2012 at 6:31 PM, Modeller Caretaker < modeller-care@salilab.org> wrote:
> On 2/29/12 10:01 AM, Susana Tomasio wrote: > >> I am trying to search for the best template for my target molecule, >> and therefore I created a file in PIR format with my sequences and then >> I ran build_profile.py. I got the following error in my log file: >> > > build_profile is designed to search through a sequence database (e.g. all > sequences from PDB or UniProt) to find matches for a single query sequence. > You've built a database of only 4 sequences. This is not nearly large > enough for build_profile to be able to determine accurate statistics. What > are you trying to do? If you want to determine which of the 4 sequences is > the best template, one possible solution is simply to build models for each > of the 4 templates and then assess them. > > Ben Webb, Modeller Caretaker > -- > modeller-care@salilab.org http://www.salilab.org/**modeller/http://www.salilab.org/modeller/ > Modeller mail list: http://salilab.org/mailman/**listinfo/modeller_usagehttp://salilab.org/mailman/listinfo/modeller_usage >
participants (2)
-
Modeller Caretaker
-
Susana Tomasio