loops iterations and various DOPE scores
Folks,
I am having several difficulties with analysis of my loops.
I built a model and selected a region to refine as a loop.
I have successfully created 1000 loop files.
Based on the model_energies.py file from the examples, I am attempting to scan the 1000 loop model files. So far the script manages to read only about 250 of the set.
Here's the script. The file "loop.list" has the names of the 1000 files, one per line: ___________________________ from modeller.automodel import * # Load the automodel class
log.verbose() # request verbose output env = environ() # create a new MODELLER environment to build this model in env.libs.topology.read(file='$(LIB)/top_heav.lib') # read topology env.libs.parameters.read(file='$(LIB)/par.lib') # read parameters
# directories for input atom files env.io.atom_files_directory = './:../atom_files'
def dope_profile(mdl, file): # DOPE energy parameters edat = energy_data(contact_shell=15.0, dynamic_modeller=True, dynamic_lennard=False, dynamic_sphere=False, excl_local=(False, False, False, False)) # DOPE group restraints oldgprsr = mdl.group_restraints if not model.dope_restraints: model.dope_restraints = \ group_restraints(classes='${LIB}/atmcls-mf.lib', parameters='${LIB}/dist-mf.lib') mdl.group_restraints = model.dope_restraints molpdf = mdl.energy(output='SHORT', file=file, edat=edat, residue_span_range=(1, 9999), normalize_profile=True, smoothing_window=10) mdl.group_restraints = oldgprsr return molpdf filename = 'loop.list' in_file = open(filename,"r") while 1: # read model file code = in_file.readline() mdl = model(env) mdl.read(file=code) aln = alignment(env) # generate topology aln.append_model(mdl, atom_files=code, align_codes=code) aln.append_model(mdl, atom_files=code, align_codes=code+'-ini') mdl.generate_topology(aln, sequence=code+'-ini') mdl.transfer_xyz(aln) if info.version_info == (8,0): dope_profile(mdl, code+'.profile') else: mdl.assess_dope(output='ENERGY_PROFILE NO_REPORT', file=code+'.profile', normalize_profile=True, smoothing_window=15) _____________________________________
Here are the first five lines of an output profile: __________________________________________________________ # Energy of each residue is written to: ABC31_HUMAN.BL00010001.pdb .profile # The profile IS normalized by the number of restraints. # The profiles are smoothed over a window of residues: 13 # The sum of all numbers in the file: -15.0456 ___________________________________________________
The actual profile file name looks like this "ABC31_HUMAN.BL00010001.pdb?.profile"
1a) So, Is there a limit to ability of Modeller or python to handle long file name strings? 1b) Why am I not able to digest all 1000 loop models?
2) In the log file that's produced I see a line like this for the first loop model:
DOPE score :-51270.746094
What is the relationship between the summed per residue values from the profile output (ie -15.0456) and this DOPE score from the log file?
Thanks,
Starr
Starr Hazard wrote: > I am having several difficulties with analysis of my loops. > > I built a model and selected a region to refine as a loop. > > I have successfully created 1000 loop files. > > Based on the model_energies.py file from the examples, I am attempting > to scan the 1000 loop model files.
Sounds OK so far.
> def dope_profile(mdl, file): > # DOPE energy parameters > edat = energy_data(contact_shell=15.0, dynamic_modeller=True, > dynamic_lennard=False, dynamic_sphere=False, > excl_local=(False, False, False, False)) > # DOPE group restraints > oldgprsr = mdl.group_restraints > if not model.dope_restraints: > model.dope_restraints = \ > group_restraints(classes='${LIB}/atmcls-mf.lib', > parameters='${LIB}/dist-mf.lib') > mdl.group_restraints = model.dope_restraints > molpdf = mdl.energy(output='SHORT', file=file, > edat=edat, residue_span_range=(1, 9999), > normalize_profile=True, smoothing_window=10) > mdl.group_restraints = oldgprsr > return molpdf
You actually only need this routine if you're using Modeller 8v0. 8v1 can produce a profile from the 'assess_dope' routine (see below). Makes the script a bit more readable. ;)
> filename = 'loop.list' > in_file = open(filename,"r") > while 1: > # read model file > code = in_file.readline() > mdl = model(env) > mdl.read(file=code) > aln = alignment(env) > # generate topology > aln.append_model(mdl, atom_files=code, align_codes=code) > aln.append_model(mdl, atom_files=code, align_codes=code+'-ini') > mdl.generate_topology(aln, sequence=code+'-ini') > mdl.transfer_xyz(aln)
Presumably all of your loop models were built with the same Modeller inputs, so the sequence of every model should be the same. In this case, you can take all of this code outside of the loop, and just put the mdl.read call within it. That should give you the same results in less time.
> 1a) So, Is there a limit to ability of Modeller or python to handle long > file name strings?
Python has no limit. Modeller (due to Fortran limitations) is in some cases limited to 600 characters, and will truncate file names if they're longer than this.
> 1b) Why am I not able to digest all 1000 loop models?
What's the error you get?
> 2) In the log file that's produced I see a line like this for the first > loop model: > > DOPE score :-51270.746094 > > What is the relationship between the summed per residue values from the > profile output (ie -15.0456) and this DOPE score from the log file?
There is no direct relationship, as the per-residue values are divided by the number of restraints acting on each residue.
Ben Webb, Modeller Caretaker
Here are the last lines of a typical log file. This run stopped after reading 236 files out of 1000 loop files. I have finished all of what I wanted to do by reading the files in smaller batches. In all cases the log files record a similar memory problem
openf5__224_> Open 11 OLD SEQUENTIAL ABC31_HUMAN.BL02370001.pdb
Dynamically allocated memory at amaxstructure [B,kB,MB]: -669621493 -653927.250 -638.601 openf5__224_> Open 11 OLD SEQUENTIAL ABC31_HUMAN.BL02370001.pdb
transfe_506_> MODEL is an average of all templates. transfe_511_> Number of templates for coordinate transfer: 1 After transfering coordinates of the equivalent template atoms, there are defined, undefined atoms in MODEL: 4312 0 >> Model assessment by DOPE potential iatmcls_286W> MODEL atom not classified: SER:OXT SER runcmd______> model.energy((def)deviation=0.0, (def)asgl_output=False, normalize_profile=True, residue_span_range=(1, 999 ), output='ENERGY_PROFILE NO_REPORT', file='RPE65_HUMAN.BL02370001.pdb\n.profile', (def)output_directory='', smoothing_wi dow=15, (def)viol_report_cut=(4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 999, 999, 999, 999, 4.5, 4 5, 4.5, 4.5, 4.5, 4.5, 999, 6.5, 4.5, 4.5, 4.5, 4.5, 4.5, 999, 999, 999, 4.5, 4.5), (def)viol_report_cut2=(2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2 0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0), schedule_scale=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0])
preppdf_453W> No fixed restraints selected; there may be some dynamic ones. preppdf_454W> Restraints file was probably not read; use restraints.append().
Dynamically allocated memory at amaxhash_contac [B,kB,MB]: -668379505 -652714.375 -637.416
Dynamically allocated memory at amaxrestraints [B,kB,MB]: -668276025 -652613.312 -637.318
Dynamically allocated memory at amaxrestraints [B,kB,MB]: -668172537 -652512.250 -637.219
Dynamically allocated memory at amaxrestraints [B,kB,MB]: -667965561 -652310.125 -637.022
Dynamically allocated memory at amaxrestraints [B,kB,MB]: -667551609 -651905.875 -636.627
Dynamically allocated memory at amaxrestraints [B,kB,MB]: -666723705 -651097.375 -635.837
Dynamically allocated memory at amaxrestraints [B,kB,MB]: -665067897 -649480.375 -634.258 alloc_r_215E> Dynamic memory allocation failed. Routine, variable, status: iapairs 2
On screen the following is echoed:
*** malloc: vm_allocate(size=6623232) failed (error code=3) *** malloc[601]: error: Can't allocate region
Modeller 8v1 is running on a MAC G5 running OSX version 10.3.
Thanks,
Starr
--On Tuesday, December 06, 2005 07:56:24 PM -0800 Modeller Caretaker modeller-care@salilab.org wrote:
> Starr Hazard wrote: >> I am having several difficulties with analysis of my loops. >> >> I built a model and selected a region to refine as a loop. >> >> I have successfully created 1000 loop files. >> >> Based on the model_energies.py file from the examples, I am attempting >> to scan the 1000 loop model files. > > Sounds OK so far. > >> def dope_profile(mdl, file): >> # DOPE energy parameters >> edat = energy_data(contact_shell=15.0, dynamic_modeller=True, >> dynamic_lennard=False, dynamic_sphere=False, >> excl_local=(False, False, False, False)) >> # DOPE group restraints >> oldgprsr = mdl.group_restraints >> if not model.dope_restraints: >> model.dope_restraints = \ >> group_restraints(classes='${LIB}/atmcls-mf.lib', >> parameters='${LIB}/dist-mf.lib') >> mdl.group_restraints = model.dope_restraints >> molpdf = mdl.energy(output='SHORT', file=file, >> edat=edat, residue_span_range=(1, 9999), >> normalize_profile=True, smoothing_window=10) >> mdl.group_restraints = oldgprsr >> return molpdf > > You actually only need this routine if you're using Modeller 8v0. 8v1 can > produce a profile from the 'assess_dope' routine (see below). Makes the > script a bit more readable. ;) > >> filename = 'loop.list' >> in_file = open(filename,"r") >> while 1: >> # read model file >> code = in_file.readline() >> mdl = model(env) >> mdl.read(file=code) >> aln = alignment(env) >> # generate topology >> aln.append_model(mdl, atom_files=code, align_codes=code) >> aln.append_model(mdl, atom_files=code, align_codes=code+'-ini') >> mdl.generate_topology(aln, sequence=code+'-ini') >> mdl.transfer_xyz(aln) > > Presumably all of your loop models were built with the same Modeller > inputs, so the sequence of every model should be the same. In this case, > you can take all of this code outside of the loop, and just put the > mdl.read call within it. That should give you the same results in less > time. > >> 1a) So, Is there a limit to ability of Modeller or python to handle long >> file name strings? > > Python has no limit. Modeller (due to Fortran limitations) is in some > cases limited to 600 characters, and will truncate file names if they're > longer than this. > >> 1b) Why am I not able to digest all 1000 loop models? > > What's the error you get? > >> 2) In the log file that's produced I see a line like this for the first >> loop model: >> >> DOPE score :-51270.746094 >> >> What is the relationship between the summed per residue values from the >> profile output (ie -15.0456) and this DOPE score from the log file? > > There is no direct relationship, as the per-residue values are divided by > the number of restraints acting on each residue. > > Ben Webb, Modeller Caretaker > -- > modeller-care@salilab.org http://www.salilab.org/modeller/ > Modeller mail list: http://salilab.org/mailman/listinfo/modeller_usage >
Starr Hazard wrote: > Here are the last lines of a typical log file. This run stopped after > reading 236 files out of 1000 loop files. I have finished all of what I > wanted to do by reading the files in smaller batches. In all cases the > log files record a similar memory problem
Yes, you're absolutely right: Modeller is running out of memory. This is because the Python interface is creating some circular references, so the objects aren't being cleaned up properly. You have three options:
1. Do what you did already - split the job into smaller batches. 2. Do what I suggested - move the creation of the mdl and aln objects outside of the loop. 3. Apply the (attached) patch to your Modeller 8v1 distribution, which should fix the problem.
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
Starr Hazard