[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[modeller_usage] modeller.parallel.communicator.RemoteError



Hi, 

I got the following error when trying to run a old python script (see
below) that worked very well before. A couple of days ago, I removed the
Java came from the system (Fedora 13) and installed Java from java.com
(jdk-7u4-linux-i586.rpm, jre-7u4-linux-i586.rpm) to fix a crash problem
of Jalview. 

I wonder if the problem is related to the new Java or some other
problems. 

Thank you. 

XP 

############ my script  ########################

# Homology modeling by the automodel class
from modeller import *              # Load standard Modeller classes
from modeller.automodel import *    # Load the automodel class
from modeller.parallel import *     # Load the parallel class, 
                                    #to use  multiple processors


# Use 5 CPUs in a parallel job on this machine
j = job() # Cluster
j.append(local_slave())
j.append(local_slave())
j.append(local_slave())
j.append(local_slave())
j.append(local_slave())

log.verbose()    # request verbose output
env = environ()  # create a new MODELLER environment to build this model
in
env.io.hetatm = True


# directories for input atom files
env.io.atom_files_directory = ['.',
'/usr/share/doc/modeller-9v8/examples']


a = automodel(env,
              alnfile  = 'CC1295N.ali',     # alignment filename
              knowns   = ('1H6L', '1CVM1', '3AMR'),              # codes
of the templates
              sequence = 'CC1295N')              # code of the target
a.starting_model= 1                 # index of the first model
a.ending_model  = 20                 # index of the last model
                                    # (determines how many models to
calculate)
a.use_parallel_job(j) 
a.make()                            # do the actual homology modeling



####################    Errors  ########################
Traceback (most recent call last):
  File "CC1295N_model.py", line 32, in ?
    a.make()                            # do the actual homologymodeling
  File "/usr/lib/modeller9.10/modlib/modeller/automodel/automodel.py",
line 107, in make
    self.multiple_models(atmsel)
  File "/usr/lib/modeller9.10/modlib/modeller/automodel/automodel.py",
line 208, in multiple_models
    self.parallel_multiple_models(atmsel)
  File "/usr/lib/modeller9.10/modlib/modeller/automodel/automodel.py",
line 228, in parallel_multiple_models 
    self.outputs.extend(job.run_all_tasks())
  File "/usr/lib/modeller9.10/modlib/modeller/parallel/job.py", line
131, in run_all_tasks for task in self._finish_all_tasks():
  File "/usr/lib/modeller9.10/modlib/modeller/parallel/job.py", line
164, in _finish_all_tasks  task = self._process_event(obj, s)
  File "/usr/lib/modeller9.10/modlib/modeller/parallel/job.py", line
180, in _process_event task = obj.task_results()
  File "/usr/lib/modeller9.10/modlib/modeller/parallel/slave.py", line
61, in task_results r = self.get_data(allow_heartbeat=True)
  File "/usr/lib/modeller9.10/modlib/modeller/parallel/communicator.py",
line 89, in get_data (cmdtype, obj) = self._recv()
  File "/usr/lib/modeller9.10/modlib/modeller/parallel/communicator.py",
line 130, in _recv raise RemoteError(obj.exc, self)
modeller.parallel.communicator.RemoteError: IndexError: user_form__E>
Functional form 8195 out of range from <Slave on localhost>
#####################################################






-----Original Message-----
From: Modeller Caretaker <>
To: 
Subject: Re: [modeller_usage] Error: Ran out of slaves to run tasks
Date: Thu, 23 Feb 2012 10:54:11 -0800

On 2/21/12 9:39 AM, Modeller Caretaker wrote:
> On 02/21/2012 08:55 AM, Xiao-Ping Zhang wrote:
>> When I was refining loops for a model in parallel, I asked modeller to
>> generate 9999 models. But modeller stopped at 1245 with the following
>> error. The computer has six cores, modeller uses four of them. The other
>> two cores were free.
> ...
>> ValueError: Ran out of slaves to run tasks
>
> This means exactly what it says: all of the slaves died, so it had
> nowhere to run the loop model building tasks. Each slave generates its
> own output file (look for files ending in .slave). Look in there to see
> what the problem was with each slave.

To conclude: it turned out that each slave was running out of memory. 
This is actually caused by a memory leak in Modeller that only affects 
parallel loopmodel runs. A patch is available to fix the problem at 
http://salilab.org/modeller/wiki/Patches

	Ben Webb, Modeller Caretaker