New subject: modeller.parallel.communicator.RemoteError

21 Feb 2012


      Dear Modellers,
When I was refining loops for a model in parallel, I asked modeller to
generate 9999 models. But modeller stopped at 1245 with the following
error. The computer has six cores, modeller uses four of them. The other
two cores were free.
Thank you for your help.
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
<Loop model building task #1.1219> on <Slave on localhost> completed
<Loop model building task #1.1217> on <Slave on localhost> completed
<Loop model building task #1.1222> on <Slave on localhost> completed
<Loop model building task #1.1218> on <Slave on localhost> completed
<Loop model building task #1.1223> on <Slave on localhost> completed
<Loop model building task #1.1224> on <Slave on localhost> completed
<Loop model building task #1.1225> on <Slave on localhost> completed
<Loop model building task #1.1226> on <Slave on localhost> completed
<Loop model building task #1.1228> on <Slave on localhost> completed
<Loop model building task #1.1229> on <Slave on localhost> completed
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
<Loop model building task #1.1227> on <Slave on localhost> completed
<Loop model building task #1.1230> on <Slave on localhost> completed
<Loop model building task #1.1232> on <Slave on localhost> completed
<Loop model building task #1.1233> on <Slave on localhost> completed
<Loop model building task #1.1234> on <Slave on localhost> completed
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
<Loop model building task #1.1235> on <Slave on localhost> completed
<Loop model building task #1.1237> on <Slave on localhost> completed
<Loop model building task #1.1238> on <Slave on localhost> completed
<Loop model building task #1.1239> on <Slave on localhost> completed
<Loop model building task #1.1240> on <Slave on localhost> completed
<Loop model building task #1.1241> on <Slave on localhost> completed
<Loop model building task #1.1242> on <Slave on localhost> completed
<Loop model building task #1.1243> on <Slave on localhost> completed
<Loop model building task #1.1244> on <Slave on localhost> completed
<Loop model building task #1.1245> on <Slave on localhost> completed
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
Traceback (most recent call last):
  File "refine_loop.py", line 41, in <module>
    a.make()
  File
"/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 36, in make
    self.build_seq(self.inimodel, 1)
  File
"/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 190, in build_seq
    self.parallel_loop_models(atmsel, ini_model, num, sched)
  File
"/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 208, in parallel_loop_models
    self.loop.outputs.extend(job.run_all_tasks())
  File
"/usr/local/lib/python2.7/site-packages/modeller/parallel/job.py", line
136, in run_all_tasks
    raise ValueError("Ran out of slaves to run tasks")
ValueError: Ran out of slaves to run tasks
XP

Error: Ran out of slaves to run tasks

Xiao-Ping Zhang

Modeller Caretaker

Modeller Caretaker

Xiao-Ping Zhang

Xiao-Ping Zhang

Modeller Caretaker

tags (0)

participants (2)