[modeller_usage] Error: Ran out of slaves to run tasks
To:
Subject: [modeller_usage] Error: Ran out of slaves to run tasks
From: Xiao-Ping Zhang <>
Date: Tue, 21 Feb 2012 08:55:54 -0800
Organization: UC Davis
Reply-to:
Dear Modellers,
When I was refining loops for a model in parallel, I asked modeller to
generate 9999 models. But modeller stopped at 1245 with the following
error. The computer has six cores, modeller uses four of them. The other
two cores were free.
Thank you for your help.
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
<Loop model building task #1.1219> on <Slave on localhost> completed
<Loop model building task #1.1217> on <Slave on localhost> completed
<Loop model building task #1.1222> on <Slave on localhost> completed
<Loop model building task #1.1218> on <Slave on localhost> completed
<Loop model building task #1.1223> on <Slave on localhost> completed
<Loop model building task #1.1224> on <Slave on localhost> completed
<Loop model building task #1.1225> on <Slave on localhost> completed
<Loop model building task #1.1226> on <Slave on localhost> completed
<Loop model building task #1.1228> on <Slave on localhost> completed
<Loop model building task #1.1229> on <Slave on localhost> completed
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
<Loop model building task #1.1227> on <Slave on localhost> completed
<Loop model building task #1.1230> on <Slave on localhost> completed
<Loop model building task #1.1232> on <Slave on localhost> completed
<Loop model building task #1.1233> on <Slave on localhost> completed
<Loop model building task #1.1234> on <Slave on localhost> completed
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
<Loop model building task #1.1235> on <Slave on localhost> completed
<Loop model building task #1.1237> on <Slave on localhost> completed
<Loop model building task #1.1238> on <Slave on localhost> completed
<Loop model building task #1.1239> on <Slave on localhost> completed
<Loop model building task #1.1240> on <Slave on localhost> completed
<Loop model building task #1.1241> on <Slave on localhost> completed
<Loop model building task #1.1242> on <Slave on localhost> completed
<Loop model building task #1.1243> on <Slave on localhost> completed
<Loop model building task #1.1244> on <Slave on localhost> completed
<Loop model building task #1.1245> on <Slave on localhost> completed
<Slave on localhost> failed (Connection lost to slave <Slave on
localhost>: [Errno 104] Connection reset by peer) - removing from
<Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on
localhost>, <Slave on localhost>]>
Traceback (most recent call last):
File "refine_loop.py", line 41, in <module>
a.make()
File
"/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 36, in make
self.build_seq(self.inimodel, 1)
File
"/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 190, in build_seq
self.parallel_loop_models(atmsel, ini_model, num, sched)
File
"/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 208, in parallel_loop_models
self.loop.outputs.extend(job.run_all_tasks())
File
"/usr/local/lib/python2.7/site-packages/modeller/parallel/job.py", line
136, in run_all_tasks
raise ValueError("Ran out of slaves to run tasks")
ValueError: Ran out of slaves to run tasks
XP