Dear Modellers,
When I was refining loops for a model in parallel, I asked modeller to generate 9999 models. But modeller stopped at 1245 with the following error. The computer has six cores, modeller uses four of them. The other two cores were free.
Thank you for your help.
<Slave on localhost> failed (Connection lost to slave <Slave on localhost>: [Errno 104] Connection reset by peer) - removing from <Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on localhost>, <Slave on localhost>]> <Loop model building task #1.1219> on <Slave on localhost> completed <Loop model building task #1.1217> on <Slave on localhost> completed <Loop model building task #1.1222> on <Slave on localhost> completed <Loop model building task #1.1218> on <Slave on localhost> completed <Loop model building task #1.1223> on <Slave on localhost> completed <Loop model building task #1.1224> on <Slave on localhost> completed <Loop model building task #1.1225> on <Slave on localhost> completed <Loop model building task #1.1226> on <Slave on localhost> completed <Loop model building task #1.1228> on <Slave on localhost> completed <Loop model building task #1.1229> on <Slave on localhost> completed <Slave on localhost> failed (Connection lost to slave <Slave on localhost>: [Errno 104] Connection reset by peer) - removing from <Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on localhost>, <Slave on localhost>]> <Loop model building task #1.1227> on <Slave on localhost> completed <Loop model building task #1.1230> on <Slave on localhost> completed <Loop model building task #1.1232> on <Slave on localhost> completed <Loop model building task #1.1233> on <Slave on localhost> completed <Loop model building task #1.1234> on <Slave on localhost> completed <Slave on localhost> failed (Connection lost to slave <Slave on localhost>: [Errno 104] Connection reset by peer) - removing from <Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on localhost>, <Slave on localhost>]> <Loop model building task #1.1235> on <Slave on localhost> completed <Loop model building task #1.1237> on <Slave on localhost> completed <Loop model building task #1.1238> on <Slave on localhost> completed <Loop model building task #1.1239> on <Slave on localhost> completed <Loop model building task #1.1240> on <Slave on localhost> completed <Loop model building task #1.1241> on <Slave on localhost> completed <Loop model building task #1.1242> on <Slave on localhost> completed <Loop model building task #1.1243> on <Slave on localhost> completed <Loop model building task #1.1244> on <Slave on localhost> completed <Loop model building task #1.1245> on <Slave on localhost> completed <Slave on localhost> failed (Connection lost to slave <Slave on localhost>: [Errno 104] Connection reset by peer) - removing from <Parallel job [<Slave on localhost>, <Slave on localhost>, <Slave on localhost>, <Slave on localhost>]> Traceback (most recent call last): File "refine_loop.py", line 41, in <module> a.make() File "/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 36, in make self.build_seq(self.inimodel, 1) File "/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 190, in build_seq self.parallel_loop_models(atmsel, ini_model, num, sched) File "/usr/local/lib/python2.7/site-packages/modeller/automodel/loopmodel.py", line 208, in parallel_loop_models self.loop.outputs.extend(job.run_all_tasks()) File "/usr/local/lib/python2.7/site-packages/modeller/parallel/job.py", line 136, in run_all_tasks raise ValueError("Ran out of slaves to run tasks") ValueError: Ran out of slaves to run tasks
XP