Dear all, I'm trying to run a parallel job with modeller. The script is
from modeller import * from modeller.automodel import * from modeller import soap_protein_od
j = job() for i in range(16) j.append(local_slave())
log.verbose() env.io.atom_files_directory = ['.', '../atom_files'] a = dopehr_loopmodel(env, alnfile='OurAlignment.ali', knows='2p1mB', sequence='Rece', assess_methods=(assess.DOPEHR, soap_protein_od.Scorer(), assess.GA341), loop_assess_methods=assess.DOPEHR) a.starting_model = 1 a.ending_model = 16 a.loop.starting_model = 1 a.loop.ending_model = 10 a.use_parallel_job(j) a.make()
However, it doesn't seem to work properly. In particular, it creates 16 slave output files containing exclusively 'import site' failed and then it hangs there, doing nothing at all, until I kill it with CTRL-C. What am I doing wrong? For sake of completeness, I'm running modeller 9.17, the distribution is Gentoo Linux, and single-CPU runs of modeller work fine without problems (so far).
Thanks!
On 10/21/16 5:00 AM, Del Genio, Charo wrote: > Dear all, I'm trying to run a parallel job with modeller. ... > a = dopehr_loopmodel(env, alnfile='OurAlignment.ali', > knows='2p1mB', sequence='Rece', > assess_methods=(assess.DOPEHR, > soap_protein_od.Scorer(), > assess.GA341), > loop_assess_methods=assess.DOPEHR) ... > However, it doesn't seem to work properly. In particular, it creates 16 slave output files containing exclusively > 'import site' failed > and then it hangs there, doing nothing at all, until I kill it with CTRL-C.
That's odd. I don't see anything obviously wrong with your script. But the master process communicates with the slaves over the network, so may be having trouble establishing a connection. You can try using host='localhost' as a job() parameter. See https://salilab.org/modeller/9.17/manual/node459.html
I don't think soap_protein_od has ever been tried in a parallel job before. I can't think of an obvious reason why it wouldn't work, although note that each Modeller task runs in a separate process (there is no shared memory) so at best you'll end up with 16 copies of the SOAP library in memory. You could certainly run out of memory since even one copy of this library is quite large. You'd probably be better off skipping this assessment method here and instead rescoring your final models with it in serial once the job has finished.
Ben Webb, Modeller Caretaker
OK, I figured out what the problem was. My OS installation has python3.4 as the default interpreter. Thus, to properly parallelize the job, I need to create it with
j = job(modeller_path='modpy.sh python2.7 /usr/bin/modslave.py', host='localhost')
which is the way I normally run modeller. Now it works properly. Also, thanks for the suggestion to drop SOAP and just use it later to score the models at the end!
> From: Modeller Caretaker modeller-care@salilab.org > Sent: 24 October 2016 17:15 > To: Del Genio, Charo; modeller_usage@salilab.org > Subject: Re: [modeller_usage] Unable to run parallel job > > That's odd. I don't see anything obviously wrong with your script. But > the master process communicates with the slaves over the network, so may > be having trouble establishing a connection. You can try using > host='localhost' as a job() parameter. See > https://salilab.org/modeller/9.17/manual/node459.html > > I don't think soap_protein_od has ever been tried in a parallel job > before. I can't think of an obvious reason why it wouldn't work, > although note that each Modeller task runs in a separate process (there > is no shared memory) so at best you'll end up with 16 copies of the SOAP > library in memory. You could certainly run out of memory since even one > copy of this library is quite large. You'd probably be better off > skipping this assessment method here and instead rescoring your final > models with it in serial once the job has finished.