Re: [modeller_usage] loop modeling parallel-queue

26 Jan 2011


      On 26 January 2011 18:14, Modeller Caretaker modeller-care@salilab.orgwrote:
> On 1/26/11 3:43 AM, Thomas Evangelidis wrote:
>
>> I had recently encountered a similar problem. There is also another
>> issue, when running 8 jobs concurrently in the same directory each
>> thread needs some input files (in my understanding) which are deleted or
>> modified by some other threads and that leads to deadlock.
>>
>
> I don't think that's the question David was asking. But what you say is not
> correct - 8 jobs will run concurrently in the same directory (in fact it is
> designed to work that way, since they will need the same initial
> conformation input file). The parallel loop modeling should not delete or
> modify any input files, so I don't know why you're seeing a deadlock - we
> have not seen such behavior here. Perhaps some problem with your network
> storage? The parallel framework also does not use threads - it uses
> independent processes - and that together with the master-slave design makes
> a deadlock (where A is waiting for B, but B is also waiting for A) rather
> unlikely.
>
> That said, the scripts you attach will get rather confused if you run them
> in the same directory, because they are rather less efficient than the
> parallel loop modeling included with Modeller. You are asking each slave to
> build the set of restraints and initial model before building the loop
> models. Obviously if one slave overwrites this initial model while another
> slave is trying to read it, things will get confused (but a deadlock should
> still not occur - the slave will simply raise an exception). But since the
> initial model and restraints are always the same, this is not necessary. The
> extra computational effort (and the potential for confusion) can be simply
> avoided by building the restraints and initial model on the master node -
> then all the slaves do is read them in and generate one or more loop models.
> This is in fact what happens if you call loopmodel's "use_parallel_job"
> method before you call make().http://salilab.org/mailman/listinfo/modeller_usage
Yes the exception I was getting was that *.ini file couldn't be found. I
tried initially "use_parallel_job" to model loops on a master node from an
initial conformation, but MODELLER wasn't scaling when trying to create many
loop models. I have 8 CPUs on my local machine and only 1 was running.
Surprisingly it did scale when I tried to create just a few loop models,
i.e. 16. Since I couldn't invest more that 1 day in debugging I decided to
do it by creating one directory for every slave node.
I do have a collection of scripts that does homology modelling and loop
modeling with the dopehr_loopmodel class using this "use_parallel_job"
method, but I couldn't adapt it to do just loop modelling from an initial
conformation...weird!
-- 

======================================================================

Thomas Evangelidis

PhD student

Biomedical Research Foundation, Academy of Athens

4 Soranou Ephessiou , 115 27 Athens, Greece

email: tevang@bioacademy.gr

          tevang3@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/