On 26 January 2011 18:14, Modeller Caretaker <modeller-care@salilab.org> wrote:
On 1/26/11 3:43 AM, Thomas Evangelidis wrote:
I had recently encountered a similar problem. There is also another
issue, when running 8 jobs concurrently in the same directory each
thread needs some input files (in my understanding) which are deleted or
modified by some other threads and that leads to deadlock.

I don't think that's the question David was asking. But what you say is not correct - 8 jobs will run concurrently in the same directory (in fact it is designed to work that way, since they will need the same initial conformation input file). The parallel loop modeling should not delete or modify any input files, so I don't know why you're seeing a deadlock - we have not seen such behavior here. Perhaps some problem with your network storage? The parallel framework also does not use threads - it uses independent processes - and that together with the master-slave design makes a deadlock (where A is waiting for B, but B is also waiting for A) rather unlikely.

That said, the scripts you attach will get rather confused if you run them in the same directory, because they are rather less efficient than the parallel loop modeling included with Modeller. You are asking each slave to build the set of restraints and initial model before building the loop models. Obviously if one slave overwrites this initial model while another slave is trying to read it, things will get confused (but a deadlock should still not occur - the slave will simply raise an exception). But since the initial model and restraints are always the same, this is not necessary. The extra computational effort (and the potential for confusion) can be simply avoided by building the restraints and initial model on the master node - then all the slaves do is read them in and generate one or more loop models. This is in fact what happens if you call loopmodel's "use_parallel_job" method before you call make().

Yes the exception I was getting was that *.ini file couldn't be found. I tried initially "use_parallel_job" to model loops on a master node from an initial conformation, but MODELLER wasn't scaling when trying to create many loop models. I have 8 CPUs on my local machine and only 1 was running. Surprisingly it did scale when I tried to create just a few loop models, i.e. 16. Since I couldn't invest more that 1 day in debugging I decided to do it by creating one directory for every slave node.

I do have a collection of scripts that does homology modelling and loop modeling with the dopehr_loopmodel class using this "use_parallel_job" method, but I couldn't adapt it to do just loop modelling from an initial conformation...weird!



Thomas Evangelidis

PhD student

Biomedical Research Foundation, Academy of Athens

4 Soranou Ephessiou , 115 27 Athens, Greece

email: tevang@bioacademy.gr


website: https://sites.google.com/site/thomasevangelidishomepage/