On 26 January 2011 18:14, Modeller Caretaker modeller-care@salilab.orgwrote:
> On 1/26/11 3:43 AM, Thomas Evangelidis wrote: > >> I had recently encountered a similar problem. There is also another >> issue, when running 8 jobs concurrently in the same directory each >> thread needs some input files (in my understanding) which are deleted or >> modified by some other threads and that leads to deadlock. >> > > I don't think that's the question David was asking. But what you say is not > correct - 8 jobs will run concurrently in the same directory (in fact it is > designed to work that way, since they will need the same initial > conformation input file). The parallel loop modeling should not delete or > modify any input files, so I don't know why you're seeing a deadlock - we > have not seen such behavior here. Perhaps some problem with your network > storage? The parallel framework also does not use threads - it uses > independent processes - and that together with the master-slave design makes > a deadlock (where A is waiting for B, but B is also waiting for A) rather > unlikely. > > That said, the scripts you attach will get rather confused if you run them > in the same directory, because they are rather less efficient than the > parallel loop modeling included with Modeller. You are asking each slave to > build the set of restraints and initial model before building the loop > models. Obviously if one slave overwrites this initial model while another > slave is trying to read it, things will get confused (but a deadlock should > still not occur - the slave will simply raise an exception). But since the > initial model and restraints are always the same, this is not necessary. The > extra computational effort (and the potential for confusion) can be simply > avoided by building the restraints and initial model on the master node - > then all the slaves do is read them in and generate one or more loop models. > This is in fact what happens if you call loopmodel's "use_parallel_job" > method before you call make().http://salilab.org/mailman/listinfo/modeller_usage
Yes the exception I was getting was that *.ini file couldn't be found. I tried initially "use_parallel_job" to model loops on a master node from an initial conformation, but MODELLER wasn't scaling when trying to create many loop models. I have 8 CPUs on my local machine and only 1 was running. Surprisingly it did scale when I tried to create just a few loop models, i.e. 16. Since I couldn't invest more that 1 day in debugging I decided to do it by creating one directory for every slave node.
I do have a collection of scripts that does homology modelling and loop modeling with the dopehr_loopmodel class using this "use_parallel_job" method, but I couldn't adapt it to do just loop modelling from an initial conformation...weird!