Parallel Python (socket.error: (111, 'Connection refused')
Hello, I have password-less ssh and rsh (I've tried both; same "Connection refused" error). My home directory is NFS mounted to all slave machines. I successfully ran parallel python with Modeller9v7 in the past with this setup but now since updating to Mod9v8 I'm not able to run the same scripts. It's been a while since I've used Modeller so I'm not sure if the problem is due to updating Modeller or changes that might have occurred in the computer configurations. Again ssh and rsh work seamlessly between all nodes without passwords.
Also:
I'm able to run the scripts using 'local_slave()'. I updated to Modeller9v8 on all machines. All nodes are correctly listed in the /etc/hosts files on all machines. We do run other software apps that require password-less ssh and they are working well. Nodes are quad core processors.
Finally:
I included below the code I'm running as well as the Modeller output in the *.slave files.
Any help would be greatly appreciated.
Regards,
Laurence Cooke Arizona Cancer Center Tucson, Arizona
****************************************** Start Parallel Task Remote file (one node) ****************************************** from modeller import * from modeller.parallel import *
from buildmodeltask import BuildModelTask
log.minimal() j = job() j.append(ssh_slave('slave1', ssh_command='ssh')) j.append(ssh_slave('slave1', ssh_command='ssh')) j.append(ssh_slave('slave1', ssh_command='ssh')) j.append(ssh_slave('slave1', ssh_command='ssh'))
j.queue_task(BuildModelTask('alignment_1.ali','pdb','sequence_1')) j.queue_task(BuildModelTask('alignment_2.ali','pdb','sequence_2')) j.queue_task(BuildModelTask('alignment_3.ali','pdb','sequence_3')) j.queue_task(BuildModelTask('alignment_4.ali','pdb','sequence_4'))
results = j.run_all_tasks()
print "Done: ", results
***************************** BuildModelTask file ****************************** import random from modeller import * from modeller.parallel import task from modeller.automodel import *
class BuildModelTask(task): def run(self,ali,pdb,unknown): env = environ(rand_seed=(random.randint(2,50000)*-1)) env.io.hetatm = True a = automodel(env, alnfile=ali, knowns=pdb, sequence=unknown, assess_methods=(assess.DOPE, assess.GA341)) a.starting_model = 1 a.ending_model = 250 a.make()
********************************* *.slave0-4 files ********************************* Slave startup: connect to master at 127.0.0.1:59941:0:LMTRWVLB Traceback (most recent call last): File "/usr/lib/modeller9v8/bin/modslave.py", line 27, in ? slaveloop(sys.argv[2]) File "/usr/lib/python2.4/site-packages/modeller/parallel/slaveloop.py", line 57, in slaveloop master.connect_back(host, port, identifier) File "/usr/lib/python2.4/site-packages/modeller/parallel/slave_communicator.py", line 15, in connect_back s.connect((host, port)) File "<string>", line 1, in connect socket.error: (111, 'Connection refused')
DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.
On 6/9/10 12:30 AM, Laurence Cooke wrote: > I have password-less ssh and rsh (I've tried both; same > "Connection refused" error). My home directory is NFS mounted to all > slave machines. I successfully ran parallel python with Modeller9v7 in > the past with this setup but now since updating to Mod9v8 I'm not able > to run the same scripts. It's been a while since I've used Modeller so > I'm not sure if the problem is due to updating Modeller or changes that > might have occurred in the computer configurations. Again ssh and rsh > work seamlessly between all nodes without passwords.
It's not a change between 9v7 and 9v8 - in fact, the parallel framework was not modified at all between these two releases.
If the connection is refused, I can think of only two possibilities:
1. You have a firewall preventing such connections (in this case no Modeller jobs will run).
2. The master process died (thus closing the listening socket) before the slave was able to start up - perhaps due to a syntax error in your script. In this case you should be able to see it from the master output (not the slave output).
Ben Webb, Modeller Caretaker
participants (2)
-
Laurence Cooke
-
Modeller Caretaker