problem running jobs on the cluster
hi: running jobs on our shared cluster i am getting an error: "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" things are running ok on baton2, the problem occurs on the nodes of the cluster. any idea how to resolve that?
thanks, keren.
I don't think hdf5 or at least appropriate versions of hdf5 is installed on the cluster as a whole. I suggest building against the version in ~drussel/production/hdf5 (you can copy that to your home dir if you want).
On Sep 21, 2011, at 3:41 AM, Keren Lasker wrote:
> hi: > running jobs on our shared cluster i am getting an error: > "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" > things are running ok on baton2, the problem occurs on the nodes of the cluster. > any idea how to resolve that? > > thanks, keren. > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
thanks. so how do you think others are running things on the cluster these days? :) is everyone using your production version or installed things locally? also, you can answer to that offline if more appropriate, but how do one link your hdf5 version with imp? is there a flag in the config file or something like that? On Sep 21, 2011, at 4:16 PM, Daniel Russel wrote:
> I don't think hdf5 or at least appropriate versions of hdf5 is installed on the cluster as a whole. I suggest building against the version in ~drussel/production/hdf5 (you can copy that to your home dir if you want). > > On Sep 21, 2011, at 3:41 AM, Keren Lasker wrote: > >> hi: >> running jobs on our shared cluster i am getting an error: >> "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" >> things are running ok on baton2, the problem occurs on the nodes of the cluster. >> any idea how to resolve that? >> >> thanks, keren. >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
On Sep 21, 2011, at 6:18 AM, Keren Lasker wrote:
> thanks. > so how do you think others are running things on the cluster these days? :) > is everyone using your production version or installed things locally? I know of various combinations of using mine, using others, Charles was going to ask Josh to update some build machines or not using it at all. I like my production setup since you know there are no other libraries that are non-standard that are required. And I intent to keep things stable (most of the libraries there have version numbers in their paths).
> also, you can answer to that offline if more appropriate, but how do one link your hdf5 version with imp? is there a flag in the config file or something like that? As with all dependencies, just make sure the includepath, libpath, and perhaps, ldlibpath includes the library.
> On Sep 21, 2011, at 4:16 PM, Daniel Russel wrote: > >> I don't think hdf5 or at least appropriate versions of hdf5 is installed on the cluster as a whole. I suggest building against the version in ~drussel/production/hdf5 (you can copy that to your home dir if you want). >> >> On Sep 21, 2011, at 3:41 AM, Keren Lasker wrote: >> >>> hi: >>> running jobs on our shared cluster i am getting an error: >>> "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" >>> things are running ok on baton2, the problem occurs on the nodes of the cluster. >>> any idea how to resolve that? >>> >>> thanks, keren. >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev >> >> >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
I was able to successfully build IMP on the cluster in fast mode. Right now, I believe IMP is built daily in release mode on /diva1 for people to use, but not in fast. Until we have something automated for that I think people are doing it on their own.
The only thing that gave me trouble was the hdf5 stuff, which I corrected by adding Daniel's hdf5 into my config file:
libpath="/netapp/sali/drussel/production/hdf5/lib" cxxflags="-m64 -mmmx -msse -msse2 -isystem /netapp/sali/drussel/production/hdf5/include"
(I'm actually not sure if I needed both those lines or just one).
I asked Josh about updating hdf5 to version 1.8 on the cluster and he said that there will soon be a major OS change and he'll do it after that. For now it had dependencies which probably need to be updated before hdf5 itself, and unless it's a priority, he'll wait until after the OS change and do it all at once.
Maybe this is obvious to people, but since I had to ask around, here is what I did to build IMP-fast, maybe it will be useful for others: 1. Check out IMP on the head node (the svn repository is accessible from there) 2. Update the config file to include Daniel's hdf5 stuff (also he has links to bullet and CGAL if you need those) 3. login to an interactive node and build there to avoid abusing the head node. Since the disks are the same on the interactive node and the head node, you can build on one and have it available on the other one.
Thanks to Dr. Rogers for helping me out with this.
db
On 9/21/11 7:37 AM, Daniel Russel wrote: > On Sep 21, 2011, at 6:18 AM, Keren Lasker wrote: > > >> thanks. >> so how do you think others are running things on the cluster these days? :) >> is everyone using your production version or installed things locally? >> > I know of various combinations of using mine, using others, Charles was going to ask Josh to update some build machines or not using it at all. I like my production setup since you know there are no other libraries that are non-standard that are required. And I intent to keep things stable (most of the libraries there have version numbers in their paths). > > >> also, you can answer to that offline if more appropriate, but how do one link your hdf5 version with imp? is there a flag in the config file or something like that? >> > As with all dependencies, just make sure the includepath, libpath, and perhaps, ldlibpath includes the library. > > >> On Sep 21, 2011, at 4:16 PM, Daniel Russel wrote: >> >> >>> I don't think hdf5 or at least appropriate versions of hdf5 is installed on the cluster as a whole. I suggest building against the version in ~drussel/production/hdf5 (you can copy that to your home dir if you want). >>> >>> On Sep 21, 2011, at 3:41 AM, Keren Lasker wrote: >>> >>> >>>> hi: >>>> running jobs on our shared cluster i am getting an error: >>>> "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" >>>> things are running ok on baton2, the problem occurs on the nodes of the cluster. >>>> any idea how to resolve that? >>>> >>>> thanks, keren. >>>> _______________________________________________ >>>> IMP-dev mailing list >>>> IMP-dev@salilab.org >>>> https://salilab.org/mailman/listinfo/imp-dev >>>> >>> >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev >>> >> >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev >> > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev >
It might make sense to set up a ccache for the cluster, like we have for the lab. The old netapp seems like a good place. That way only the first person has to wait around while a new checkout is being built. And we can provide some standard config.py files to make everything just work. Make sense?
On Sep 21, 2011, at 2:19 PM, Dave Barkan wrote:
> I was able to successfully build IMP on the cluster in fast mode. Right > now, I believe IMP is built daily in release mode on /diva1 for people > to use, but not in fast. Until we have something automated for that I > think people are doing it on their own. > > The only thing that gave me trouble was the hdf5 stuff, which I > corrected by adding Daniel's hdf5 into my config file: > > libpath="/netapp/sali/drussel/production/hdf5/lib" > cxxflags="-m64 -mmmx -msse -msse2 -isystem > /netapp/sali/drussel/production/hdf5/include" > > (I'm actually not sure if I needed both those lines or just one). > > I asked Josh about updating hdf5 to version 1.8 on the cluster and he > said that there will soon be a major OS change and he'll do it after > that. For now it had dependencies which probably need to be updated > before hdf5 itself, and unless it's a priority, he'll wait until after > the OS change and do it all at once. > > Maybe this is obvious to people, but since I had to ask around, here is > what I did to build IMP-fast, maybe it will be useful for others: > 1. Check out IMP on the head node (the svn repository is accessible from > there) > 2. Update the config file to include Daniel's hdf5 stuff (also he has > links to bullet and CGAL if you need those) > 3. login to an interactive node and build there to avoid abusing the > head node. Since the disks are the same on the interactive node and the > head node, you can build on one and have it available on the other one. > > Thanks to Dr. Rogers for helping me out with this. > > db > > > > On 9/21/11 7:37 AM, Daniel Russel wrote: >> On Sep 21, 2011, at 6:18 AM, Keren Lasker wrote: >> >> >>> thanks. >>> so how do you think others are running things on the cluster these days? :) >>> is everyone using your production version or installed things locally? >>> >> I know of various combinations of using mine, using others, Charles was going to ask Josh to update some build machines or not using it at all. I like my production setup since you know there are no other libraries that are non-standard that are required. And I intent to keep things stable (most of the libraries there have version numbers in their paths). >> >> >>> also, you can answer to that offline if more appropriate, but how do one link your hdf5 version with imp? is there a flag in the config file or something like that? >>> >> As with all dependencies, just make sure the includepath, libpath, and perhaps, ldlibpath includes the library. >> >> >>> On Sep 21, 2011, at 4:16 PM, Daniel Russel wrote: >>> >>> >>>> I don't think hdf5 or at least appropriate versions of hdf5 is installed on the cluster as a whole. I suggest building against the version in ~drussel/production/hdf5 (you can copy that to your home dir if you want). >>>> >>>> On Sep 21, 2011, at 3:41 AM, Keren Lasker wrote: >>>> >>>> >>>>> hi: >>>>> running jobs on our shared cluster i am getting an error: >>>>> "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" >>>>> things are running ok on baton2, the problem occurs on the nodes of the cluster. >>>>> any idea how to resolve that? >>>>> >>>>> thanks, keren. >>>>> _______________________________________________ >>>>> IMP-dev mailing list >>>>> IMP-dev@salilab.org >>>>> https://salilab.org/mailman/listinfo/imp-dev >>>>> >>>> >>>> _______________________________________________ >>>> IMP-dev mailing list >>>> IMP-dev@salilab.org >>>> https://salilab.org/mailman/listinfo/imp-dev >>>> >>> >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev >>> >> >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev >> > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
sounds very very good :) and very very helpful for many people in lab! :) On Sep 22, 2011, at 12:31 AM, Daniel Russel wrote:
> It might make sense to set up a ccache for the cluster, like we have for the lab. The old netapp seems like a good place. That way only the first person has to wait around while a new checkout is being built. And we can provide some standard config.py files to make everything just work. Make sense? > > > On Sep 21, 2011, at 2:19 PM, Dave Barkan wrote: > >> I was able to successfully build IMP on the cluster in fast mode. Right >> now, I believe IMP is built daily in release mode on /diva1 for people >> to use, but not in fast. Until we have something automated for that I >> think people are doing it on their own. >> >> The only thing that gave me trouble was the hdf5 stuff, which I >> corrected by adding Daniel's hdf5 into my config file: >> >> libpath="/netapp/sali/drussel/production/hdf5/lib" >> cxxflags="-m64 -mmmx -msse -msse2 -isystem >> /netapp/sali/drussel/production/hdf5/include" >> >> (I'm actually not sure if I needed both those lines or just one). >> >> I asked Josh about updating hdf5 to version 1.8 on the cluster and he >> said that there will soon be a major OS change and he'll do it after >> that. For now it had dependencies which probably need to be updated >> before hdf5 itself, and unless it's a priority, he'll wait until after >> the OS change and do it all at once. >> >> Maybe this is obvious to people, but since I had to ask around, here is >> what I did to build IMP-fast, maybe it will be useful for others: >> 1. Check out IMP on the head node (the svn repository is accessible from >> there) >> 2. Update the config file to include Daniel's hdf5 stuff (also he has >> links to bullet and CGAL if you need those) >> 3. login to an interactive node and build there to avoid abusing the >> head node. Since the disks are the same on the interactive node and the >> head node, you can build on one and have it available on the other one. >> >> Thanks to Dr. Rogers for helping me out with this. >> >> db >> >> >> >> On 9/21/11 7:37 AM, Daniel Russel wrote: >>> On Sep 21, 2011, at 6:18 AM, Keren Lasker wrote: >>> >>> >>>> thanks. >>>> so how do you think others are running things on the cluster these days? :) >>>> is everyone using your production version or installed things locally? >>>> >>> I know of various combinations of using mine, using others, Charles was going to ask Josh to update some build machines or not using it at all. I like my production setup since you know there are no other libraries that are non-standard that are required. And I intent to keep things stable (most of the libraries there have version numbers in their paths). >>> >>> >>>> also, you can answer to that offline if more appropriate, but how do one link your hdf5 version with imp? is there a flag in the config file or something like that? >>>> >>> As with all dependencies, just make sure the includepath, libpath, and perhaps, ldlibpath includes the library. >>> >>> >>>> On Sep 21, 2011, at 4:16 PM, Daniel Russel wrote: >>>> >>>> >>>>> I don't think hdf5 or at least appropriate versions of hdf5 is installed on the cluster as a whole. I suggest building against the version in ~drussel/production/hdf5 (you can copy that to your home dir if you want). >>>>> >>>>> On Sep 21, 2011, at 3:41 AM, Keren Lasker wrote: >>>>> >>>>> >>>>>> hi: >>>>>> running jobs on our shared cluster i am getting an error: >>>>>> "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" >>>>>> things are running ok on baton2, the problem occurs on the nodes of the cluster. >>>>>> any idea how to resolve that? >>>>>> >>>>>> thanks, keren. >>>>>> _______________________________________________ >>>>>> IMP-dev mailing list >>>>>> IMP-dev@salilab.org >>>>>> https://salilab.org/mailman/listinfo/imp-dev >>>>>> >>>>> >>>>> _______________________________________________ >>>>> IMP-dev mailing list >>>>> IMP-dev@salilab.org >>>>> https://salilab.org/mailman/listinfo/imp-dev >>>>> >>>> >>>> _______________________________________________ >>>> IMP-dev mailing list >>>> IMP-dev@salilab.org >>>> https://salilab.org/mailman/listinfo/imp-dev >>>> >>> >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev >>> >> >> _______________________________________________ >> IMP-dev mailing list >> IMP-dev@salilab.org >> https://salilab.org/mailman/listinfo/imp-dev > > > _______________________________________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/listinfo/imp-dev
On 9/21/11 3:41 AM, Keren Lasker wrote: > running jobs on our shared cluster i am getting an error: > "ImportError: libhdf5.so.6: cannot open shared object file: No such file or directory" > things are running ok on baton2, the problem occurs on the nodes of the cluster.
Simple fix: copy libhdf5.so.6 from /usr/lib64/ on baton2 to the same directory containing your IMP .so files. Same goes for any other .so that IMP complains about on non-interactive nodes but not on interactive nodes (or baton2).
Since Josh maintains the cluster for all of QB3, it is doubtful that all of the runtime libraries on his nodes will always (or indeed, ever) match our requirements. Those on baton2 will, however, since we maintain it just for us.
Ben
participants (4)
-
Ben Webb
-
Daniel Russel
-
Dave Barkan
-
Keren Lasker