Differences using Xeon vs Opteron systems
Hello.
I'm the systems administrator for a research lab using Modeller 9v3. I've installed the x86_64 RPM file on a pair of Xeon X5355 based systems and some Opteron 6136 based systems. The researchers are reporting getting radically different results between these two systems. I've done some testing and Opteron 252 based systems give the same results as the Opteron 6136 systems and Xeon X5570 systems give the same results as the Xeon X5355.
I'm working on getting a set of results I can post, but has anyone seen anything like this before? I was being shown summaries where the final results were different in value by 1.5
Thank you.
On 01/27/2011 12:15 PM, Robert Healey wrote: > I'm the systems administrator for a research lab using Modeller 9v3.
While not the cause of the behavior you're seeing, 9v3 is really old. Several bugs which in some cases affect the quality of output models have been fixed over the past 3 years, and the latest version, 9v8, should be entirely compatible with scripts written for 9v3.
> I've installed the x86_64 RPM file on a pair of Xeon X5355 based systems > and some Opteron 6136 based systems. The researchers are reporting > getting radically different results between these two systems.
This is completely normal and expected. Due to differences from machine to machine, floating point results will differ. While these differences are very very small (say 10^-8), during an optimization of a rugged energy surface they can end up giving very different structures. (Imagine the system is at a local maximum on the energy surface, like a ball at the very top of a hill. The tiniest push will send it rolling down the hill to a local minimum. The "push" might be +10^-8 on one machine and -10^-8 on another, but the local minima could be angstroms apart.) These differences could occur because different processors order floating point instructions differently (e.g. a*b*c could be evaluated as (a*b)*c or a*(b*c)) or move data from memory to processor registers (which often have a different precision) at different times.
Generally speaking, many models should be built for any modeling problem, and something like the average of the best-scoring cluster returned. Optimizations with the sorts of rugged energy surfaces common in molecular modeling are very unlikely to find the global minimum if only a single model is built. (Multiple models will also negate the effects of differences between processors.)
Ben Webb, Modeller Caretaker
On 1/27/2011 4:38 PM, Modeller Caretaker wrote: > On 01/27/2011 12:15 PM, Robert Healey wrote: >> I'm the systems administrator for a research lab using Modeller 9v3. > > While not the cause of the behavior you're seeing, 9v3 is really old. > Several bugs which in some cases affect the quality of output models > have been fixed over the past 3 years, and the latest version, 9v8, > should be entirely compatible with scripts written for 9v3. > >> I've installed the x86_64 RPM file on a pair of Xeon X5355 based systems >> and some Opteron 6136 based systems. The researchers are reporting >> getting radically different results between these two systems. > > This is completely normal and expected. Due to differences from machine > to machine, floating point results will differ. While these differences > are very very small (say 10^-8), during an optimization of a rugged > energy surface they can end up giving very different structures. > (Imagine the system is at a local maximum on the energy surface, like a > ball at the very top of a hill. The tiniest push will send it rolling > down the hill to a local minimum. The "push" might be +10^-8 on one > machine and -10^-8 on another, but the local minima could be angstroms > apart.) These differences could occur because different processors order > floating point instructions differently (e.g. a*b*c could be evaluated > as (a*b)*c or a*(b*c)) or move data from memory to processor registers > (which often have a different precision) at different times. > > Generally speaking, many models should be built for any modeling > problem, and something like the average of the best-scoring cluster > returned. Optimizations with the sorts of rugged energy surfaces common > in molecular modeling are very unlikely to find the global minimum if > only a single model is built. (Multiple models will also negate the > effects of differences between processors.) > > Ben Webb, Modeller Caretaker
Having the researchers run some more comparisons, I've found that the modeller9v3-absoft gives consistent results across both CPU platforms, unlike the default x86_64 RPM build which differed dramatically on AMD platforms.
They also tried them x86_64 RPM 9v8 and the results matched the 9v3 x86_64 RPM results on both platforms. We're switching to -absoft on all systems now for consistency, but it would be nice if that was available as a 64 bit build also.
Bob Healey Systems Administrator Biocomputation and Bioinformatics Constellation and Molecularium healer@rpi.edu (518) 276-4407
On 2/14/11 5:20 PM, Robert Healey wrote: > Having the researchers run some more comparisons, I've found that the > modeller9v3-absoft gives consistent results across both CPU platforms
Yes, because it does not use any of the SSE functionality or any other optimization that differs between platforms. We provide that binary for systems that don't support SSE, such as FreeBSD's Linux layer.
As I said, it really makes no sense to rely on exact results, since the accuracy of the force field is far less than machine precision. The absoft binary is also noticeably slower.
> They also tried them x86_64 RPM 9v8 and the results matched the 9v3 > x86_64 RPM results on both platforms. We're switching to -absoft on all > systems now for consistency, but it would be nice if that was available > as a 64 bit build also.
You are of course welcome to do this if you want, but it is not a supported use case. And we certainly have no plans to build additional 64-bit binaries, where lack of SSE functionality is not an issue (since it is built into the x86_64 instruction set).
Ben Webb, Modeller Caretaker
participants (2)
-
Modeller Caretaker
-
Robert Healey