I was a bit too brief with my explanation. It's still fundamentally correct, but let me clarify what I really meant.
TheGlassman wrote:GROMACS uses 99% floating point operations, which are identical in 32-bit code and 64-bit code.
Really?!?! I thought all this time (4 years?)Gromacs was SSE. I do know it slows to a crawl if SSE is disabled, intentionally or unintentionally. Of course SMP always uses SSE even if the client reports disabled. (no slowdown)
When I said floating point, I was including SSE and/or SSE2 although technically they're different. SSE and SSE2 instructions are identical in 32-bit and 64-bit code, too. SSE accelerates single precision floating point and SSE2 accelerates double precision floating point by performing multiple instructions in parallel, but the data is still stored as floating point data, not as 32-bit or 64-bit integers, and the actual arithmetic results end up being the same. (I apologize for my sloppy language.)
Updated release of StressCPU - we had a new cluster to burn-in!
Version 2.0 now supports both ia32 (32bit) as well as x86-64/em64t (64bit) platforms. It is multithreaded (both pthreads and win32 threads) by default and will automatically sense the number of CPUs on Linux, Mac OS X, and windows. It runs slightly hotter, in particular for x86-64 systems, the checks are better, and you can now set it for a fixed excution time, e.g. 12 hours. The package includes pre-compiled binaries for Windows, 32 and 64 bit Linux, and 32 as well as 64 bit OS X.
Don't know if this is the same code as being used in SMP by F@H, but Gromacs is working in that direction.
Good point. I don't know what StressCPU is doing or why.
A change from 32-bit integers to 64-bit integers would not make any measurable difference but would require the support of twice as many versions of FAH's software.
One actually, and the current Linux64 SMP would be replaced. Have you looked at the download page lately? If the code isn't ready you are of course correct.
Also a good point.
Since the Linux/MacOS SMP client must be 64-bit, only one version is required, and it might as well be 64-bit. I'm not sure what sort of incompatibilities that might bring to the servers. What are the implications of a trajectory that contains one Gen run on 32-bit windows, the next Gen on 64-bit Linux, and the next Gen on something else? Would the WUs need to be different? Would there need to be a step that converts the length of the integers going in and going out? It's worth asking the Pande Group these questions.
I suspect they are already running 2 32bit integers at once through the SSE units. I find it hard to believe that the wider and extra 64 bit registers wouldn't help quite a bit in speed, and the Pande group has no problem supporting a new core if they think there is a benefit.
Technically, the parallel integer operations are processed with Multimedia instructions (aka - MMX) which all modern processors have, but that's not the point.
Let's assume that my guess of 99% floating point is correct (I really do not know). If we have one integer operation followed by 99 FP operations or 25 SSE operations or 50 SSE2 operations which is then followed by the next integer operation, there is absolutely nothing to be gained by delaying one integer operation so it can be done simultaneously with the next one. The slowest of APUs can finish the first operation while the floating point hardware is busy with the other 99/50/25 operations. As 7im has reported, there is a small gain, so the real code isn't as bad as my example, but I think you get the point.
Anyway Bruce thanks, for your time. It has been enjoyable and informative as always.
NP.