Linux SMP v6 compared to Windows SMP client

Moderators: Site Moderators, FAHC Science Team

uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: Linux SMP v6 compared to Windows SMP client

Post by uncle_fungus »

ICC vs GCC will make effectively no difference to the performance of Gromacs, since the the innermost loops of the calculation are pure assembler (with Fortran loops as a fallback).

If you're not convinced, take a look at the comparative numbers in this mailing list entry: http://www.gromacs.org/pipermail/gmx-de ... 01473.html
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux SMP v6 compared to Windows SMP client

Post by bruce »

The hand-coded inner loops in Gromacs make excellent use of the SSE(2) instruction sets, which is where a good compiler might have done some improvements, but not as good as Assembly Language Code. That's quite different than what happens with SMP which involves inter-CPU communications.

Any process consists of segments that can be parallelized and segments that must be serial.

Example: If there are 1000 atoms and you want to find the closest one to a selected atom, you can do 999 distance calculations in parallel and then you have to sort the results (mostly a serial process). If the distance calculations have to account for Einstein's curvature of space, then some distance calculations are going to take longer to compute than others, and you can't finish the sort until the last distance calculation is finished -- so 998 of the parallel processors have to wait while only #999 is busy, reducing utilization. Use all those distance values at Time=t1. Repeat at Time=t2 . . .

The outer loop consists of Time from some T1 to some T2 and then a new WU is generated for T2 to T3. . . for Time = 0 to a very large number.

(Of course Gromacs is doing a lot more than calculating distances and curvature of space isn't really an issue, nor is the explicit sort, but this is a fictitious example.)
shatteredsilicon
Posts: 87
Joined: Tue Jul 08, 2008 2:27 pm
Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers

Re: Linux SMP v6 compared to Windows SMP client

Post by shatteredsilicon »

Fair enough. I generally don't resort to assembler if a compiler can do a reasonable job. GCC didn't, but ICC did, and if it was a choice between hand-optimizing assembler and using ICC, it was a no-brainer. For a start, it means I can still use the exact same code without ugly ifdefs on a different platform (e.g. PPC with XLC). But if innermost loops are already in good vectorized assembly, then I guess there's not much to be gained.

Thanks again for the explanation.
Image
1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux SMP v6 compared to Windows SMP client

Post by bruce »

shatteredsilicon wrote:if innermost loops are already in good vectorized assembly, then I guess there's not much to be gained.
That code is open-sourced at www.gromacs.org. Feel free to suggest improvements, but Eric is one of the best optimizing compilers that I know. 8-) ;)
shatteredsilicon
Posts: 87
Joined: Tue Jul 08, 2008 2:27 pm
Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers

Re: Linux SMP v6 compared to Windows SMP client

Post by shatteredsilicon »

Great, thanks for pointing out the OSS code. I'll go have a dig through that. :)
Image
1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Post Reply