Linux SMP v6 compared to Windows SMP client

Post by **uncle_fungus** » Tue Jul 08, 2008 8:20 pm

ICC vs GCC will make effectively no difference to the performance of Gromacs, since the the innermost loops of the calculation are pure assembler (with Fortran loops as a fallback).

If you're not convinced, take a look at the comparative numbers in this mailing list entry: http://www.gromacs.org/pipermail/gmx-de ... 01473.html

Post by **bruce** » Tue Jul 08, 2008 8:32 pm

The hand-coded inner loops in Gromacs make excellent use of the SSE(2) instruction sets, which is where a good compiler might have done some improvements, but not as good as Assembly Language Code. That's quite different than what happens with SMP which involves inter-CPU communications.

Any process consists of segments that can be parallelized and segments that must be serial.

Example: If there are 1000 atoms and you want to find the closest one to a selected atom, you can do 999 distance calculations in parallel and then you have to sort the results (mostly a serial process). If the distance calculations have to account for Einstein's curvature of space, then some distance calculations are going to take longer to compute than others, and you can't finish the sort until the last distance calculation is finished -- so 998 of the parallel processors have to wait while only #999 is busy, reducing utilization. Use all those distance values at Time=t1. Repeat at Time=t2 . . .

The outer loop consists of Time from some T1 to some T2 and then a new WU is generated for T2 to T3. . . for Time = 0 to a very large number.

(Of course Gromacs is doing a lot more than calculating distances and curvature of space isn't really an issue, nor is the explicit sort, but this is a fictitious example.)

shatteredsilicon · Post by **shatteredsilicon** » Tue Jul 08, 2008 8:37 pm

Fair enough. I generally don't resort to assembler if a compiler can do a reasonable job. GCC didn't, but ICC did, and if it was a choice between hand-optimizing assembler and using ICC, it was a no-brainer. For a start, it means I can still use the exact same code without ugly ifdefs on a different platform (e.g. PPC with XLC). But if innermost loops are already in good vectorized assembly, then I guess there's not much to be gained.

Thanks again for the explanation.

Post by **bruce** » Tue Jul 08, 2008 8:46 pm

shatteredsilicon wrote:if innermost loops are already in good vectorized assembly, then I guess there's not much to be gained.

That code is open-sourced at www.gromacs.org. Feel free to suggest improvements, but Eric is one of the best optimizing compilers that I know.

shatteredsilicon · Post by **shatteredsilicon** » Tue Jul 08, 2008 9:00 pm

Great, thanks for pointing out the OSS code. I'll go have a dig through that.

Folding Forum

Linux SMP v6 compared to Windows SMP client

Re: Linux SMP v6 compared to Windows SMP client

Re: Linux SMP v6 compared to Windows SMP client

Re: Linux SMP v6 compared to Windows SMP client

Re: Linux SMP v6 compared to Windows SMP client

Re: Linux SMP v6 compared to Windows SMP client