i dont know whether it is suitable to posting at this board as i cant find other related board about the development of core
as discussion raised from this thread (viewtopic.php?f=38&t=31377), i'd like to know would next gen cpu core would be more flexible in SMID support? like single binary for all SMID instructions(SSEs/AVXs)? and more importantly, would new core support AVX2 and/or AVX512?
thanks
avx2 and avx512 support in next core
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: avx2 and avx512 support in next core
[I am not an authority, this is just my silly, wild assed guess]
It is my belief that core_a7 does support AVX2, as well as SSE2. GROMACS 2018 does support the following new CPU enhancements:
Achieved speedup on Intel KNL processors of around 11% for PME spread/gather on typical simulation systems.
In the simple case of leap-frog without pressure coupling and with at most one temperature-coupling group, the update of velocities and coordinates is now implemented with SIMD intrinsics for improved simulation rate.
AMD Ryzen appears to always perform slightly better with OpenMP than MPI, up to using all 16 threads on the 8-core die.
While Ryzen supports 256-bit AVX2, the internal units are organized to execute either a single 256-bit instruction or two 128-bit SIMD instruction per cycle. Since most of our kernels are slightly less efficient for wider SIMD, this improves performance by roughly 10%.
On AMD Zen, tabulated Ewald kernels are always faster than analytical. And with AVX2_256 2xNN kernels are faster than 4xN. These faster choices are now made based on CpuInfo at run time.
The group-scheme kernels can use AVX instructions from either the AVX_128_FMA and AVX_256 extensions. But hardware that supports the new AVX2_128 extensions also supports AVX_256, so we enable such support for the group-scheme kernels.
Recent Intel x86 hardware can have multiple AVX-512 FMA units, and the number of those units and the way their use interacts with the way the CPU chooses its clock speed mean that it can be advantageous to avoid using AVX-512 SIMD support in GROMACS if there is only one such unit. Because there is no way to query the hardware to count the number of such units, we run code at CMake and mdrun time to compare the performance from using such units, and recommend the version that is best. This may mean that building GROMACS on the front-end node of the cluster might not suit the compute nodes, even when they are all from the same generation of Intel’s hardware. - http://manual.gromacs.org/documentation ... mance.html
Notice that some improvements can look at CPUinfo at runtime and make good choices, Easy for F@H. Some have to be fixed at compile time, (CMAKE) so can make bad choices if executed on any other CPU than the one it was compiled on. F@H will not easily take advantage of those optimizations.
I do not believe F@H changes GROMACS version within a core, but I do think they always use the latest stable version for a new core. (I do not know if any new CPU core is far enough along to have locked which version of GROMACS to use)
It is my belief that core_a7 does support AVX2, as well as SSE2. GROMACS 2018 does support the following new CPU enhancements:
Achieved speedup on Intel KNL processors of around 11% for PME spread/gather on typical simulation systems.
In the simple case of leap-frog without pressure coupling and with at most one temperature-coupling group, the update of velocities and coordinates is now implemented with SIMD intrinsics for improved simulation rate.
AMD Ryzen appears to always perform slightly better with OpenMP than MPI, up to using all 16 threads on the 8-core die.
While Ryzen supports 256-bit AVX2, the internal units are organized to execute either a single 256-bit instruction or two 128-bit SIMD instruction per cycle. Since most of our kernels are slightly less efficient for wider SIMD, this improves performance by roughly 10%.
On AMD Zen, tabulated Ewald kernels are always faster than analytical. And with AVX2_256 2xNN kernels are faster than 4xN. These faster choices are now made based on CpuInfo at run time.
The group-scheme kernels can use AVX instructions from either the AVX_128_FMA and AVX_256 extensions. But hardware that supports the new AVX2_128 extensions also supports AVX_256, so we enable such support for the group-scheme kernels.
Recent Intel x86 hardware can have multiple AVX-512 FMA units, and the number of those units and the way their use interacts with the way the CPU chooses its clock speed mean that it can be advantageous to avoid using AVX-512 SIMD support in GROMACS if there is only one such unit. Because there is no way to query the hardware to count the number of such units, we run code at CMake and mdrun time to compare the performance from using such units, and recommend the version that is best. This may mean that building GROMACS on the front-end node of the cluster might not suit the compute nodes, even when they are all from the same generation of Intel’s hardware. - http://manual.gromacs.org/documentation ... mance.html
Notice that some improvements can look at CPUinfo at runtime and make good choices, Easy for F@H. Some have to be fixed at compile time, (CMAKE) so can make bad choices if executed on any other CPU than the one it was compiled on. F@H will not easily take advantage of those optimizations.
I do not believe F@H changes GROMACS version within a core, but I do think they always use the latest stable version for a new core. (I do not know if any new CPU core is far enough along to have locked which version of GROMACS to use)
Last edited by JimboPalmer on Wed Feb 06, 2019 10:51 pm, edited 3 times in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: avx2 and avx512 support in next core
The FAHCore_a7 that I have uses GROMACS, VERSION 5.0.4 which was the stable version at the time A7 went through development. At that time, SSE2 and AVX were not supported in the same version so there are two versions of FAHCore_a7, as selected by the FAHClient. From what I read on gromacs.org, the performance difference between the various AVX versions would yield very similar performance levels, so developing a new FAHCore_a* to incorporate a later stable version of GROMACS would be a new development version with only minor improvements.
At some time in the future, a later version of GROMACS may be incorporated into a new FAHCore, but only when it's worth the extra developmental effort -- as opposed to spending that same developmental effort on some other aspect of FAH.
For additional information, consult http://gromacs.org
At some time in the future, a later version of GROMACS may be incorporated into a new FAHCore, but only when it's worth the extra developmental effort -- as opposed to spending that same developmental effort on some other aspect of FAH.
For additional information, consult http://gromacs.org
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: avx2 and avx512 support in next core
As I remember the technical explanations, the issue is that GROMACS doesn't support well (at least when Core A7 was built) the dynamic selection of optimized code. So the choice was made to hardcode the instruction set at core compilation. Of course, an SSE2 version has been build for older hardware because newer SSEx version doesn't help in FAH and are not always supported on all CPUs.
Also, an AVX core was chosen, because when the decision was made, AVX was supported on all CPUs, but newer iterations were not (or just on a few CPU for AVX2).
If a GROMACS version that is able to dynamically select optimized code without issue is released, I'm pretty sure that a future version of Fahcore will use it ...
Also, an AVX core was chosen, because when the decision was made, AVX was supported on all CPUs, but newer iterations were not (or just on a few CPU for AVX2).
If a GROMACS version that is able to dynamically select optimized code without issue is released, I'm pretty sure that a future version of Fahcore will use it ...
Re: avx2 and avx512 support in next core
Right. Gromacs VERSION 5.0.4 (single precision) could be compiled for SSE2 or for AVX but the AVX version could not down-select to SSE so two versions were compiled and built into two versions of FAHCore_A7. As I remember, it made a significant difference when AVX was added -- but support for more than one AVX version made little difference difference (as I said above).
The original SSE code contained ALC code which would pack two SP data words into a DP register so that the SSE instructions that operated on both halves of the registers could be utilized and then unpacked the results. My guess is that the same logic would be used to use AVX512, whether the packing/unpacking was done by GROMACS code or by AVX* firmware.
The original SSE code contained ALC code which would pack two SP data words into a DP register so that the SSE instructions that operated on both halves of the registers could be utilized and then unpacked the results. My guess is that the same logic would be used to use AVX512, whether the packing/unpacking was done by GROMACS code or by AVX* firmware.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: avx2 and avx512 support in next core
I hope avx2 and avx512 stay disabled if it does not bring a big performance boost as CPUs get very hot with these units and throttle down clock.