There's a nonlinear increase of power consumption with an increase in CPU frequency. Power will increase at a rate proportional to frequency cubed or more. For reducing the maximum amount of energy used by a single CPU instruction, it's always better to increase the number of CPU cores and decrease their frequency to stay within the package's power limits. In other words one core at 4 GHz and two cores at 2 GHz will perform the same number of computations in the same period of time, but the former will use significantly more power to do so.
But as we all know, GROMACS does not scale infinitely with the number of cores as load balancing and reconciling forces between slices adds more and more overhead. Also, more cores means smaller per-core caches and more cache misses. I don't know how relevant a hot cache is to FAH.
What is the sweet spot? Assuming no budgetary concerns, what is the approximate ideal number of cores for a HEDT/server processor if folding is the only activity it will be doing? Is there any data on this? I'm not sure how much I should trust lar.systems, and the GROMACS user forum is not at particularly representative of the kind of simulations that FAH performs.
Power optimization and number of cores
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
-
- Posts: 1722
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Power optimization and number of cores
No data, we just fold.
16 threads/8cores is a sweet spot. Past that performance increase is minimal, yet with bonus point formula it still shows point increase.
Gromacs craps out at random projects running on 64+threads.
It almost always craps out completely with anything above 96 threads.
CPUs scale better with new generations of products, which have improved FP pipelines. That stuff scales infinitely. Like, Zen5 has FPU which needs to be tweaked by compilers to be utilised properly in Desktops, yet FAH picked it up and said: No worries there, we will utilise this to the end of time
AMD 3d cache is loved by Gromacs. However AMD 9000 series came with tweaked FPU and cache subsystems and that thing blows 3dcache out of the water (or should I say matches and surpasses older 3d cache products).
Still, if there is a chance to buy 3d cache product Vs normal one for FAH, go with 3d cache.
And Linux as OS, since Windows scheduler is pants for FAH.
My 9950x3d on Windows is equal in output to my 9950x in Linux. This shouldn't be the case.
FAH and I suppose Gromacs hate Intel big.shittle architecture. You basically must fold only on P cores, otherwise FAH will pick up the slowest cores and drop P core performance to match them
16 threads/8cores is a sweet spot. Past that performance increase is minimal, yet with bonus point formula it still shows point increase.
Gromacs craps out at random projects running on 64+threads.
It almost always craps out completely with anything above 96 threads.
CPUs scale better with new generations of products, which have improved FP pipelines. That stuff scales infinitely. Like, Zen5 has FPU which needs to be tweaked by compilers to be utilised properly in Desktops, yet FAH picked it up and said: No worries there, we will utilise this to the end of time

AMD 3d cache is loved by Gromacs. However AMD 9000 series came with tweaked FPU and cache subsystems and that thing blows 3dcache out of the water (or should I say matches and surpasses older 3d cache products).
Still, if there is a chance to buy 3d cache product Vs normal one for FAH, go with 3d cache.
And Linux as OS, since Windows scheduler is pants for FAH.
My 9950x3d on Windows is equal in output to my 9950x in Linux. This shouldn't be the case.
FAH and I suppose Gromacs hate Intel big.shittle architecture. You basically must fold only on P cores, otherwise FAH will pick up the slowest cores and drop P core performance to match them
Re: Power optimization and number of cores
It looks like the 9000 series is Ryzen. I'm looking at Threadripper or EPYC for the PCIe lanes. But I suppose any Zen5 has the architecture changes you speak of.
Then I suppose I was right in getting a Threadripper 7960X with 24c/48t. GROMACS won't shit itself and it should still have superior energy efficiency compared to a system with the same TPD but only 16 threads running at a higher frequency. If 48t is too much, I could always disable SMT.
The Intel big.LITTLE really is horrible, although GROMACS could just detect P-cores and only fold on them if it was programmed to. The GROMACS load balancer is not equipped to handle such drastic differences in performance, so it results in the E-cores all running at top speed while P-cores get only brief bursts of use, spending their time idle waiting on the E-cores otherwise.
I've found that the best way to make use of that architecture is to run one WU on the P-cores and one on the E-cores, assuming the system has sufficient cooling capacity to maintain boost rates on all cores, otherwise the heat from the E-cores causes the P-cores to throttle. But if cooling is sufficient, the E-cores can be used as well on a separate WU.
But I don't like Intel anyway. I prefer AMD.
Then I suppose I was right in getting a Threadripper 7960X with 24c/48t. GROMACS won't shit itself and it should still have superior energy efficiency compared to a system with the same TPD but only 16 threads running at a higher frequency. If 48t is too much, I could always disable SMT.
The Intel big.LITTLE really is horrible, although GROMACS could just detect P-cores and only fold on them if it was programmed to. The GROMACS load balancer is not equipped to handle such drastic differences in performance, so it results in the E-cores all running at top speed while P-cores get only brief bursts of use, spending their time idle waiting on the E-cores otherwise.
I've found that the best way to make use of that architecture is to run one WU on the P-cores and one on the E-cores, assuming the system has sufficient cooling capacity to maintain boost rates on all cores, otherwise the heat from the E-cores causes the P-cores to throttle. But if cooling is sufficient, the E-cores can be used as well on a separate WU.
But I don't like Intel anyway. I prefer AMD.
Re: Power optimization and number of cores
In my experience, running a second WU on E-cores with a 6P+8E configuration limited to 80W reduces P-cores frequency by only 300-400 MHz. It is indeed more efficient to utilize more processors, just as you said :^)arisu wrote: ↑Mon May 05, 2025 9:58 am I've found that the best way to make use of that architecture is to run one WU on the P-cores and one on the E-cores, assuming the system has sufficient cooling capacity to maintain boost rates on all cores, otherwise the heat from the E-cores causes the P-cores to throttle. But if cooling is sufficient, the E-cores can be used as well on a separate WU.
-
- Posts: 557
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: Power optimization and number of cores
You might find a below link interesting. This blog is done by folder Paragon.arisu wrote: ↑Mon May 05, 2025 9:58 am It looks like the 9000 series is Ryzen. I'm looking at Threadripper or EPYC for the PCIe lanes. But I suppose any Zen5 has the architecture changes you speak of.
Then I suppose I was right in getting a Threadripper 7960X with 24c/48t. GROMACS won't shit itself and it should still have superior energy efficiency compared to a system with the same TPD but only 16 threads running at a higher frequency. If 48t is too much, I could always disable SMT.
The Intel big.LITTLE really is horrible, although GROMACS could just detect P-cores and only fold on them if it was programmed to. The GROMACS load balancer is not equipped to handle such drastic differences in performance, so it results in the E-cores all running at top speed while P-cores get only brief bursts of use, spending their time idle waiting on the E-cores otherwise.
I've found that the best way to make use of that architecture is to run one WU on the P-cores and one on the E-cores, assuming the system has sufficient cooling capacity to maintain boost rates on all cores, otherwise the heat from the E-cores causes the P-cores to throttle. But if cooling is sufficient, the E-cores can be used as well on a separate WU.
But I don't like Intel anyway. I prefer AMD.
https://greenfoldingathome.com/2 ... ation/
Fold them if you get them!
Re: Power optimization and number of cores
That's a great blog! I've read his post about the 3090 but I forgot about his site. Thank you!