Hi everyone,
I’ve been a long-time folder on my main gaming rig, but I recently decided to shift the heavy lifting to a more permanent setup. I managed to get my hands on some enterprise surplus—a ProLiant DL360 equipped with a 14-core 2GHz Xeon. It’s a different experience compared to the high-clocked consumer chips I'm used to, especially when it comes to the sheer volume of threads available.
I’ve been digging through the forum archives, and one specific point I saw mentioned in a previous technical thread was the "Prime Number" issue—where certain core counts can cause the Gromacs cores to struggle with domain decomposition. With 14 physical cores (or 28 threads with Hyper-Threading), I’m trying to figure out the best way to map this out in the client to avoid any performance penalties.
From my short time testing this server, I’ve noticed that while it's incredibly stable, the lower 2.0GHz clock speed means the Time Per Frame (TPF) can be quite high on larger work units. I’m torn between running one massive SMP slot to utilize the whole CPU or splitting the workload into two smaller slots to ensure that a single "stuck" work unit doesn't stall the entire machine's output. It’s a bit of a learning curve moving from a 5GHz i9 to a server that relies purely on parallelization rather than raw speed.
I’m really focused on contributing as much as possible to the research, but I want to make sure I’m not just spinning the fans and wasting power if the thread count is poorly optimized.
In your experience with these mid-range Xeon builds, do you find it more efficient to let the client handle all cores in one go, or is there a specific "sweet spot" thread count that handles work units more reliably at these lower frequencies?
Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
Moderators: Site Moderators, FAHC Science Team
-
wevaba6950
- Posts: 1
- Joined: Wed Apr 22, 2026 2:33 pm
Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
Last edited by Joe_H on Wed Apr 22, 2026 4:13 pm, edited 1 time in total.
Reason: Removed external link
Reason: Removed external link
-
Joe_H
- Site Admin
- Posts: 8362
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
I have approved your post even though you have multiple spam reports on the chance you are genuinely interested in folding and not just advertising the seller of used servers at the link I have removed.
Some of the info you dug up was valid for older CPU folding cores, multiples of "large" primes could cause issues. That is no longer the case with the CPU folding cores currently in use though there can be efficiency peaks depending on the number of CPU threads and which specific projects are being processed.
With the 14-core Xeon you have at least a processor that is modern enough to support AVX and AVX2, that gives a boost to processing speed. The Xeon also supports HT resulting in up to 28 threads being available. However using the extra threads available through HT may not increase WU processing throughput as much, it does increase power usage and can result in thermal throttling. So I would recommend starting with a thread count of 14 and doing some experimentation with different counts to see what gives the best results for you.
Some of the info you dug up was valid for older CPU folding cores, multiples of "large" primes could cause issues. That is no longer the case with the CPU folding cores currently in use though there can be efficiency peaks depending on the number of CPU threads and which specific projects are being processed.
With the 14-core Xeon you have at least a processor that is modern enough to support AVX and AVX2, that gives a boost to processing speed. The Xeon also supports HT resulting in up to 28 threads being available. However using the extra threads available through HT may not increase WU processing throughput as much, it does increase power usage and can result in thermal throttling. So I would recommend starting with a thread count of 14 and doing some experimentation with different counts to see what gives the best results for you.
-
appepi
- Posts: 154
- Joined: Wed Mar 18, 2020 2:55 pm
- Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3) HP Z4G4 (3) ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3) Dell GTX 1080 NVIDIA P1000 (2) K1200
- Location: Sydney Australia
Re: Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
So as not to bury the lead, the optimal allocation of FaH threads for these Xeons is zero, unless your electricity is free.14-core 2GHz Xeon
I assume that if we are talking about an an older generation DL360 then the 14 core Xeon at issue is something like a Xeon E5-2680 V4 or a Xeon E5-2690 V4. According to the LAR Systems CPU_PPD database, the former produced an average 9,569 PPD per Logical Processor, and the latter 8,993 PPD per Logical Professor in the CPU folding data submitted to LAR. Notionally this might imply something like 28 times that if all LPs were deployable at the same average efficiency, or about 250K PPD. They have TDP's of 120W and 135W respectively.
132 XEON CPU E5-2680 V4 @ 2.40GHZ......28 9,569 267,932 Intel
143 XEON CPU E5-2690 V4 @ 2.60GHZ......28 8,993 251,804 Intel
For comparison I show corresponding LAR data for my (locally) top-of-the line CPUs from each generation of HP Z-series workstations that I have bought on Ebay since 2016. they have TDPs of 140W, 140W, and 130Wx2=260W respectively.
176 XEON W-2145 CPU @ 3.70GHZ..........16 13,197 211,152 Intel (in a Z4G4 with GTX 1070)
253 XEON CPU E5-1650 V3 @ 3.50GHZ......12 11,244 134,928 Intel (in a Z440 with RTX 2060)
317 XEON CPU X5690 @ 3.47GHZ............24 3,938 94,512 ... Intel (in a Z800 with K620)
For what it may be worth, whenever I fold with these CPUs - which is rare these days -I eliminate hyperthreading (only for FaH) with the HP Performance Advisor, where I use the "Manage CPU Affinity" Option to Allocate the relevant FaHCore (eg FahCore_A8.exe on this Z600 I am using at the moment) to every second Thread to do the equivalent of eliminating hyperthreading but only for FaH. For example, on the Z800 with 2x X5690, what happens is that the 12 allocated threads will all run at 100% but all 24 are still available for higher priority work, and (assuming that other work is not too demanding) the Core Maximum for each CPU will stay around 75-80 deg C with stock cooling. I don't have to tell FAH anything in particular about the number of cores since it thinks there are 24 but it only ever finds that (at most) 12 are available to it. I choose this core allocation purely for cooling.
I leave it as an exercise for the reader to verify that, using the LAR estimate of (USD $0.10) per kWh, the costs per Million PPD for the CPU's in the table above are respectively (USD) $1.07, $1.29, $1.59, $2.49 and $6.63, since the Excel table formats very badly. Note that this is only true iunder the LAR calculation in which all cores and threads are producing PPD at the average rate found in LAR data where some core limitations may have been made by those submitting date (eg by me).
I have myself confirmed the reported results for LAR Systems GPU "Kilowatt efficiency" for those GPU's I actually use, namely RTX 2060 (TU106), GTX 1080 (9943), GTX 1070 (6463), GTX 1060 6GB (4372), Quadro P1000 (GP107GL), Quadro K1200 (GM107GL), Quadro K620 (GM107GL). On the same basis as with the CPUs, the USD per MPPD are $0.20, $0.27, $0.28, $0.35, $0.51, $1.22 and $2.57 respectively.
In other words, a Quado P1000 at about USD $50 can be popped into almost any box and will turn out PPDs at 80% of the rate of the 14C 28T Xeons in the DL360 at half the cost per Million PPD (USD $0.51 vs $1.07). Thus the financially optimal thread allocation to these CPU's is zero.
CAVEATS: This ignores the overall cost of the supporting box in both cases. It also ignores the issue of "time of day" rates for electricity, which (for example) divide my own day into three periods, in which the peak rate is much higher, adding an incentive for hardware that can deal with WU's in chunks small enough (say 4-5 hours) to allow completion while limiting over-runs from the "off peak" rates into the "shoulder" rates to a minimum and running into "peak" rates can be avoided altogether. That is pretty much impossible for all the CPUs mentioned, and for GPU's other than my GTX and RTX ones, or (for those folders with enough spare cash) better.
This relative cost may explain why I only fold with GPUs on these devices, except once a month or so when I have them running for other reasons (eg updating, proof of life, discouraging spiders) because they are horribly inefficient in producing PPD relative to my GPUs when measured by electricity consumption and its cost (unless maybe one is using personal solar and has an available excess). This is (of course) true of CPUs in general.
In other threads in this forum I see that CPU jobs are said to be those simulations that simply cannot be done on GPUs, in which case it seems to me that GPU WUs and CPU WUs can't be put on a common "points" scale any more, and the current scale acts as a strong disincentive to running CPUs and creates a difference in support between the kinds of research that need one type of simulation rather than another. Assuming that they should be seen to have equal value, this can hardly be a good thing in principle, and though of course the "points" are actually worth zip to most users, human behaviour suggests otherwise.
-
Joe_H
- Site Admin
- Posts: 8362
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
I looked up the server, the Xeon that matches his description of 14 cores and 2 GHz speed would be a Xeon® Gold 5117 Processor, a later generation than what you looked up. It also matched with the link I removed.
-
appepi
- Posts: 154
- Joined: Wed Mar 18, 2020 2:55 pm
- Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3) HP Z4G4 (3) ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3) Dell GTX 1080 NVIDIA P1000 (2) K1200
- Location: Sydney Australia
Re: Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
Unfortunately the LAR CPU database doesnt have a 5117, but it has a 5120 that seems to be much the same but a tiny bit better when I compareXeon® Gold 5117
https://www.intel.com/content/www/us/en ... tions.html with
https://www.intel.com/content/www/us/en ... tions.html
171 XEON GOLD 5120 CPU @ 2.20GHZ 28 7,821 218,988 Intel
The PPD is a bit lower than the Xeons I had used, and not much better than my 8C 16T W-2145 but the TDP is much lower, so ....
It has a TDP of 105W and repeating the USD calculation yields $1.15 per MPPD, so the conclusion is the same.