Optimizing Thread Allocation on Older Enterprise Hardware (ProLiant DL360 / 14-Core Xeon)
Posted: Wed Apr 22, 2026 3:00 pm
Hi everyone,
I’ve been a long-time folder on my main gaming rig, but I recently decided to shift the heavy lifting to a more permanent setup. I managed to get my hands on some enterprise surplus—a ProLiant DL360 equipped with a 14-core 2GHz Xeon. It’s a different experience compared to the high-clocked consumer chips I'm used to, especially when it comes to the sheer volume of threads available.
I’ve been digging through the forum archives, and one specific point I saw mentioned in a previous technical thread was the "Prime Number" issue—where certain core counts can cause the Gromacs cores to struggle with domain decomposition. With 14 physical cores (or 28 threads with Hyper-Threading), I’m trying to figure out the best way to map this out in the client to avoid any performance penalties.
From my short time testing this server, I’ve noticed that while it's incredibly stable, the lower 2.0GHz clock speed means the Time Per Frame (TPF) can be quite high on larger work units. I’m torn between running one massive SMP slot to utilize the whole CPU or splitting the workload into two smaller slots to ensure that a single "stuck" work unit doesn't stall the entire machine's output. It’s a bit of a learning curve moving from a 5GHz i9 to a server that relies purely on parallelization rather than raw speed.
I’m really focused on contributing as much as possible to the research, but I want to make sure I’m not just spinning the fans and wasting power if the thread count is poorly optimized.
In your experience with these mid-range Xeon builds, do you find it more efficient to let the client handle all cores in one go, or is there a specific "sweet spot" thread count that handles work units more reliably at these lower frequencies?
I’ve been a long-time folder on my main gaming rig, but I recently decided to shift the heavy lifting to a more permanent setup. I managed to get my hands on some enterprise surplus—a ProLiant DL360 equipped with a 14-core 2GHz Xeon. It’s a different experience compared to the high-clocked consumer chips I'm used to, especially when it comes to the sheer volume of threads available.
I’ve been digging through the forum archives, and one specific point I saw mentioned in a previous technical thread was the "Prime Number" issue—where certain core counts can cause the Gromacs cores to struggle with domain decomposition. With 14 physical cores (or 28 threads with Hyper-Threading), I’m trying to figure out the best way to map this out in the client to avoid any performance penalties.
From my short time testing this server, I’ve noticed that while it's incredibly stable, the lower 2.0GHz clock speed means the Time Per Frame (TPF) can be quite high on larger work units. I’m torn between running one massive SMP slot to utilize the whole CPU or splitting the workload into two smaller slots to ensure that a single "stuck" work unit doesn't stall the entire machine's output. It’s a bit of a learning curve moving from a 5GHz i9 to a server that relies purely on parallelization rather than raw speed.
I’m really focused on contributing as much as possible to the research, but I want to make sure I’m not just spinning the fans and wasting power if the thread count is poorly optimized.
In your experience with these mid-range Xeon builds, do you find it more efficient to let the client handle all cores in one go, or is there a specific "sweet spot" thread count that handles work units more reliably at these lower frequencies?