CPU folding without losing GPU PPD (a quick guide)

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
arisu
Posts: 315
Joined: Mon Feb 24, 2025 11:11 pm

CPU folding without losing GPU PPD (a quick guide)

Post by arisu »

It's no secret that Nvidia GPUs are notorious for being unhappy if CPU folding is going on at the same time. Even if a core or two is reserved for the GPU's thread on the CPU, Nvidia's folding PPD will drop and the CPU folding PPD won't make up for it. I've solved this issue on Linux and I'm writing down what I did in case anyone else is interested.

I've been doing testing on two RTX 4090 Mobile GPUs, one RTX 3090, and one GTX 770M and I've managed to eliminate this issue, making it possible to fold on an Nvidia GPU and on the CPU without upsetting the GPU. Here are the requirements:
  • All threads must only run on P-cores, never E-cores
  • The initial GPU thread (the one using 100% CPU) must be pinned to a single core
  • The CPU folding threads must be pinned to their own cores
  • The core that the GPU thread is on must have its maximum clock rate limited (*)
  • No folding threads may use the HyperThreading/SMT pair that the GPU thread runs on
  • Enough cores must be left unused that the GPU thread is not thermally throttled (*)
(* if your chipset can sustain its all-core boost indefinitely, these two steps can be skipped)

The most important parts are pinning the threads to cores, avoiding thermal throttling, and not loading the core that is the SMT pair of the one that the GPU feeding thread is running on.

Locking the clock of the core the GPU thread is on is not strictly necessary, but it is important for for finding out how fast it has to be. For an RTX 4090 Mobile, I find that limiting one Intel i9 14900HX P-core to 2.5 GHz is the lowest it can go before the GPU starts being starved (on some projects it can go lower, as low as 2.2 GHz). You'll want to test this out yourself. If in doubt, just lock it to the base clock.

Then when you start up CPU folding, wait 10 seconds for the cores to get up to temperature and then check the core that the GPU thread is running on, and make sure that it is still running at the maximum you gave it. If it is being throttled, then reduce the number of cores for CPU folding by 2 and check again.

If all these requirements are met, you'll achieve GPU folding and CPU folding at the same time without performance loss. For an i9 14900HX with an RTX 4090 Mobile, these are the steps I use to fold on 12 cores, plus GPU:

Code: Select all

# taskset -acp 0-11 $(pgrep FahCore_a)
# taskset -acp 0-15 $(pgrep FahCore_2)
# taskset -cp 15 $(pgrep FahCore_2)
# echo 2500000 > /sys/devices/system/cpu/cpu15/cpufreq/scaling_max_freq
That pins all CPU folding threads to cores 0-11, all GPU threads to 0-15 (to speed up the periodic sanity checks done at each checkpoint), and then switches the first GPU thread (the one that uses 100% CPU and feeds the GPU) to only core 15. It leaves cores 12 and 13 idle to prevent thermal throttling (I should probably leave 0 and 1 idle instead because 0 is the "main" core and is the one the scheduler favors), and it leaves core 14 idle because it is the SMT pair for core 15 (that is, cores 14 and 15 are actually one physical core. Which cores share the same physical core can be found with the command "lscpu --extended"). It also reduces core 15's clock speed to 2.5 GHz.

When I do this, the core running the GPU thread maintains 2.5 GHz and the cores doing CPU folding run at 2.6-2.7 GHz. Compared to only pinning the GPU thread and not folding on the CPU, I lose up to 1% GPU PPD at most (about 100k PPD out of 10M total), but gain more than 300k PPD from the CPU folding. In most cases, I don't lose any noticible PPD at all.

In quick tests, I believe a dedicated Linux folding machine can gain fairly significant PPD by performing a few extra steps that involve isolating certain CPUs from the scheduler and disabling timer interrupts on them.
Post Reply