Using MPS to dramatically increase PPD on big GPUs (Linux guide)

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

arisu
Posts: 586
Joined: Mon Feb 24, 2025 11:11 pm

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by arisu »

enroscado wrote: Thu Jul 31, 2025 12:31 pm I have a couple of 5090s that I could try this on. However, I am not running fah-client.service (my setup doesn't allow it), but rather as a command on a (Ubuntu) terminal windows that remains open.

Can you please help me figure out a workaround with no service? I'll let it run for a week or so on both 5090s, and compare overall output after a week.
You should be able to do it like this:

Code: Select all

sudo nvidia-smi -c EXCLUSIVE_PROCESS
export CUDA_VISIBLE_DEVICES=$(nvidia-smi --query-gpu=uuid --format=csv,noheader | paste -sd ',')
nvidia-cuda-mps-control -d
fah-client
That's assuming both GPUs are on the same computer. That sets the GPUs into exclusive mode, exports an environmental variable that tells the MPS daemon and the folding client what GPUs it can use, starts the MPS server (which will fork itself to the background), and then starts folding.

To stop folding and MPS, either Control+C on the fah-client process or send it SIGTERM (which will tell it to gracefully shut down), then run this:

Code: Select all

sudo nvidia-smi -c DEFAULT
unset CUDA_VISIBLE_DEVICES
echo quit | nvidia-cuda-mps-control
That will reset all GPUs into default mode, remove the environmental variable, and tell the MPS controller to terminate the MPS daemon.
arisu
Posts: 586
Joined: Mon Feb 24, 2025 11:11 pm

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by arisu »

Nicolas_orleans wrote: Thu Jul 31, 2025 12:56 pm To my knowledge (from fah-client --help) there is no command line to restart fah-client when not running it as a service.
I would say pause fah-client from webUI (using the finish function to reach "no work" status), follow arisu's steps, and replace the service restart by a pkill of fah-client process, then the usual start with ./fah-client
That's what I plan to do on a 5080 instance when I find time... Pkill is ugly but no better idea.
As calxalot says, both SIGTERM (the pkill default) and SIGINT (the signal that is received if you control+c while the process runs) are safe.

In fact systemd just sends SIGTERM to the process when stopping it (source). The client detects the SIGTERM and relays the message to the cores, asking them to shut down too. Once the cores shut down and save their work, the client logs the event and the core's progress and then terminates itself. The client doesn't know who is responsible for shutting it down (systemd or pkill) and it doesn't care.

Avoid using SIGKILL unless the client refuses to shut down for many minutes and even after multiple SIGTERMs, because the kill signal will rip out its guts and not give it time to clean up, which can cause loss of progress or even database corruption if you're unlucky.

This is what I do:

Code: Select all

pkill -x fah-client
umfaddi
Posts: 8
Joined: Fri Aug 19, 2016 10:49 am

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by umfaddi »

When using a two-way GPU setup what is the best way of adding a new resource group?

Default Group with both GPUs enabled and...
another GPU resource group with both GPUs enabled and a total of 2 groups...
...OR:
one resource group with GPU no. 1 enabled
one resource group with GPU no. 2 enabled with a total of 3 groups.

Does it even matter which option I go for?
muziqaz
Posts: 2063
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by muziqaz »

As many resource groups as you want to split both GPUs to.
2 GPUs split in half, you need 4 resource groups, 2 of them selected same one GPU, other two second same GPU
FAH Omega tester
Image
Albuquerquefx
Posts: 9
Joined: Wed Oct 01, 2025 3:05 am
Hardware configuration: AMD 9800X3D + 5090, Windows 11
AMD 5950X + 4070 Super, Fedora 42 VM on Proxmox 8.3
Intel i7-3930k + 4070 Super + 4090, Fedora 42

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by Albuquerquefx »

I created an account here on the forums specifically to say "Thank you!" to @arisu. If I may, here are some additional findings...

First, the box I'm running:
  • Intel DX79Si "Siler" motherboard, PCIe 3.0 was in "beta" at this time and was only supported with very specific steppings of compatible processors
  • Intel i7-3930k C2-stepping CPU, which supported PCIe 3 and also VT-* instructions
  • Eight 4GB DDR3/1600 sticks of ram in quad channel 1T CL8 1600MT config.
  • Asus TUF Gaming 4090 at 350W power limit, and +180 GPU / -1000 memory offsets, plugged into a PCIe 3.0 x16 slot.
  • MSI Ventus 4070 Super at 200W power limit, and +90 GPU / -1000 memory offsets, plugged into a PCIe 3.0 x8 slot.
  • Fedora 42 running NVIDIA 580.82 drivers, with coolbits enabled, on Xorg desktop
Without MPS, the 4090 will reliably generate somewhere between 20-22MPPD, and the 4070 Super about 9-11MPPD, obviously depending on WU distribution. With MPS enabled for two sessions each, the 4090 delivers between 22-30MPPD in the aggregate, whereas the 4070 Super seems to choke down about 8-12MPPD in the aggregate. Without any context other than these scores, you would imagine the 4070 Super is simply ill-equipped for use with MPS.

However, digging further into the situation shows @arisu's concerns about the disconnect between artificial scoring (PPD!) versus the actual science output are well and truly the problem. When digging into the actual WU output, both cards increased their output by significant double-digits; the 4090 saw an almost 55% increase in WU throughput, the 4070S saw a 35% increase. Exactly as @arisu expected, it seems the total wall-clock time necessary to compute a single WU artificially constrains the overall far-increased performance of the card and its ability to crank out more than 1 WU at a time.

I did test three MPS instances with the 4090 (with 16,384 CUDA cores, it seemed like ~5460 CUDA cores would be more than enough to chew through a lot of work) and while the total WU output increased even further to about 65%, the scores dropped dramatically to the 12-15MPPD range. Again, almost a two-thirds increase in total output, but a whopping 30% decrease in allocated points seems a little too much.

I've since left the 4090 split into two MPS sessions, and have left MPS configured for the 4070S but for now only have one resource group assigned to it. A week ago, that one box would average about 33MPPD, it now averages right around 40MPPD. This is also on a motherboard and processor from literally 14 years ago (!). I have another 4070 Super with the same specs and config running on an ASRock B550 + 5950X rig and, again using the same exact everything, belts out 11-12MPPD. I have another ASRock B550 + Ryzen 5500 coming in the mail today and I'll get that i7-3930k put out to pasture shortly. I should find another ~10% or better performance hiding in there...
Last edited by Albuquerquefx on Wed Oct 01, 2025 6:42 pm, edited 1 time in total.
Image
foldinghomealone
Posts: 146
Joined: Wed Feb 01, 2017 7:07 pm
Hardware configuration: 5900x + 5080

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by foldinghomealone »

Great summary, thanks guys.
I wish I had the Linux knowledge to set this up.
I'm expecting a 5080 the next days and would like to run some tests.

What about power draw in MPS "mode"?
(edit: can be seen in the first article, seems to be a bit more, only)
Image
Albuquerquefx
Posts: 9
Joined: Wed Oct 01, 2025 3:05 am
Hardware configuration: AMD 9800X3D + 5090, Windows 11
AMD 5950X + 4070 Super, Fedora 42 VM on Proxmox 8.3
Intel i7-3930k + 4070 Super + 4090, Fedora 42

Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)

Post by Albuquerquefx »

foldinghomealone wrote: Mon Oct 06, 2025 12:23 amI wish I had the Linux knowledge to set this up.
Which Linux distro are you running / are you planning to run?
foldinghomealone wrote: Mon Oct 06, 2025 12:23 amWhat about power draw in MPS "mode"?
The answer is "it depends" but you should expect a few more watts...

Your CPU will have additional core(s) in a spinlock state, servicing the extra F@H thread(s). Depending on your CPU, this is likely to only be a small handful of watts (less than double-digits). Main system memory will also be that bit of extra busy, which could probably add another watt or three of extra consumption to the wall socket.

As you're already aware, the GPU power consumption varies significantly based on WU. The question to ask first: is the GPU already power limited before enabling MPS? Because this is a 24/7 folding rig, and is also bundled with a 4070 Super, I had both cards power limited down a bit because the WU/watt (or PPD/watt) efficiency is significantly better at lower power levels. With only one CUDA worker thread, my 4090 would typically sit at my 350W power limit all the time. With MPS enabled, I only noticed a tiny bit of power difference, mostly because the card could remain loaded with one WU while another one finished uploading / downloading / preparing to start.

All in all, I wager the addition of MPS probably added another ~25 or so watt-hours of consumption to a rig that's already chewing through nearly 625Whr when the monitor is off.
Image
Post Reply