Page 2 of 2
Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)
Posted: Sun Aug 03, 2025 2:22 am
by arisu
enroscado wrote: ↑Thu Jul 31, 2025 12:31 pm
I have a couple of 5090s that I could try this on. However, I am not running fah-client.service (my setup doesn't allow it), but rather as a command on a (Ubuntu) terminal windows that remains open.
Can you please help me figure out a workaround with no service? I'll let it run for a week or so on both 5090s, and compare overall output after a week.
You should be able to do it like this:
Code: Select all
sudo nvidia-smi -c EXCLUSIVE_PROCESS
export CUDA_VISIBLE_DEVICES=$(nvidia-smi --query-gpu=uuid --format=csv,noheader | paste -sd ',')
nvidia-cuda-mps-control -d
fah-client
That's assuming both GPUs are on the same computer. That sets the GPUs into exclusive mode, exports an environmental variable that tells the MPS daemon and the folding client what GPUs it can use, starts the MPS server (which will fork itself to the background), and then starts folding.
To stop folding and MPS, either Control+C on the fah-client process or send it SIGTERM (which will tell it to gracefully shut down), then run this:
Code: Select all
sudo nvidia-smi -c DEFAULT
unset CUDA_VISIBLE_DEVICES
echo quit | nvidia-cuda-mps-control
That will reset all GPUs into default mode, remove the environmental variable, and tell the MPS controller to terminate the MPS daemon.
Re: Using MPS to dramatically increase PPD on big GPUs (Linux guide)
Posted: Sun Aug 03, 2025 2:40 am
by arisu
Nicolas_orleans wrote: ↑Thu Jul 31, 2025 12:56 pm
To my knowledge (from fah-client --help) there is no command line to restart fah-client when not running it as a service.
I would say pause fah-client from webUI (using the finish function to reach "no work" status), follow arisu's steps, and replace the service restart by a pkill of fah-client process, then the usual start with ./fah-client
That's what I plan to do on a 5080 instance when I find time... Pkill is ugly but no better idea.
As calxalot says, both SIGTERM (the pkill default) and SIGINT (the signal that is received if you control+c while the process runs) are safe.
In fact systemd just sends SIGTERM to the process when stopping it (
source). The client detects the SIGTERM and relays the message to the cores, asking them to shut down too. Once the cores shut down and save their work, the client logs the event and the core's progress and then terminates itself. The client doesn't know who is responsible for shutting it down (systemd or pkill) and it doesn't care.
Avoid using SIGKILL unless the client refuses to shut down for many minutes and even after multiple SIGTERMs, because the kill signal will rip out its guts and not give it time to clean up, which can cause loss of progress or even database corruption if you're unlucky.
This is what I do: