Running FAH 8.x in slurm jobs

Moderators: Site Moderators, FAHC Science Team

Post Reply
Tom_Servo
Posts: 1
Joined: Wed Apr 29, 2026 2:10 pm

Running FAH 8.x in slurm jobs

Post by Tom_Servo »

I am an HPC sysadmin at a University, and often run FAH 7.x in slurm jobs for both testing and for making use of idle nodes. To do this, a slurm job launches a container with FAH 7 installed, and runs FAHClient with the options "--exit-when-done --max-units 1 --max-queue 1"

These options don't exist in 8.x, it would be great if 8.x could have some additional capabilities to limit WU so a slurm job could have a finite amount to do in X time.
calxalot
Site Moderator
Posts: 1840
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Running FAH 8.x in slurm jobs

Post by calxalot »

Relevant enhancement request: https://github.com/FoldingAtHome/fah-cl ... issues/251

See also https://github.com/JWhyFR/fah-v8

I think JWhy could help you with v8.
Meanwhile, there is nothing wrong with using v7.
JWhy
Posts: 40
Joined: Thu Nov 29, 2007 9:42 pm

Re: Running FAH 8.x in slurm jobs

Post by JWhy »

I can try to help but I am no expert ... and I didn't even know v7 had these options ;)

Code: Select all

exit-when-done <boolean=false>
    Exit when all slots are paused.

max-queue <integer=16>
    Maximum units per slot in the work queue.

max-units <integer=0>
    Process at most this number of units, then pause.
Since none of this exists in V8, I think we could create a startup script (adapted from firedfly’s or mine) that would:
- install python3 and the other prerequisites
- install lufah
- download F@H V8
- configure (with account token + other parameters : gpu only ? cpu only ? ) and launch F@H
- wait a few moments, then check with lufah to see if a work unit has been downloaded and is being processed
- if so, send a "finish" command via lufah
- then wait until ( = check with lufah in a loop, every X minutes, ) the status is "paused" and there are no "units" in stock
- check if the WU has been credited (with lufah history, probably)
- and when all this is ok and if it's necessary to explicitly stop fahclient: kill the fahclient process !

NB : a few things to adjust if you're doing calculations on a multi-GPU setup

Let us know if you think this could work with your setup.
Last edited by JWhy on Thu Apr 30, 2026 10:48 am, edited 1 time in total.
calxalot
Site Moderator
Posts: 1840
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Running FAH 8.x in slurm jobs

Post by calxalot »

Although it is not 100% reliable, there is also

Code: Select all

lufah wait-until-paused
which I sometimes use after sending finish.
One could stop the client job after it becomes paused. Or kill -TERM if not using systemd.
calxalot
Site Moderator
Posts: 1840
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Running FAH 8.x in slurm jobs

Post by calxalot »

@JWhy
I should point out lufah error messages may have changed from what you have in fah-watchdog.sh
Or you should expect such in next version.
JWhy
Posts: 40
Joined: Thu Nov 29, 2007 9:42 pm

Re: Running FAH 8.x in slurm jobs

Post by JWhy »

Thanks for the heads up !
Post Reply