I'm trying to do some backfilling on a farm machine, just like the friends at CERN are doing. My setup is Scientific Linux 7.7 on x86_64, the machines all have two Xeon CPUs and 6 or 8 Nvidia GPUs of several generations, in this example 6 NVidia Tesla P4. I'm using the latest CUDA 10.2.
Folding@Home isn't installed directly on the OS, but I'm using CERN's Docker container from lukasheinrich/folding:latest/ and I'm running it using singularity 3.5.3 using the --nv option to bind in the NVidia devices and libraries.
The CERN container is Ubuntu 18.04.1 with fahclient 7.5.1 installed.
I'm running everything in a batch system requesting 1 CPU core and 1 GPU. The command line options I've used are:
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=$CUDA_VISIBLE_DEVICES --smp=false
CUDA_VISIBLE_DEVICES is set by the batch system, in this case to 0 for the first of 6 GPUs.
It seems, Folding@home is still trying to fetch jobs for the GPUs it sees but is aware it cannot use:
Code: Select all
08:43:22: Version: 7.5.1
08:43:22: Date: May 11 2018
08:43:22: Time: 19:59:04
08:43:22: Repository: Git
08:43:22: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
08:43:22: Branch: master
08:43:22: Compiler: GNU 6.3.0 20170516
08:43:22: Options: -std=gnu++98 -O3 -funroll-loops
08:43:22: Platform: linux2 4.14.0-3-amd64
08:43:22: Bits: 64
08:43:22: Mode: Release
08:43:22:******************************* System ********************************
08:43:22: CPU: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
08:43:22: CPU ID: GenuineIntel Family 6 Model 85 Stepping 4
08:43:22: CPUs: 64
08:43:22: Memory: 376.20GiB
08:43:22: Free Memory: 181.49GiB
08:43:22: Threads: POSIX_THREADS
08:43:22: OS Version: 3.10
08:43:22: Has Battery: false
08:43:22: On Battery: false
08:43:22: UTC Offset: 2
08:43:22: PID: 245167
08:43:22: CWD: /batch/57329060.1.gpu.q
08:43:22: OS: Linux 3.10.0-1062.7.1.el7.x86_64 x86_64
08:43:22: OS Arch: AMD64
08:43:22: GPUs: 0
08:43:22: CUDA Device 0: Platform:0 Device:0 Bus:59 Slot:0 Compute:6.1 Driver:10.2
08:43:22:OpenCL Device 0: Platform:0 Device:0 Bus:59 Slot:0 Compute:1.2 Driver:440.33
08:43:22:***********************************************************************
08:43:22:<config>
08:43:22: <!-- Folding Slots -->
08:43:22:</config>
08:43:22:Connecting to assign1.foldingathome.org:8080
08:43:22:Updated GPUs.txt
08:43:22:Read GPUs.txt
08:43:22:Trying to access database...
08:43:22:Successfully acquired database lock
08:43:22:FS00:Set client configured
08:43:22:Enabled folding slot 00: READY cpu:1
08:43:22:Enabled folding slot 01: READY gpu:0:GP104GL [Tesla P4]
08:43:22:Enabled folding slot 02: READY gpu:1:GP104GL [Tesla P4]
08:43:22:Enabled folding slot 03: READY gpu:2:GP104GL [Tesla P4]
08:43:22:Enabled folding slot 04: READY gpu:3:GP104GL [Tesla P4]
08:43:22:Enabled folding slot 05: READY gpu:4:GP104GL [Tesla P4]
08:43:22:Enabled folding slot 06: READY gpu:5:GP104GL [Tesla P4]
[91m08:43:22:ERROR:No compute devices matched GPU #1 NVIDIA:5 GP104GL [Tesla P4]. You may need to update your graphics drivers.[0m
[91m08:43:22:ERROR:No compute devices matched GPU #2 NVIDIA:5 GP104GL [Tesla P4]. You may need to update your graphics drivers.[0m
[91m08:43:22:ERROR:No compute devices matched GPU #3 NVIDIA:5 GP104GL [Tesla P4]. You may need to update your graphics drivers.[0m
[91m08:43:22:ERROR:No compute devices matched GPU #4 NVIDIA:5 GP104GL [Tesla P4]. You may need to update your graphics drivers.[0m
[91m08:43:22:ERROR:No compute devices matched GPU #5 NVIDIA:5 GP104GL [Tesla P4]. You may need to update your graphics drivers.[0m
Code: Select all
08:43:25:WU03:FS03:Requesting new work unit for slot 03: READY gpu:2:GP104GL [Tesla P4] from 40.114.52.201
08:43:25:WU03:FS03:Connecting to 40.114.52.201:8080
How can I tell folding to not to try to use the GPUs it cannot use?