Page 1 of 1

Unable to run NVIDIA GPU with driver 535 [Solved]

Posted: Thu Aug 10, 2023 5:00 pm
by jloflin
I just installed FAH V8.1.18 this morning because V7.6.1 would not use my gpu. The cpu was found and started folding immediately. My gpu, Nvidia gtx2060 was not found on the setup page. When I dropped down to the Nvidia drive 525, everything works. Here is my log from my attempt with the 535 driver. If anyone can point out why the gpu wasn't seen, I would appreciate it.

I am running Linux Mint V21.2.

Code: Select all

16:06:31:I1:*********************** Folding@home Client ***********************
16:06:31:I1: Version: 8.1.18
16:06:31:I1: Author: Joseph Coffland 
16:06:31:I1: Org: foldingathome.org
16:06:31:I1: Copyright: 2023 foldingathome.org
16:06:31:I1: Homepage: https://foldingathome.org/
16:06:31:I1: License: https://www.gnu.org/licenses/gpl-3.0.txt
16:06:31:I1: Date: Apr 18 2023
16:06:31:I1: Time: 12:09:09
16:06:31:I1: Revision: 80a3d5eb8f60f7833de2954087682958b511895c
16:06:31:I1: Branch: master
16:06:31:I1: Compiler: GNU 10.2.1 20210110
16:06:31:I1: Options: -faligned-new -std=c++17 -fsigned-char -ffunction-sections
16:06:31:I1: -fdata-sections -O3 -funroll-loops -fno-pie
16:06:31:I1: Platform: linux 5.10.0-16-cloud-amd64
16:06:31:I1: Bits: 64
16:06:31:I1: Mode: Release
16:06:31:I1: Args: --log=/var/log/fah-client/log.txt
16:06:31:I1: --log-rotate-dir=/var/log/fah-client/
16:06:31:I1:****************************** CBang ******************************
16:06:31:I1: Version: 1.7.2
16:06:31:I1: Author: Joseph Coffland 
16:06:31:I1: Org: Cauldron Development LLC
16:06:31:I1: Copyright: Cauldron Development LLC, 2003-2023
16:06:31:I1: Homepage: https://cauldrondevelopment.com/
16:06:31:I1: License: GPL 2+
16:06:31:I1: Date: Apr 14 2023
16:06:31:I1: Time: 16:26:30
16:06:31:I1: Revision: ac8bbdd5bb93c01679a881f5962fed800bf29e58
16:06:31:I1: Branch: master
16:06:31:I1: Compiler: GNU 10.2.1 20210110
16:06:31:I1: Options: -faligned-new -std=c++17 -fsigned-char -ffunction-sections
16:06:31:I1: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
16:06:31:I1: Platform: linux 5.10.0-16-cloud-amd64
16:06:31:I1: Bits: 64
16:06:31:I1: Mode: Release
16:06:31:I1:***************************** System ******************************
16:06:31:I1: CPU: AMD Ryzen 9 3900X 12-Core Processor
16:06:31:I1: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:06:31:I1: CPUs: 24
16:06:31:I1: Memory: 31.27GiB
16:06:31:I1:Free Memory: 30.70GiB
16:06:31:I1: Threads: POSIX_THREADS
16:06:31:I1: OS Version: 5.15
16:06:31:I1:Has Battery: false
16:06:31:I1: On Battery: false
16:06:31:I1: UTC Offset: -6
16:06:31:I1: PID: 980
16:06:31:I1: CWD: /var/lib/fah-client
16:06:31:I1: Exec: /usr/bin/fah-client
16:06:31:I1:*******************************************************************
16:06:31:I2:
16:06:31:I1:Opening Database
16:06:31:I1:Listening for HTTP on 127.0.0.1:7396
16:06:31:I3:id = 1+sDgcOpZoS9yXJUQR5PI1od3HBjODohpyeXUQ6qa4E=
16:06:31:I3:Loading work unit 1 to group '' with ID 6Nv0fCGb_TS-ToxwvBY_HvlW1iRv0EP_s2ozw2_vOk8
16:06:31:I3:Loaded 1 wus.
16:06:31:E :Exception: clGetPlatformIDs() returned -1001
16:06:31:E :Exception: cuInit() returned 100
16:06:31:I3:gpus = {
16:06:31:I3: "gpu:09:00:00": {"vendor": 4318, "device": 7944, "type": "nvidia", "supported": false}
16:06:31:I3:}
16:06:31:I1:Loaded cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8
16:06:31:I3::WU1:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir 6Nv0fCGb_TS-ToxwvBY_HvlW1iRv0EP_s2ozw2_vOk8 -suffix 01 -version 8.1.18 -lifeline 980 -np 23
16:06:31:I3::WU1:Started FahCore on PID 1047
16:06:32:I1::WU1:*********************** Log Started 2023-08-10T16:06:31Z ***********************
16:06:32:I1::WU1:************************** Gromacs Folding@home Core ***************************
16:06:32:I1::WU1: Core: Gromacs
16:06:32:I1::WU1: Type: 0xa8
16:06:32:I1::WU1: Version: 0.0.12
16:06:32:I1::WU1: Author: Joseph Coffland 
16:06:32:I1::WU1: Copyright: 2020 foldingathome.org
16:06:32:I1::WU1: Homepage: https://foldingathome.org/
16:06:32:I1::WU1: Date: Jan 16 2021
16:06:32:I1::WU1: Time: 19:24:44
16:06:32:I1::WU1: Compiler: GNU 8.3.0
16:06:32:I1::WU1: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
16:06:32:I1::WU1: -fdata-sections -O3 -funroll-loops -fno-pie
16:06:32:I1::WU1: Platform: linux2 4.15.0-128-generic
16:06:32:I1::WU1: Bits: 64
16:06:32:I1::WU1: Mode: Release
16:06:32:I1::WU1: SIMD: avx2_256
16:06:32:I1::WU1: OpenMP: ON
16:06:32:I1::WU1: CUDA: OFF
16:06:32:I1::WU1: Args: -dir 6Nv0fCGb_TS-ToxwvBY_HvlW1iRv0EP_s2ozw2_vOk8 -suffix 01
16:06:32:I1::WU1: -version 8.1.18 -lifeline 980 -np 23
16:06:32:I1::WU1:************************************ libFAH ************************************
16:06:32:I1::WU1: Date: Jan 16 2021
16:06:32:I1::WU1: Time: 19:21:38
16:06:32:I1::WU1: Compiler: GNU 8.3.0
16:06:32:I1::WU1: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
16:06:32:I1::WU1: -fdata-sections -O3 -funroll-loops -fno-pie
16:06:32:I1::WU1: Platform: linux2 4.15.0-128-generic
16:06:32:I1::WU1: Bits: 64
16:06:32:I1::WU1: Mode: Release
16:06:32:I1::WU1:************************************ CBang *************************************
16:06:32:I1::WU1: Date: Jan 16 2021
16:06:32:I1::WU1: Time: 19:21:24
16:06:32:I1::WU1: Compiler: GNU 8.3.0
16:06:32:I1::WU1: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
16:06:32:I1::WU1: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
16:06:32:I1::WU1: Platform: linux2 4.15.0-128-generic
16:06:32:I1::WU1: Bits: 64
16:06:32:I1::WU1: Mode: Release
16:06:32:I1::WU1:************************************ System ************************************
16:06:32:I1::WU1: CPU: AMD Ryzen 9 3900X 12-Core Processor
16:06:32:I1::WU1: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:06:32:I1::WU1: CPUs: 24
16:06:32:I1::WU1: Memory: 31.27GiB
16:06:32:I1::WU1:Free Memory: 30.63GiB
16:06:32:I1::WU1: Threads: POSIX_THREADS
16:06:32:I1::WU1: OS Version: 5.15
16:06:32:I1::WU1:Has Battery: false
16:06:32:I1::WU1: On Battery: false
16:06:32:I1::WU1: UTC Offset: -6
16:06:32:I1::WU1: PID: 1047
16:06:32:I1::WU1: CWD: /var/lib/fah-client/work
16:06:32:I1::WU1:******************************************************************************** 

Re: Unable to run NVIDIA GPU with driver 535

Posted: Fri Aug 11, 2023 9:12 pm
by toTOW
16:06:31:E :Exception: clGetPlatformIDs() returned -1001
16:06:31:E :Exception: cuInit() returned 100
These errors means that the client can't open the OpenCL and CUDA libraries ... they were probably similar with v7 client. This is usually caused by permission issues.

Is the client started with the service command created by client installer ?

Re: Unable to run NVIDIA GPU with driver 535

Posted: Sat Aug 12, 2023 12:52 am
by jloflin
These errors means that the client can't open the OpenCL and CUDA libraries ... they were probably similar with v7 client. This is usually caused by permission issues.

Is the client started with the service command created by client installer ?
Yes. I followed the directions at:
https://foldingathome.org/foldinghome-v ... de/?lng=en
and the gpu part of the program would not function with the Nvidia 535 driver, but works perfectly with the Nvidia 525 driver.

Re: Unable to run NVIDIA GPU with driver 535

Posted: Sat Aug 12, 2023 1:16 pm
by toTOW
There are no references to service commands or how to start/stop the client on this page ... so how do you start the client ?

It looks like this : viewtopic.php?p=361627#p361627

Re: Unable to run NVIDIA GPU with driver 535

Posted: Sat Aug 12, 2023 1:59 pm
by jloflin
I followed these directions from the V8 Client Guide:
Client home page

The client home page is the first screen you will encounter. After connecting, it will display the status of your F@H client. If you have not yet configured a username, team or passkey a dialog will popup asking you to do so via the Settings page or choose to fold anonymously.
This dialog appears if you have not yet configured your client.

Once configured, the header at the top of the page will show your username, team and the points earned so far. The buttons in the top right provide quick access to the client settings and log viewer.

Below the header, in the body of the page, is a large green Start Folding button. After configuring your user settings, click this button to get going. Click it again when you want to pause folding.
With Nvidia driver 535, on the settings page, the gpu line showed as disabled and I was not able to click the enable box (greyed out). With the Nvidia 525 driver, I clicked the enable box on the settings page and the gpu started folding.

Re: Unable to run NVIDIA GPU with driver 535

Posted: Sun Aug 13, 2023 10:23 am
by toTOW
If the drivers were the issue, we would know it : everyone would be complaining. And I have a v7 client running perfectly fine with those drivers.

If you look at this thread, you'll see that the client has a different behaviour after installing it and when you restart it later : viewtopic.php?t=39636

Re: Unable to run NVIDIA GPU with driver 535

Posted: Sun Aug 13, 2023 3:26 pm
by jloflin
So, just to check things out, I paused folding (Nvidia driver 525), went to the Driver Manager, selected the Nvidia 535 driver, applied it, rebooted it, restarted folding, and now it works. I have no idea what went wrong, or what fixed it.
Anyway, it's running correctly now.

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Posted: Sun Nov 03, 2024 5:02 pm
by AthanSpod
There's absolutely something weird going on with the v8 client.

I had this working, but a few months back when preparing to be utilising F@H as heating over the colder months. I then paused it and did nothing with it other than letting the service run on boot up (this is my desktop, shutdown each night currently).

Today I decided I needed that bit of heat, so unpaused the client. The CPU picked up a work unit immediately. The GPU did not.

The log of the failed attempts, multiple

Code: Select all

systemctl restart fah-client.service
showed the:

Code: Select all

OpenCL not supported: clGetPlatformIDs() returned -1001
CUDA not supported: cuInit() returned 999
issue.

I then started playing with starting it directly, as the `fah-client` user, under strace to see if I could find what was going on. Magically it started working, downloading 'cores' for the GPU. Now after exiting that and `systemctl start fah-client.service` it's happily running on both CPU and GPU. Specifically I used:

Code: Select all

cd /var/lib/fah-client && strace -o fah-client -s 4096 -f -ff /usr/bin/fah-client --config=/etc/fah-client/config.xml --log=/var/log/fah-client/log.txt --log-rotate-dir=/var/log/fah-client/
(I should have specified a full path into /var/tmp/strace/fah-client/ on the -o argument to strace, but not doing so only resulted in the strace output being in /var/lib/fah-client/)

This is on a Debian 12/bookworm system, using the package from https://download.foldingathome.org/rele ... _amd64.deb . `id fah-client`

Code: Select all

uid=997(fah-client) gid=996(fah-client) groups=996(fah-client),44(video),137(render)
but that seems moot as:

Code: Select all

16:54:19 0$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Nov  3 08:25 /dev/nvidia-modeset
crw-rw-rw- 1 root root 238,   0 Nov  3 08:25 /dev/nvidia-uvm
crw-rw-rw- 1 root root 238,   1 Nov  3 08:25 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Nov  3 08:25 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Nov  3 08:25 /dev/nvidiactl

/dev/nvidia-caps:
total 0
cr-------- 1 root root 241, 1 Nov  3 08:25 nvidia-cap1
cr--r--r-- 1 root root 241, 2 Nov  3 08:25 nvidia-cap2
I did not install, uninstall or reinstall anything between it complaining about those CUDA calls and it starting to work again. This feels like there's some other issue, with the v8 client code, that then results in the CUDA errors being logged, rather than those pointing to the root cause.

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Posted: Mon Nov 04, 2024 11:43 am
by AthanSpod
Further investigation, looking at the systemd unit that's supplied. It has some lines aimed at securing the system against the client causing side effects:

Code: Select all

PrivateTmp=yes
NoNewPrivileges=yes
ProtectSystem=full
ProtectHome=yes
So, I rebooted first with `PrivateTmp` line commented out - no change. The web page shows "Resources not available" for the GPU. But doing the same with only `NoNewPrivileges` commented out results in it working.

This is with NVIDIA drivers 535.216.01 (although I'll try the 550 series later today... I had been using 560 ones, but those don't have the recent security fix). I'm using my own kernel, but the .config is copied from Debian ones.

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Posted: Mon Nov 04, 2024 11:51 am
by AthanSpod
Edit: I brainfarted, it's `NoNewPrivileges` that makes the difference, not `ProtectSystem`. Editing ... done.

Using an override (i.e. `systemctl edit fah-client.service`) to set NoNewPrivileges to "no" also results in the GPU working, as expected. Now, going by the systemd.exec man page this would mean that something in the code is utilising `execve()` on a setuid or setgid binary. Is there any chance the libcuda calls are actually doing some such thing ?

Of course it's possible there's something else going on, perhaps some sort of race condition, and this is a red herring. I'll report back if it stops working for me again. Right now I'm going to go install the 550 series latest driver.

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Posted: Mon Nov 04, 2024 3:34 pm
by Marcos FRM
AthanSpod wrote: Mon Nov 04, 2024 11:51 amUsing an override (i.e. `systemctl edit fah-client.service`) to set NoNewPrivileges to "no" also results in the GPU working, as expected. Now, going by the systemd.exec man page this would mean that something in the code is utilising `execve()` on a setuid or setgid binary. Is there any chance the libcuda calls are actually doing some such thing ?
This might be the case.

https://manpages.ubuntu.com/manpages/or ... obe.1.html

I expected device nodes to be created by the modules themselves, or at least for the driver to have udev rules for this purpose.

Removing NoNewPrivileges= from the systemd unit file is unfortunate.