CUDA: Not detected. Linux Debian Stable
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 7
- Joined: Sat Apr 17, 2021 2:38 am
CUDA: Not detected. Linux Debian Stable
Hello.
The program detects my GPU but not CUDA. The OS is Linux Debian Stable.
It is a pity because my CPU is almost always heavily loaded; I have little CPU power to share, but my GPU is almost unused.
This is from the log:
OS: Linux 4.19.0-16-amd64 x86_64
OS Arch: AMD64
GPUs: 1
GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
CUDA: Not detected: cuInit() returned 999
The program detects my GPU but not CUDA. The OS is Linux Debian Stable.
It is a pity because my CPU is almost always heavily loaded; I have little CPU power to share, but my GPU is almost unused.
This is from the log:
OS: Linux 4.19.0-16-amd64 x86_64
OS Arch: AMD64
GPUs: 1
GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
CUDA: Not detected: cuInit() returned 999
Re: CUDA: Not detected. Linux Debian Stable
Have you installed Nvidia's proprietary drivers, or are you using the open source "nouveau" drivers? The Nvidia drivers can be installed as an apt package or as a binary install.
Debian has a tool called "nvidia-detect" which will recommend which driver to install, but before you can use that you have to make sure that "non-free" packages are allowed; you can do that by editing your sources.list files in /etc/apt/sources.list or the /etc/apt/sources.list.d directory: add "non-free" after "main contrib" and run "apt update" or hit "u" in aptitude.
https://wiki.debian.org/NvidiaGraphicsDrivers
You possibly need the package "ocl-icd-opencl-dev" as well as the Nvidia drivers. Installing "clinfo" is also useful for debugging.
After you've installed the drivers and rebooted, if it still doesn't work, check the output of the commands "nvidia-smi" and possibly "clinfo".
If permissions are a problem, you can overcome that by making a Systemd startup file, explained here: viewtopic.php?p=339973#p339973
Debian has a tool called "nvidia-detect" which will recommend which driver to install, but before you can use that you have to make sure that "non-free" packages are allowed; you can do that by editing your sources.list files in /etc/apt/sources.list or the /etc/apt/sources.list.d directory: add "non-free" after "main contrib" and run "apt update" or hit "u" in aptitude.
https://wiki.debian.org/NvidiaGraphicsDrivers
You possibly need the package "ocl-icd-opencl-dev" as well as the Nvidia drivers. Installing "clinfo" is also useful for debugging.
After you've installed the drivers and rebooted, if it still doesn't work, check the output of the commands "nvidia-smi" and possibly "clinfo".
If permissions are a problem, you can overcome that by making a Systemd startup file, explained here: viewtopic.php?p=339973#p339973
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
-
- Posts: 7
- Joined: Sat Apr 17, 2021 2:38 am
Re: CUDA: Not detected. Linux Debian Stable
Thank you for your answer.gunnarre wrote:Have you installed Nvidia's proprietary drivers, or are you using the open source "nouveau" drivers? The Nvidia drivers can be installed as an apt package or as a binary install.
Debian has a tool called "nvidia-detect" which will recommend which driver to install, but before you can use that you have to make sure that "non-free" packages are allowed; you can do that by editing your sources.list files in /etc/apt/sources.list or the /etc/apt/sources.list.d directory: add "non-free" after "main contrib" and run "apt update" or hit "u" in aptitude.
https://wiki.debian.org/NvidiaGraphicsDrivers
You possibly need the package "ocl-icd-opencl-dev" as well as the Nvidia drivers. Installing "clinfo" is also useful for debugging.
After you've installed the drivers and rebooted, if it still doesn't work, check the output of the commands "nvidia-smi" and possibly "clinfo".
If permissions are a problem, you can overcome that by making a Systemd startup file, explained here: viewtopic.php?p=339973#p339973
I have the proprietary Nvidia drivers version 460.39-1.
I already had ocl-icd-opencl-dev.
Here is the output of clinfo. I lack the knowledge to understand if there is something wrong in it.
Number of platforms 1
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 18.3.6
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Clover
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)
clCreateContext(NULL, ...) [default] No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2
Re: CUDA: Not detected. Linux Debian Stable
Is the "libnvidia-compute-460" package installed? It should be there if your NVidia drivers are correctly installed.
Is there different output if you run "sudo clinfo"? In that case, it's a permissions issue (look at the link I posted at the end).
What is the output of the command "nvidia-smi"?
Is there different output if you run "sudo clinfo"? In that case, it's a permissions issue (look at the link I posted at the end).
What is the output of the command "nvidia-smi"?
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
-
- Posts: 7
- Joined: Sat Apr 17, 2021 2:38 am
Re: CUDA: Not detected. Linux Debian Stable
There is not any libnvidia-compute-460 package in Debian: https://packages.debian.org/search?suit ... ia-computegunnarre wrote:Is the "libnvidia-compute-460" package installed? It should be there if your NVidia drivers are correctly installed.
Is there different output if you run "sudo clinfo"? In that case, it's a permissions issue (look at the link I posted at the end).
What is the output of the command "nvidia-smi"?
But I have solved part of the problem: when I upgraded from the Nvidia stable packages to the stable-backports, the package nvidia-egl-common was not upgraded. The dependencies should have dealt with that! After upgrading that package and restarting the computer the log looks much better:
Code: Select all
GPUs: 1
GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:11.2
OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:460.39
Code: Select all
ERROR:Exception: Error executing: 'PRAGMA synchronous=NORMAL': database is locked
-
- Posts: 7
- Joined: Sat Apr 17, 2021 2:38 am
Re: CUDA: Not detected. Linux Debian Stable
The output of "nvidia-smi" is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... On | 00000000:01:00.0 On | N/A |
| 0% 35C P8 8W / 200W | 521MiB / 6070MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1544 G /usr/lib/xorg/Xorg 237MiB |
| 0 N/A N/A 2561 G ...akonadi_archivemail_agent 1MiB |
| 0 N/A N/A 2572 G .../akonadi_mailfilter_agent 1MiB |
| 0 N/A N/A 2585 G ...n/akonadi_sendlater_agent 1MiB |
| 0 N/A N/A 2988 G ...b/firefox-esr/firefox-esr 1MiB |
| 0 N/A N/A 3031 G ...b/firefox-esr/firefox-esr 197MiB |
| 0 N/A N/A 11078 G nvidia-settings 0MiB |
| 0 N/A N/A 12708 G ...AAAAAAAAA= --shared-files 71MiB |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... On | 00000000:01:00.0 On | N/A |
| 0% 35C P8 8W / 200W | 521MiB / 6070MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1544 G /usr/lib/xorg/Xorg 237MiB |
| 0 N/A N/A 2561 G ...akonadi_archivemail_agent 1MiB |
| 0 N/A N/A 2572 G .../akonadi_mailfilter_agent 1MiB |
| 0 N/A N/A 2585 G ...n/akonadi_sendlater_agent 1MiB |
| 0 N/A N/A 2988 G ...b/firefox-esr/firefox-esr 1MiB |
| 0 N/A N/A 3031 G ...b/firefox-esr/firefox-esr 197MiB |
| 0 N/A N/A 11078 G nvidia-settings 0MiB |
| 0 N/A N/A 12708 G ...AAAAAAAAA= --shared-files 71MiB |
+-----------------------------------------------------------------------------+
Re: CUDA: Not detected. Linux Debian Stable
I'm not sure if that is a permissions issue. Is the output from "clinfo" and "sudo clinfo" different? If so, add fahclient to the "video" and/or "render" groups, and then look at the FAHClient.service startup-script detailed here: viewtopic.php?p=339973#p339973
If that doesn't help installing the driver from ".run"-file might. In the ".run" file from here: https://docs.nvidia.com/cuda/cuda-insta ... ml#runfile you can choose to only install the driver, with
Avoid installing the whole CUDA toolkit - you don't need it, and it might conflict with the CUDA version that comes with the core.
If that doesn't help installing the driver from ".run"-file might. In the ".run" file from here: https://docs.nvidia.com/cuda/cuda-insta ... ml#runfile you can choose to only install the driver, with
Code: Select all
sudo sh cuda_<version>_linux.run --driver
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: CUDA: Not detected. Linux Debian Stable
There probably are several reasons why the db may be locked. One common reason that I've seen is when you get two copies of FAHClient running. A single FAHClient service should handle all of the devices in that OS that can fold. (That might not be the issue.)database is locked
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: CUDA: Not detected. Linux Debian Stable
I'm now having the same problem of CUDA not being detected on my system running Ubuntu 20.04.
About a week or so ago I noticed that my system's PPD had dropped from ~3.2M/day to ~2.4M/day. I've had a busy week, so have been slowly debugging this. The system is a dual-card config with a Geforce 1080 Ti and a 1660 Ti. Looking at FAHControl, the 1660 Ti was generating the usual PPD while the 1080 Ti was way down, as in having gone from 2M+ PPD to around 800K.
The first thing I noticed in "Software & Updates | Additional Drivers" was that the 1080 Ti was using the manually installed NVIDIA driver version 390 while the 1660 Ti was using the automatically configured version 450. (Or 440, sorry.) Seeing that it is difficult to switch the 1080's driver from the manually-set driver to the higher version, I implemented the suggestion of removing all the NVIDIA drivers (including CUDA) and install the latest driver fresh. I installed driver 465.19.01, reinstalled CUDA and, well, no change to what FAHControl shows for the cards' individual PPDs.
Looking at the FAH log, the only mention of CUDA is in the System section:
16:27:49:******************************* System ********************************
16:27:49: CPU: AMD Athlon(tm) X4 860K Quad Core Processor
16:27:49: CPU ID: AuthenticAMD Family 21 Model 48 Stepping 1
16:27:49: CPUs: 4
16:27:49: Memory: 31.38GiB
16:27:49:Free Memory: 30.70GiB
16:27:49: Threads: POSIX_THREADS
16:27:49: OS Version: 5.8
16:27:49:Has Battery: false
16:27:49: On Battery: false
16:27:49: UTC Offset: -4
16:27:49: PID: 1057
16:27:49: CWD: /var/lib/fahclient
16:27:49: OS: Linux 5.8.0-50-generic x86_64
16:27:49: OS Arch: AMD64
16:27:49: GPUs: 2
16:27:49: GPU 0: NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
16:27:49: GPU 1: NVIDIA:7 TU116 [GeForce GTX 1660 Ti]
16:27:49: CUDA: 7.5
16:27:49:CUDA Driver: 11030
16:27:49:***********************************************************************
which is odd since it shows CUDA 7.5 while nvidia-smi gives me this showing CUDA version 11.3:
Wed Apr 21 20:06:40 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 57% 71C P2 204W / 250W | 405MiB / 11175MiB | 96% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:05:00.0 Off | N/A |
| 74% 76C P2 111W / 120W | 344MiB / 5944MiB | 97% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1095 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 4146 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 4285 G /usr/bin/gnome-shell 63MiB |
| 0 N/A N/A 5820 G /usr/lib/firefox/firefox 2MiB |
| 0 N/A N/A 5886 C ...13/Core_22.fah/FahCore_22 223MiB |
| 1 N/A N/A 1095 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 4146 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 6044 C ...13/Core_22.fah/FahCore_22 331MiB |
+-----------------------------------------------------------------------------+
Further down in the log, I see no "Platform 3: CUDA" that I used to see for both cards:
16:27:52:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
16:27:52:WU01:FS02:0x22:Version 0.0.13
16:27:52:WU01:FS02:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
16:27:52:WU01:FS02:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
16:27:52:WU01:FS02:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
16:27:52:WU01:FS02:0x22: Global context and integrator variables write interval: disabled
16:27:52:WU01:FS02:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
16:27:52:WU01:FS02:0x22:Please consider upgrading your client version.
16:27:52:WU00:FS01:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
16:27:52:WU00:FS01:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
16:27:52:WU00:FS01:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
16:27:52:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
16:27:52:WU00:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
16:27:52:WU00:FS01:0x22:Please consider upgrading your client version.
16:27:53:WU01:FS02:0x22:There are 3 platforms available.
16:27:53:WU01:FS02:0x22:Platform 0: Reference
16:27:53:WU01:FS02:0x22:Platform 1: CPU
16:27:53:WU01:FS02:0x22:Platform 2: OpenCL
16:27:53:WU01:FS02:0x22: opencl-device 1 specified
16:27:53:WU00:FS01:0x22:There are 3 platforms available.
16:27:53:WU00:FS01:0x22:Platform 0: Reference
16:27:53:WU00:FS01:0x22:Platform 1: CPU
16:27:53:WU00:FS01:0x22:Platform 2: OpenCL
16:27:53:WU00:FS01:0x22: opencl-device 0 specified
16:28:12:WU01:FS02:0x22:Attempting to create OpenCL context:
16:28:12:WU01:FS02:0x22: Configuring platform OpenCL
16:28:13:WU00:FS01:0x22:Attempting to create OpenCL context:
16:28:13:WU00:FS01:0x22: Configuring platform OpenCL
16:28:35:WU01:FS02:0x22: Using OpenCL on platformId 0 and gpu 1
16:28:36:WU00:FS01:0x22: Using OpenCL on platformId 0 and gpu 0
Figuring that I messed up some path setting, I pulled down the NVIDIA CUDA coding example set to see if it generated any errors. Nope, the multidevice.cu program compiles and runs just fine. (And with an added line, informs me that it found two CUDA devices.) Looks like CUDA is working.
clinfo also looks fine and shows that both devices have:
Device Version OpenCL 3.0 CUDA
Driver Version 465.19.01
Device OpenCL C Version OpenCL C 1.2
After all of this deinstall / reinstall / test / verify it seems that FAH just doesn't want to recognize that the system has CUDA and insists on using OpenCL. (Which, oddly, seems to run nicely on the 1660 Ti but really suffers performance problems on the 1080 Ti.)
Following a suggestion earlier in this thread, I've confirmed that the latest version of ocl-icd-opencl-dev is installed.
Any ideas or debugging hints?
About a week or so ago I noticed that my system's PPD had dropped from ~3.2M/day to ~2.4M/day. I've had a busy week, so have been slowly debugging this. The system is a dual-card config with a Geforce 1080 Ti and a 1660 Ti. Looking at FAHControl, the 1660 Ti was generating the usual PPD while the 1080 Ti was way down, as in having gone from 2M+ PPD to around 800K.
The first thing I noticed in "Software & Updates | Additional Drivers" was that the 1080 Ti was using the manually installed NVIDIA driver version 390 while the 1660 Ti was using the automatically configured version 450. (Or 440, sorry.) Seeing that it is difficult to switch the 1080's driver from the manually-set driver to the higher version, I implemented the suggestion of removing all the NVIDIA drivers (including CUDA) and install the latest driver fresh. I installed driver 465.19.01, reinstalled CUDA and, well, no change to what FAHControl shows for the cards' individual PPDs.
Looking at the FAH log, the only mention of CUDA is in the System section:
16:27:49:******************************* System ********************************
16:27:49: CPU: AMD Athlon(tm) X4 860K Quad Core Processor
16:27:49: CPU ID: AuthenticAMD Family 21 Model 48 Stepping 1
16:27:49: CPUs: 4
16:27:49: Memory: 31.38GiB
16:27:49:Free Memory: 30.70GiB
16:27:49: Threads: POSIX_THREADS
16:27:49: OS Version: 5.8
16:27:49:Has Battery: false
16:27:49: On Battery: false
16:27:49: UTC Offset: -4
16:27:49: PID: 1057
16:27:49: CWD: /var/lib/fahclient
16:27:49: OS: Linux 5.8.0-50-generic x86_64
16:27:49: OS Arch: AMD64
16:27:49: GPUs: 2
16:27:49: GPU 0: NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
16:27:49: GPU 1: NVIDIA:7 TU116 [GeForce GTX 1660 Ti]
16:27:49: CUDA: 7.5
16:27:49:CUDA Driver: 11030
16:27:49:***********************************************************************
which is odd since it shows CUDA 7.5 while nvidia-smi gives me this showing CUDA version 11.3:
Wed Apr 21 20:06:40 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 57% 71C P2 204W / 250W | 405MiB / 11175MiB | 96% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:05:00.0 Off | N/A |
| 74% 76C P2 111W / 120W | 344MiB / 5944MiB | 97% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1095 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 4146 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 4285 G /usr/bin/gnome-shell 63MiB |
| 0 N/A N/A 5820 G /usr/lib/firefox/firefox 2MiB |
| 0 N/A N/A 5886 C ...13/Core_22.fah/FahCore_22 223MiB |
| 1 N/A N/A 1095 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 4146 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 6044 C ...13/Core_22.fah/FahCore_22 331MiB |
+-----------------------------------------------------------------------------+
Further down in the log, I see no "Platform 3: CUDA" that I used to see for both cards:
16:27:52:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
16:27:52:WU01:FS02:0x22:Version 0.0.13
16:27:52:WU01:FS02:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
16:27:52:WU01:FS02:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
16:27:52:WU01:FS02:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
16:27:52:WU01:FS02:0x22: Global context and integrator variables write interval: disabled
16:27:52:WU01:FS02:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
16:27:52:WU01:FS02:0x22:Please consider upgrading your client version.
16:27:52:WU00:FS01:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
16:27:52:WU00:FS01:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
16:27:52:WU00:FS01:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
16:27:52:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
16:27:52:WU00:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
16:27:52:WU00:FS01:0x22:Please consider upgrading your client version.
16:27:53:WU01:FS02:0x22:There are 3 platforms available.
16:27:53:WU01:FS02:0x22:Platform 0: Reference
16:27:53:WU01:FS02:0x22:Platform 1: CPU
16:27:53:WU01:FS02:0x22:Platform 2: OpenCL
16:27:53:WU01:FS02:0x22: opencl-device 1 specified
16:27:53:WU00:FS01:0x22:There are 3 platforms available.
16:27:53:WU00:FS01:0x22:Platform 0: Reference
16:27:53:WU00:FS01:0x22:Platform 1: CPU
16:27:53:WU00:FS01:0x22:Platform 2: OpenCL
16:27:53:WU00:FS01:0x22: opencl-device 0 specified
16:28:12:WU01:FS02:0x22:Attempting to create OpenCL context:
16:28:12:WU01:FS02:0x22: Configuring platform OpenCL
16:28:13:WU00:FS01:0x22:Attempting to create OpenCL context:
16:28:13:WU00:FS01:0x22: Configuring platform OpenCL
16:28:35:WU01:FS02:0x22: Using OpenCL on platformId 0 and gpu 1
16:28:36:WU00:FS01:0x22: Using OpenCL on platformId 0 and gpu 0
Figuring that I messed up some path setting, I pulled down the NVIDIA CUDA coding example set to see if it generated any errors. Nope, the multidevice.cu program compiles and runs just fine. (And with an added line, informs me that it found two CUDA devices.) Looks like CUDA is working.
clinfo also looks fine and shows that both devices have:
Device Version OpenCL 3.0 CUDA
Driver Version 465.19.01
Device OpenCL C Version OpenCL C 1.2
After all of this deinstall / reinstall / test / verify it seems that FAH just doesn't want to recognize that the system has CUDA and insists on using OpenCL. (Which, oddly, seems to run nicely on the 1660 Ti but really suffers performance problems on the 1080 Ti.)
Following a suggestion earlier in this thread, I've confirmed that the latest version of ocl-icd-opencl-dev is installed.
Any ideas or debugging hints?
Re: CUDA: Not detected. Linux Debian Stable
Yes, it's a complicated issue. I'm not sure I can explain all of the details.
OpenCL drivers are required by FAH during initiazation and they may be various versions from various sources. CUDA drivers are carefully assigned by NVidia and there are a multitude of options.
This is further complicated by the fact that FAHCore_22 installs a version of CUDA inside of the FAHCore directory/folder which supposedly will work with any nVidia GPU later than Fermi, whereas direct support for various generations of GPUs use specific versions of CUDA. I've seen reports saying that installing a specific version of the CUDA drivers may, in fact, conflict with the version provided by FAHCore_22.
If you figure out what's happening, let me know what you learned.
OpenCL drivers are required by FAH during initiazation and they may be various versions from various sources. CUDA drivers are carefully assigned by NVidia and there are a multitude of options.
This is further complicated by the fact that FAHCore_22 installs a version of CUDA inside of the FAHCore directory/folder which supposedly will work with any nVidia GPU later than Fermi, whereas direct support for various generations of GPUs use specific versions of CUDA. I've seen reports saying that installing a specific version of the CUDA drivers may, in fact, conflict with the version provided by FAHCore_22.
If you figure out what's happening, let me know what you learned.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: CUDA: Not detected. Linux Debian Stable
When I folded on a multi-GPU Linux PC, I discovered that I needed the "server" variant of the NVidia drivers to support multi-GPU, but according to other people this wasn't correct. You might try to install the "server" variant of the drivers, or perhaps downgrade the NVidia drivers to an earlier version. The problem here might be which order the drivers are installed, or FAH's method of detecting CUDA and OpenCL - ideally, FAH should run even if you have multiple versions of CUDA, ROCM and AMDPro drivers installed, and pick the "best one" of those automatically, but it's a complicated task to program and test, so you're most likely to succeed by not having any other CUDA or OpenCL toolkits installed.
Have you tried both the Debian package and the .run-file installation methods? Does using the "server" variant of the driver instead of the regular version help?
Have you tried both the Debian package and the .run-file installation methods? Does using the "server" variant of the driver instead of the regular version help?
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: CUDA: Not detected. Linux Debian Stable
I'll try those suggestions, gunnarre.
Bruce, yes, this is apparently a complicated issue but it is not made any easier by the FAH code right now. I've been trying things and, along with what gunnarre suggested, I'm guessing. I'm sure I'm not telling you anything new by saying that guessing is not exactly the most optimal way of debugging a complex situation and at this point I'm getting frustrated. I don't want to just throw in the towel and say "oh well, FAH gets what FAH gets" since this Linux box, and both of its video cards, exist only for FAH so I want it to work as best as it can!
I've set verbosity to 5 and it gives me no additional clues. None of the debug variables appear to be applicable to this problem, so can we get more info? Since getting CUDA to work is so "touchy", I'm thinking that maybe a debug variable could be created that when set, only takes the client up through the point of identifying if CUDA is usable or not -- describing every check the code is making in that section and what answer it is getting back -- and then quits. (The reason for it then quitting is that I've found my debugging to be slowed down by not wanting to flush WUs that were started so as to not hurt the progress of the science. Depending upon what I'm going to try next, sometimes I have to set the slots to "finishing" and wait for them both to finish before doing the next install like backing up to Ubuntu 18.04.) Hopefully, getting the output at this level of step-by-step could more readily show where the problem is and whether our changes are getting closer to have it working. Right now, it is a works-or-not black box with no hint of getting closer to it working.
Thanks.
Bruce, yes, this is apparently a complicated issue but it is not made any easier by the FAH code right now. I've been trying things and, along with what gunnarre suggested, I'm guessing. I'm sure I'm not telling you anything new by saying that guessing is not exactly the most optimal way of debugging a complex situation and at this point I'm getting frustrated. I don't want to just throw in the towel and say "oh well, FAH gets what FAH gets" since this Linux box, and both of its video cards, exist only for FAH so I want it to work as best as it can!
I've set verbosity to 5 and it gives me no additional clues. None of the debug variables appear to be applicable to this problem, so can we get more info? Since getting CUDA to work is so "touchy", I'm thinking that maybe a debug variable could be created that when set, only takes the client up through the point of identifying if CUDA is usable or not -- describing every check the code is making in that section and what answer it is getting back -- and then quits. (The reason for it then quitting is that I've found my debugging to be slowed down by not wanting to flush WUs that were started so as to not hurt the progress of the science. Depending upon what I'm going to try next, sometimes I have to set the slots to "finishing" and wait for them both to finish before doing the next install like backing up to Ubuntu 18.04.) Hopefully, getting the output at this level of step-by-step could more readily show where the problem is and whether our changes are getting closer to have it working. Right now, it is a works-or-not black box with no hint of getting closer to it working.
Thanks.
Re: CUDA: Not detected. Linux Debian Stable
Yes, verbosidy is a useless setting.
What I would do is FINISH any work in process ... remove any traces of the toolkit except for the driver ... remove FAHCore_22 (or all FAHCores) ... and start a single slot to reinstall a fresh copy of the FAHCore for the failing GPU. After that GPU is folding, allow the other slot to be unpaused.
Taht's based on the assumption that something changed in FAHCore_22 or something is still pointing to the toolkit.
FAH does NOT use server drivers. Folding@HOME is designed to use video drivers standard Geforce drivers on a home machine (either Studio or game-ready, as far as I know)
What I would do is FINISH any work in process ... remove any traces of the toolkit except for the driver ... remove FAHCore_22 (or all FAHCores) ... and start a single slot to reinstall a fresh copy of the FAHCore for the failing GPU. After that GPU is folding, allow the other slot to be unpaused.
Taht's based on the assumption that something changed in FAHCore_22 or something is still pointing to the toolkit.
FAH does NOT use server drivers. Folding@HOME is designed to use video drivers standard Geforce drivers on a home machine (either Studio or game-ready, as far as I know)
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: CUDA: Not detected. Linux Debian Stable
Perhaps I'm getting closer to the issue. Looking more closely at the logs, I'm seeing this:
21:19:30:WU00:FS01:0x22:Version 0.0.13
21:19:31:WU00:FS01:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
21:19:31:WU00:FS01:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
21:19:31:WU00:FS01:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
21:19:31:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
21:19:31:WU00:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
21:19:31:WU00:FS01:0x22:Please consider upgrading your client version.
21:19:31:WU01:FS00:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
21:19:31:WU01:FS00:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
21:19:31:WU01:FS00:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
21:19:31:WU01:FS00:0x22: Global context and integrator variables write interval: disabled
21:19:31:WU01:FS00:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
21:19:31:WU01:FS00:0x22:Please consider upgrading your client version.
The reason that I started diving into this "No -opencl-device specified... ...Please consider upgrading your client version" message is that over the course of the last week, I've probably reinstalled Ubuntu (of various flavors), the nvidia drivers (of various flavors) and FAH somewhere between a half dozen and a dozen times with the latest one being just six hours ago. The thing is that EVERY INSTALL I tell the installer to blitz the disk and install everything fresh. This means if I have an old version of a FAH-something that needs upgrading, I got it again today.
Looking further to see what is causing this error message I found that the command starting the core is this:
21:19:30:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 704 -lifeline 983 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
and yes, sure enough, it is not specifying -opencl-device N. I looked at other recent posting that include the log file (such as viewtopic.php?f=80&t=36961 from a few weeks ago) has this command:
18:11:37:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\PC\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 4020 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
Now, of course, that is a Windows box where things can be different, so here is the command from viewtopic.php?f=80&t=36169:
12:38:39:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 14522 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
which is also recent-enough to use the 22-0.0.13 core. Note that it has both the opencl-device and cuda-device parameters.
Now, what befuddles me right now is that my two video cards are being used for folding using openCL. If it was only a problem that whatever generates the core's argument list (FAHClient, I presume) isn't happy with my system's CUDA, why would it also not put the opencl-device on the parameter list? I doubt that opencl-device would be a default or else I shouldn't be seeing the messages squawking about the depredicated -gpu option and needing to upgrade my client.
Out of curiosity, I added opencl-device and cuda-device to my slot 0's definition (which is a gpu as I don't bother with cpu folding) and the log file shows that those lines were entirely ignored when the folding was restarted after a system reboot.
And yes, the CUDA toolkit was removed quite a while before these tests and I told the Ubuntu installer earlier today to go out and get proprietary drivers to avoid the hassle of trying to get Ubuntu to start up without the X system in order to load the .run file. It installed the non-server nvidia-driver-460 which nvidia-smi shows as having brought down CUDA 11.2.
21:19:30:WU00:FS01:0x22:Version 0.0.13
21:19:31:WU00:FS01:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
21:19:31:WU00:FS01:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
21:19:31:WU00:FS01:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
21:19:31:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
21:19:31:WU00:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
21:19:31:WU00:FS01:0x22:Please consider upgrading your client version.
21:19:31:WU01:FS00:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
21:19:31:WU01:FS00:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
21:19:31:WU01:FS00:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
21:19:31:WU01:FS00:0x22: Global context and integrator variables write interval: disabled
21:19:31:WU01:FS00:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
21:19:31:WU01:FS00:0x22:Please consider upgrading your client version.
The reason that I started diving into this "No -opencl-device specified... ...Please consider upgrading your client version" message is that over the course of the last week, I've probably reinstalled Ubuntu (of various flavors), the nvidia drivers (of various flavors) and FAH somewhere between a half dozen and a dozen times with the latest one being just six hours ago. The thing is that EVERY INSTALL I tell the installer to blitz the disk and install everything fresh. This means if I have an old version of a FAH-something that needs upgrading, I got it again today.
Looking further to see what is causing this error message I found that the command starting the core is this:
21:19:30:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 704 -lifeline 983 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
and yes, sure enough, it is not specifying -opencl-device N. I looked at other recent posting that include the log file (such as viewtopic.php?f=80&t=36961 from a few weeks ago) has this command:
18:11:37:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\PC\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 4020 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
Now, of course, that is a Windows box where things can be different, so here is the command from viewtopic.php?f=80&t=36169:
12:38:39:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 14522 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
which is also recent-enough to use the 22-0.0.13 core. Note that it has both the opencl-device and cuda-device parameters.
Now, what befuddles me right now is that my two video cards are being used for folding using openCL. If it was only a problem that whatever generates the core's argument list (FAHClient, I presume) isn't happy with my system's CUDA, why would it also not put the opencl-device on the parameter list? I doubt that opencl-device would be a default or else I shouldn't be seeing the messages squawking about the depredicated -gpu option and needing to upgrade my client.
Out of curiosity, I added opencl-device and cuda-device to my slot 0's definition (which is a gpu as I don't bother with cpu folding) and the log file shows that those lines were entirely ignored when the folding was restarted after a system reboot.
And yes, the CUDA toolkit was removed quite a while before these tests and I told the Ubuntu installer earlier today to go out and get proprietary drivers to avoid the hassle of trying to get Ubuntu to start up without the X system in order to load the .run file. It installed the non-server nvidia-driver-460 which nvidia-smi shows as having brought down CUDA 11.2.
Re: CUDA: Not detected. Linux Debian Stable
You're not looking at the beginning of FAH's log. FAHClient detects the hardware once and only when it initializes. This system looks like this:
Note that it detects two configurations for my GPU based on the drivers found at that time. You'll only find the CWD statement once.
It does not re-detect the hardware later in the run but simply reports it for what the FAHCore sees based on the parameter strings.
Your configuration will certainly be different. Was OpenCL available at that point?
When FAHCore_22 runs, it then considers the four available choices, and in my case, chooses CUDA. If CUDA had failed to initialize, it would have proceeded to create an OpenCL platforn on Opencl-device 0 as a second choice for -gpu 0
Code: Select all
08:59:30: CWD: C:\Users\Bruce\AppData\Roaming\FAHClient
08:59:30: Win32 Service: false
08:59:30: OS: Windows 10 Home
08:59:30: OS Arch: AMD64
08:59:30: GPUs: 1
08:59:30: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
08:59:30: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:11.2
08:59:30:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:461.92
08:59:30:***********************************************************************
It does not re-detect the hardware later in the run but simply reports it for what the FAHCore sees based on the parameter strings.
Your configuration will certainly be different. Was OpenCL available at that point?
When FAHCore_22 runs, it then considers the four available choices, and in my case, chooses CUDA. If CUDA had failed to initialize, it would have proceeded to create an OpenCL platforn on Opencl-device 0 as a second choice for -gpu 0
Code: Select all
20:42:10:WU01:FS01:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" C:\Users\Bruce\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 6268 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
...
03:52:06:WU02:FS01:0x22:Version 0.0.13
03:52:07:WU02:FS01:0x22: Checkpoint write interval: 15000 steps (2%) [50 total]
03:52:07:WU02:FS01:0x22: JSON viewer frame write interval: 7500 steps (1%) [100 total]
03:52:07:WU02:FS01:0x22: XTC frame write interval: 250000 steps (33%) [3 total]
03:52:07:WU02:FS01:0x22: Global context and integrator variables write interval: disabled
03:52:08:WU02:FS01:0x22:There are 4 platforms available.
03:52:08:WU02:FS01:0x22:Platform 0: Reference
03:52:08:WU02:FS01:0x22:Platform 1: CPU
03:52:08:WU02:FS01:0x22:Platform 2: OpenCL
03:52:08:WU02:FS01:0x22: opencl-device 0 specified
03:52:08:WU02:FS01:0x22:Platform 3: CUDA
03:52:08:WU02:FS01:0x22: cuda-device 0 specified
03:52:55:WU02:FS01:0x22:Attempting to create CUDA context:
03:52:55:WU02:FS01:0x22: Configuring platform CUDA
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.