AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict
Posted: Sat Dec 17, 2022 3:11 am
I've got a system with an AMD 6750XT in it running Ubuntu 22.04. An AMDGPU driver update was released this week and I wanted to install it to see if it would include the ROCm OpenCL perf improvements seen on Windows earlier this year.
Instead, GPU folding stopped working altogether.
fahclient would find the GPU OK, download a work unit, and start it only to choke immediately with BAD_WORK_UNIT. Repeat until the client auto-disabled the GPU slot.
This looked and felt like an OpenCL problem - it's been problematic for me long ago folding on a GPU on linux. BUT - this time OpenCL appeared to be working fine elsewhere in the system - clinfo ran normally, fahbench could find and use it, and even the fahclient syslog reported
The telltale for this problem was further down in the log, when core22 tries to actually start
It should have found
I traced the problem to OpenMM, and eventually figured it out.
Core22 ships from Folding@Home work servers with libstdc++.so.6 version GLIBCXX_3.4.28, but libamdocl64.so (the AMD ROCm OpenCL implementation) requires GLIBCXX_3.4.29
Fortunately the system libstdc++ (/lib/x86_64-linux-gnu/libstdc++.so.6) is GLIBCXX_3.4.30, so it can just be swapped in.
Workaround
Configure the GPU slot, and enable it.
Let it fail and disable.
Re-enable the GPU slot and it should work
This workaround will last as long until a new core version is released or something else clears your cores.foldingathome.org cache. Then you will need to apply it to the new directory.
Fix
The Folding@Home team needs to update the version of libstdc++ they are shipping with their workunits.
Oh, and does the new ROCm improve folding performance?
Estimated PPD 2110772
Instead, GPU folding stopped working altogether.
fahclient would find the GPU OK, download a work unit, and start it only to choke immediately with BAD_WORK_UNIT. Repeat until the client auto-disabled the GPU slot.
This looked and felt like an OpenCL problem - it's been problematic for me long ago folding on a GPU on linux. BUT - this time OpenCL appeared to be working fine elsewhere in the system - clinfo ran normally, fahbench could find and use it, and even the fahclient syslog reported
Code: Select all
04:27:21:******************************* System ********************************
04:27:21: CPU: AMD Ryzen 7 5800X 8-Core Processor
04:27:21: CPU ID: AuthenticAMD Family 25 Model 33 Stepping 2
04:27:21: CPUs: 16
04:27:21: Memory: 31.27GiB
04:27:21: Free Memory: 24.57GiB
04:27:21: Threads: POSIX_THREADS
04:27:21: OS Version: 5.15
04:27:21: Has Battery: false
04:27:21: On Battery: false
04:27:21: UTC Offset: -6
04:27:21: PID: 42255
04:27:21: CWD: /var/lib/fahclient
04:27:21: OS: Linux 5.15.0-56-generic x86_64
04:27:21: OS Arch: AMD64
04:27:21: GPUs: 1
04:27:21: GPU 0: Bus:9 Slot:0 Func:0 AMD:6 Navi 22 XT-XL [Radeon RX
04:27:21: 6700/6700XT/6800M]
04:27:21: CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
04:27:21: libcuda.so: cannot open shared object file: No such file or
04:27:21: directory
04:27:21:OpenCL Device 0: Platform:0 Device:0 Bus:9 Slot:0 Compute:2.0 Driver:3513.0
Code: Select all
04:28:03:WU00:FS01:0x22:Project: 18909 (Run 37, Clone 4, Gen 31)
04:28:03:WU00:FS01:0x22:Reading tar file core.xml
04:28:03:WU00:FS01:0x22:Reading tar file integrator.xml
04:28:03:WU00:FS01:0x22:Reading tar file state.xml
04:28:03:WU00:FS01:0x22:Reading tar file system.xml
04:28:03:WU00:FS01:0x22:Digital signatures verified
04:28:03:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
04:28:03:WU00:FS01:0x22:Version 0.0.20
04:28:03:WU00:FS01:0x22: Checkpoint write interval: 62500 steps (5%) [20 total]
04:28:03:WU00:FS01:0x22: JSON viewer frame write interval: 12500 steps (1%) [100 total]
04:28:03:WU00:FS01:0x22: XTC frame write interval: 25000 steps (2%) [50 total]
04:28:03:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
04:28:03:WU00:FS01:0x22:There are 2 platforms available.
04:28:03:WU00:FS01:0x22:Platform 0: Reference
04:28:03:WU00:FS01:0x22:Platform 1: CPU
04:28:03:WU00:FS01:0x22:opencl-device was set but OpenCL platform could not be found.
04:28:03:WU00:FS01:0x22:ERROR:126: Neither CUDA nor OpenCL is available.
Code: Select all
02:27:17:WU01:FS01:0x22:There are 3 platforms available.
02:27:17:WU01:FS01:0x22:Platform 0: Reference
02:27:17:WU01:FS01:0x22:Platform 1: CPU
02:27:17:WU01:FS01:0x22:Platform 2: OpenCL
Core22 ships from Folding@Home work servers with libstdc++.so.6 version GLIBCXX_3.4.28, but libamdocl64.so (the AMD ROCm OpenCL implementation) requires GLIBCXX_3.4.29
Fortunately the system libstdc++ (/lib/x86_64-linux-gnu/libstdc++.so.6) is GLIBCXX_3.4.30, so it can just be swapped in.
Workaround
Configure the GPU slot, and enable it.
Let it fail and disable.
Code: Select all
cd /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.20/Core_22.fah/
sudo rm libstdc++.so.6
sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6 libstdc++.so.6
This workaround will last as long until a new core version is released or something else clears your cores.foldingathome.org cache. Then you will need to apply it to the new directory.
Fix
The Folding@Home team needs to update the version of libstdc++ they are shipping with their workunits.
Oh, and does the new ROCm improve folding performance?
Estimated PPD 2110772