AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

tchiers
Posts: 23
Joined: Tue Oct 23, 2018 4:23 am

AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by tchiers »

I've got a system with an AMD 6750XT in it running Ubuntu 22.04. An AMDGPU driver update was released this week and I wanted to install it to see if it would include the ROCm OpenCL perf improvements seen on Windows earlier this year.

Instead, GPU folding stopped working altogether. :evil:
fahclient would find the GPU OK, download a work unit, and start it only to choke immediately with BAD_WORK_UNIT. Repeat until the client auto-disabled the GPU slot.

This looked and felt like an OpenCL problem - it's been problematic for me long ago folding on a GPU on linux. BUT - this time OpenCL appeared to be working fine elsewhere in the system - clinfo ran normally, fahbench could find and use it, and even the fahclient syslog reported

Code: Select all

04:27:21:******************************* System ********************************
04:27:21:            CPU: AMD Ryzen 7 5800X 8-Core Processor
04:27:21:         CPU ID: AuthenticAMD Family 25 Model 33 Stepping 2
04:27:21:           CPUs: 16
04:27:21:         Memory: 31.27GiB
04:27:21:    Free Memory: 24.57GiB
04:27:21:        Threads: POSIX_THREADS
04:27:21:     OS Version: 5.15
04:27:21:    Has Battery: false
04:27:21:     On Battery: false
04:27:21:     UTC Offset: -6
04:27:21:            PID: 42255
04:27:21:            CWD: /var/lib/fahclient
04:27:21:             OS: Linux 5.15.0-56-generic x86_64
04:27:21:        OS Arch: AMD64
04:27:21:           GPUs: 1
04:27:21:          GPU 0: Bus:9 Slot:0 Func:0 AMD:6 Navi 22 XT-XL [Radeon RX
04:27:21:                 6700/6700XT/6800M]
04:27:21:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
04:27:21:                 libcuda.so: cannot open shared object file: No such file or
04:27:21:                 directory
04:27:21:OpenCL Device 0: Platform:0 Device:0 Bus:9 Slot:0 Compute:2.0 Driver:3513.0
The telltale for this problem was further down in the log, when core22 tries to actually start

Code: Select all

04:28:03:WU00:FS01:0x22:Project: 18909 (Run 37, Clone 4, Gen 31)
04:28:03:WU00:FS01:0x22:Reading tar file core.xml
04:28:03:WU00:FS01:0x22:Reading tar file integrator.xml
04:28:03:WU00:FS01:0x22:Reading tar file state.xml
04:28:03:WU00:FS01:0x22:Reading tar file system.xml
04:28:03:WU00:FS01:0x22:Digital signatures verified
04:28:03:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
04:28:03:WU00:FS01:0x22:Version 0.0.20
04:28:03:WU00:FS01:0x22:  Checkpoint write interval: 62500 steps (5%) [20 total]
04:28:03:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
04:28:03:WU00:FS01:0x22:  XTC frame write interval: 25000 steps (2%) [50 total]
04:28:03:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
04:28:03:WU00:FS01:0x22:There are 2 platforms available.
04:28:03:WU00:FS01:0x22:Platform 0: Reference
04:28:03:WU00:FS01:0x22:Platform 1: CPU
04:28:03:WU00:FS01:0x22:opencl-device was set but OpenCL platform could not be found.
04:28:03:WU00:FS01:0x22:ERROR:126: Neither CUDA nor OpenCL is available.
It should have found

Code: Select all

02:27:17:WU01:FS01:0x22:There are 3 platforms available.
02:27:17:WU01:FS01:0x22:Platform 0: Reference
02:27:17:WU01:FS01:0x22:Platform 1: CPU
02:27:17:WU01:FS01:0x22:Platform 2: OpenCL
I traced the problem to OpenMM, and eventually figured it out.

Core22 ships from Folding@Home work servers with libstdc++.so.6 version GLIBCXX_3.4.28, but libamdocl64.so (the AMD ROCm OpenCL implementation) requires GLIBCXX_3.4.29

Fortunately the system libstdc++ (/lib/x86_64-linux-gnu/libstdc++.so.6) is GLIBCXX_3.4.30, so it can just be swapped in.

Workaround
Configure the GPU slot, and enable it.
Let it fail and disable.

Code: Select all

cd /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.20/Core_22.fah/
sudo rm libstdc++.so.6
sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6 libstdc++.so.6
Re-enable the GPU slot and it should work

This workaround will last as long until a new core version is released or something else clears your cores.foldingathome.org cache. Then you will need to apply it to the new directory.

Fix
The Folding@Home team needs to update the version of libstdc++ they are shipping with their workunits.

Oh, and does the new ROCm improve folding performance?
Estimated PPD 2110772
8-)
Last edited by tchiers on Mon Dec 19, 2022 1:34 am, edited 1 time in total.
toTOW
Site Moderator
Posts: 6497
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by toTOW »

I forwarded your post to FahCore and OpenMM devs.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by JohnChodera »

Thanks for the heads up, @tchiers and @toTOW!
We'll look into a solution.

~ John Chodera // MSKCC
jonorok
Posts: 5
Joined: Thu Sep 17, 2020 4:01 pm

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by jonorok »

Thank you!

This fixed it for me, too.

RX 6700XT, ROCM 5.4, on Pop OS 22.04.
hashibs
Posts: 3
Joined: Thu Oct 20, 2022 6:26 pm

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by hashibs »

Your troubleshooting has saved me a lot of frustration!

I'd been waiting some months to upgrade one of my folding PCs to Ubuntu 22.04, until AMD drivers for the RX 6700 XT were available and had "stabilized" a bit. Finally decided to do that today, and afterward was unpleasantly surprised to find the gpu idle and this error in the log:

Code: Select all

17:56:18:WU01:FS01:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
The amdgpu-install script didn't throw any errors, the output of clinfo seemed correct, and so did the config files, permissions, etc. I searched around, somehow found this page, and gave your steps a shot. Works great. Thanks for making it easier to contribute!
muziqaz
Posts: 2129
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by muziqaz »

OK, this is very interesting, however, this does not explain why I can fold with fahcore_22 and shipped libs (old GLIBCXX), when I run fahcore_22 on its own. I only get no opencl platform when run through fahclient. or is fahclient require newer GLIBCXX?
I will try this workaround and see if it helps, and then give suggestions to devs. This thing needs to be handled automatically
FAH Omega tester
Image
muziqaz
Posts: 2129
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by muziqaz »

Ok, this is actually working. Incredible, and quite alarming, as this might happen in the future too, and only several failed WUs would ring the bells to us to start acting :(
FAH Omega tester
Image
SovietReimu1917
Posts: 2
Joined: Thu Sep 29, 2022 11:17 pm

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by SovietReimu1917 »

Why Core22 contains libstdc++.so.6?
I think it should use the system one since it is the basic library installed on most systems.
muziqaz
Posts: 2129
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by muziqaz »

SovietReimu1917 wrote: Sat Sep 16, 2023 12:36 pm Why Core22 contains libstdc++.so.6?
I think it should use the system one since it is the basic library installed on most systems.
Nearly all libs present in fahcore folder are just in case libs, to make sure that system users do not need to fish any libs in case they are not present in their system for some reason :)
If FAH devs were sure that 100% of systems contain all the necessary libs, they would not package them together with fahcore :)
FAH Omega tester
Image
toTOW
Site Moderator
Posts: 6497
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by toTOW »

We had so many compatibility issues with those libs that are sometimes never updated by the user or the distribution that it's safer to provide them with the core ... :roll:
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
DarkFoss
Posts: 129
Joined: Fri Apr 16, 2010 11:43 pm
Hardware configuration: AMD 5800X3D Asus ROG Strix X570-E Gaming WiFi II bios 5031 G-Skill TridentZ Neo 3600mhz Asrock Tachi RX 7900XTX Corsair rm850x psu Asus PG32UQXR EK Elite 360 D-rgb aio Win 11pro/Kubuntu 2404.2 LTS Kernel 6.11.x HWE LowLatency UPS BX1500G
Location: Galifrey

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by DarkFoss »

Thank you fixed this for me as well. Kubuntu 22.04 LTS Rocm ver 5.7.0 Asrock 7900XTX.
As an aside I downloaded FAHBench and did similar but copied libstdc++.so.6.0.30 into the folder rm'ed the old libstdc++.so.6 that was inside and made the symlink point back to libstdc++.so.6.0.30. FAHBench now works fine.
I'm thinking doing that within the core might be a better workaround. I'll give it a try after these wu's finish up.
Image
L0nerism
Posts: 3
Joined: Sat Nov 22, 2014 10:07 am

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by L0nerism »

After fighting with getting a Vega 64 on Debian Sid (ROCm v5.7.3) to fold Core 22 work units, I'm making this post to thank you for this workaround. :D
braiam
Posts: 16
Joined: Mon Mar 23, 2020 2:56 pm

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by braiam »

This still happens with core 23. Core 24 however seems to not have issues.
muziqaz
Posts: 2129
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by muziqaz »

braiam wrote: Tue Sep 10, 2024 4:41 am This still happens with core 23. Core 24 however seems to not have issues.
Screenshots or it didn't happen ;)
FAH Omega tester
Image
braiam
Posts: 16
Joined: Mon Mar 23, 2020 2:56 pm

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Post by braiam »

muziqaz wrote: Tue Sep 10, 2024 5:36 am
braiam wrote: Tue Sep 10, 2024 4:41 am This still happens with core 23. Core 24 however seems to not have issues.
Screenshots or it didn't happen ;)
Is there any way to force a core 23 work? Or for me to do a dummy job? I haven't found anything on the forums that would achieve this.
Post Reply