Hardware: AMD Ryzen 9 7900X + AMD RX 7900 XTX
OS: Debian Trixie (testing)
Kernel: 6.17.13+deb14-amd64 (also tried Liquorix 6.18.7)
ROCm: 6.4.3 from Debian repos
F@H: v8.5.5
GPU is detected and shows "supported: true". CPU WUs work fine.
GPU WUs start but crash after a few seconds with:
From dmesg:
amdgpu: Failed to pin bo. ret -1
amdgpu: Failed to map wptr bo to GART
From F@H log:
Error initializing context: clCreateCommandQueue (-6)
ERROR:125: Failed to create a GPU-enabled OpenMM Context.
Already tried:
- libstdc++ symlink fix
- Different kernels (Debian 6.17, Liquorix 6.18)
- Disabling KDE compositor
- HSA_OVERRIDE_GFX_VERSION=10.3.0
@muziqaz - I believe I have read that you have 7900 XTX working. What kernel/ROCm version do you use and what tips do you suggest?
Any suggestion is welcome
7900 XTX on Debian Trixie - "Failed to pin bo" error crashes GPU WUs
Moderators: Site Moderators, FAHC Science Team
Re: 7900 XTX on Debian Trixie - "Failed to pin bo" error crashes GPU WUs
Thanks musiqaz, basically I wasn't fully following your guide before.
Extended:
Problem: GPU detected as "supported: true" but WUs crashed after several minutes with:
---
Fix
1. Add AMD ROCm repo (Trixie uses Sequoia/sqv for GPG verification):
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" here
5. Restart:
Now
Extended:
Problem: GPU detected as "supported: true" but WUs crashed after several minutes with:
The root cause was twofold: the conflict between Debian's ROCm packages and AMD's ROCm packages creating two OpenCL platforms, plus wrong GPU architecture detection (gfx1030 instead of gfx1100).amdgpu: Failed to pin bo. ret -1
Error initializing context: clCreateCommandQueue (-6)
Core returned BAD_WORK_UNIT (114)
---
Fix
1. Add AMD ROCm repo (Trixie uses Sequoia/sqv for GPG verification):
2. Install AMD ROCm and remove Debian's conflicting packages:echo 'Apt::Key::gpgvcommand "/usr/bin/gpgv";' | sudo tee /etc/apt/apt.conf.d/99rocm-gpgv
wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/rocm.gpg
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.3 noble main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update
3. Remove duplicate ICD:sudo apt install rocm-opencl-runtime ocl-icd-opencl-dev
sudo apt remove rocm-opencl-icd
4. Remove any HSA overrides (this is something that I introduced, and only meaningful for my case, but to leave it documented):sudo rm /etc/OpenCL/vendors/amdocl64.icd
# Keep only amdocl64_60403_128.icd from AMD
I had [Service]sudo rm /etc/systemd/system/fah-client.service.d/amd-fix.conf
# Or remove any HSA_OVERRIDE_GFX_VERSION lines
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" here
5. Restart:
---sudo systemctl daemon-reload
sudo systemctl restart fah-client
Now
After several years contributing with CPU, now I have for the first time a GPU that can contribute a good amount of Working Units.clinfo | grep -E "Number of platforms|Device Name"
# Should show: 1 platform, gfx1100