Page 1 of 1

7900 XTX on Debian Trixie - "Failed to pin bo" error crashes GPU WUs

Posted: Mon Jan 26, 2026 10:30 am
by Vejeta
Hardware: AMD Ryzen 9 7900X + AMD RX 7900 XTX
OS: Debian Trixie (testing)
Kernel: 6.17.13+deb14-amd64 (also tried Liquorix 6.18.7)
ROCm: 6.4.3 from Debian repos
F@H: v8.5.5

GPU is detected and shows "supported: true". CPU WUs work fine.
GPU WUs start but crash after a few seconds with:

From dmesg:
amdgpu: Failed to pin bo. ret -1
amdgpu: Failed to map wptr bo to GART

From F@H log:
Error initializing context: clCreateCommandQueue (-6)
ERROR:125: Failed to create a GPU-enabled OpenMM Context.

Already tried:
- libstdc++ symlink fix
- Different kernels (Debian 6.17, Liquorix 6.18)
- Disabling KDE compositor
- HSA_OVERRIDE_GFX_VERSION=10.3.0

@muziqaz - I believe I have read that you have 7900 XTX working. What kernel/ROCm version do you use and what tips do you suggest?

Any suggestion is welcome

Re: 7900 XTX on Debian Trixie - "Failed to pin bo" error crashes GPU WUs

Posted: Mon Jan 26, 2026 4:57 pm
by muziqaz

Re: 7900 XTX on Debian Trixie - "Failed to pin bo" error crashes GPU WUs

Posted: Tue Jan 27, 2026 12:58 am
by Vejeta
Thanks musiqaz, basically I wasn't fully following your guide before.

Extended:

Problem: GPU detected as "supported: true" but WUs crashed after several minutes with:
amdgpu: Failed to pin bo. ret -1
Error initializing context: clCreateCommandQueue (-6)
Core returned BAD_WORK_UNIT (114)
The root cause was twofold: the conflict between Debian's ROCm packages and AMD's ROCm packages creating two OpenCL platforms, plus wrong GPU architecture detection (gfx1030 instead of gfx1100).

---

Fix

1. Add AMD ROCm repo (Trixie uses Sequoia/sqv for GPG verification):
echo 'Apt::Key::gpgvcommand "/usr/bin/gpgv";' | sudo tee /etc/apt/apt.conf.d/99rocm-gpgv
wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/rocm.gpg
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.3 noble main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update
2. Install AMD ROCm and remove Debian's conflicting packages:
sudo apt install rocm-opencl-runtime ocl-icd-opencl-dev
sudo apt remove rocm-opencl-icd
3. Remove duplicate ICD:
sudo rm /etc/OpenCL/vendors/amdocl64.icd
# Keep only amdocl64_60403_128.icd from AMD
4. Remove any HSA overrides (this is something that I introduced, and only meaningful for my case, but to leave it documented):
sudo rm /etc/systemd/system/fah-client.service.d/amd-fix.conf
# Or remove any HSA_OVERRIDE_GFX_VERSION lines
I had [Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" here


5. Restart:
sudo systemctl daemon-reload
sudo systemctl restart fah-client
---

Now
clinfo | grep -E "Number of platforms|Device Name"
# Should show: 1 platform, gfx1100
After several years contributing with CPU, now I have for the first time a GPU that can contribute a good amount of Working Units.