Only 2 of 4 GPUs detected (multi PCIe domain issue)
Posted: Mon Mar 30, 2026 9:12 am
Hi,
I'm running FAH v8.5.5 on an HPE DL380a Gen12 server with 4x NVIDIA H200 NVL GPUs. The client only detects 2 of the 4 GPUs.
System Info:
OS: Ubuntu (kernel 6.17)
CPU: Intel Xeon 6760P (256 threads)
RAM: 2TB
GPU: 4x NVIDIA H200 NVL
Driver: 580.95
FAH Client: v8.5.5
GPU topology (nvidia-smi topo -m):
PCIe bus IDs (nvidia-smi --query-gpu=index,name,pci.bus_id --format=csv):
GPU0 and GPU1 are on PCIe domain 00000000, while GPU2 and GPU3 are on domain 00000001. Both domains share the same bus numbers (AC and D4).
FAH log shows only 2 GPUs detected:
It appears that the GPU ID format gpu:BUS:DEV:FN does not include the PCIe domain. Since both domains have the same bus numbers (AC=172, D4=212), the client only picks up one set of GPUs (domain 1) and misses the other two (domain 0).
Would it be possible to include the PCIe domain in the GPU identifier (e.g., gpu:DOMAIN:BUS:DEV:FN) so that all GPUs are properly enumerated on multi-domain systems?
Thanks for your time!
I'm running FAH v8.5.5 on an HPE DL380a Gen12 server with 4x NVIDIA H200 NVL GPUs. The client only detects 2 of the 4 GPUs.
System Info:
OS: Ubuntu (kernel 6.17)
CPU: Intel Xeon 6760P (256 threads)
RAM: 2TB
GPU: 4x NVIDIA H200 NVL
Driver: 580.95
FAH Client: v8.5.5
GPU topology (nvidia-smi topo -m):
Code: Select all
GPU0 GPU1 GPU2 GPU3
GPU0 X NV18 SYS SYS
GPU1 NV18 X SYS SYS
GPU2 SYS SYS X NV18
GPU3 SYS SYS NV18 XCode: Select all
0, NVIDIA H200 NVL, 00000000:AC:00.0
1, NVIDIA H200 NVL, 00000000:D4:00.0
2, NVIDIA H200 NVL, 00000001:AC:00.0
3, NVIDIA H200 NVL, 00000001:D4:00.0FAH log shows only 2 GPUs detected:
Code: Select all
"gpu:172:00:00": {
"description": "NVIDIA H200 NVL",
"uuid": "a03316c8-9bba-37a6-5a7e-af0e1a66797b", ← GPU2
...
},
"gpu:212:00:00": {
"description": "NVIDIA H200 NVL",
"uuid": "4b963b13-f43a-2b27-c8c6-de3c9a2446af", ← GPU3
...
}Would it be possible to include the PCIe domain in the GPU identifier (e.g., gpu:DOMAIN:BUS:DEV:FN) so that all GPUs are properly enumerated on multi-domain systems?
Thanks for your time!