Only 2 of 4 GPUs detected (multi PCIe domain issue)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Mowd
Posts: 1
Joined: Mon Mar 30, 2026 9:06 am

Only 2 of 4 GPUs detected (multi PCIe domain issue)

Post by Mowd »

Hi,
I'm running FAH v8.5.5 on an HPE DL380a Gen12 server with 4x NVIDIA H200 NVL GPUs. The client only detects 2 of the 4 GPUs.
System Info:

OS: Ubuntu (kernel 6.17)
CPU: Intel Xeon 6760P (256 threads)
RAM: 2TB
GPU: 4x NVIDIA H200 NVL
Driver: 580.95
FAH Client: v8.5.5

GPU topology (nvidia-smi topo -m):

Code: Select all

        GPU0    GPU1    GPU2    GPU3
GPU0     X      NV18    SYS     SYS
GPU1    NV18     X      SYS     SYS
GPU2    SYS     SYS      X      NV18
GPU3    SYS     SYS     NV18     X
PCIe bus IDs (nvidia-smi --query-gpu=index,name,pci.bus_id --format=csv):

Code: Select all

0, NVIDIA H200 NVL, 00000000:AC:00.0
1, NVIDIA H200 NVL, 00000000:D4:00.0
2, NVIDIA H200 NVL, 00000001:AC:00.0
3, NVIDIA H200 NVL, 00000001:D4:00.0
GPU0 and GPU1 are on PCIe domain 00000000, while GPU2 and GPU3 are on domain 00000001. Both domains share the same bus numbers (AC and D4).
FAH log shows only 2 GPUs detected:

Code: Select all

"gpu:172:00:00": {
    "description": "NVIDIA H200 NVL",
    "uuid": "a03316c8-9bba-37a6-5a7e-af0e1a66797b",   ← GPU2
    ...
},
"gpu:212:00:00": {
    "description": "NVIDIA H200 NVL",
    "uuid": "4b963b13-f43a-2b27-c8c6-de3c9a2446af",   ← GPU3
    ...
}
It appears that the GPU ID format gpu:BUS:DEV:FN does not include the PCIe domain. Since both domains have the same bus numbers (AC=172, D4=212), the client only picks up one set of GPUs (domain 1) and misses the other two (domain 0).
Would it be possible to include the PCIe domain in the GPU identifier (e.g., gpu:DOMAIN:BUS:DEV:FN) so that all GPUs are properly enumerated on multi-domain systems?
Thanks for your time!
toTOW
Site Moderator
Posts: 6545
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Only 2 of 4 GPUs detected (multi PCIe domain issue)

Post by toTOW »

I have the impression that FAH Client doesn't support this specific PCIe mapping.

I asked the dev to have a look.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply