Page 1 of 1

Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Fri May 02, 2025 1:20 am
by homeshark
NVIDIA GA100 [A100 PCIe 80GB] 10DE 20b5 - 10DE 1533

Code: Select all

***************************** Folding@home Client ******************************
    Version: 8.4.9
     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
        Org: foldingathome.org
  Copyright: 2023-2024, foldingathome.org
   Homepage: https://foldingathome.org/
    License: GPL-3.0-or-later
        URL: https://v8-4.foldingathome.org/
       Date: Nov 20 2024
       Time: 14:47:19
   Revision: 360fe71b1bd05bb89814bfb97b73a5bda84802d6
     Branch: master
   Compiler: GNU 8.3.0
    Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
             -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
   Platform: linux 4.19.0-27-cloud-amd64
       Bits: 64
       Mode: Release
       Args: --info
************************************ CBang *************************************
    Version: 1.7.2
     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
        Org: Cauldron Development
  Copyright: Cauldron Development, 2003-2024
   Homepage: https://cauldrondevelopment.com/
    License: LGPL-2.1-or-later
       Date: Nov 19 2024
       Time: 21:54:38
   Revision: 443c54e909eb8d8994405a18fb328b5b05a623a5
     Branch: master
   Compiler: GNU 8.3.0
    Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
             -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
             -fPIC
   Platform: linux 4.19.0-27-cloud-amd64
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: AMD EPYC 7V13 64-Core Processor
     CPU ID: AuthenticAMD Family 25 Model 1 Stepping 1
       CPUs: 24
     Memory: 216.26GiB
Free Memory: 198.73GiB
 OS Version: 6.8
Has Battery: false
 On Battery: false
   Hostname: FaHwUS000000
 UTC Offset: 0
        PID: 29184
        CWD: /opt/fahclient
       Exec: /opt/fahclient/fah-client
********************************************************************************
Image


NVIDIA H100 NVL 10DE 2321 - 10DE 1839

Code: Select all

***************************** Folding@home Client ******************************
    Version: 8.4.9
     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
        Org: foldingathome.org
  Copyright: 2023-2024, foldingathome.org
   Homepage: https://foldingathome.org/
    License: GPL-3.0-or-later
        URL: https://v8-4.foldingathome.org/
       Date: Nov 20 2024
       Time: 14:47:19
   Revision: 360fe71b1bd05bb89814bfb97b73a5bda84802d6
     Branch: master
   Compiler: GNU 8.3.0
    Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
             -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
   Platform: linux 4.19.0-27-cloud-amd64
       Bits: 64
       Mode: Release
       Args: --info
************************************ CBang *************************************
    Version: 1.7.2
     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
        Org: Cauldron Development
  Copyright: Cauldron Development, 2003-2024
   Homepage: https://cauldrondevelopment.com/
    License: LGPL-2.1-or-later
       Date: Nov 19 2024
       Time: 21:54:38
   Revision: 443c54e909eb8d8994405a18fb328b5b05a623a5
     Branch: master
   Compiler: GNU 8.3.0
    Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
             -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
             -fPIC
   Platform: linux 4.19.0-27-cloud-amd64
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: AMD EPYC 9V84 96-Core Processor
     CPU ID: AuthenticAMD Family 25 Model 17 Stepping 1
       CPUs: 80
     Memory: 629.86GiB
Free Memory: 610.22GiB
 OS Version: 6.8
Has Battery: false
 On Battery: false
   Hostname: FaHwUS000001
 UTC Offset: 0
        PID: 29967
        CWD: /opt/fahclient
       Exec: /opt/fahclient/fah-client
********************************************************************************
Image

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Fri May 02, 2025 8:07 pm
by muziqaz
They both have been added

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Sun May 04, 2025 6:00 pm
by homeshark
muziqaz wrote: Fri May 02, 2025 8:07 pm They both have been added
I may have made a mistake(?) - what the system detects is different than what FaH detects
It looks like FaH detects the same IDs for both cards

Card #1
NVIDIA Corporation GA103 [10de:2321] (rev a1)

FaH detects
NVIDIA H100 NVL |PCI Vendor ID 0x1414 PCI Device ID 0xb111

Image

Card #2
NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1)

FaH detects
NVIDIA A100 80GB PCIe |PCI Vendor ID 0x1414 PCI Device ID 0xb111

Image

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Sun May 04, 2025 6:15 pm
by muziqaz
No mistake. You are running some sort of the docker or instance. That docker or instance creates its own vendor pci id, which cannot be added to the whitelist. Whitelist only accepts actual vendor IDs like 1002 for AMD, 10DE for nVidia and 8086 for Intel

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Sun May 04, 2025 10:28 pm
by toTOW
Vendor ID 0x1414 (or 5140 in stupid decimal notation that FAH v8 keep using) is Microsoft Corporation, so something is wrong in your setup ... :(

The Device IDs you provided are correct for the cards. I also added a few other models that had Device IDs around these and that were missing.

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Sun May 04, 2025 11:02 pm
by muziqaz
It's a VM with a GPU pass through. Not the first time I see this ID

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Sun May 04, 2025 11:32 pm
by homeshark
toTOW wrote: Sun May 04, 2025 10:28 pm Vendor ID 0x1414 (or 5140 in stupid decimal notation that FAH v8 keep using) is Microsoft Corporation, so something is wrong in your setup ... :(

The Device IDs you provided are correct for the cards. I also added a few other models that had Device IDs around these and that were missing.
These are Microsoft Azure VMs running Ubuntu 22.04 LTS It's the same install code that I use for the VMs with the T4 cards no issue. But who knows maybe the infrastructure is slightly different for the higher end VMs A100 and H100. Will investigate 😞

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Mon May 05, 2025 6:04 pm
by homeshark
toTOW wrote: Sun May 04, 2025 10:28 pm Vendor ID 0x1414 (or 5140 in stupid decimal notation that FAH v8 keep using) is Microsoft Corporation, so something is wrong in your setup ... :(

The Device IDs you provided are correct for the cards. I also added a few other models that had Device IDs around these and that were missing.
muziqaz wrote: Sun May 04, 2025 11:02 pm It's a VM with a GPU pass through. Not the first time I see this ID
I'm trying to dive more into this... for all intent and purposes - access to the physical GPU is there (verified by nvidia-smi and lspci -vvv -s 0001:00:00.0)

Where does FaH get its information from for PCI Vendor ID /Device ID? - I'd like to query this same information on the supported (NVidia T4) and unsupported system (H100 / A100) so I can see if there is a difference.

Additionally I see the Microsoft vendor on all systems with command lspci -vvv -s 0001:00:00.0 - Capabilities: [250 v1] Vendor Specific Information: ID=1414 Rev=1 Len=040 <?>

I haven't found the Microsoft device ID anywhere just yet - 0xb111

Re: Plz add NVIDIA GA100 [A100 PCIe 80GB] | NVIDIA H100 NVL

Posted: Mon May 05, 2025 6:16 pm
by muziqaz
FAHClient gets its info same way you just got your info. forget 0xb111. The biggest problem is emulated MS PCI ID 1414