issue with GPUs on multiple PCI buses
Posted: Sun Dec 19, 2021 5:30 pm
On Windows F@H 7.6.21, Core22 seems launch with parameters that make it run only on a GPU on the internal PCIe bus.
The host is a Windows 10 PC with an Nvidia GPU in an internal PCIe slot and and discrete AMD GPUs connected via Thunderbolt interfaces.
The PCIe buses that represent the Thunderbolt interfaces show as detected in the log. But the parameters generated for Core22 don't seem to use this information.
Here is the log of the case when the only Running job is an AMD card on Folding Slot ID 2 PCIe bus 124 device 0, but Core22 is using the Nvidia card that is defined as Folding Slot 0 bus 225 device 0. I confirm the Nvidia GPU is doing the work by using HWinfo.
The host is a Windows 10 PC with an Nvidia GPU in an internal PCIe slot and and discrete AMD GPUs connected via Thunderbolt interfaces.
The PCIe buses that represent the Thunderbolt interfaces show as detected in the log. But the parameters generated for Core22 don't seem to use this information.
Here is the log of the case when the only Running job is an AMD card on Folding Slot ID 2 PCIe bus 124 device 0, but Core22 is using the Nvidia card that is defined as Folding Slot 0 bus 225 device 0. I confirm the Nvidia GPU is doing the work by using HWinfo.
Code: Select all
*********************** Log Started 2021-12-19T17:00:26Z ***********************
17:00:26:******************************* libFAH ********************************
17:00:26: Date: Oct 20 2020
17:00:26: Time: 13:36:55
17:00:26: Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
17:00:26: Branch: master
17:00:26: Compiler: Visual C++ 2015
17:00:26: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
17:00:26: Platform: win32 10
17:00:26: Bits: 32
17:00:26: Mode: Release
17:00:26:****************************** FAHClient ******************************
17:00:26: Version: 7.6.21
17:00:26: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:00:26: Copyright: 2020 foldingathome.org
17:00:26: Homepage: https://foldingathome.org/
17:00:26: Date: Oct 20 2020
17:00:26: Time: 13:41:04
17:00:26: Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
17:00:26: Branch: master
17:00:26: Compiler: Visual C++ 2015
17:00:26: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
17:00:26: Platform: win32 10
17:00:26: Bits: 32
17:00:26: Mode: Release
17:00:26: Args: --open-web-control
17:00:26: Config: C:\ProgramData\FAHClient\config.xml
17:00:26:******************************** CBang ********************************
17:00:26: Date: Oct 20 2020
17:00:26: Time: 11:36:18
17:00:26: Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
17:00:26: Branch: master
17:00:26: Compiler: Visual C++ 2015
17:00:26: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
17:00:26: Platform: win32 10
17:00:26: Bits: 32
17:00:26: Mode: Release
17:00:26:******************************* System ********************************
17:00:26: CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
17:00:26: CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
17:00:26: CPUs: 24
17:00:26: Memory: 31.84GiB
17:00:26: Free Memory: 26.63GiB
17:00:26: Threads: WINDOWS_THREADS
17:00:26: OS Version: 6.2
17:00:26: Has Battery: false
17:00:26: On Battery: false
17:00:26: UTC Offset: -5
17:00:26: PID: 10640
17:00:26: CWD: C:\ProgramData\FAHClient
17:00:26: Win32 Service: false
17:00:26: OS: Windows 10 Enterprise
17:00:26: OS Arch: AMD64
17:00:26: GPUs: 3
17:00:26: GPU 0: Bus:72 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284
17:00:26: GPU 1: Bus:225 Slot:0 Func:0 NVIDIA:8 GA102 [GeForce RTX 3090]
17:00:26: GPU 2: Bus:124 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284
17:00:26: CUDA Device 0: Platform:0 Device:0 Bus:225 Slot:0 Compute:8.6 Driver:11.5
17:00:26:OpenCL Device 0: Platform:0 Device:0 Bus:225 Slot:0 Compute:3.0 Driver:497.9
17:00:26:OpenCL Device 1: Platform:1 Device:0 Bus:124 Slot:0 Compute:1.2 Driver:3354.13
17:00:26:OpenCL Device 2: Platform:1 Device:1 Bus:72 Slot:0 Compute:1.2 Driver:3354.13
17:00:26:***********************************************************************
17:00:26:<config>
17:00:26: <!-- Folding Slot Configuration -->
17:00:26: <cause v='COVID_19'/>
17:00:26:
17:00:26: <!-- Network -->
17:00:26: <proxy v=':8080'/>
17:00:26:
17:00:26: <!-- User Information -->
17:00:26: <passkey v='*****'/>
17:00:26: <team v='234771'/>
17:00:26: <user v='atlr'/>
17:00:26:
17:00:26: <!-- Folding Slots -->
17:00:26: <slot id='1' type='GPU'>
17:00:26: <paused v='true'/>
17:00:26: <pci-bus v='72'/>
17:00:26: <pci-slot v='0'/>
17:00:26: </slot>
17:00:26: <slot id='0' type='GPU'>
17:00:26: <paused v='true'/>
17:00:26: <pci-bus v='225'/>
17:00:26: <pci-slot v='0'/>
17:00:26: </slot>
17:00:26: <slot id='2' type='GPU'>
17:00:26: <paused v='true'/>
17:00:26: <pci-bus v='124'/>
17:00:26: <pci-slot v='0'/>
17:00:26: </slot>
17:00:26:</config>
17:00:26:Trying to access database...
17:00:26:Successfully acquired database lock
17:00:26:FS01:Initialized folding slot 01: gpu:72:0 Vega 20 [Radeon VII] 13,284
17:00:26:FS00:Initialized folding slot 00: gpu:225:0 GA102 [GeForce RTX 3090]
17:00:26:FS02:Initialized folding slot 02: gpu:124:0 Vega 20 [Radeon VII] 13,284
17:01:27:Removing old file 'configs/config-20211219-162242.xml'
17:01:27:Saving configuration to config.xml
17:01:27:<config>
17:01:27: <!-- Folding Slot Configuration -->
17:01:27: <cause v='COVID_19'/>
17:01:27:
17:01:27: <!-- Network -->
17:01:27: <proxy v=':8080'/>
17:01:27:
17:01:27: <!-- Slot Control -->
17:01:27: <power v='FULL'/>
17:01:27:
17:01:27: <!-- User Information -->
17:01:27: <passkey v='*****'/>
17:01:27: <team v='234771'/>
17:01:27: <user v='atlr'/>
17:01:27:
17:01:27: <!-- Folding Slots -->
17:01:27: <slot id='1' type='GPU'>
17:01:27: <paused v='true'/>
17:01:27: <pci-bus v='72'/>
17:01:27: <pci-slot v='0'/>
17:01:27: </slot>
17:01:27: <slot id='0' type='GPU'>
17:01:27: <paused v='true'/>
17:01:27: <pci-bus v='225'/>
17:01:27: <pci-slot v='0'/>
17:01:27: </slot>
17:01:27: <slot id='2' type='GPU'>
17:01:27: <paused v='true'/>
17:01:27: <pci-bus v='124'/>
17:01:27: <pci-slot v='0'/>
17:01:27: </slot>
17:01:27:</config>
17:01:38:FS02:Unpaused
17:01:38:WU03:FS02:Starting
17:01:38:WU03:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.18/Core_22.fah/FahCore_22.exe -dir 03 -suffix 01 -version 706 -lifeline 10640 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
17:01:38:WU03:FS02:Started FahCore on PID 5704
17:01:38:WU03:FS02:Core PID:12340
17:01:38:WU03:FS02:FahCore 0x22 started
17:01:39:WU03:FS02:0x22:*********************** Log Started 2021-12-19T17:01:38Z ***********************
17:01:39:WU03:FS02:0x22:*************************** Core22 Folding@home Core ***************************
17:01:39:WU03:FS02:0x22: Core: Core22
17:01:39:WU03:FS02:0x22: Type: 0x22
17:01:39:WU03:FS02:0x22: Version: 0.0.18
17:01:39:WU03:FS02:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:01:39:WU03:FS02:0x22: Copyright: 2020 foldingathome.org
17:01:39:WU03:FS02:0x22: Homepage: https://foldingathome.org/
17:01:39:WU03:FS02:0x22: Date: Sep 28 2021
17:01:39:WU03:FS02:0x22: Time: 05:55:05
17:01:39:WU03:FS02:0x22: Revision: cfe3d7d990e8f456e371f8ce63b5fcc6daab2103
17:01:39:WU03:FS02:0x22: Branch: HEAD
17:01:39:WU03:FS02:0x22: Compiler: Visual C++
17:01:39:WU03:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:01:39:WU03:FS02:0x22: -DOPENMM_VERSION="\"7.6.0\""
17:01:39:WU03:FS02:0x22: Platform: win32 10
17:01:39:WU03:FS02:0x22: Bits: 64
17:01:39:WU03:FS02:0x22: Mode: Release
17:01:39:WU03:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
17:01:39:WU03:FS02:0x22: <peastman@stanford.edu>
17:01:39:WU03:FS02:0x22: Args: -dir 03 -suffix 01 -version 706 -lifeline 5704 -checkpoint 15
17:01:39:WU03:FS02:0x22: -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0
17:01:39:WU03:FS02:0x22: -gpu-usage 100
17:01:39:WU03:FS02:0x22:************************************ libFAH ************************************
17:01:39:WU03:FS02:0x22: Date: Sep 28 2021
17:01:39:WU03:FS02:0x22: Time: 05:53:43
17:01:39:WU03:FS02:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:01:39:WU03:FS02:0x22: Branch: HEAD
17:01:39:WU03:FS02:0x22: Compiler: Visual C++
17:01:39:WU03:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:01:39:WU03:FS02:0x22: Platform: win32 10
17:01:39:WU03:FS02:0x22: Bits: 64
17:01:39:WU03:FS02:0x22: Mode: Release
17:01:39:WU03:FS02:0x22:************************************ CBang *************************************
17:01:39:WU03:FS02:0x22: Date: Sep 28 2021
17:01:39:WU03:FS02:0x22: Time: 05:52:38
17:01:39:WU03:FS02:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:01:39:WU03:FS02:0x22: Branch: HEAD
17:01:39:WU03:FS02:0x22: Compiler: Visual C++
17:01:39:WU03:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:01:39:WU03:FS02:0x22: Platform: win32 10
17:01:39:WU03:FS02:0x22: Bits: 64
17:01:39:WU03:FS02:0x22: Mode: Release
17:01:39:WU03:FS02:0x22:************************************ System ************************************
17:01:39:WU03:FS02:0x22: CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
17:01:39:WU03:FS02:0x22: CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
17:01:39:WU03:FS02:0x22: CPUs: 24
17:01:39:WU03:FS02:0x22: Memory: 31.84GiB
17:01:39:WU03:FS02:0x22:Free Memory: 26.56GiB
17:01:39:WU03:FS02:0x22: Threads: WINDOWS_THREADS
17:01:39:WU03:FS02:0x22: OS Version: 6.2
17:01:39:WU03:FS02:0x22:Has Battery: false
17:01:39:WU03:FS02:0x22: On Battery: false
17:01:39:WU03:FS02:0x22: UTC Offset: -5
17:01:39:WU03:FS02:0x22: PID: 12340
17:01:39:WU03:FS02:0x22: CWD: C:\ProgramData\FAHClient\work
17:01:39:WU03:FS02:0x22:************************************ OpenMM ************************************
17:01:39:WU03:FS02:0x22: Version: 7.6.0
17:01:39:WU03:FS02:0x22:********************************************************************************
17:01:39:WU03:FS02:0x22:Project: 18201 (Run 2755, Clone 0, Gen 31)
17:01:39:WU03:FS02:0x22:Unit: 0x00000000000000000000000000000000
17:01:39:WU03:FS02:0x22:Digital signatures verified
17:01:39:WU03:FS02:0x22:Folding@home GPU Core22 Folding@home Core
17:01:39:WU03:FS02:0x22:Version 0.0.18
17:01:39:WU03:FS02:0x22: Checkpoint write interval: 25000 steps (2%) [50 total]
17:01:39:WU03:FS02:0x22: JSON viewer frame write interval: 12500 steps (1%) [100 total]
17:01:39:WU03:FS02:0x22: XTC frame write interval: 20000 steps (1.6%) [62 total]
17:01:39:WU03:FS02:0x22: Global context and integrator variables write interval: disabled
17:01:39:WU03:FS02:0x22:There are 3 platforms available.
17:01:39:WU03:FS02:0x22:Platform 0: Reference
17:01:39:WU03:FS02:0x22:Platform 1: CPU
17:01:39:WU03:FS02:0x22:Platform 2: OpenCL
17:01:39:WU03:FS02:0x22: opencl-device 0 specified
17:01:56:WU03:FS02:0x22:Attempting to create OpenCL context:
17:01:56:WU03:FS02:0x22: Configuring platform OpenCL
17:01:59:WU03:FS02:0x22: Using OpenCL on platformId 1 and gpu 0
17:01:59:WU03:FS02:0x22:Completed 50000 out of 1250000 steps (4%)
17:02:28:Removing old file 'configs/config-20211219-162343.xml'
17:02:28:Saving configuration to config.xml
17:02:28:<config>
17:02:28: <!-- Folding Slot Configuration -->
17:02:28: <cause v='COVID_19'/>
17:02:28:
17:02:28: <!-- Network -->
17:02:28: <proxy v=':8080'/>
17:02:28:
17:02:28: <!-- Slot Control -->
17:02:28: <power v='FULL'/>
17:02:28:
17:02:28: <!-- User Information -->
17:02:28: <passkey v='*****'/>
17:02:28: <team v='234771'/>
17:02:28: <user v='atlr'/>
17:02:28:
17:02:28: <!-- Folding Slots -->
17:02:28: <slot id='1' type='GPU'>
17:02:28: <paused v='true'/>
17:02:28: <pci-bus v='72'/>
17:02:28: <pci-slot v='0'/>
17:02:28: </slot>
17:02:28: <slot id='0' type='GPU'>
17:02:28: <paused v='true'/>
17:02:28: <pci-bus v='225'/>
17:02:28: <pci-slot v='0'/>
17:02:28: </slot>
17:02:28: <slot id='2' type='GPU'>
17:02:28: <pci-bus v='124'/>
17:02:28: <pci-slot v='0'/>
17:02:28: </slot>
17:02:28:</config>
17:03:06:WU03:FS02:0x22:Completed 62500 out of 1250000 steps (5%)
17:04:12:WU03:FS02:0x22:Completed 75000 out of 1250000 steps (6%)
17:04:13:WU03:FS02:0x22:Checkpoint completed at step 75000
17:05:19:WU03:FS02:0x22:Completed 87500 out of 1250000 steps (7%)