- running Win10 on AMD Ryzen TR 1950X (16C/32T using 28T for WUs)
- with 32G memory and
- with 4x NVidia 1080Ti
- each with 11G memory
- using the latest driver 456.71
- all on Hybrid cooling, therefore always staying between 41C-58C temperature,
- with FAH WUs usually keeping the GPUs busy anywhere between 60-90%, but never really hitting 90%+. (In comparison, many BOINC projects I ran in the past used to keep them at 99-100% 24/7)
- running the latest FAH 7.6.20
Thoughts?
If this is something that needs to be debugged, feel free to point me to instrumented EXEs/DLLs and/or instructions to give you what you need. This happens irregularly, yet fairly often on this machine; we would have some detailed debug logs in short order.
Here is the log file:
Code: Select all
******************************* Date: 2020-10-22 *******************************
16:01:45:WU01:FS04:Connecting to assign1.foldingathome.org:80
16:01:45:WU01:FS04:Assigned to work server 129.213.157.105
16:01:45:WU01:FS04:Requesting new work unit for slot 04: gpu:67:0 GP102 [GeForce GTX 1080 Ti] 11380 - RUNNING from 129.213.157.105
16:01:45:WU01:FS04:Connecting to 129.213.157.105:8080
16:01:49:WU01:FS04:Downloading 10.63MiB
16:01:55:WU01:FS04:Download 5.29%
16:02:01:WU01:FS04:Download 7.06%
16:02:07:WU01:FS04:Download 11.17%
16:02:14:WU01:FS04:Download 15.29%
16:02:20:WU01:FS04:Download 17.64%
16:02:26:WU01:FS04:Download 20.00%
16:02:32:WU01:FS04:Download 22.94%
16:02:38:WU01:FS04:Download 25.88%
16:02:44:WU01:FS04:Download 28.82%
16:02:50:WU01:FS04:Download 31.17%
16:02:56:WU01:FS04:Download 34.70%
16:03:03:WU01:FS04:Download 37.64%
16:03:10:WU01:FS04:Download 40.58%
16:03:17:WU01:FS04:Download 43.52%
16:03:23:WU01:FS04:Download 45.28%
16:03:30:WU01:FS04:Download 48.81%
16:03:36:WU01:FS04:Download 52.93%
16:03:42:WU01:FS04:Download 57.05%
16:03:48:WU01:FS04:Download 59.99%
16:03:54:WU01:FS04:Download 63.52%
16:04:01:WU01:FS04:Download 67.04%
16:04:08:WU01:FS04:Download 69.40%
16:04:15:WU01:FS04:Download 72.93%
16:04:21:WU01:FS04:Download 78.22%
16:04:27:WU01:FS04:Download 80.57%
16:04:34:WU01:FS04:Download 84.69%
16:04:40:WU01:FS04:Download 87.63%
16:04:47:WU01:FS04:Download 90.57%
16:04:53:WU01:FS04:Download 93.51%
16:04:59:WU01:FS04:Download 97.63%
16:05:02:WU01:FS04:Download complete
16:05:02:WU01:FS04:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:14904 run:417 clone:0 gen:62 core:0x22 unit:0x0000005981d59d695f4ec9e340f2570a
16:05:02:WU01:FS04:Starting
16:05:02:WU01:FS04:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Master\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 13572 -checkpoint 5 -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu-vendor nvidia -gpu 3 -gpu-usage 100
16:05:02:WU01:FS04:Started FahCore on PID 14572
16:05:02:WU01:FS04:Core PID:11896
16:05:02:WU01:FS04:FahCore 0x22 started
16:05:02:WU01:FS04:0x22:*********************** Log Started 2020-10-22T16:05:02Z ***********************
16:05:02:WU01:FS04:0x22:*************************** Core22 Folding@home Core ***************************
16:05:03:WU01:FS04:0x22: Core: Core22
16:05:03:WU01:FS04:0x22: Type: 0x22
16:05:03:WU01:FS04:0x22: Version: 0.0.13
16:05:03:WU01:FS04:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:05:03:WU01:FS04:0x22: Copyright: 2020 foldingathome.org
16:05:03:WU01:FS04:0x22: Homepage: https://foldingathome.org/
16:05:03:WU01:FS04:0x22: Date: Sep 19 2020
16:05:03:WU01:FS04:0x22: Time: 02:35:58
16:05:03:WU01:FS04:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
16:05:03:WU01:FS04:0x22: Branch: core22-0.0.13
16:05:03:WU01:FS04:0x22: Compiler: Visual C++ 2015
16:05:03:WU01:FS04:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:05:03:WU01:FS04:0x22: -DOPENMM_GIT_HASH="\"189320d0\""
16:05:03:WU01:FS04:0x22: Platform: win32 10
16:05:03:WU01:FS04:0x22: Bits: 64
16:05:03:WU01:FS04:0x22: Mode: Release
16:05:03:WU01:FS04:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
16:05:03:WU01:FS04:0x22: <peastman@stanford.edu>
16:05:03:WU01:FS04:0x22: Args: -dir 01 -suffix 01 -version 706 -lifeline 14572 -checkpoint 5
16:05:03:WU01:FS04:0x22: -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu-vendor
16:05:03:WU01:FS04:0x22: nvidia -gpu 3 -gpu-usage 100
16:05:03:WU01:FS04:0x22:************************************ libFAH ************************************
16:05:03:WU01:FS04:0x22: Date: Sep 7 2020
16:05:03:WU01:FS04:0x22: Time: 19:09:56
16:05:03:WU01:FS04:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
16:05:03:WU01:FS04:0x22: Branch: HEAD
16:05:03:WU01:FS04:0x22: Compiler: Visual C++ 2015
16:05:03:WU01:FS04:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:05:03:WU01:FS04:0x22: Platform: win32 10
16:05:03:WU01:FS04:0x22: Bits: 64
16:05:03:WU01:FS04:0x22: Mode: Release
16:05:03:WU01:FS04:0x22:************************************ CBang *************************************
16:05:03:WU01:FS04:0x22: Date: Sep 7 2020
16:05:03:WU01:FS04:0x22: Time: 19:08:30
16:05:03:WU01:FS04:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
16:05:03:WU01:FS04:0x22: Branch: HEAD
16:05:03:WU01:FS04:0x22: Compiler: Visual C++ 2015
16:05:03:WU01:FS04:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:05:03:WU01:FS04:0x22: Platform: win32 10
16:05:03:WU01:FS04:0x22: Bits: 64
16:05:03:WU01:FS04:0x22: Mode: Release
16:05:03:WU01:FS04:0x22:************************************ System ************************************
16:05:03:WU01:FS04:0x22: CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
16:05:03:WU01:FS04:0x22: CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
16:05:03:WU01:FS04:0x22: CPUs: 32
16:05:03:WU01:FS04:0x22: Memory: 31.88GiB
16:05:03:WU01:FS04:0x22:Free Memory: 25.04GiB
16:05:03:WU01:FS04:0x22: Threads: WINDOWS_THREADS
16:05:03:WU01:FS04:0x22: OS Version: 6.2
16:05:03:WU01:FS04:0x22:Has Battery: false
16:05:03:WU01:FS04:0x22: On Battery: false
16:05:03:WU01:FS04:0x22: UTC Offset: -7
16:05:03:WU01:FS04:0x22: PID: 11896
16:05:03:WU01:FS04:0x22: CWD: C:\Users\Master\AppData\Roaming\FAHClient\work
16:05:03:WU01:FS04:0x22:************************************ OpenMM ************************************
16:05:03:WU01:FS04:0x22: Revision: 189320d0
16:05:03:WU01:FS04:0x22:********************************************************************************
16:05:03:WU01:FS04:0x22:Project: 14904 (Run 417, Clone 0, Gen 62)
16:05:03:WU01:FS04:0x22:Unit: 0x0000005981d59d695f4ec9e340f2570a
16:05:03:WU01:FS04:0x22:Reading tar file core.xml
16:05:03:WU01:FS04:0x22:Reading tar file integrator.xml
16:05:03:WU01:FS04:0x22:Reading tar file state.xml
16:05:03:WU01:FS04:0x22:Reading tar file system.xml
16:05:05:WU01:FS04:0x22:Digital signatures verified
16:05:05:WU01:FS04:0x22:Folding@home GPU Core22 Folding@home Core
16:05:05:WU01:FS04:0x22:Version 0.0.13
16:05:05:WU01:FS04:0x22: Checkpoint write interval: 100000 steps (5%) [20 total]
16:05:05:WU01:FS04:0x22: JSON viewer frame write interval: 20000 steps (1%) [100 total]
16:05:05:WU01:FS04:0x22: XTC frame write interval: 50000 steps (2.5%) [40 total]
16:05:05:WU01:FS04:0x22: Global context and integrator variables write interval: disabled
16:05:05:WU01:FS04:0x22:There are 4 platforms available.
16:05:05:WU01:FS04:0x22:Platform 0: Reference
16:05:05:WU01:FS04:0x22:Platform 1: CPU
16:05:05:WU01:FS04:0x22:Platform 2: OpenCL
16:05:05:WU01:FS04:0x22: opencl-device 3 specified
16:05:05:WU01:FS04:0x22:Platform 3: CUDA
16:05:05:WU01:FS04:0x22: cuda-device 3 specified
16:05:17:WU01:FS04:0x22:Attempting to create CUDA context:
16:05:17:WU01:FS04:0x22: Configuring platform CUDA
16:05:23:WU01:FS04:0x22: Using CUDA and gpu 3
16:05:23:WU01:FS04:0x22:Completed 0 out of 2000000 steps (0%)
16:05:24:WU01:FS04:0x22:Checkpoint completed at step 0
16:06:45:WU01:FS04:0x22:An exception occurred at step 18957: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)
16:06:45:WU01:FS04:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
16:06:45:WU01:FS04:0x22:Folding@home Core Shutdown: CORE_RESTART
17:33:52:WARNING:WU01:FS04:FahCore returned an unknown error code which probably indicates that it crashed
17:33:52:WARNING:WU01:FS04:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
17:33:53:WU01:FS04:Starting
17:33:53:WU01:FS04:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Master\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 13572 -checkpoint 5 -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu-vendor nvidia -gpu 3 -gpu-usage 100
17:33:53:WU01:FS04:Started FahCore on PID 4460
17:33:53:WU01:FS04:Core PID:4012
17:33:53:WU01:FS04:FahCore 0x22 started
17:33:53:WU01:FS04:0x22:*********************** Log Started 2020-10-22T17:33:53Z ***********************
17:33:53:WU01:FS04:0x22:*************************** Core22 Folding@home Core ***************************
17:33:53:WU01:FS04:0x22: Core: Core22
17:33:53:WU01:FS04:0x22: Type: 0x22
17:33:53:WU01:FS04:0x22: Version: 0.0.13
17:33:53:WU01:FS04:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:33:53:WU01:FS04:0x22: Copyright: 2020 foldingathome.org
17:33:53:WU01:FS04:0x22: Homepage: https://foldingathome.org/
17:33:53:WU01:FS04:0x22: Date: Sep 19 2020
17:33:53:WU01:FS04:0x22: Time: 02:35:58
17:33:53:WU01:FS04:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
17:33:53:WU01:FS04:0x22: Branch: core22-0.0.13
17:33:53:WU01:FS04:0x22: Compiler: Visual C++ 2015
17:33:53:WU01:FS04:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:53:WU01:FS04:0x22: -DOPENMM_GIT_HASH="\"189320d0\""
17:33:53:WU01:FS04:0x22: Platform: win32 10
17:33:53:WU01:FS04:0x22: Bits: 64
17:33:53:WU01:FS04:0x22: Mode: Release
17:33:53:WU01:FS04:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
17:33:53:WU01:FS04:0x22: <peastman@stanford.edu>
17:33:53:WU01:FS04:0x22: Args: -dir 01 -suffix 01 -version 706 -lifeline 4460 -checkpoint 5
17:33:53:WU01:FS04:0x22: -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu-vendor
17:33:53:WU01:FS04:0x22: nvidia -gpu 3 -gpu-usage 100
17:33:53:WU01:FS04:0x22:************************************ libFAH ************************************
17:33:53:WU01:FS04:0x22: Date: Sep 7 2020
17:33:53:WU01:FS04:0x22: Time: 19:09:56
17:33:53:WU01:FS04:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:33:53:WU01:FS04:0x22: Branch: HEAD
17:33:53:WU01:FS04:0x22: Compiler: Visual C++ 2015
17:33:53:WU01:FS04:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:53:WU01:FS04:0x22: Platform: win32 10
17:33:53:WU01:FS04:0x22: Bits: 64
17:33:53:WU01:FS04:0x22: Mode: Release
17:33:53:WU01:FS04:0x22:************************************ CBang *************************************
17:33:53:WU01:FS04:0x22: Date: Sep 7 2020
17:33:53:WU01:FS04:0x22: Time: 19:08:30
17:33:53:WU01:FS04:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:33:53:WU01:FS04:0x22: Branch: HEAD
17:33:53:WU01:FS04:0x22: Compiler: Visual C++ 2015
17:33:53:WU01:FS04:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:53:WU01:FS04:0x22: Platform: win32 10
17:33:53:WU01:FS04:0x22: Bits: 64
17:33:53:WU01:FS04:0x22: Mode: Release
17:33:53:WU01:FS04:0x22:************************************ System ************************************
17:33:53:WU01:FS04:0x22: CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
17:33:53:WU01:FS04:0x22: CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
17:33:53:WU01:FS04:0x22: CPUs: 32
17:33:53:WU01:FS04:0x22: Memory: 31.88GiB
17:33:53:WU01:FS04:0x22:Free Memory: 25.61GiB
17:33:53:WU01:FS04:0x22: Threads: WINDOWS_THREADS
17:33:53:WU01:FS04:0x22: OS Version: 6.2
17:33:53:WU01:FS04:0x22:Has Battery: false
17:33:53:WU01:FS04:0x22: On Battery: false
17:33:53:WU01:FS04:0x22: UTC Offset: -7
17:33:53:WU01:FS04:0x22: PID: 4012
17:33:53:WU01:FS04:0x22: CWD: C:\Users\Master\AppData\Roaming\FAHClient\work
17:33:53:WU01:FS04:0x22:************************************ OpenMM ************************************
17:33:53:WU01:FS04:0x22: Revision: 189320d0
17:33:53:WU01:FS04:0x22:********************************************************************************
17:33:54:WU01:FS04:0x22:Project: 14904 (Run 417, Clone 0, Gen 62)
17:33:54:WU01:FS04:0x22:Unit: 0x0000005981d59d695f4ec9e340f2570a
17:33:54:WU01:FS04:0x22:Digital signatures verified
17:33:54:WU01:FS04:0x22:Folding@home GPU Core22 Folding@home Core
17:33:54:WU01:FS04:0x22:Version 0.0.13
17:33:54:WU01:FS04:0x22: Checkpoint write interval: 100000 steps (5%) [20 total]
17:33:54:WU01:FS04:0x22: JSON viewer frame write interval: 20000 steps (1%) [100 total]
17:33:54:WU01:FS04:0x22: XTC frame write interval: 50000 steps (2.5%) [40 total]
17:33:54:WU01:FS04:0x22: Global context and integrator variables write interval: disabled
17:33:54:WU01:FS04:0x22:There are 4 platforms available.
17:33:54:WU01:FS04:0x22:Platform 0: Reference
17:33:54:WU01:FS04:0x22:Platform 1: CPU
17:33:54:WU01:FS04:0x22:Platform 2: OpenCL
17:33:54:WU01:FS04:0x22: opencl-device 3 specified
17:33:54:WU01:FS04:0x22:Platform 3: CUDA
17:33:54:WU01:FS04:0x22: cuda-device 3 specified
17:34:04:WU01:FS04:0x22:Attempting to create CUDA context:
17:34:04:WU01:FS04:0x22: Configuring platform CUDA
17:34:10:WU01:FS04:0x22: Using CUDA and gpu 3
17:34:10:WU01:FS04:0x22:Completed 0 out of 2000000 steps (0%)
17:35:25:WU01:FS04:0x22:Completed 20000 out of 2000000 steps (1%)
17:36:42:WU01:FS04:0x22:Completed 40000 out of 2000000 steps (2%)
17:37:52:WU01:FS04:0x22:Completed 60000 out of 2000000 steps (3%)
17:39:02:WU01:FS04:0x22:Completed 80000 out of 2000000 steps (4%)
17:40:11:WU01:FS04:0x22:Completed 100000 out of 2000000 steps (5%)
17:40:12:WU01:FS04:0x22:Checkpoint completed at step 100000
17:41:21:WU01:FS04:0x22:Completed 120000 out of 2000000 steps (6%)
17:42:30:WU01:FS04:0x22:Completed 140000 out of 2000000 steps (7%)
17:43:39:WU01:FS04:0x22:Completed 160000 out of 2000000 steps (8%)
17:44:53:WU01:FS04:0x22:Completed 180000 out of 2000000 steps (9%)
17:46:12:WU01:FS04:0x22:Completed 200000 out of 2000000 steps (10%)
17:46:13:WU01:FS04:0x22:Checkpoint completed at step 200000
17:47:33:WU01:FS04:0x22:Completed 220000 out of 2000000 steps (11%)
17:48:53:WU01:FS04:0x22:Completed 240000 out of 2000000 steps (12%)
17:50:12:WU01:FS04:0x22:Completed 260000 out of 2000000 steps (13%)
17:51:32:WU01:FS04:0x22:Completed 280000 out of 2000000 steps (14%)
17:52:51:WU01:FS04:0x22:Completed 300000 out of 2000000 steps (15%)
17:52:52:WU01:FS04:0x22:Checkpoint completed at step 300000
17:54:10:WU01:FS04:0x22:Completed 320000 out of 2000000 steps (16%)
17:55:28:WU01:FS04:0x22:Completed 340000 out of 2000000 steps (17%)
17:56:47:WU01:FS04:0x22:Completed 360000 out of 2000000 steps (18%)
17:58:06:WU01:FS04:0x22:Completed 380000 out of 2000000 steps (19%)
17:59:25:WU01:FS04:0x22:Completed 400000 out of 2000000 steps (20%)
17:59:26:WU01:FS04:0x22:Checkpoint completed at step 400000
18:00:35:WU01:FS04:0x22:Completed 420000 out of 2000000 steps (21%)
18:01:39:WU01:FS04:0x22:Completed 440000 out of 2000000 steps (22%)
18:02:39:WU01:FS04:0x22:Completed 460000 out of 2000000 steps (23%)
18:03:38:WU01:FS04:0x22:Completed 480000 out of 2000000 steps (24%)
18:04:38:WU01:FS04:0x22:Completed 500000 out of 2000000 steps (25%)
18:04:39:WU01:FS04:0x22:Checkpoint completed at step 500000
18:05:38:WU01:FS04:0x22:Completed 520000 out of 2000000 steps (26%)
18:06:37:WU01:FS04:0x22:Completed 540000 out of 2000000 steps (27%)
Tuna
Here