13434/13435 Stalled Work Units
Posted: Mon Jan 11, 2021 8:48 pm
I have received 3 WUs for 13434 and 13435 over the last couple of days. All WUs have eventually completed, but each WU threw a "WU_stalled" flag 2 - 3 times per simulation. I haven't seen this on any other work units over the last few months. These WUs have all been assigned to my RTX 2070 Super.
13434 (248, 1, 28)
13434 (104, 3, 45)
13435 (348,1,21)
18:38:46:***********************************************************************
13434 (248, 1, 28)
13434 (104, 3, 45)
13435 (348,1,21)
Code: Select all
18:48:28:WU00:FS01:0x22:Completed 3650000 out of 5000000 steps (73%)
19:00:47:WU00:FS01:0x22:Watchdog triggered, requesting soft shutdown down
19:10:47:WU00:FS01:0x22:Watchdog shutdown failed, hard shutdown triggered
19:10:47:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
19:10:47:WARNING:WU00:FS01:FahCore returned: WU_STALLED (127 = 0x7f)
19:10:47:WU00:FS01:Starting
19:10:47:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\user\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 3568 -checkpoint 30 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
19:10:47:WU00:FS01:Started FahCore on PID 11696
19:10:47:WU00:FS01:Core PID:3988
19:10:47:WU00:FS01:FahCore 0x22 started
19:10:48:WU00:FS01:0x22:*********************** Log Started 2021-01-11T19:10:47Z ***********************
19:10:48:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:10:48:WU00:FS01:0x22: Core: Core22
19:10:48:WU00:FS01:0x22: Type: 0x22
19:10:48:WU00:FS01:0x22: Version: 0.0.13
19:10:48:WU00:FS01:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:10:48:WU00:FS01:0x22: Copyright: 2020 foldingathome.org
19:10:48:WU00:FS01:0x22: Homepage: https://foldingathome.org/
19:10:48:WU00:FS01:0x22: Date: Sep 19 2020
19:10:48:WU00:FS01:0x22: Time: 02:35:58
19:10:48:WU00:FS01:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
19:10:48:WU00:FS01:0x22: Branch: core22-0.0.13
19:10:48:WU00:FS01:0x22: Compiler: Visual C++ 2015
19:10:48:WU00:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:10:48:WU00:FS01:0x22: -DOPENMM_GIT_HASH="\"189320d0\""
19:10:48:WU00:FS01:0x22: Platform: win32 10
19:10:48:WU00:FS01:0x22: Bits: 64
19:10:48:WU00:FS01:0x22: Mode: Release
19:10:48:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:10:48:WU00:FS01:0x22: <peastman@stanford.edu>
19:10:48:WU00:FS01:0x22: Args: -dir 00 -suffix 01 -version 706 -lifeline 11696 -checkpoint 30
19:10:48:WU00:FS01:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
19:10:48:WU00:FS01:0x22: nvidia -gpu 0 -gpu-usage 100
19:10:48:WU00:FS01:0x22:************************************ libFAH ************************************
19:10:48:WU00:FS01:0x22: Date: Sep 7 2020
19:10:48:WU00:FS01:0x22: Time: 19:09:56
19:10:48:WU00:FS01:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
19:10:48:WU00:FS01:0x22: Branch: HEAD
19:10:48:WU00:FS01:0x22: Compiler: Visual C++ 2015
19:10:48:WU00:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:10:48:WU00:FS01:0x22: Platform: win32 10
19:10:48:WU00:FS01:0x22: Bits: 64
19:10:48:WU00:FS01:0x22: Mode: Release
19:10:48:WU00:FS01:0x22:************************************ CBang *************************************
19:10:48:WU00:FS01:0x22: Date: Sep 7 2020
19:10:48:WU00:FS01:0x22: Time: 19:08:30
19:10:48:WU00:FS01:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
19:10:48:WU00:FS01:0x22: Branch: HEAD
19:10:48:WU00:FS01:0x22: Compiler: Visual C++ 2015
19:10:48:WU00:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:10:48:WU00:FS01:0x22: Platform: win32 10
19:10:48:WU00:FS01:0x22: Bits: 64
19:10:48:WU00:FS01:0x22: Mode: Release
19:10:48:WU00:FS01:0x22:************************************ System ************************************
19:10:48:WU00:FS01:0x22: CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
19:10:48:WU00:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:10:48:WU00:FS01:0x22: CPUs: 12
19:10:48:WU00:FS01:0x22: Memory: 15.94GiB
19:10:48:WU00:FS01:0x22:Free Memory: 9.41GiB
19:10:48:WU00:FS01:0x22: Threads: WINDOWS_THREADS
19:10:48:WU00:FS01:0x22: OS Version: 6.2
19:10:48:WU00:FS01:0x22:Has Battery: false
19:10:48:WU00:FS01:0x22: On Battery: false
19:10:48:WU00:FS01:0x22: UTC Offset: -8
19:10:48:WU00:FS01:0x22: PID: 3988
19:10:48:WU00:FS01:0x22: CWD: C:\Users\user\AppData\Roaming\FAHClient\work
19:10:48:WU00:FS01:0x22:************************************ OpenMM ************************************
19:10:48:WU00:FS01:0x22: Revision: 189320d0
19:10:48:WU00:FS01:0x22:********************************************************************************
19:10:48:WU00:FS01:0x22:Project: 13434 (Run 104, Clone 3, Gen 45)
19:10:48:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
19:10:48:WU00:FS01:0x22:Digital signatures verified
19:10:48:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:10:48:WU00:FS01:0x22:Version 0.0.13
19:10:48:WU00:FS01:0x22: Checkpoint write interval: 250000 steps (5%) [20 total]
19:10:48:WU00:FS01:0x22: JSON viewer frame write interval: 50000 steps (1%) [100 total]
19:10:48:WU00:FS01:0x22: XTC frame write interval: 250000 steps (5%) [20 total]
19:10:48:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
19:10:48:WU00:FS01:0x22:There are 4 platforms available.
19:10:48:WU00:FS01:0x22:Platform 0: Reference
19:10:48:WU00:FS01:0x22:Platform 1: CPU
19:10:48:WU00:FS01:0x22:Platform 2: OpenCL
19:10:48:WU00:FS01:0x22: opencl-device 0 specified
19:10:48:WU00:FS01:0x22:Platform 3: CUDA
19:10:48:WU00:FS01:0x22: cuda-device 0 specified
19:10:54:WU00:FS01:0x22:Attempting to create CUDA context:
19:10:54:WU00:FS01:0x22: Configuring platform CUDA
19:10:56:WU00:FS01:0x22: Using CUDA and gpu 0
19:10:56:WU00:FS01:0x22:Completed 3500000 out of 5000000 steps (70%)
19:13:35:WU00:FS01:0x22:Completed 3550000 out of 5000000 steps (71%)
19:16:15:WU00:FS01:0x22:Completed 3600000 out of 5000000 steps (72%)
19:18:54:WU00:FS01:0x22:Completed 3650000 out of 5000000 steps (73%)
19:21:34:WU00:FS01:0x22:Completed 3700000 out of 5000000 steps (74%)
19:24:13:WU00:FS01:0x22:Completed 3750000 out of 5000000 steps (75%)
19:24:14:WU00:FS01:0x22:Checkpoint completed at step 3750000
Code: Select all
18:38:46:******************************* System ********************************
18:38:46: CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
18:38:46: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
18:38:46: CPUs: 12
18:38:46: Memory: 15.94GiB
18:38:46: Free Memory: 9.28GiB
18:38:46: Threads: WINDOWS_THREADS
18:38:46: OS Version: 6.2
18:38:46: Has Battery: false
18:38:46: On Battery: false
18:38:46: UTC Offset: -8
18:38:46: PID: 3568
18:38:46: CWD: C:\Users\user\AppData\Roaming\FAHClient
18:38:46: Win32 Service: false
18:38:46: OS: Windows 10 Enterprise
18:38:46: OS Arch: AMD64
18:38:46: GPUs: 2
18:38:46: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 SUPER]
18:38:46: 8218
18:38:46: GPU 1: Bus:2 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 970] 3494
18:38:46: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.2
18:38:46: CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:5.2 Driver:11.2
18:38:46:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:461.9
18:38:46:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:461.9