Same issue as reported on 18201 here: viewtopic.php?f=19&t=37275
This WU also processed normally, but on completion, the upload to 128.252.203.11 failed to send, retried 128.252.203.2, got a WORK_QUIT (404) and the work unit was dumped
14:04:22:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:18202 run:9152 clone:0 gen:4 core:0x22 unit:0x00000000000000040000471a000023c0
14:04:22:WU02:FS01:Starting
14:04:22:WU02:FS01:Running FahCore: /snap/folding-at-home-fcole90/168/usr/bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 02 -suffix 01 -version 706 -lifeline 2235 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:04:22:WU02:FS01:Started FahCore on PID 1233704
14:04:22:WU02:FS01:Core PID:1233708
14:04:22:WU02:FS01:FahCore 0x22 started
14:04:22:WU02:FS01:0x22:*********************** Log Started 2021-06-24T14:04:22Z ***********************
14:04:22:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
14:04:22:WU02:FS01:0x22: Core: Core22
14:04:22:WU02:FS01:0x22: Type: 0x22
14:04:22:WU02:FS01:0x22: Version: 0.0.13
14:04:22:WU02:FS01:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:04:22:WU02:FS01:0x22: Copyright: 2020 foldingathome.org
14:04:22:WU02:FS01:0x22: Homepage: https://foldingathome.org/
14:04:22:WU02:FS01:0x22: Date: Sep 19 2020
14:04:22:WU02:FS01:0x22: Time: 01:10:35
14:04:22:WU02:FS01:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
14:04:22:WU02:FS01:0x22: Branch: core22-0.0.13
14:04:22:WU02:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:04:22:WU02:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:04:22:WU02:FS01:0x22: -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
14:04:22:WU02:FS01:0x22: Platform: linux2 4.19.76-linuxkit
14:04:22:WU02:FS01:0x22: Bits: 64
14:04:22:WU02:FS01:0x22: Mode: Release
14:04:22:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
14:04:22:WU02:FS01:0x22: <peastman@stanford.edu>
14:04:22:WU02:FS01:0x22: Args: -dir 02 -suffix 01 -version 706 -lifeline 1233704 -checkpoint 15
14:04:22:WU02:FS01:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
14:04:22:WU02:FS01:0x22: nvidia -gpu 0 -gpu-usage 100
14:04:22:WU02:FS01:0x22:************************************ libFAH ************************************
14:04:22:WU02:FS01:0x22: Date: Sep 15 2020
14:04:22:WU02:FS01:0x22: Time: 05:14:43
14:04:22:WU02:FS01:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
14:04:22:WU02:FS01:0x22: Branch: HEAD
14:04:22:WU02:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:04:22:WU02:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:04:22:WU02:FS01:0x22: -funroll-loops
14:04:22:WU02:FS01:0x22: Platform: linux2 4.19.76-linuxkit
14:04:22:WU02:FS01:0x22: Bits: 64
14:04:22:WU02:FS01:0x22: Mode: Release
14:04:22:WU02:FS01:0x22:************************************ CBang *************************************
14:04:22:WU02:FS01:0x22: Date: Sep 15 2020
14:04:22:WU02:FS01:0x22: Time: 05:11:04
14:04:22:WU02:FS01:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
14:04:22:WU02:FS01:0x22: Branch: HEAD
14:04:22:WU02:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:04:22:WU02:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:04:22:WU02:FS01:0x22: -funroll-loops -fPIC
14:04:22:WU02:FS01:0x22: Platform: linux2 4.19.76-linuxkit
14:04:22:WU02:FS01:0x22: Bits: 64
14:04:22:WU02:FS01:0x22: Mode: Release
14:04:22:WU02:FS01:0x22:************************************ System ************************************
14:04:22:WU02:FS01:0x22: CPU: AMD Ryzen 7 3800X 8-Core Processor
14:04:22:WU02:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
14:04:22:WU02:FS01:0x22: CPUs: 8
14:04:22:WU02:FS01:0x22: Memory: 15.60GiB
14:04:22:WU02:FS01:0x22:Free Memory: 13.01GiB
14:04:22:WU02:FS01:0x22: Threads: POSIX_THREADS
14:04:22:WU02:FS01:0x22: OS Version: 5.11
14:04:22:WU02:FS01:0x22:Has Battery: false
14:04:22:WU02:FS01:0x22: On Battery: false
14:04:22:WU02:FS01:0x22: UTC Offset: -6
14:04:22:WU02:FS01:0x22: PID: 1233708
14:04:22:WU02:FS01:0x22: CWD: /home/<redacted>/snap/folding-at-home-fcole90/common/work
14:04:22:WU02:FS01:0x22:************************************ OpenMM ************************************
14:04:22:WU02:FS01:0x22: Revision: 189320d0
14:04:22:WU02:FS01:0x22:********************************************************************************
14:04:22:WU02:FS01:0x22:Project: 18202 (Run 9152, Clone 0, Gen 4)
14:04:22:WU02:FS01:0x22:Unit: 0x00000000000000000000000000000000
14:04:22:WU02:FS01:0x22:Reading tar file core.xml
14:04:22:WU02:FS01:0x22:Reading tar file integrator.xml
14:04:22:WU02:FS01:0x22:Reading tar file state.xml
14:04:23:WU02:FS01:0x22:Reading tar file system.xml
14:04:23:WU02:FS01:0x22:Digital signatures verified
14:04:23:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:04:23:WU02:FS01:0x22:Version 0.0.13
14:04:23:WU02:FS01:0x22: Checkpoint write interval: 25000 steps (2%) [50 total]
14:04:23:WU02:FS01:0x22: JSON viewer frame write interval: 12500 steps (1%) [100 total]
14:04:23:WU02:FS01:0x22: XTC frame write interval: 20000 steps (1.6%) [62 total]
14:04:23:WU02:FS01:0x22: Global context and integrator variables write interval: disabled
14:04:23:WU02:FS01:0x22:There are 4 platforms available.
14:04:23:WU02:FS01:0x22:Platform 0: Reference
14:04:23:WU02:FS01:0x22:Platform 1: CPU
14:04:23:WU02:FS01:0x22:Platform 2: OpenCL
14:04:23:WU02:FS01:0x22: opencl-device 0 specified
14:04:23:WU02:FS01:0x22:Platform 3: CUDA
14:04:23:WU02:FS01:0x22: cuda-device 0 specified
14:04:30:WU02:FS01:0x22:Attempting to create CUDA context:
14:04:30:WU02:FS01:0x22: Configuring platform CUDA
14:04:37:WU02:FS01:0x22: Using CUDA and gpu 0
14:04:37:WU02:FS01:0x22:Completed 0 out of 1250000 steps (0%)
14:04:37:WU02:FS01:0x22:Checkpoint completed at step 0
14:06:16:WU02:FS01:0x22:Completed 12500 out of 1250000 steps (1%)
14:07:56:WU02:FS01:0x22:Completed 25000 out of 1250000 steps (2%)
14:07:57:WU02:FS01:0x22:Checkpoint completed at step 25000
14:09:37:WU02:FS01:0x22:Completed 37500 out of 1250000 steps (3%)
14:11:16:WU02:FS01:0x22:Completed 50000 out of 1250000 steps (4%)
14:11:17:WU02:FS01:0x22:Checkpoint completed at step 50000
14:12:57:WU02:FS01:0x22:Completed 62500 out of 1250000 steps (5%)
14:14:37:WU02:FS01:0x22:Completed 75000 out of 1250000 steps (6%)
14:14:38:WU02:FS01:0x22:Checkpoint completed at step 75000
14:16:18:WU02:FS01:0x22:Completed 87500 out of 1250000 steps (7%)
14:17:58:WU02:FS01:0x22:Completed 100000 out of 1250000 steps (8%)
14:17:59:WU02:FS01:0x22:Checkpoint completed at step 100000
14:19:39:WU02:FS01:0x22:Completed 112500 out of 1250000 steps (9%)
14:21:18:WU02:FS01:0x22:Completed 125000 out of 1250000 steps (10%)
14:21:20:WU02:FS01:0x22:Checkpoint completed at step 125000
14:22:59:WU02:FS01:0x22:Completed 137500 out of 1250000 steps (11%)
14:24:39:WU02:FS01:0x22:Completed 150000 out of 1250000 steps (12%)
14:24:40:WU02:FS01:0x22:Checkpoint completed at step 150000
14:26:20:WU02:FS01:0x22:Completed 162500 out of 1250000 steps (13%)
14:28:00:WU02:FS01:0x22:Completed 175000 out of 1250000 steps (14%)
14:28:01:WU02:FS01:0x22:Checkpoint completed at step 175000
14:29:41:WU02:FS01:0x22:Completed 187500 out of 1250000 steps (15%)
14:31:21:WU02:FS01:0x22:Completed 200000 out of 1250000 steps (16%)
14:31:22:WU02:FS01:0x22:Checkpoint completed at step 200000
14:33:01:WU02:FS01:0x22:Completed 212500 out of 1250000 steps (17%)
14:34:41:WU02:FS01:0x22:Completed 225000 out of 1250000 steps (18%)
14:34:42:WU02:FS01:0x22:Checkpoint completed at step 225000
14:36:22:WU02:FS01:0x22:Completed 237500 out of 1250000 steps (19%)
14:38:02:WU02:FS01:0x22:Completed 250000 out of 1250000 steps (20%)
14:38:03:WU02:FS01:0x22:Checkpoint completed at step 250000
14:39:42:WU02:FS01:0x22:Completed 262500 out of 1250000 steps (21%)
14:41:22:WU02:FS01:0x22:Completed 275000 out of 1250000 steps (22%)
14:41:23:WU02:FS01:0x22:Checkpoint completed at step 275000
14:43:03:WU02:FS01:0x22:Completed 287500 out of 1250000 steps (23%)
14:44:43:WU02:FS01:0x22:Completed 300000 out of 1250000 steps (24%)
14:44:44:WU02:FS01:0x22:Checkpoint completed at step 300000
14:46:24:WU02:FS01:0x22:Completed 312500 out of 1250000 steps (25%)
14:48:03:WU02:FS01:0x22:Completed 325000 out of 1250000 steps (26%)
14:48:04:WU02:FS01:0x22:Checkpoint completed at step 325000
14:49:44:WU02:FS01:0x22:Completed 337500 out of 1250000 steps (27%)
14:51:24:WU02:FS01:0x22:Completed 350000 out of 1250000 steps (28%)
14:51:25:WU02:FS01:0x22:Checkpoint completed at step 350000
14:53:04:WU02:FS01:0x22:Completed 362500 out of 1250000 steps (29%)
14:54:44:WU02:FS01:0x22:Completed 375000 out of 1250000 steps (30%)
14:54:45:WU02:FS01:0x22:Checkpoint completed at step 375000
14:56:25:WU02:FS01:0x22:Completed 387500 out of 1250000 steps (31%)
14:58:05:WU02:FS01:0x22:Completed 400000 out of 1250000 steps (32%)
14:58:06:WU02:FS01:0x22:Checkpoint completed at step 400000
14:59:45:WU02:FS01:0x22:Completed 412500 out of 1250000 steps (33%)
15:01:25:WU02:FS01:0x22:Completed 425000 out of 1250000 steps (34%)
15:01:26:WU02:FS01:0x22:Checkpoint completed at step 425000
15:03:06:WU02:FS01:0x22:Completed 437500 out of 1250000 steps (35%)
15:04:46:WU02:FS01:0x22:Completed 450000 out of 1250000 steps (36%)
15:04:47:WU02:FS01:0x22:Checkpoint completed at step 450000
15:06:26:WU02:FS01:0x22:Completed 462500 out of 1250000 steps (37%)
15:08:06:WU02:FS01:0x22:Completed 475000 out of 1250000 steps (38%)
15:08:07:WU02:FS01:0x22:Checkpoint completed at step 475000
15:09:47:WU02:FS01:0x22:Completed 487500 out of 1250000 steps (39%)
15:11:27:WU02:FS01:0x22:Completed 500000 out of 1250000 steps (40%)
15:11:28:WU02:FS01:0x22:Checkpoint completed at step 500000
15:13:08:WU02:FS01:0x22:Completed 512500 out of 1250000 steps (41%)
15:14:48:WU02:FS01:0x22:Completed 525000 out of 1250000 steps (42%)
15:14:49:WU02:FS01:0x22:Checkpoint completed at step 525000
15:16:29:WU02:FS01:0x22:Completed 537500 out of 1250000 steps (43%)
15:18:08:WU02:FS01:0x22:Completed 550000 out of 1250000 steps (44%)
15:18:10:WU02:FS01:0x22:Checkpoint completed at step 550000
15:19:49:WU02:FS01:0x22:Completed 562500 out of 1250000 steps (45%)
15:21:29:WU02:FS01:0x22:Completed 575000 out of 1250000 steps (46%)
15:21:30:WU02:FS01:0x22:Checkpoint completed at step 575000
15:23:10:WU02:FS01:0x22:Completed 587500 out of 1250000 steps (47%)
15:24:50:WU02:FS01:0x22:Completed 600000 out of 1250000 steps (48%)
15:24:51:WU02:FS01:0x22:Checkpoint completed at step 600000
15:26:30:WU02:FS01:0x22:Completed 612500 out of 1250000 steps (49%)
15:28:10:WU02:FS01:0x22:Completed 625000 out of 1250000 steps (50%)
15:28:11:WU02:FS01:0x22:Checkpoint completed at step 625000
15:29:51:WU02:FS01:0x22:Completed 637500 out of 1250000 steps (51%)
15:31:30:WU02:FS01:0x22:Completed 650000 out of 1250000 steps (52%)
15:31:31:WU02:FS01:0x22:Checkpoint completed at step 650000
15:33:11:WU02:FS01:0x22:Completed 662500 out of 1250000 steps (53%)
15:34:51:WU02:FS01:0x22:Completed 675000 out of 1250000 steps (54%)
15:34:52:WU02:FS01:0x22:Checkpoint completed at step 675000
15:36:32:WU02:FS01:0x22:Completed 687500 out of 1250000 steps (55%)
15:38:12:WU02:FS01:0x22:Completed 700000 out of 1250000 steps (56%)
15:38:13:WU02:FS01:0x22:Checkpoint completed at step 700000
15:39:52:WU02:FS01:0x22:Completed 712500 out of 1250000 steps (57%)
15:41:32:WU02:FS01:0x22:Completed 725000 out of 1250000 steps (58%)
15:41:33:WU02:FS01:0x22:Checkpoint completed at step 725000
15:43:13:WU02:FS01:0x22:Completed 737500 out of 1250000 steps (59%)
15:44:53:WU02:FS01:0x22:Completed 750000 out of 1250000 steps (60%)
15:44:54:WU02:FS01:0x22:Checkpoint completed at step 750000
15:46:33:WU02:FS01:0x22:Completed 762500 out of 1250000 steps (61%)
15:48:13:WU02:FS01:0x22:Completed 775000 out of 1250000 steps (62%)
15:48:14:WU02:FS01:0x22:Checkpoint completed at step 775000
15:49:54:WU02:FS01:0x22:Completed 787500 out of 1250000 steps (63%)
15:51:34:WU02:FS01:0x22:Completed 800000 out of 1250000 steps (64%)
15:51:35:WU02:FS01:0x22:An exception occurred at step 800000: Force RMSE error of 5.28262 with threshold of 5
15:51:35:WU02:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
15:51:35:WU02:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
15:51:35:WARNING:WU02:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
15:51:35:WU02:FS01:Starting
15:51:35:WU02:FS01:Running FahCore: /snap/folding-at-home-fcole90/168/usr/bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 02 -suffix 01 -version 706 -lifeline 2235 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
15:51:35:WU02:FS01:Started FahCore on PID 1283985
15:51:35:WU02:FS01:Core PID:1283989
15:51:35:WU02:FS01:FahCore 0x22 started
15:51:36:WU02:FS01:0x22:*********************** Log Started 2021-06-24T15:51:35Z ***********************
15:51:36:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
15:51:36:WU02:FS01:0x22: Core: Core22
15:51:36:WU02:FS01:0x22: Type: 0x22
15:51:36:WU02:FS01:0x22: Version: 0.0.13
15:51:36:WU02:FS01:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:51:36:WU02:FS01:0x22: Copyright: 2020 foldingathome.org
15:51:36:WU02:FS01:0x22: Homepage: https://foldingathome.org/
15:51:36:WU02:FS01:0x22: Date: Sep 19 2020
15:51:36:WU02:FS01:0x22: Time: 01:10:35
15:51:36:WU02:FS01:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
15:51:36:WU02:FS01:0x22: Branch: core22-0.0.13
15:51:36:WU02:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
15:51:36:WU02:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
15:51:36:WU02:FS01:0x22: -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
15:51:36:WU02:FS01:0x22: Platform: linux2 4.19.76-linuxkit
15:51:36:WU02:FS01:0x22: Bits: 64
15:51:36:WU02:FS01:0x22: Mode: Release
15:51:36:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
15:51:36:WU02:FS01:0x22: <peastman@stanford.edu>
15:51:36:WU02:FS01:0x22: Args: -dir 02 -suffix 01 -version 706 -lifeline 1283985 -checkpoint 15
15:51:36:WU02:FS01:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
15:51:36:WU02:FS01:0x22: nvidia -gpu 0 -gpu-usage 100
15:51:36:WU02:FS01:0x22:************************************ libFAH ************************************
15:51:36:WU02:FS01:0x22: Date: Sep 15 2020
15:51:36:WU02:FS01:0x22: Time: 05:14:43
15:51:36:WU02:FS01:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
15:51:36:WU02:FS01:0x22: Branch: HEAD
15:51:36:WU02:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
15:51:36:WU02:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
15:51:36:WU02:FS01:0x22: -funroll-loops
15:51:36:WU02:FS01:0x22: Platform: linux2 4.19.76-linuxkit
15:51:36:WU02:FS01:0x22: Bits: 64
15:51:36:WU02:FS01:0x22: Mode: Release
15:51:36:WU02:FS01:0x22:************************************ CBang *************************************
15:51:36:WU02:FS01:0x22: Date: Sep 15 2020
15:51:36:WU02:FS01:0x22: Time: 05:11:04
15:51:36:WU02:FS01:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
15:51:36:WU02:FS01:0x22: Branch: HEAD
15:51:36:WU02:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
15:51:36:WU02:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
15:51:36:WU02:FS01:0x22: -funroll-loops -fPIC
15:51:36:WU02:FS01:0x22: Platform: linux2 4.19.76-linuxkit
15:51:36:WU02:FS01:0x22: Bits: 64
15:51:36:WU02:FS01:0x22: Mode: Release
15:51:36:WU02:FS01:0x22:************************************ System ************************************
15:51:36:WU02:FS01:0x22: CPU: AMD Ryzen 7 3800X 8-Core Processor
15:51:36:WU02:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
15:51:36:WU02:FS01:0x22: CPUs: 8
15:51:36:WU02:FS01:0x22: Memory: 15.60GiB
15:51:36:WU02:FS01:0x22:Free Memory: 12.81GiB
15:51:36:WU02:FS01:0x22: Threads: POSIX_THREADS
15:51:36:WU02:FS01:0x22: OS Version: 5.11
15:51:36:WU02:FS01:0x22:Has Battery: false
15:51:36:WU02:FS01:0x22: On Battery: false
15:51:36:WU02:FS01:0x22: UTC Offset: -6
15:51:36:WU02:FS01:0x22: PID: 1283989
15:51:36:WU02:FS01:0x22: CWD: /home/<redacted>/snap/folding-at-home-fcole90/common/work
15:51:36:WU02:FS01:0x22:************************************ OpenMM ************************************
15:51:36:WU02:FS01:0x22: Revision: 189320d0
15:51:36:WU02:FS01:0x22:********************************************************************************
15:51:36:WU02:FS01:0x22:Project: 18202 (Run 9152, Clone 0, Gen 4)
15:51:36:WU02:FS01:0x22:Unit: 0x00000000000000000000000000000000
15:51:36:WU02:FS01:0x22:Digital signatures verified
15:51:36:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
15:51:36:WU02:FS01:0x22:Version 0.0.13
15:51:36:WU02:FS01:0x22: Checkpoint write interval: 25000 steps (2%) [50 total]
15:51:36:WU02:FS01:0x22: JSON viewer frame write interval: 12500 steps (1%) [100 total]
15:51:36:WU02:FS01:0x22: XTC frame write interval: 20000 steps (1.6%) [62 total]
15:51:36:WU02:FS01:0x22: Global context and integrator variables write interval: disabled
15:51:36:WU02:FS01:0x22:There are 4 platforms available.
15:51:36:WU02:FS01:0x22:Platform 0: Reference
15:51:36:WU02:FS01:0x22:Platform 1: CPU
15:51:36:WU02:FS01:0x22:Platform 2: OpenCL
15:51:36:WU02:FS01:0x22: opencl-device 0 specified
15:51:36:WU02:FS01:0x22:Platform 3: CUDA
15:51:36:WU02:FS01:0x22: cuda-device 0 specified
15:51:43:WU02:FS01:0x22:Attempting to create CUDA context:
15:51:43:WU02:FS01:0x22: Configuring platform CUDA
15:51:47:WU02:FS01:0x22: Using CUDA and gpu 0
15:51:47:WU02:FS01:0x22:Completed 775000 out of 1250000 steps (62%)
15:53:27:WU02:FS01:0x22:Completed 787500 out of 1250000 steps (63%)
15:55:07:WU02:FS01:0x22:Completed 800000 out of 1250000 steps (64%)
15:55:08:WU02:FS01:0x22:Checkpoint completed at step 800000
15:56:48:WU02:FS01:0x22:Completed 812500 out of 1250000 steps (65%)
15:58:27:WU02:FS01:0x22:Completed 825000 out of 1250000 steps (66%)
15:58:29:WU02:FS01:0x22:Checkpoint completed at step 825000
16:00:08:WU02:FS01:0x22:Completed 837500 out of 1250000 steps (67%)
16:01:48:WU02:FS01:0x22:Completed 850000 out of 1250000 steps (68%)
16:01:49:WU02:FS01:0x22:Checkpoint completed at step 850000
16:03:29:WU02:FS01:0x22:Completed 862500 out of 1250000 steps (69%)
16:05:09:WU02:FS01:0x22:Completed 875000 out of 1250000 steps (70%)
16:05:10:WU02:FS01:0x22:Checkpoint completed at step 875000
16:06:50:WU02:FS01:0x22:Completed 887500 out of 1250000 steps (71%)
16:08:30:WU02:FS01:0x22:Completed 900000 out of 1250000 steps (72%)
16:08:31:WU02:FS01:0x22:Checkpoint completed at step 900000
16:10:11:WU02:FS01:0x22:Completed 912500 out of 1250000 steps (73%)
16:11:51:WU02:FS01:0x22:Completed 925000 out of 1250000 steps (74%)
16:11:52:WU02:FS01:0x22:Checkpoint completed at step 925000
16:13:31:WU02:FS01:0x22:Completed 937500 out of 1250000 steps (75%)
16:15:11:WU02:FS01:0x22:Completed 950000 out of 1250000 steps (76%)
16:15:12:WU02:FS01:0x22:Checkpoint completed at step 950000
16:16:52:WU02:FS01:0x22:Completed 962500 out of 1250000 steps (77%)
16:18:31:WU02:FS01:0x22:Completed 975000 out of 1250000 steps (78%)
16:18:33:WU02:FS01:0x22:Checkpoint completed at step 975000
16:20:12:WU02:FS01:0x22:Completed 987500 out of 1250000 steps (79%)
16:21:52:WU02:FS01:0x22:Completed 1000000 out of 1250000 steps (80%)
16:21:53:WU02:FS01:0x22:Checkpoint completed at step 1000000
16:23:33:WU02:FS01:0x22:Completed 1012500 out of 1250000 steps (81%)
16:25:12:WU02:FS01:0x22:Completed 1025000 out of 1250000 steps (82%)
16:25:13:WU02:FS01:0x22:Checkpoint completed at step 1025000
16:26:53:WU02:FS01:0x22:Completed 1037500 out of 1250000 steps (83%)
16:28:33:WU02:FS01:0x22:Completed 1050000 out of 1250000 steps (84%)
16:28:34:WU02:FS01:0x22:Checkpoint completed at step 1050000
16:30:14:WU02:FS01:0x22:Completed 1062500 out of 1250000 steps (85%)
16:31:53:WU02:FS01:0x22:Completed 1075000 out of 1250000 steps (86%)
16:31:54:WU02:FS01:0x22:Checkpoint completed at step 1075000
16:33:34:WU02:FS01:0x22:Completed 1087500 out of 1250000 steps (87%)
16:35:14:WU02:FS01:0x22:Completed 1100000 out of 1250000 steps (88%)
16:35:15:WU02:FS01:0x22:Checkpoint completed at step 1100000
16:36:55:WU02:FS01:0x22:Completed 1112500 out of 1250000 steps (89%)
16:38:34:WU02:FS01:0x22:Completed 1125000 out of 1250000 steps (90%)
16:38:35:WU02:FS01:0x22:Checkpoint completed at step 1125000
16:40:15:WU02:FS01:0x22:Completed 1137500 out of 1250000 steps (91%)
16:41:55:WU02:FS01:0x22:Completed 1150000 out of 1250000 steps (92%)
16:41:56:WU02:FS01:0x22:Checkpoint completed at step 1150000
16:43:35:WU02:FS01:0x22:Completed 1162500 out of 1250000 steps (93%)
16:45:15:WU02:FS01:0x22:Completed 1175000 out of 1250000 steps (94%)
16:45:16:WU02:FS01:0x22:Checkpoint completed at step 1175000
16:46:56:WU02:FS01:0x22:Completed 1187500 out of 1250000 steps (95%)
16:48:35:WU02:FS01:0x22:Completed 1200000 out of 1250000 steps (96%)
16:48:36:WU02:FS01:0x22:Checkpoint completed at step 1200000
16:50:16:WU02:FS01:0x22:Completed 1212500 out of 1250000 steps (97%)
16:51:56:WU02:FS01:0x22:Completed 1225000 out of 1250000 steps (98%)
16:51:57:WU02:FS01:0x22:Checkpoint completed at step 1225000
16:53:36:WU02:FS01:0x22:Completed 1237500 out of 1250000 steps (99%)
16:55:16:WU02:FS01:0x22:Completed 1250000 out of 1250000 steps (100%)
16:55:16:WU02:FS01:0x22:Average performance: 17.3668 ns/day
16:55:17:WU02:FS01:0x22:Checkpoint completed at step 1250000
16:55:21:WU02:FS01:0x22:Saving result file ../logfile_01.txt
16:55:21:WU02:FS01:0x22:Saving result file checkpointIntegrator.xml
16:55:21:WU02:FS01:0x22:Saving result file checkpointState.xml
16:55:24:WU02:FS01:0x22:Saving result file positions.xtc
16:55:24:WU02:FS01:0x22:Saving result file science.log
16:55:24:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
16:55:25:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
16:55:25:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:18202 run:9152 clone:0 gen:4 core:0x22 unit:0x00000000000000040000471a000023c0
16:55:25:WU02:FS01:Uploading 27.51MiB to 128.252.203.11
16:55:25:WU02:FS01:Connecting to 128.252.203.11:8080
16:55:56:WARNING:WU02:FS01:Exception: Failed to send results to work server: Not connected
16:55:56:WU02:FS01:Trying to send results to collection server
16:55:56:WU02:FS01:Uploading 27.51MiB to 128.252.203.2
16:55:56:WU02:FS01:Connecting to 128.252.203.2:8080
16:56:02:WU02:FS01:Upload 41.35%
16:56:08:WU02:FS01:Upload 99.74%
16:56:08:WU02:FS01:Upload complete
16:56:08:WU02:FS01:Server responded WORK_QUIT (404)
16:56:08:WARNING:WU02:FS01:Server did not like results, dumping
16:56:08:WU02:FS01:Cleaning up
Did 18202 even go through Beta? It doesn't seem stable enough for full public.
I should note that as far as I can tell even the 18202 WUs that I did show as uploaded have not been credited. So as far as I'm concerned 18202 is a write off and is just wasting electricity.
Yes, 18202 went through internal and beta testing and performed well in both. One issue that it seems like we see (especially for bigger systems as we have in 18201/18202) is that differences in the GPU being used can severely change the stability of a project. Based on the current stats, it may be that we did not adequately capture enough GPU diversity during beta testing. Early on instabilities were very rare in this project and have only reached a higher % of late. As instabilities don't impact our end calculations, I had left the project running. We're reevaluating whether to keep running this project now or whether, whether to restrict the project to a narrower range of GPUs (thus limiting the diversity and allowing us to optimize stability for a smaller range), or whether tinker more with our initial system in order to make the project more stable for a diverse set of GPUs. Running larger projects stably on diverse GPU species is an active area of work for several of the labs involved in folding at home.
Regarding dumping- to my knowledge this is a new issue that has arisen since 128.252.203.11's load and points issues at the end of last week through the beginning of this week. We have not observed this error before and are actively working to understand what's caused the issue and whether it is a ripple effect from the high load incident. I have halted all assignments from 128.252.203.11 again until we can resolve this issue. Our intention is never to assign WUs that are meaningless work and I'm similarly unhappy that we're seeing this.
Check the WS connection to the CS, the dumping is occurring when the upload fails to the WS and goes to the CS instead. If the CS does not have the WU data to recognize a valid one, it gets dumped.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
The configuration of 128.252.203.11 is really strange in that it functions as both a WS and a CS. One thing that we've noticed so far is that in the WUs being dumped our trajectory file hasn't been updated with the newest generations that have come back. For example, in R1245:C3:G9 the trajectory file (state.xml) was written last on Jun 15 at 04:07. Results8 (results from gen8) was written on Jun 17 12:28. State,xml should have been updated after gen8 returned, but this has not happened.
16:55:56:WARNING:WU02:FS01:Exception: Failed to send results to work server: Not connected
16:55:56:WU02:FS01:Trying to send results to collection server
16:55:56:WU02:FS01:Uploading 27.51MiB to 128.252.203.2
16:55:56:WU02:FS01:Connecting to 128.252.203.2:8080
16:56:02:WU02:FS01:Upload 41.35%
16:56:08:WU02:FS01:Upload 99.74%
16:56:08:WU02:FS01:Upload complete
16:56:08:WU02:FS01:Server responded WORK_QUIT (404)
16:56:08:WARNING:WU02:FS01:Server did not like results, dumping
16:56:08:WU02:FS01:Cleaning up
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
19:07:59:WU02:FS01:0x22:Completed 1225000 out of 1250000 steps (98%)
19:08:00:WU02:FS01:0x22:Checkpoint completed at step 1225000
19:11:02:WU02:FS01:0x22:Completed 1237500 out of 1250000 steps (99%)
19:14:03:WU02:FS01:0x22:Completed 1250000 out of 1250000 steps (100%)
19:14:03:WU02:FS01:0x22:Average performance: 9.52066 ns/day
19:14:05:WU02:FS01:0x22:Checkpoint completed at step 1250000
19:14:12:WU02:FS01:0x22:Saving result file ../logfile_01.txt
19:14:12:WU02:FS01:0x22:Saving result file checkpointIntegrator.xml
19:14:12:WU02:FS01:0x22:Saving result file checkpointState.xml
19:14:18:WU02:FS01:0x22:Saving result file positions.xtc
19:14:18:WU02:FS01:0x22:Saving result file science.log
19:14:18:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
19:14:18:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
19:14:19:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:18202 run:15116 clone:2 gen:6 core:0x22 unit:0x00000002000000060000471a00003b0c
19:14:19:WU02:FS01:Uploading 27.50MiB to 128.252.203.11
19:14:19:WU02:FS01:Connecting to 128.252.203.11:8080
19:14:50:WARNING:WU02:FS01:Exception: Failed to send results to work server: Not connected
19:14:50:WU02:FS01:Trying to send results to collection server
19:14:51:WU02:FS01:Uploading 27.50MiB to 128.252.203.2
19:14:51:WU02:FS01:Connecting to 128.252.203.2:8080
19:14:57:WU02:FS01:Upload 12.50%
19:15:03:WU02:FS01:Upload 24.77%
19:15:09:WU02:FS01:Upload 37.05%
19:15:15:WU02:FS01:Upload 49.10%
19:15:21:WU02:FS01:Upload 61.37%
19:15:27:WU02:FS01:Upload 73.64%
19:15:33:WU02:FS01:Upload 85.69%
19:15:39:WU02:FS01:Upload 97.96%
19:15:40:WU02:FS01:Upload complete
19:15:40:WU02:FS01:Server responded WORK_QUIT (404)
19:15:40:WARNING:WU02:FS01:Server did not like results, dumping
19:15:40:WU02:FS01:Cleaning up
Work Quit generally means that you've already uploaded that WU and the server is refusing to give you credit for it again.
NOTE: This server is apparently accepting WUs (once) but the credit reports are not reaching the stats server so you can't tell what's actually happening.
I've had similar for one of my units today (project:18202 run:1927 clone:2 gen:5), tried to upload to Work Server 128.252.203.11, failed, uploaded to Collection Server 128.252.203.2, which then dumped the results.
I checked the Work unit status to find that it had been returned "Faulty 2" by 2 previous donors.
The last was recorded 10 days ago.
Considering the timeout is 2 days and many donors will able to churn through these units in a few hours I can't help but wonder if the 2 recorded returns, of my unit, were actually Faulty and how many times the units have been assigned to donors and then had their results dumped. https://apps.foldingathome.org/wu#proje ... ne=2&gen=5
Hi Aetch, The log you posted is for Project 17804, Run 83, Clone 289, Gen 77. Would you be willing to post the log for project:18202 run:1927 clone:2 gen:5?
Some of the gap that you're seeing on the WU status app is because we actually took this project down for ~8 days to resolve some server issues we were having. During that time, no new WUs were being assigned. We turned this project back on yesterday ~mid afternoon CST thus you are likely the first one to get this specific WU again.
Several things can result in "faulty WUs". One of them is that the WU has to restart in the middle because the protein goes out of the "tolerated" forces expected on the system. While this isn't ultimately bad for the project (it happens more for larger systems such as project 18202), it does give the client some issues. With the server reset a few days ago we slightly stepped back the tolerances which should reduce the likelihood of these instances happening.
11:43:12:WU02:FS01:0x22:Checkpoint completed at step 1250000
11:43:18:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
11:43:18:WU02:FS01:0x22:Saving result file checkpointIntegrator.xml
11:43:18:WU02:FS01:0x22:Saving result file checkpointState.xml
11:43:22:WU02:FS01:0x22:Saving result file positions.xtc
11:43:22:WU02:FS01:0x22:Saving result file science.log
11:43:22:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
11:43:23:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
11:43:23:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:18202 run:1927 clone:2 gen:5 core:0x22 unit:0x00000002000000050000471a00000787
11:43:23:WU02:FS01:Uploading 27.50MiB to 128.252.203.11
11:43:23:WU02:FS01:Connecting to 128.252.203.11:8080
11:43:54:WU02:FS01:Upload 0.68%
11:43:54:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
11:43:54:WU02:FS01:Trying to send results to collection server
11:43:54:WU02:FS01:Uploading 27.50MiB to 128.252.203.2
11:43:54:WU02:FS01:Connecting to 128.252.203.2:8080
11:44:00:WU02:FS01:Upload 9.77%
11:44:06:WU02:FS01:Upload 20.91%
11:44:12:WU02:FS01:Upload 32.27%
11:44:18:WU02:FS01:Upload 43.18%
11:44:24:WU02:FS01:Upload 54.54%
11:44:30:WU02:FS01:Upload 65.90%
11:44:36:WU02:FS01:Upload 77.03%
11:44:42:WU02:FS01:Upload 88.40%
11:44:48:WU02:FS01:Upload 99.30%
11:44:48:WU02:FS01:Upload complete
11:44:48:WU02:FS01:Server responded WORK_QUIT (404)
11:44:48:WARNING:WU02:FS01:Server did not like results, dumping
11:44:48:WU02:FS01:Cleaning up
I'm cool with the project being withdrawn while the bugs are investigated and sorted, my concern had been that precious folding resources had been wasted.
Ah, I see- thanks for the clarification. I'll check in with our collection server again and will report back!
EDIT- Since we have many WUs coming back successfully when uploaded to the work server and all of these reports seem to follow a consistent trend (WS busy -> upload to CS -> CS dumps) we've swapped our collection server out for a new one. It may take a little while for this change to permeate to all WUs on highland1 (majority of WUs out are: 18201, 18202, 18206), but we're starting to see a lot of successful WUs returned to the new CS as opposed to our old one giving us hope that this fixes the problem. We're continuing to assess our new CS. Thanks again for raising this issue.