12:57:22:WU01:FS01:Connecting to assign1.foldingathome.org:80
12:57:22:WU01:FS01:Assigned to work server 18.188.125.154
12:57:22:WU01:FS01:Requesting new work unit for slot 01: gpu:1:0 TU104 [GeForce RTX 2070 SUPER] 8218 from 18.188.125.154
12:57:22:WU01:FS01:Connecting to 18.188.125.154:8080
12:57:23:WU01:FS01:Downloading 7.52MiB
12:57:25:WU01:FS01:Download complete
12:57:25:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13428 run:10164 clone:3 gen:1 core:0x22 unit:0x0000000112bc7d9a0000000027b40003
12:58:29:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
12:58:29:WU00:FS01:0x22:Average performance: 128.571 ns/day
12:58:29:WU00:FS01:0x22:Checkpoint completed at step 1000000
12:58:37:WU00:FS01:0x22:Saving result file ../logfile_01.txt
12:58:37:WU00:FS01:0x22:Saving result file checkpointIntegrator.xml.bz2
12:58:37:WU00:FS01:0x22:Saving result file checkpointState.xml.bz2
12:58:37:WU00:FS01:0x22:Saving result file globals.csv
12:58:37:WU00:FS01:0x22:Saving result file positions.xtc
12:58:37:WU00:FS01:0x22:Saving result file science.log
12:58:37:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
12:58:38:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
12:58:38:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13428 run:10626 clone:11 gen:0 core:0x22 unit:0x0000000012bc7d9a000000002982000b
12:58:38:WU00:FS01:Uploading 8.49MiB to 18.188.125.154
12:58:38:WU00:FS01:Connecting to 18.188.125.154:8080
12:58:38:WU01:FS01:Starting
12:58:38:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
12:58:38:WU01:FS01:Started FahCore on PID 7534
12:58:38:WU01:FS01:FahCore 0x22 started
12:58:38:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
12:58:38:WU01:FS01:Starting
12:58:38:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
12:58:38:WU01:FS01:Started FahCore on PID 7536
12:58:38:WU01:FS01:FahCore 0x22 started
12:58:39:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
12:58:44:WU00:FS01:Upload 75.06%
12:58:46:WU00:FS01:Upload complete
12:58:47:WU00:FS01:Server responded WORK_ACK (400)
12:58:47:WU00:FS01:Final credit estimate, 237125.00 points
12:58:47:WU00:FS01:Cleaning up
12:59:39:WU01:FS01:Starting
12:59:39:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
12:59:39:WU01:FS01:Started FahCore on PID 7541
12:59:39:WU01:FS01:FahCore 0x22 started
12:59:39:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
12:59:43:FS01:Paused
12:59:52:FS01:Unpaused
13:00:39:WU01:FS01:Starting
13:00:39:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
13:00:39:WU01:FS01:Started FahCore on PID 7546
13:00:39:WU01:FS01:FahCore 0x22 started
13:00:39:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
13:00:46:FS01:Paused
*********************** Log Started 2020-11-20T13:05:49Z ***********************
13:05:49:FS01:Initialized folding slot 01: gpu:1:0 TU104 [GeForce RTX 2070 SUPER] 8218
13:06:01:FS01:Unpaused
13:06:01:WU01:FS01:Starting
13:06:01:WU01:FS01:Running FahCore: /snap/folding-at-home-fcole90/58/usr/bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 4377 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
13:06:01:WU01:FS01:Started FahCore on PID 5526
13:06:01:WU01:FS01:Core PID:5530
13:06:01:WU01:FS01:FahCore 0x22 started
13:06:03:WU01:FS01:0x22:*********************** Log Started 2020-11-20T13:06:03Z ***********************
13:06:03:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
13:06:03:WU01:FS01:0x22: Core: Core22
13:06:03:WU01:FS01:0x22: Type: 0x22
13:06:03:WU01:FS01:0x22: Version: 0.0.13
13:06:03:WU01:FS01:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:06:03:WU01:FS01:0x22: Copyright: 2020 foldingathome.org
13:06:03:WU01:FS01:0x22: Homepage: https://foldingathome.org/
13:06:03:WU01:FS01:0x22: Date: Sep 19 2020
13:06:03:WU01:FS01:0x22: Time: 01:10:35
13:06:03:WU01:FS01:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
13:06:03:WU01:FS01:0x22: Branch: core22-0.0.13
13:06:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:06:03:WU01:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:06:03:WU01:FS01:0x22: -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
13:06:03:WU01:FS01:0x22: Platform: linux2 4.19.76-linuxkit
13:06:03:WU01:FS01:0x22: Bits: 64
13:06:03:WU01:FS01:0x22: Mode: Release
13:06:03:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
13:06:03:WU01:FS01:0x22: <peastman@stanford.edu>
13:06:03:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 706 -lifeline 5526 -checkpoint 15
13:06:03:WU01:FS01:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
13:06:03:WU01:FS01:0x22: nvidia -gpu 0 -gpu-usage 100
13:06:03:WU01:FS01:0x22:************************************ libFAH ************************************
13:06:03:WU01:FS01:0x22: Date: Sep 15 2020
13:06:03:WU01:FS01:0x22: Time: 05:14:43
13:06:03:WU01:FS01:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
13:06:03:WU01:FS01:0x22: Branch: HEAD
13:06:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:06:03:WU01:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:06:03:WU01:FS01:0x22: -funroll-loops
13:06:03:WU01:FS01:0x22: Platform: linux2 4.19.76-linuxkit
13:06:03:WU01:FS01:0x22: Bits: 64
13:06:03:WU01:FS01:0x22: Mode: Release
13:06:03:WU01:FS01:0x22:************************************ CBang *************************************
13:06:03:WU01:FS01:0x22: Date: Sep 15 2020
13:06:03:WU01:FS01:0x22: Time: 05:11:04
13:06:03:WU01:FS01:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
13:06:03:WU01:FS01:0x22: Branch: HEAD
13:06:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:06:03:WU01:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:06:03:WU01:FS01:0x22: -funroll-loops -fPIC
13:06:03:WU01:FS01:0x22: Platform: linux2 4.19.76-linuxkit
13:06:03:WU01:FS01:0x22: Bits: 64
13:06:03:WU01:FS01:0x22: Mode: Release
13:06:03:WU01:FS01:0x22:************************************ System ************************************
13:06:03:WU01:FS01:0x22: CPU: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
13:06:03:WU01:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
13:06:03:WU01:FS01:0x22: CPUs: 4
13:06:03:WU01:FS01:0x22: Memory: 15.50GiB
13:06:03:WU01:FS01:0x22:Free Memory: 13.00GiB
13:06:03:WU01:FS01:0x22: Threads: POSIX_THREADS
13:06:03:WU01:FS01:0x22: OS Version: 5.8
13:06:03:WU01:FS01:0x22:Has Battery: false
13:06:03:WU01:FS01:0x22: On Battery: false
13:06:03:WU01:FS01:0x22: UTC Offset: -7
13:06:03:WU01:FS01:0x22: PID: 5530
13:06:03:WU01:FS01:0x22: CWD: /home/<redacted>/snap/folding-at-home-fcole90/common/work
13:06:03:WU01:FS01:0x22:************************************ OpenMM ************************************
13:06:03:WU01:FS01:0x22: Revision: 189320d0
13:06:03:WU01:FS01:0x22:********************************************************************************
13:06:03:WU01:FS01:0x22:Project: 13428 (Run 10164, Clone 3, Gen 1)
13:06:03:WU01:FS01:0x22:Unit: 0x0000000112bc7d9a0000000027b40003
13:06:03:WU01:FS01:0x22:Reading tar file core.xml
13:06:03:WU01:FS01:0x22:Reading tar file integrator.xml.bz2
13:06:03:WU01:FS01:0x22:Reading tar file state.xml.bz2
13:06:04:WU01:FS01:0x22:Reading tar file system.xml.bz2
13:06:04:WU01:FS01:0x22:Digital signatures verified
13:06:04:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
13:06:04:WU01:FS01:0x22:Version 0.0.13
13:06:04:WU01:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
13:06:04:WU01:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
13:06:04:WU01:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
13:06:04:WU01:FS01:0x22: Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
13:06:07:WU01:FS01:0x22:There are 4 platforms available.
13:06:07:WU01:FS01:0x22:Platform 0: Reference
13:06:07:WU01:FS01:0x22:Platform 1: CPU
13:06:07:WU01:FS01:0x22:Platform 2: OpenCL
13:06:07:WU01:FS01:0x22: opencl-device 0 specified
13:06:07:WU01:FS01:0x22:Platform 3: CUDA
13:06:07:WU01:FS01:0x22: cuda-device 0 specified
13:06:16:WU01:FS01:0x22:Attempting to create CUDA context:
13:06:16:WU01:FS01:0x22: Configuring platform CUDA
13:06:29:WU01:FS01:0x22: Using CUDA and gpu 0
13:06:29:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
13:06:29:WU01:FS01:0x22:Checkpoint completed at step 0
Not sure why this happens, but it does occasionally and is easily solvable with a reboot. As the log shows the WU starts right up after a system reboot so not sure why it's failing normally. I haven't seen this on any other project recently. The last time I saw similar issues was the last time the 1342x projects were put out to run, but even then it was about once a week. This is about once every 24hrs.
Just wanted to add my story to the WU. With a newly minted 3090 I was folding 13428 and maybe 10mins later my system just shutdown. Happened three times, but I only noted down the WU on the last instance. Was wondering if my 650W PSU was too weak so I did 3 hours on furmark and then pushed it to the max with Intel Burn Test + Furmark. Both instances far surpassing FAH's draw from the wall (640W vs 440W). So ruling out the PSU. I find myself with WU 13428 and this thread.
the "recommended" PSU for a 3090 is 750 watts. FAH is hard to test for as noted thru countless threads regarding overclocks, so I wouldn't rule out something possibly getting overdrawn just yet. What did the log say?
Kamicrit wrote:Hey there
Was wondering if my 650W PSU was too weak so I did 3 hours on furmark and then pushed it to the max with Intel Burn Test + Furmark. Both instances far surpassing FAH's draw from the wall (640W vs 440W). So ruling out the PSU. I find myself with WU 13428 and this thread.
So this is a completely different issue, this is something likely to do with either how the WU interacts with the Nvidia driver or something odd in the WU itself. When I said rebooting I'm not talking something automatic, if you notice in the log I'm pausing folding, rebooting, and then starting it again and everything is fine which is odd.
It's worth noting that Furmark and burn tests don't actually test the power system appropriately because they don't simulate the transients that FAH or gaming can produce, they pretty much only test sustained power and thermal solutions. Whereas FAH and gaming are far more spiky in how they do things, I can literally hear the fans ramp down when the WU does a checkpoint, then spike again as computation ramps up once more.
During my furmark testing, I did try to simulate extreme swings in power draw with enabling/disabling the donut. Starting to wonder if this is a CUDA thing? My old 1070 never crashed, PSU in question is a Seasonic G-650. Projects 13428, 14905, 14904 have crashed for me.
During my furmark testing, I did try to simulate extreme swings in power draw with enabling/disabling the donut. Starting to wonder if this is a CUDA thing? My old 1070 never crashed, PSU in question is a Seasonic G-650. Projects 13428, 14905, 14904 have crashed for me.
No you're specific issue is a 30 series NVidia issue, where the GPU can create transients that are larger than the capacitance of the PSU and thus triggers Over current protection (OCP).
Separately:
FWIW I did seem to solve the issue by power limiting my GPU back to just below the NVidia stock power limit... not sure why that works but it seems to keep clocks at a place the WU doesn't seem to error on.
Kamicrit wrote:...During my furmark testing, I did try to simulate extreme swings in power draw with enabling/disabling the donut....
Furmark or other common benchmarks tend to use the visualization/rendering elements of the GPU. However, F@H uses the compute elements of the GPU. Thus, they are two physically different pathways on the GPUs so you can't compare them directly.
Kamicrit wrote:...Starting to wonder if this is a CUDA thing?...
On supported Nvidia GPUs, CUDA would be more optimized then OpenCL which would mean that it would push the GPU even harder due to the extra optimizations.
Kamicrit wrote:...My old 1070 never crashed, PSU in question is a Seasonic G-650. Projects 13428, 14905, 14904 have crashed for me...