Page 1 of 1
Project 13428 random FAILED_3, solvable with a reboot
Posted: Fri Nov 20, 2020 1:11 pm
by mgetz
Code: Select all
12:57:22:WU01:FS01:Connecting to assign1.foldingathome.org:80
12:57:22:WU01:FS01:Assigned to work server 18.188.125.154
12:57:22:WU01:FS01:Requesting new work unit for slot 01: gpu:1:0 TU104 [GeForce RTX 2070 SUPER] 8218 from 18.188.125.154
12:57:22:WU01:FS01:Connecting to 18.188.125.154:8080
12:57:23:WU01:FS01:Downloading 7.52MiB
12:57:25:WU01:FS01:Download complete
12:57:25:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13428 run:10164 clone:3 gen:1 core:0x22 unit:0x0000000112bc7d9a0000000027b40003
12:58:29:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
12:58:29:WU00:FS01:0x22:Average performance: 128.571 ns/day
12:58:29:WU00:FS01:0x22:Checkpoint completed at step 1000000
12:58:37:WU00:FS01:0x22:Saving result file ../logfile_01.txt
12:58:37:WU00:FS01:0x22:Saving result file checkpointIntegrator.xml.bz2
12:58:37:WU00:FS01:0x22:Saving result file checkpointState.xml.bz2
12:58:37:WU00:FS01:0x22:Saving result file globals.csv
12:58:37:WU00:FS01:0x22:Saving result file positions.xtc
12:58:37:WU00:FS01:0x22:Saving result file science.log
12:58:37:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
12:58:38:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
12:58:38:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13428 run:10626 clone:11 gen:0 core:0x22 unit:0x0000000012bc7d9a000000002982000b
12:58:38:WU00:FS01:Uploading 8.49MiB to 18.188.125.154
12:58:38:WU00:FS01:Connecting to 18.188.125.154:8080
12:58:38:WU01:FS01:Starting
12:58:38:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
12:58:38:WU01:FS01:Started FahCore on PID 7534
12:58:38:WU01:FS01:FahCore 0x22 started
12:58:38:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
12:58:38:WU01:FS01:Starting
12:58:38:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
12:58:38:WU01:FS01:Started FahCore on PID 7536
12:58:38:WU01:FS01:FahCore 0x22 started
12:58:39:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
12:58:44:WU00:FS01:Upload 75.06%
12:58:46:WU00:FS01:Upload complete
12:58:47:WU00:FS01:Server responded WORK_ACK (400)
12:58:47:WU00:FS01:Final credit estimate, 237125.00 points
12:58:47:WU00:FS01:Cleaning up
12:59:39:WU01:FS01:Starting
12:59:39:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
12:59:39:WU01:FS01:Started FahCore on PID 7541
12:59:39:WU01:FS01:FahCore 0x22 started
12:59:39:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
12:59:43:FS01:Paused
12:59:52:FS01:Unpaused
13:00:39:WU01:FS01:Starting
13:00:39:WU01:FS01:Running FahCore: /bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 3667 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
13:00:39:WU01:FS01:Started FahCore on PID 7546
13:00:39:WU01:FS01:FahCore 0x22 started
13:00:39:WARNING:WU01:FS01:FahCore returned: FAILED_3 (255 = 0xff)
13:00:46:FS01:Paused
*********************** Log Started 2020-11-20T13:05:49Z ***********************
13:05:49:FS01:Initialized folding slot 01: gpu:1:0 TU104 [GeForce RTX 2070 SUPER] 8218
13:06:01:FS01:Unpaused
13:06:01:WU01:FS01:Starting
13:06:01:WU01:FS01:Running FahCore: /snap/folding-at-home-fcole90/58/usr/bin/FAHCoreWrapper /home/<redacted>/snap/folding-at-home-fcole90/common/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 4377 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
13:06:01:WU01:FS01:Started FahCore on PID 5526
13:06:01:WU01:FS01:Core PID:5530
13:06:01:WU01:FS01:FahCore 0x22 started
13:06:03:WU01:FS01:0x22:*********************** Log Started 2020-11-20T13:06:03Z ***********************
13:06:03:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
13:06:03:WU01:FS01:0x22: Core: Core22
13:06:03:WU01:FS01:0x22: Type: 0x22
13:06:03:WU01:FS01:0x22: Version: 0.0.13
13:06:03:WU01:FS01:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:06:03:WU01:FS01:0x22: Copyright: 2020 foldingathome.org
13:06:03:WU01:FS01:0x22: Homepage: https://foldingathome.org/
13:06:03:WU01:FS01:0x22: Date: Sep 19 2020
13:06:03:WU01:FS01:0x22: Time: 01:10:35
13:06:03:WU01:FS01:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
13:06:03:WU01:FS01:0x22: Branch: core22-0.0.13
13:06:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:06:03:WU01:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:06:03:WU01:FS01:0x22: -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
13:06:03:WU01:FS01:0x22: Platform: linux2 4.19.76-linuxkit
13:06:03:WU01:FS01:0x22: Bits: 64
13:06:03:WU01:FS01:0x22: Mode: Release
13:06:03:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
13:06:03:WU01:FS01:0x22: <peastman@stanford.edu>
13:06:03:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 706 -lifeline 5526 -checkpoint 15
13:06:03:WU01:FS01:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
13:06:03:WU01:FS01:0x22: nvidia -gpu 0 -gpu-usage 100
13:06:03:WU01:FS01:0x22:************************************ libFAH ************************************
13:06:03:WU01:FS01:0x22: Date: Sep 15 2020
13:06:03:WU01:FS01:0x22: Time: 05:14:43
13:06:03:WU01:FS01:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
13:06:03:WU01:FS01:0x22: Branch: HEAD
13:06:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:06:03:WU01:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:06:03:WU01:FS01:0x22: -funroll-loops
13:06:03:WU01:FS01:0x22: Platform: linux2 4.19.76-linuxkit
13:06:03:WU01:FS01:0x22: Bits: 64
13:06:03:WU01:FS01:0x22: Mode: Release
13:06:03:WU01:FS01:0x22:************************************ CBang *************************************
13:06:03:WU01:FS01:0x22: Date: Sep 15 2020
13:06:03:WU01:FS01:0x22: Time: 05:11:04
13:06:03:WU01:FS01:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
13:06:03:WU01:FS01:0x22: Branch: HEAD
13:06:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:06:03:WU01:FS01:0x22: Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:06:03:WU01:FS01:0x22: -funroll-loops -fPIC
13:06:03:WU01:FS01:0x22: Platform: linux2 4.19.76-linuxkit
13:06:03:WU01:FS01:0x22: Bits: 64
13:06:03:WU01:FS01:0x22: Mode: Release
13:06:03:WU01:FS01:0x22:************************************ System ************************************
13:06:03:WU01:FS01:0x22: CPU: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
13:06:03:WU01:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
13:06:03:WU01:FS01:0x22: CPUs: 4
13:06:03:WU01:FS01:0x22: Memory: 15.50GiB
13:06:03:WU01:FS01:0x22:Free Memory: 13.00GiB
13:06:03:WU01:FS01:0x22: Threads: POSIX_THREADS
13:06:03:WU01:FS01:0x22: OS Version: 5.8
13:06:03:WU01:FS01:0x22:Has Battery: false
13:06:03:WU01:FS01:0x22: On Battery: false
13:06:03:WU01:FS01:0x22: UTC Offset: -7
13:06:03:WU01:FS01:0x22: PID: 5530
13:06:03:WU01:FS01:0x22: CWD: /home/<redacted>/snap/folding-at-home-fcole90/common/work
13:06:03:WU01:FS01:0x22:************************************ OpenMM ************************************
13:06:03:WU01:FS01:0x22: Revision: 189320d0
13:06:03:WU01:FS01:0x22:********************************************************************************
13:06:03:WU01:FS01:0x22:Project: 13428 (Run 10164, Clone 3, Gen 1)
13:06:03:WU01:FS01:0x22:Unit: 0x0000000112bc7d9a0000000027b40003
13:06:03:WU01:FS01:0x22:Reading tar file core.xml
13:06:03:WU01:FS01:0x22:Reading tar file integrator.xml.bz2
13:06:03:WU01:FS01:0x22:Reading tar file state.xml.bz2
13:06:04:WU01:FS01:0x22:Reading tar file system.xml.bz2
13:06:04:WU01:FS01:0x22:Digital signatures verified
13:06:04:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
13:06:04:WU01:FS01:0x22:Version 0.0.13
13:06:04:WU01:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
13:06:04:WU01:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
13:06:04:WU01:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
13:06:04:WU01:FS01:0x22: Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
13:06:07:WU01:FS01:0x22:There are 4 platforms available.
13:06:07:WU01:FS01:0x22:Platform 0: Reference
13:06:07:WU01:FS01:0x22:Platform 1: CPU
13:06:07:WU01:FS01:0x22:Platform 2: OpenCL
13:06:07:WU01:FS01:0x22: opencl-device 0 specified
13:06:07:WU01:FS01:0x22:Platform 3: CUDA
13:06:07:WU01:FS01:0x22: cuda-device 0 specified
13:06:16:WU01:FS01:0x22:Attempting to create CUDA context:
13:06:16:WU01:FS01:0x22: Configuring platform CUDA
13:06:29:WU01:FS01:0x22: Using CUDA and gpu 0
13:06:29:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
13:06:29:WU01:FS01:0x22:Checkpoint completed at step 0
Not sure why this happens, but it does occasionally and is easily solvable with a reboot. As the log shows the WU starts right up after a system reboot so not sure why it's failing normally. I haven't seen this on any other project recently. The last time I saw similar issues was the last time the 1342x projects were put out to run, but even then it was about once a week. This is about once every 24hrs.
Re: Project 13428 random FAILED_3, solvable with a reboot
Posted: Sun Nov 22, 2020 6:25 am
by Kamicrit
Hey there
Just wanted to add my story to the WU. With a newly minted 3090 I was folding 13428 and maybe 10mins later my system just shutdown. Happened three times, but I only noted down the WU on the last instance. Was wondering if my 650W PSU was too weak so I did 3 hours on furmark and then pushed it to the max with Intel Burn Test + Furmark. Both instances far surpassing FAH's draw from the wall (640W vs 440W). So ruling out the PSU. I find myself with WU 13428 and this thread.
Re: Project 13428 random FAILED_3, solvable with a reboot
Posted: Sun Nov 22, 2020 6:51 am
by Knish
the "recommended" PSU for a 3090 is 750 watts. FAH is hard to test for as noted thru countless threads regarding overclocks, so I wouldn't rule out something possibly getting overdrawn just yet. What did the log say?
Re: Project 13428 random FAILED_3, solvable with a reboot
Posted: Sun Nov 22, 2020 2:01 pm
by mgetz
Kamicrit wrote:Hey there
Was wondering if my 650W PSU was too weak so I did 3 hours on furmark and then pushed it to the max with Intel Burn Test + Furmark. Both instances far surpassing FAH's draw from the wall (640W vs 440W). So ruling out the PSU. I find myself with WU 13428 and this thread.
So this is a completely different issue, this is something likely to do with either how the WU interacts with the Nvidia driver or something odd in the WU itself. When I said rebooting I'm not talking something automatic, if you notice in the log I'm pausing folding, rebooting, and then starting it again and everything is fine which is odd.
It's worth noting that Furmark and burn tests don't actually test the power system appropriately because they don't simulate the transients that FAH or gaming can produce, they pretty much only test sustained power and thermal solutions. Whereas FAH and gaming are far more spiky in how they do things, I can literally hear the fans ramp down when the WU does a checkpoint, then spike again as computation ramps up once more.
Re: Project 13428 random FAILED_3, solvable with a reboot
Posted: Sun Nov 22, 2020 7:54 pm
by Kamicrit
Hello again
During my furmark testing, I did try to simulate extreme swings in power draw with enabling/disabling the donut. Starting to wonder if this is a CUDA thing? My old 1070 never crashed, PSU in question is a Seasonic G-650. Projects 13428, 14905, 14904 have crashed for me.
Code: Select all
*********************** Log Started 2020-11-22T19:57:02Z ***********************
19:57:02:******************************* libFAH ********************************
19:57:02: Date: Oct 20 2020
19:57:02: Time: 13:36:55
19:57:02: Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
19:57:02: Branch: master
19:57:02: Compiler: Visual C++ 2015
19:57:02: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
19:57:02: Platform: win32 10
19:57:02: Bits: 32
19:57:02: Mode: Release
19:57:02:****************************** FAHClient ******************************
19:57:02: Version: 7.6.21
19:57:02: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:57:02: Copyright: 2020 foldingathome.org
19:57:02: Homepage: https://foldingathome.org/
19:57:02: Date: Oct 20 2020
19:57:02: Time: 13:41:04
19:57:02: Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
19:57:02: Branch: master
19:57:02: Compiler: Visual C++ 2015
19:57:02: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
19:57:02: Platform: win32 10
19:57:02: Bits: 32
19:57:02: Mode: Release
19:57:02: Config: C:\Users\Edward\AppData\Roaming\FAHClient\config.xml
19:57:02:******************************** CBang ********************************
19:57:02: Date: Oct 20 2020
19:57:02: Time: 11:36:18
19:57:02: Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
19:57:02: Branch: master
19:57:02: Compiler: Visual C++ 2015
19:57:02: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
19:57:02: Platform: win32 10
19:57:02: Bits: 32
19:57:02: Mode: Release
19:57:02:******************************* System ********************************
19:57:02: CPU: Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
19:57:02: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:57:02: CPUs: 6
19:57:02: Memory: 15.94GiB
19:57:02: Free Memory: 11.66GiB
19:57:02: Threads: WINDOWS_THREADS
19:57:02: OS Version: 6.2
19:57:02: Has Battery: false
19:57:02: On Battery: false
19:57:02: UTC Offset: -8
19:57:02: PID: 15684
19:57:02: CWD: C:\Users\Edward\AppData\Roaming\FAHClient
19:57:02: Win32 Service: false
19:57:02: OS: Windows 10 Home
19:57:02: OS Arch: AMD64
19:57:02: GPUs: 1
19:57:02: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 GA102 [GeForce RTX 3090]
19:57:02: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:8.6 Driver:11.1
19:57:02:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:457.30
19:57:02:***********************************************************************
19:57:03:<config>
19:57:03: <!-- Folding Core -->
19:57:03: <checkpoint v='30'/>
19:57:03:
19:57:03: <!-- Network -->
19:57:03: <proxy v=':8080'/>
19:57:03:
19:57:03: <!-- User Information -->
19:57:03: <passkey v='*****'/>
19:57:03: <team v='230362'/>
19:57:03: <user v='KamiCrit'/>
19:57:03:
19:57:03: <!-- Folding Slots -->
19:57:03: <slot id='1' type='GPU'>
19:57:03: <pci-bus v='1'/>
19:57:03: <pci-slot v='0'/>
19:57:03: </slot>
19:57:03:</config>
19:57:03:Trying to access database...
19:57:03:Successfully acquired database lock
19:57:03:FS01:Initialized folding slot 01: gpu:1:0 GA102 [GeForce RTX 3090]
19:57:03:WU00:FS01:Starting
19:57:03:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Edward\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 15684 -checkpoint 30 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
19:57:03:WU00:FS01:Started FahCore on PID 16228
19:57:04:WU00:FS01:Core PID:15988
19:57:04:WU00:FS01:FahCore 0x22 started
19:57:06:WU00:FS01:0x22:*********************** Log Started 2020-11-22T19:57:05Z ***********************
19:57:10:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:57:10:WU00:FS01:0x22: Core: Core22
19:57:10:WU00:FS01:0x22: Type: 0x22
19:57:10:WU00:FS01:0x22: Version: 0.0.13
19:57:10:WU00:FS01:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:57:10:WU00:FS01:0x22: Copyright: 2020 foldingathome.org
19:57:10:WU00:FS01:0x22: Homepage: https://foldingathome.org/
19:57:11:WU00:FS01:0x22: Date: Sep 19 2020
19:57:12:WU00:FS01:0x22: Time: 02:35:58
19:57:12:WU00:FS01:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
19:57:14:WU00:FS01:0x22: Branch: core22-0.0.13
19:57:16:WU00:FS01:0x22: Compiler: Visual C++ 2015
19:57:17:WU00:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:57:18:WU00:FS01:0x22: -DOPENMM_GIT_HASH="\"189320d0\""
19:57:18:WU00:FS01:0x22: Platform: win32 10
19:57:18:WU00:FS01:0x22: Bits: 64
19:57:18:WU00:FS01:0x22: Mode: Release
19:57:18:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:57:18:WU00:FS01:0x22: <peastman@stanford.edu>
19:57:18:WU00:FS01:0x22: Args: -dir 00 -suffix 01 -version 706 -lifeline 16228 -checkpoint 30
19:57:18:WU00:FS01:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
19:57:18:WU00:FS01:0x22: nvidia -gpu 0 -gpu-usage 100
19:57:18:WU00:FS01:0x22:************************************ libFAH ************************************
19:57:18:WU00:FS01:0x22: Date: Sep 7 2020
19:57:18:WU00:FS01:0x22: Time: 19:09:56
19:57:18:WU00:FS01:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
19:57:18:WU00:FS01:0x22: Branch: HEAD
19:57:18:WU00:FS01:0x22: Compiler: Visual C++ 2015
19:57:18:WU00:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:57:18:WU00:FS01:0x22: Platform: win32 10
19:57:18:WU00:FS01:0x22: Bits: 64
19:57:18:WU00:FS01:0x22: Mode: Release
19:57:18:WU00:FS01:0x22:************************************ CBang *************************************
19:57:18:WU00:FS01:0x22: Date: Sep 7 2020
19:57:18:WU00:FS01:0x22: Time: 19:08:30
19:57:18:WU00:FS01:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
19:57:18:WU00:FS01:0x22: Branch: HEAD
19:57:18:WU00:FS01:0x22: Compiler: Visual C++ 2015
19:57:18:WU00:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:57:18:WU00:FS01:0x22: Platform: win32 10
19:57:18:WU00:FS01:0x22: Bits: 64
19:57:18:WU00:FS01:0x22: Mode: Release
19:57:18:WU00:FS01:0x22:************************************ System ************************************
19:57:18:WU00:FS01:0x22: CPU: Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
19:57:18:WU00:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:57:18:WU00:FS01:0x22: CPUs: 6
19:57:18:WU00:FS01:0x22: Memory: 15.94GiB
19:57:18:WU00:FS01:0x22:Free Memory: 11.44GiB
19:57:18:WU00:FS01:0x22: Threads: WINDOWS_THREADS
19:57:18:WU00:FS01:0x22: OS Version: 6.2
19:57:18:WU00:FS01:0x22:Has Battery: false
19:57:18:WU00:FS01:0x22: On Battery: false
19:57:18:WU00:FS01:0x22: UTC Offset: -8
19:57:18:WU00:FS01:0x22: PID: 15988
19:57:18:WU00:FS01:0x22: CWD: C:\Users\Edward\AppData\Roaming\FAHClient\work
19:57:18:WU00:FS01:0x22:************************************ OpenMM ************************************
19:57:18:WU00:FS01:0x22: Revision: 189320d0
19:57:18:WU00:FS01:0x22:********************************************************************************
19:57:18:WU00:FS01:0x22:Project: 14904 (Run 137, Clone 2, Gen 164)
19:57:18:WU00:FS01:0x22:Unit: 0x000000e481d59d695f4ec9e52e5b6f72
19:57:18:WU00:FS01:0x22:Digital signatures verified
19:57:18:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:57:18:WU00:FS01:0x22:Version 0.0.13
19:57:18:WU00:FS01:0x22: Checkpoint write interval: 100000 steps (5%) [20 total]
19:57:18:WU00:FS01:0x22: JSON viewer frame write interval: 20000 steps (1%) [100 total]
19:57:18:WU00:FS01:0x22: XTC frame write interval: 50000 steps (2.5%) [40 total]
19:57:18:WU00:FS01:0x22: Global context and integrator variables write interval: disabled
19:57:20:WU00:FS01:0x22:There are 4 platforms available.
19:57:20:WU00:FS01:0x22:Platform 0: Reference
19:57:20:WU00:FS01:0x22:Platform 1: CPU
19:57:20:WU00:FS01:0x22:Platform 2: OpenCL
19:57:20:WU00:FS01:0x22: opencl-device 0 specified
19:57:20:WU00:FS01:0x22:Platform 3: CUDA
19:57:20:WU00:FS01:0x22: cuda-device 0 specified
19:57:35:WU00:FS01:0x22:Attempting to create CUDA context:
19:57:35:WU00:FS01:0x22: Configuring platform CUDA
19:57:40:WU00:FS01:0x22: Using CUDA and gpu 0
19:57:40:WU00:FS01:0x22:Completed 900000 out of 2000000 steps (45%)
19:58:21:WU00:FS01:0x22:Completed 920000 out of 2000000 steps (46%)
19:59:00:WU00:FS01:0x22:Completed 940000 out of 2000000 steps (47%)
Re: Project 13428 random FAILED_3, solvable with a reboot
Posted: Mon Nov 30, 2020 5:02 pm
by mgetz
Kamicrit wrote:Hello again
During my furmark testing, I did try to simulate extreme swings in power draw with enabling/disabling the donut. Starting to wonder if this is a CUDA thing? My old 1070 never crashed, PSU in question is a Seasonic G-650. Projects 13428, 14905, 14904 have crashed for me.
No you're specific issue is a 30 series NVidia issue, where the GPU can create transients that are larger than the capacitance of the PSU and thus triggers Over current protection (OCP).
Separately:
FWIW I did seem to solve the issue by power limiting my GPU back to just below the NVidia stock power limit... not sure why that works but it seems to keep clocks at a place the WU doesn't seem to error on.
Re: Project 13428 random FAILED_3, solvable with a reboot
Posted: Wed Dec 16, 2020 6:49 am
by PantherX
Kamicrit wrote:...During my furmark testing, I did try to simulate extreme swings in power draw with enabling/disabling the donut....
Furmark or other common benchmarks tend to use the visualization/rendering elements of the GPU. However, F@H uses the compute elements of the GPU. Thus, they are two physically different pathways on the GPUs so you can't compare them directly.
Kamicrit wrote:...Starting to wonder if this is a CUDA thing?...
On supported Nvidia GPUs, CUDA would be more optimized then OpenCL which would mean that it would push the GPU even harder due to the extra optimizations.
Kamicrit wrote:...My old 1070 never crashed, PSU in question is a Seasonic G-650. Projects 13428, 14905, 14904 have crashed for me...
GTX 1070 has the recommended PSU of 500 Watts so your 650 met that requirement:
https://www.nvidia.com/en-in/geforce/pr ... -gtx-1070/
RTX 3090 has the recommended PSU of 750 Watts so your 650 does not met that requirement:
https://www.nvidia.com/en-us/geforce/gr ... /rtx-3090/