Page 1 of 1

Crashes on FahCore_22

Posted: Fri Oct 23, 2020 12:19 pm
by Familyman_19
I seem to be getting some random crashes on the GPU side of things. It happens maybe every third day or so, but I get a pop up that FahCore_22 has crashed. Of course it doesn't restart until I close the pop up, which in the most recent case was after 12 hours. Here is where the log shows the crash. Each time it has been the same error. Anything I can do in these instances or is it just something to live with?

Code: Select all

23:46:25:WU00:FS01:0x22:An exception occurred at step 1053057: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
23:46:25:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
23:46:25:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
******************************* Date: 2020-10-23 *******************************
******************************* Date: 2020-10-23 *******************************
12:09:17:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
12:09:17:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
12:09:17:WU00:FS01:Starting
12:09:17:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Mike\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 4304 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
12:09:17:WU00:FS01:Started FahCore on PID 2976
12:09:17:WU00:FS01:Core PID:8804
12:09:17:WU00:FS01:FahCore 0x22 started
12:09:17:WU00:FS01:0x22:*********************** Log Started 2020-10-23T12:09:17Z ***********************
12:09:17:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
12:09:17:WU00:FS01:0x22:       Core: Core22
12:09:17:WU00:FS01:0x22:       Type: 0x22
12:09:17:WU00:FS01:0x22:    Version: 0.0.13
12:09:17:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:09:17:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
12:09:17:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
12:09:17:WU00:FS01:0x22:       Date: Sep 19 2020
12:09:17:WU00:FS01:0x22:       Time: 02:35:58
12:09:17:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
12:09:17:WU00:FS01:0x22:     Branch: core22-0.0.13
12:09:17:WU00:FS01:0x22:   Compiler: Visual C++ 2015
12:09:17:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
12:09:17:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
12:09:17:WU00:FS01:0x22:   Platform: win32 10
12:09:17:WU00:FS01:0x22:       Bits: 64
12:09:17:WU00:FS01:0x22:       Mode: Release
12:09:17:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
12:09:17:WU00:FS01:0x22:             <peastman@stanford.edu>
12:09:17:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2976 -checkpoint 15
12:09:17:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
12:09:17:WU00:FS01:0x22:             0 -gpu 0
12:09:17:WU00:FS01:0x22:************************************ libFAH ************************************
12:09:17:WU00:FS01:0x22:       Date: Sep 7 2020
12:09:17:WU00:FS01:0x22:       Time: 19:09:56
12:09:17:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
12:09:17:WU00:FS01:0x22:     Branch: HEAD
12:09:17:WU00:FS01:0x22:   Compiler: Visual C++ 2015
12:09:17:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
12:09:17:WU00:FS01:0x22:   Platform: win32 10
12:09:17:WU00:FS01:0x22:       Bits: 64
12:09:17:WU00:FS01:0x22:       Mode: Release
12:09:17:WU00:FS01:0x22:************************************ CBang *************************************
12:09:17:WU00:FS01:0x22:       Date: Sep 7 2020
12:09:17:WU00:FS01:0x22:       Time: 19:08:30
12:09:17:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
12:09:17:WU00:FS01:0x22:     Branch: HEAD
12:09:17:WU00:FS01:0x22:   Compiler: Visual C++ 2015
12:09:17:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
12:09:17:WU00:FS01:0x22:   Platform: win32 10
12:09:17:WU00:FS01:0x22:       Bits: 64
12:09:17:WU00:FS01:0x22:       Mode: Release
12:09:17:WU00:FS01:0x22:************************************ System ************************************
12:09:17:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
12:09:17:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 26 Stepping 4
12:09:17:WU00:FS01:0x22:       CPUs: 8
12:09:17:WU00:FS01:0x22:     Memory: 23.99GiB
12:09:17:WU00:FS01:0x22:Free Memory: 19.41GiB
12:09:17:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
12:09:17:WU00:FS01:0x22: OS Version: 6.2
12:09:17:WU00:FS01:0x22:Has Battery: false
12:09:17:WU00:FS01:0x22: On Battery: false
12:09:17:WU00:FS01:0x22: UTC Offset: -4
12:09:17:WU00:FS01:0x22:        PID: 8804
12:09:17:WU00:FS01:0x22:        CWD: C:\Users\Mike\AppData\Roaming\FAHClient\work
12:09:17:WU00:FS01:0x22:************************************ OpenMM ************************************
12:09:17:WU00:FS01:0x22:   Revision: 189320d0
12:09:17:WU00:FS01:0x22:********************************************************************************
12:09:17:WU00:FS01:0x22:Project: 17309 (Run 0, Clone 6791, Gen 0)
12:09:17:WU00:FS01:0x22:Unit: 0x0000000012bc7d9a5f91cc5ca4ca0346
12:09:17:WU00:FS01:0x22:Digital signatures verified
12:09:17:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
12:09:17:WU00:FS01:0x22:Version 0.0.13
12:09:17:WU00:FS01:0x22:  Checkpoint write interval: 62500 steps (5%) [20 total]
12:09:17:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
12:09:17:WU00:FS01:0x22:  XTC frame write interval: 125000 steps (10%) [10 total]
12:09:17:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
12:09:17:WU00:FS01:0x22:There are 4 platforms available.
12:09:17:WU00:FS01:0x22:Platform 0: Reference
12:09:17:WU00:FS01:0x22:Platform 1: CPU
12:09:17:WU00:FS01:0x22:Platform 2: OpenCL
12:09:17:WU00:FS01:0x22:  opencl-device 0 specified
12:09:17:WU00:FS01:0x22:Platform 3: CUDA
12:09:17:WU00:FS01:0x22:  cuda-device 0 specified
12:09:41:WU00:FS01:0x22:Attempting to create CUDA context:
12:09:41:WU00:FS01:0x22:  Configuring platform CUDA
12:09:48:WU00:FS01:0x22:  Using CUDA and gpu 0
12:09:48:WU00:FS01:0x22:Completed 1000000 out of 1250000 steps (80%)

Re: Crashes on FahCore_22

Posted: Fri Oct 23, 2020 3:03 pm
by Joe_H
If you have't cleaned dust out of your system recently, start with that. If your GPU is overclocked, and this includes factory overclocking, try reducing the clock by a bit or run at reference speeds for your card.

Re: Crashes on FahCore_22

Posted: Fri Oct 23, 2020 7:01 pm
by foldy
FahCore_22 recently switched to CUDA. So you can try to go back to OpenCL by adding extra core options: -disable-cuda

Re: Crashes on FahCore_22

Posted: Fri Oct 23, 2020 9:08 pm
by PantherX
Also, do you have sufficient VRAM and are any other applications using VRAM on your system?

Re: Crashes on FahCore_22

Posted: Mon Oct 26, 2020 12:38 pm
by Familyman_19
Joe_H wrote:If you have't cleaned dust out of your system recently, start with that. If your GPU is overclocked, and this includes factory overclocking, try reducing the clock by a bit or run at reference speeds for your card.
I do have an overclock on the GPU. I dropped it 25MHz. We'll see if that helps. It didn't show any issues until the switch to CUDA, but maybe that was enough to push it over the edge.

Re: Crashes on FahCore_22

Posted: Mon Oct 26, 2020 12:47 pm
by Familyman_19
PantherX wrote:Also, do you have sufficient VRAM and are any other applications using VRAM on your system?
I have the 6GB 1060, currently during folding I am sitting around 10% utilization. It peaked at 30% in the last 15 hours, but I did some gaming last night. I typically pause FAH while I game. The crashes have typically occurred when I'm not using the PC for anything other than folding.

Re: Crashes on FahCore_22

Posted: Wed Oct 28, 2020 5:17 pm
by bruce
CUDA processes more work per unit time so it can push an overclock "over the line" It should be noted that FAH tends to exceed the utilization rates that you get when you run conventional overclocking benchmarks. 100% means different things to different portions of your CPU which is why FAH officially not not support overclocking. You're entirely on your own when new software comes out and we manage to increase the throughput.

Re: Crashes on FahCore_22

Posted: Fri Oct 30, 2020 7:09 pm
by AnClar
I've had one random Fah_Core_22 crash since the switchover to CUDA from OpenCL. So far, it hasn't reoccurred, but I'll keep an eye out for any more. My Folding system is a Gen1 Core i7, with an nVidia GTX970 graphics card. I've been folding with the same kit for eight months now, error-free. This may have just been a random glitch related to a particular WU. We'll see.

Re: Crashes on FahCore_22

Posted: Sun Nov 01, 2020 6:43 am
by PantherX
Occasionally, you may get a bad WU and there's nothing that you can do about it: viewtopic.php?f=19&t=16526