Page 1 of 1

13405 - Two failed WUs in 24-hours

Posted: Tue May 12, 2020 4:56 pm
by SAK917
First time posting an error, but since I have enabled the <client-type v='advanced'/> I believe this is the protocol? If not, please correct me...

System is not overclocked.

System Log:

Code: Select all

*********************** Log Started 2020-05-11T18:09:06Z ***********************
18:09:06:Trying to access database...
18:09:06:Successfully acquired database lock
18:09:06:Downloading GPUs.txt from assign1.foldingathome.org:80
18:09:06:Connecting to assign1.foldingathome.org:80
18:09:06:Read GPUs.txt
18:09:07:Enabled folding slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448
18:09:07:Enabled folding slot 00: READY cpu:8
18:09:07:****************************** FAHClient ******************************
18:09:07:        Version: 7.6.13
18:09:07:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:09:07:      Copyright: 2020 foldingathome.org
18:09:07:       Homepage: https://foldingathome.org/
18:09:07:           Date: Apr 27 2020
18:09:07:           Time: 21:21:01
18:09:07:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
18:09:07:         Branch: master
18:09:07:       Compiler: Visual C++ 2008
18:09:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:09:07:       Platform: win32 10
18:09:07:           Bits: 32
18:09:07:           Mode: Release
18:09:07:         Config: E:\FAHClient\config.xml
18:09:07:******************************** CBang ********************************
18:09:07:           Date: Apr 24 2020
18:09:07:           Time: 17:07:55
18:09:07:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
18:09:07:         Branch: master
18:09:07:       Compiler: Visual C++ 2008
18:09:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:09:07:       Platform: win32 10
18:09:07:           Bits: 32
18:09:07:           Mode: Release
18:09:07:******************************* System ********************************
18:09:07:            CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
18:09:07:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
18:09:07:           CPUs: 16
18:09:07:         Memory: 31.86GiB
18:09:07:    Free Memory: 26.96GiB
18:09:07:        Threads: WINDOWS_THREADS
18:09:07:     OS Version: 6.2
18:09:07:    Has Battery: true
18:09:07:     On Battery: false
18:09:07:     UTC Offset: -7
18:09:07:            PID: 16772
18:09:07:            CWD: E:\FAHClient
18:09:07:  Win32 Service: false
18:09:07:             OS: Windows 10 Enterprise
18:09:07:        OS Arch: AMD64
18:09:07:           GPUs: 1
18:09:07:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti Rev. A]
18:09:07:                 M 13448
18:09:07:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.2
18:09:07:OpenCL Device 0: Platform:0 Device:0 Bus:NA Slot:NA Compute:2.1 Driver:26.20
18:09:07:OpenCL Device 2: Platform:1 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:442.92
18:09:07:******************************* libFAH ********************************
18:09:07:           Date: Apr 15 2020
18:09:07:           Time: 14:53:14
18:09:07:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
18:09:07:         Branch: master
18:09:07:       Compiler: Visual C++ 2008
18:09:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:09:07:       Platform: win32 10
18:09:07:           Bits: 32
18:09:07:           Mode: Release
18:09:07:***********************************************************************
18:09:07:<config>
18:09:07:  <!-- Folding Core -->
18:09:07:  <checkpoint v='20'/>
18:09:07:
18:09:07:  <!-- Folding Slot Configuration -->
18:09:07:  <client-type v='advanced'/>
18:09:07:
18:09:07:  <!-- Network -->
18:09:07:  <proxy v=':8080'/>
18:09:07:
18:09:07:  <!-- Slot Control -->
18:09:07:  <power v='medium'/>
18:09:07:
18:09:07:  <!-- User Information -->
18:09:07:  <passkey v='*****'/>
18:09:07:  <team v='*********'/>
18:09:07:  <user v='**********'/>
18:09:07:
18:09:07:  <!-- Folding Slots -->
18:09:07:  <slot id='1' type='GPU'/>
18:09:07:  <slot id='0' type='CPU'>
18:09:07:    <cpus v='8'/>
18:09:07:  </slot>
18:09:07:</config>
Have had two 13405 WUs (both Gen 0) fail in the past 24 hours: Project: 13405 (Run 634, Clone 3, Gen 0) and Project: 13405 (Run 464, Clone 53, Gen 0)

In the past 24 hours have also successfully folded two Project 13405 WUs (Run 203, Clone 54, Gen 1 and Run 646, Clone 13, Gen 1), in addition to four Project 13404 WUs.

Project: 13405 (Run 634, Clone 3, Gen 0)

Code: Select all

******************************* Date: 2020-05-12 *******************************
00:45:09:WU00:FS01:Connecting to assign1.foldingathome.org:80
00:45:09:WU00:FS01:Assigned to work server 18.188.125.154
00:45:09:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
00:45:09:WU00:FS01:Connecting to 18.188.125.154:8080
00:45:11:WU00:FS01:Downloading 6.36MiB
00:45:13:WU00:FS01:Download complete
00:45:13:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13405 run:634 clone:3 gen:0 core:0x22 unit:0x0000000412bc7d9a5eb97d3bc5a97f36
00:46:11:WU00:FS01:Starting
00:46:11:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 16772 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu 0
00:46:11:WU00:FS01:Started FahCore on PID 19056
00:46:11:WU00:FS01:Core PID:15680
00:46:11:WU00:FS01:FahCore 0x22 started
00:46:12:WU00:FS01:0x22:*********************** Log Started 2020-05-12T00:46:11Z ***********************
00:46:12:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:46:12:WU00:FS01:0x22:       Type: 0x22
00:46:12:WU00:FS01:0x22:       Core: Core22
00:46:12:WU00:FS01:0x22:    Website: https://foldingathome.org/
00:46:12:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
00:46:12:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
00:46:12:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
00:46:12:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 19056 -checkpoint 20
00:46:12:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device
00:46:12:WU00:FS01:0x22:             0 -gpu 0
00:46:12:WU00:FS01:0x22:     Config: <none>
00:46:12:WU00:FS01:0x22:************************************ Build *************************************
00:46:12:WU00:FS01:0x22:    Version: 0.0.5
00:46:12:WU00:FS01:0x22:       Date: Apr 22 2020
00:46:12:WU00:FS01:0x22:       Time: 04:42:59
00:46:12:WU00:FS01:0x22: Repository: Git
00:46:12:WU00:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
00:46:12:WU00:FS01:0x22:     Branch: HEAD
00:46:12:WU00:FS01:0x22:   Compiler: Visual C++ 2008
00:46:12:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:46:12:WU00:FS01:0x22:   Platform: win32 10
00:46:12:WU00:FS01:0x22:       Bits: 64
00:46:12:WU00:FS01:0x22:       Mode: Release
00:46:12:WU00:FS01:0x22:************************************ System ************************************
00:46:12:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
00:46:12:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
00:46:12:WU00:FS01:0x22:       CPUs: 16
00:46:12:WU00:FS01:0x22:     Memory: 31.86GiB
00:46:12:WU00:FS01:0x22:Free Memory: 21.70GiB
00:46:12:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
00:46:12:WU00:FS01:0x22: OS Version: 6.2
00:46:12:WU00:FS01:0x22:Has Battery: true
00:46:12:WU00:FS01:0x22: On Battery: false
00:46:12:WU00:FS01:0x22: UTC Offset: -7
00:46:12:WU00:FS01:0x22:        PID: 15680
00:46:12:WU00:FS01:0x22:        CWD: E:\FAHClient\work
00:46:12:WU00:FS01:0x22:         OS: Windows 10 Pro
00:46:12:WU00:FS01:0x22:    OS Arch: AMD64
00:46:12:WU00:FS01:0x22:********************************************************************************
00:46:12:WU00:FS01:0x22:Project: 13405 (Run 634, Clone 3, Gen 0)
00:46:12:WU00:FS01:0x22:Unit: 0x0000000412bc7d9a5eb97d3bc5a97f36
00:46:12:WU00:FS01:0x22:Reading tar file core.xml
00:46:12:WU00:FS01:0x22:Reading tar file integrator.xml
00:46:12:WU00:FS01:0x22:Reading tar file state.xml
00:46:12:WU00:FS01:0x22:Reading tar file system.xml
00:46:12:WU00:FS01:0x22:Digital signatures verified
00:46:12:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:46:12:WU00:FS01:0x22:Version 0.0.5
00:46:20:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
00:46:20:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:47:15:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
00:47:26:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
00:47:26:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
00:47:36:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
00:47:36:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
00:47:47:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
00:47:47:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
00:47:47:WU00:FS01:0x22:ERROR:114: Max Retries Reached
00:47:47:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
00:47:47:WU00:FS01:0x22:Saving result file badstate-0.xml
00:47:47:WU00:FS01:0x22:Saving result file badstate-1.xml
00:47:47:WU00:FS01:0x22:Saving result file badstate-2.xml
00:47:47:WU00:FS01:0x22:Saving result file checkpt.crc
00:47:47:WU00:FS01:0x22:Saving result file globals.csv
00:47:47:WU00:FS01:0x22:Saving result file science.log
00:47:47:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
00:47:47:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:47:47:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13405 run:634 clone:3 gen:0 core:0x22 unit:0x0000000412bc7d9a5eb97d3bc5a97f36
00:47:47:WU00:FS01:Uploading 104.50KiB to 18.188.125.154
00:47:47:WU00:FS01:Connecting to 18.188.125.154:8080
00:47:48:WU00:FS01:Upload complete
00:47:48:WU00:FS01:Server responded WORK_ACK (400)
00:47:48:WU00:FS01:Cleaning up
Project: 13405 (Run 464, Clone 53, Gen 0)

Code: Select all

******************************* Date: 2020-05-12 *******************************
09:06:08:WU00:FS01:Connecting to assign1.foldingathome.org:80
09:06:08:WARNING:WU00:FS01:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
09:06:08:WU00:FS01:Connecting to assign2.foldingathome.org:80
09:06:09:WU00:FS01:Assigned to work server 18.188.125.154
09:06:09:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
09:06:09:WU00:FS01:Connecting to 18.188.125.154:8080
09:06:10:WU00:FS01:Downloading 6.36MiB
09:06:12:WU00:FS01:Download complete
09:06:12:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13405 run:464 clone:53 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb5846c5e680eb0
09:07:10:WU00:FS01:Starting
09:07:10:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 16772 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu 0
09:07:10:WU00:FS01:Started FahCore on PID 9188
09:07:10:WU00:FS01:Core PID:19036
09:07:10:WU00:FS01:FahCore 0x22 started
09:07:11:WU00:FS01:0x22:*********************** Log Started 2020-05-12T09:07:10Z ***********************
09:07:11:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
09:07:11:WU00:FS01:0x22:       Type: 0x22
09:07:11:WU00:FS01:0x22:       Core: Core22
09:07:11:WU00:FS01:0x22:    Website: https://foldingathome.org/
09:07:11:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
09:07:11:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
09:07:11:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
09:07:11:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 9188 -checkpoint 20
09:07:11:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device
09:07:11:WU00:FS01:0x22:             0 -gpu 0
09:07:11:WU00:FS01:0x22:     Config: <none>
09:07:11:WU00:FS01:0x22:************************************ Build *************************************
09:07:11:WU00:FS01:0x22:    Version: 0.0.5
09:07:11:WU00:FS01:0x22:       Date: Apr 22 2020
09:07:11:WU00:FS01:0x22:       Time: 04:42:59
09:07:11:WU00:FS01:0x22: Repository: Git
09:07:11:WU00:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
09:07:11:WU00:FS01:0x22:     Branch: HEAD
09:07:11:WU00:FS01:0x22:   Compiler: Visual C++ 2008
09:07:11:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:07:11:WU00:FS01:0x22:   Platform: win32 10
09:07:11:WU00:FS01:0x22:       Bits: 64
09:07:11:WU00:FS01:0x22:       Mode: Release
09:07:11:WU00:FS01:0x22:************************************ System ************************************
09:07:11:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
09:07:11:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
09:07:11:WU00:FS01:0x22:       CPUs: 16
09:07:11:WU00:FS01:0x22:     Memory: 31.86GiB
09:07:11:WU00:FS01:0x22:Free Memory: 20.99GiB
09:07:11:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
09:07:11:WU00:FS01:0x22: OS Version: 6.2
09:07:11:WU00:FS01:0x22:Has Battery: true
09:07:11:WU00:FS01:0x22: On Battery: false
09:07:11:WU00:FS01:0x22: UTC Offset: -7
09:07:11:WU00:FS01:0x22:        PID: 19036
09:07:11:WU00:FS01:0x22:        CWD: E:\FAHClient\work
09:07:11:WU00:FS01:0x22:         OS: Windows 10 Pro
09:07:11:WU00:FS01:0x22:    OS Arch: AMD64
09:07:11:WU00:FS01:0x22:********************************************************************************
09:07:11:WU00:FS01:0x22:Project: 13405 (Run 464, Clone 53, Gen 0)
09:07:11:WU00:FS01:0x22:Unit: 0x0000000212bc7d9a5eb5846c5e680eb0
09:07:11:WU00:FS01:0x22:Reading tar file core.xml
09:07:11:WU00:FS01:0x22:Reading tar file integrator.xml
09:07:11:WU00:FS01:0x22:Reading tar file state.xml
09:07:11:WU00:FS01:0x22:Reading tar file system.xml
09:07:11:WU00:FS01:0x22:Digital signatures verified
09:07:11:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:07:11:WU00:FS01:0x22:Version 0.0.5
09:07:18:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
09:07:18:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:08:12:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
09:09:07:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
09:10:01:WU00:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
09:10:55:WU00:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
09:11:50:WU00:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
09:12:44:WU00:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
09:13:38:WU00:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
09:14:32:WU00:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
09:15:27:WU00:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
09:16:21:WU00:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
09:17:15:WU00:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
09:18:09:WU00:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
09:19:04:WU00:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
09:19:58:WU00:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
09:20:52:WU00:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
09:21:47:WU00:FS01:0x22:Completed 160000 out of 1000000 steps (16%)
09:22:41:WU00:FS01:0x22:Completed 170000 out of 1000000 steps (17%)
09:23:35:WU00:FS01:0x22:Completed 180000 out of 1000000 steps (18%)
09:24:29:WU00:FS01:0x22:Completed 190000 out of 1000000 steps (19%)
09:25:24:WU00:FS01:0x22:Completed 200000 out of 1000000 steps (20%)
09:26:18:WU00:FS01:0x22:Completed 210000 out of 1000000 steps (21%)
09:27:12:WU00:FS01:0x22:Completed 220000 out of 1000000 steps (22%)
09:28:06:WU00:FS01:0x22:Completed 230000 out of 1000000 steps (23%)
09:29:00:WU00:FS01:0x22:Completed 240000 out of 1000000 steps (24%)
09:29:55:WU00:FS01:0x22:Completed 250000 out of 1000000 steps (25%)
09:30:56:WU00:FS01:0x22:Completed 260000 out of 1000000 steps (26%)
09:31:56:WU00:FS01:0x22:Completed 270000 out of 1000000 steps (27%)
09:32:55:WU00:FS01:0x22:Completed 280000 out of 1000000 steps (28%)
09:33:54:WU00:FS01:0x22:Completed 290000 out of 1000000 steps (29%)
09:34:54:WU00:FS01:0x22:Completed 300000 out of 1000000 steps (30%)
09:35:54:WU00:FS01:0x22:Completed 310000 out of 1000000 steps (31%)
09:36:53:WU00:FS01:0x22:Completed 320000 out of 1000000 steps (32%)
09:37:53:WU00:FS01:0x22:Completed 330000 out of 1000000 steps (33%)
09:38:52:WU00:FS01:0x22:Completed 340000 out of 1000000 steps (34%)
09:39:52:WU00:FS01:0x22:Completed 350000 out of 1000000 steps (35%)
09:40:51:WU00:FS01:0x22:Completed 360000 out of 1000000 steps (36%)
09:41:51:WU00:FS01:0x22:Completed 370000 out of 1000000 steps (37%)
09:42:50:WU00:FS01:0x22:Completed 380000 out of 1000000 steps (38%)
09:43:50:WU00:FS01:0x22:Completed 390000 out of 1000000 steps (39%)
09:44:49:WU00:FS01:0x22:Completed 400000 out of 1000000 steps (40%)
09:45:49:WU00:FS01:0x22:Completed 410000 out of 1000000 steps (41%)
09:46:48:WU00:FS01:0x22:Completed 420000 out of 1000000 steps (42%)
09:47:48:WU00:FS01:0x22:Completed 430000 out of 1000000 steps (43%)
09:48:47:WU00:FS01:0x22:Completed 440000 out of 1000000 steps (44%)
09:49:47:WU00:FS01:0x22:Completed 450000 out of 1000000 steps (45%)
09:50:46:WU00:FS01:0x22:Completed 460000 out of 1000000 steps (46%)
09:51:46:WU00:FS01:0x22:Completed 470000 out of 1000000 steps (47%)
09:52:45:WU00:FS01:0x22:Completed 480000 out of 1000000 steps (48%)
09:53:45:WU00:FS01:0x22:Completed 490000 out of 1000000 steps (49%)
09:54:45:WU00:FS01:0x22:Completed 500000 out of 1000000 steps (50%)
09:55:41:WU00:FS01:0x22:Completed 510000 out of 1000000 steps (51%)
09:56:35:WU00:FS01:0x22:Completed 520000 out of 1000000 steps (52%)
09:57:08:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:57:08:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
09:57:19:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:57:19:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
09:57:28:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:57:28:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
09:57:28:WU00:FS01:0x22:ERROR:114: Max Retries Reached
09:57:28:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
09:57:28:WU00:FS01:0x22:Saving result file badstate-0.xml
09:57:28:WU00:FS01:0x22:Saving result file badstate-1.xml
09:57:29:WU00:FS01:0x22:Saving result file badstate-2.xml
09:57:29:WU00:FS01:0x22:Saving result file checkpointState.xml
09:57:29:WU00:FS01:0x22:Saving result file checkpt.crc
09:57:29:WU00:FS01:0x22:Saving result file globals.csv
09:57:29:WU00:FS01:0x22:Saving result file positions.xtc
09:57:29:WU00:FS01:0x22:Saving result file science.log
09:57:29:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:57:30:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:57:30:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13405 run:464 clone:53 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb5846c5e680eb0
09:57:30:WU00:FS01:Uploading 4.98MiB to 18.188.125.154
09:57:30:WU00:FS01:Connecting to 18.188.125.154:8080
09:57:36:WU00:FS01:Upload 48.99%
09:57:40:WU00:FS01:Upload complete
09:57:41:WU00:FS01:Server responded WORK_ACK (400)
09:57:41:WU00:FS01:Cleaning up
Hope this helps, please let me know if there is anything I should do differently when reporting issues or if there is no need to do so.

Re: 13405 - Two failed WUs in 24-hours

Posted: Tue May 12, 2020 9:47 pm
by Joe_H
This looks good, there is a higher chance of "bad" WUs on these projects as the researcher is doing some things differently than normal. He has responded to some of the other error reports on these projects.

Re: 13405 - Two failed WUs in 24-hours

Posted: Tue May 12, 2020 10:25 pm
by JohnChodera
Thanks for the report, @SAK917!

@Joe_H is right---we're testing out some new workloads that help us prioritize compounds for synthesis via the COVID Moonshot (https://covid.postera.ai/covid/submissions/compounds) and are continuing to refine our process to make everything more stable!
The next batch of projects should make significant improvements over the first batch.

Thanks so much for your patience!

~ John Chodera // MSKCC

Re: 13405 - Two failed WUs in 24-hours

Posted: Tue May 12, 2020 10:55 pm
by SAK917
No worries on my patience, I am more than happy to help and am not worried about points or failures if it helps move the science forward. I just wasn't sure if the client reported back this information or if I needed to report it here on this forum? In the future, should I continue to report failed WUs as I did above? And if so, did I include all the information you need?

Thanks for YOUR patience as I figure this all out...

Re: 13405 - Two failed WUs in 24-hours

Posted: Tue May 12, 2020 11:11 pm
by SAK917
And in case it helps, here is one more I missed the first time I scanned for failed WUs:

Project: 13405 (Run 624, Clone 16, Gen 0)

Code: Select all

******************************* Date: 2020-05-12 *******************************
03:55:17:WU01:FS01:Connecting to assign1.foldingathome.org:80
03:55:17:WARNING:WU01:FS01:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
03:55:17:WU01:FS01:Connecting to assign2.foldingathome.org:80
03:55:18:WU01:FS01:Assigned to work server 18.188.125.154
03:55:18:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
03:55:18:WU01:FS01:Connecting to 18.188.125.154:8080
03:55:19:WU01:FS01:Downloading 6.36MiB
03:55:21:WU01:FS01:Download complete
03:55:21:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13405 run:624 clone:16 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb97d3c5824f777
03:56:18:WU01:FS01:Starting
03:56:18:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 16772 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu 0
03:56:18:WU01:FS01:Started FahCore on PID 3992
03:56:18:WU01:FS01:Core PID:9600
03:56:18:WU01:FS01:FahCore 0x22 started
03:56:19:WU01:FS01:0x22:*********************** Log Started 2020-05-12T03:56:18Z ***********************
03:56:19:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
03:56:19:WU01:FS01:0x22:       Type: 0x22
03:56:19:WU01:FS01:0x22:       Core: Core22
03:56:19:WU01:FS01:0x22:    Website: https://foldingathome.org/
03:56:19:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
03:56:19:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
03:56:19:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
03:56:19:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 3992 -checkpoint 20
03:56:19:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device
03:56:19:WU01:FS01:0x22:             0 -gpu 0
03:56:19:WU01:FS01:0x22:     Config: <none>
03:56:19:WU01:FS01:0x22:************************************ Build *************************************
03:56:19:WU01:FS01:0x22:    Version: 0.0.5
03:56:19:WU01:FS01:0x22:       Date: Apr 22 2020
03:56:19:WU01:FS01:0x22:       Time: 04:42:59
03:56:19:WU01:FS01:0x22: Repository: Git
03:56:19:WU01:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
03:56:19:WU01:FS01:0x22:     Branch: HEAD
03:56:19:WU01:FS01:0x22:   Compiler: Visual C++ 2008
03:56:19:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:56:19:WU01:FS01:0x22:   Platform: win32 10
03:56:19:WU01:FS01:0x22:       Bits: 64
03:56:19:WU01:FS01:0x22:       Mode: Release
03:56:19:WU01:FS01:0x22:************************************ System ************************************
03:56:19:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
03:56:19:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
03:56:19:WU01:FS01:0x22:       CPUs: 16
03:56:19:WU01:FS01:0x22:     Memory: 31.86GiB
03:56:19:WU01:FS01:0x22:Free Memory: 21.37GiB
03:56:19:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
03:56:19:WU01:FS01:0x22: OS Version: 6.2
03:56:19:WU01:FS01:0x22:Has Battery: true
03:56:19:WU01:FS01:0x22: On Battery: false
03:56:19:WU01:FS01:0x22: UTC Offset: -7
03:56:19:WU01:FS01:0x22:        PID: 9600
03:56:19:WU01:FS01:0x22:        CWD: E:\FAHClient\work
03:56:19:WU01:FS01:0x22:         OS: Windows 10 Pro
03:56:19:WU01:FS01:0x22:    OS Arch: AMD64
03:56:19:WU01:FS01:0x22:********************************************************************************
03:56:19:WU01:FS01:0x22:Project: 13405 (Run 624, Clone 16, Gen 0)
03:56:19:WU01:FS01:0x22:Unit: 0x0000000212bc7d9a5eb97d3c5824f777
03:56:19:WU01:FS01:0x22:Reading tar file core.xml
03:56:19:WU01:FS01:0x22:Reading tar file integrator.xml
03:56:19:WU01:FS01:0x22:Reading tar file state.xml
03:56:19:WU01:FS01:0x22:Reading tar file system.xml
03:56:19:WU01:FS01:0x22:Digital signatures verified
03:56:19:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
03:56:19:WU01:FS01:0x22:Version 0.0.5
03:56:28:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
03:56:28:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:57:22:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
03:58:15:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
03:59:08:WU01:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
04:00:01:WU01:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
04:00:54:WU01:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
04:01:47:WU01:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
04:02:40:WU01:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
04:03:34:WU01:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
04:04:27:WU01:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
04:05:20:WU01:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
04:06:13:WU01:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
04:07:06:WU01:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
04:08:00:WU01:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
04:08:56:WU01:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
04:09:49:WU01:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
04:10:42:WU01:FS01:0x22:Completed 160000 out of 1000000 steps (16%)
04:11:36:WU01:FS01:0x22:Completed 170000 out of 1000000 steps (17%)
04:12:29:WU01:FS01:0x22:Completed 180000 out of 1000000 steps (18%)
04:13:22:WU01:FS01:0x22:Completed 190000 out of 1000000 steps (19%)
04:14:15:WU01:FS01:0x22:Completed 200000 out of 1000000 steps (20%)
04:15:09:WU01:FS01:0x22:Completed 210000 out of 1000000 steps (21%)
04:16:02:WU01:FS01:0x22:Completed 220000 out of 1000000 steps (22%)
04:16:55:WU01:FS01:0x22:Completed 230000 out of 1000000 steps (23%)
04:17:48:WU01:FS01:0x22:Completed 240000 out of 1000000 steps (24%)
04:18:41:WU01:FS01:0x22:Completed 250000 out of 1000000 steps (25%)
04:18:45:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:18:45:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
04:18:55:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:18:55:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
04:19:06:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:19:06:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
04:19:06:WU01:FS01:0x22:ERROR:114: Max Retries Reached
04:19:06:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
04:19:06:WU01:FS01:0x22:Saving result file badstate-0.xml
04:19:06:WU01:FS01:0x22:Saving result file badstate-1.xml
04:19:06:WU01:FS01:0x22:Saving result file badstate-2.xml
04:19:06:WU01:FS01:0x22:Saving result file checkpointState.xml
04:19:06:WU01:FS01:0x22:Saving result file checkpt.crc
04:19:06:WU01:FS01:0x22:Saving result file globals.csv
04:19:06:WU01:FS01:0x22:Saving result file positions.xtc
04:19:06:WU01:FS01:0x22:Saving result file science.log
04:19:06:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
04:19:07:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:19:07:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13405 run:624 clone:16 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb97d3c5824f777
04:19:07:WU01:FS01:Uploading 4.92MiB to 18.188.125.154
04:19:07:WU01:FS01:Connecting to 18.188.125.154:8080
04:19:13:WU01:FS01:Upload 66.12%
04:19:16:WU01:FS01:Upload complete
04:19:16:WU01:FS01:Server responded WORK_ACK (400)
04:19:16:WU01:FS01:Cleaning up

Re: 13405 - Two failed WUs in 24-hours

Posted: Wed May 13, 2020 4:47 am
by Nuitari
@JohnChodera Do report of failed WUs help you for those projects? Over the last 48h:

GeForce GTX 1660 SUPER:
02:56:08:WU01:FS02:0x22:ERROR:exception: Error invoking kernel gridSpreadCharge: clEnqueueNDRangeKernel (-4)
02:56:09:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13405 run:295 clone:29 gen:3 core:0x22 unit:0x0000000b12bc7d9a5eb3a38d95a537c7

02:56:21:WU02:FS02:0x22:ERROR:exception: Error invoking kernel computeExclusionParameters: clEnqueueNDRangeKernel (-4)
02:56:21:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:13405 run:621 clone:15 gen:0 core:0x22 unit:0x0000000112bc7d9a5eb97d3c6300afc9

02:56:34:WU01:FS02:0x22:ERROR:exception: Error invoking kernel clearFourBuffers: clEnqueueNDRangeKernel (-4)
02:56:35:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13404 run:400 clone:11 gen:2 core:0x22 unit:0x0000000712bc7d9a5eb37aa655db589f

02:56:47:WU02:FS02:0x22:ERROR:exception: Error uploading array posq: clEnqueueWriteBuffer (-4)
02:56:48:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:13404 run:542 clone:7 gen:0 core:0x22 unit:0x0000000112bc7d9a5eb97d45508c54bb


18:14:04:WU01:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
18:14:04:WU01:FS02:0x22:Following exception occured: Particle coordinate is nan
18:14:17:WU01:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
18:14:17:WU01:FS02:0x22:Following exception occured: Particle coordinate is nan
18:14:30:WU01:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
18:14:30:WU01:FS02:0x22:Following exception occured: Particle coordinate is nan
18:14:30:WU01:FS02:0x22:ERROR:114: Max Retries Reached
18:14:32:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13404 run:459 clone:71 gen:0 core:0x22 unit:0x0000000312bc7d9a5eb584720e81e542

03:08:53:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
03:08:53:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
03:09:07:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
03:09:07:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
03:09:20:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
03:09:20:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
03:09:20:WU03:FS02:0x22:ERROR:114: Max Retries Reached
03:09:22:WU03:FS02:Sending unit results: id:03 state:SEND error:FAULTY project:13405 run:243 clone:90 gen:1 core:0x22 unit:0x0000000312bc7d9a5eb3a384d6883f24

From memory clEnqueueWriteBuffer (-4) is a lack of memory, but its a 5gb card with a few gb available, and the times I saw this running it only took ~150mb.

I did have some successes on that card:
project:13405 run:494 clone:8 gen:3

Re: 13405 - Two failed WUs in 24-hours

Posted: Wed May 13, 2020 5:06 am
by bruce
Please note that projects 13404 - 05 are hightly experimental, doing some very different kinds of analysis than traditional projects. Yes, they're experiencing a high failure rate.

Thank you for your report. When science does new things, often learning what does NOT work is as important as learning what does.

Re: 13405 - Two failed WUs in 24-hours

Posted: Wed May 13, 2020 8:07 am
by PantherX
SAK917 wrote:No worries on my patience, I am more than happy to help and am not worried about points or failures if it helps move the science forward. I just wasn't sure if the client reported back this information or if I needed to report it here on this forum? In the future, should I continue to report failed WUs as I did above? And if so, did I include all the information you need?...
Welcome to the F@H Forum SAK917,

Yep, in future, please feel free to post WUs if you have issues. Apart from the log, there's also science.log which would be stored here for you:
E:\FAHClient\work\00\00\science.log

The 00\00 will change depending on the WU ID but if you can grab the science.log, when issues occur, it will provide additional scientific data and help researchers out.

Re: 13405 - Two failed WUs in 24-hours

Posted: Thu May 14, 2020 1:00 am
by jrweiss
My 1050ti seems to be running the 13405 OK:

Code: Select all

06:19:07:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13405 run:614 clone:64 gen:0 core:0x22 unit:0x0000000012bc7d9a5eb97d3c6e820942
06:19:07:WU02:FS01:Uploading 4.99MiB to 18.188.125.154
06:19:07:WU00:FS01:Starting
06:19:07:WU02:FS01:Connecting to 18.188.125.154:8080
06:19:16:WU02:FS01:Upload complete
06:19:16:WU02:FS01:Server responded WORK_ACK (400)
06:19:16:WU02:FS01:Final credit estimate, 68977.00 points
06:19:16:WU02:FS01:Cleaning up
. . .

18:16:08:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
18:16:09:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
18:16:09:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13405 run:475 clone:75 gen:3 core:0x22 unit:0x0000000512bc7d9a5eb5846cf90408a9
18:16:09:WU00:FS01:Uploading 5.00MiB to 18.188.125.154
18:16:09:WU00:FS01:Connecting to 18.188.125.154:8080
18:16:18:WU00:FS01:Upload complete
18:16:18:WU00:FS01:Server responded WORK_ACK (400)
18:16:18:WU00:FS01:Final credit estimate, 73280.00 points

Re: 13405 - Two failed WUs in 24-hours

Posted: Thu May 14, 2020 3:28 am
by JohnChodera
> @JohnChodera Do report of failed WUs help you for those projects?

The failed WUs make their way back to us, so we see all these statistics on our end. No need to report them unless they are causing undue difficulties.

~ John Chodera // MSKCC

Re: 13405 - Two failed WUs in 24-hours

Posted: Thu May 14, 2020 5:01 am
by SAK917
OK, good to know, thanks John. I won't report any more unless they are catastrophic on my end...

Thanks