13405 - Two failed WUs in 24-hours

Moderators: Site Moderators, FAHC Science Team

Post Reply
SAK917
Posts: 12
Joined: Tue May 12, 2020 4:17 pm

13405 - Two failed WUs in 24-hours

Post by SAK917 »

First time posting an error, but since I have enabled the <client-type v='advanced'/> I believe this is the protocol? If not, please correct me...

System is not overclocked.

System Log:

Code: Select all

*********************** Log Started 2020-05-11T18:09:06Z ***********************
18:09:06:Trying to access database...
18:09:06:Successfully acquired database lock
18:09:06:Downloading GPUs.txt from assign1.foldingathome.org:80
18:09:06:Connecting to assign1.foldingathome.org:80
18:09:06:Read GPUs.txt
18:09:07:Enabled folding slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448
18:09:07:Enabled folding slot 00: READY cpu:8
18:09:07:****************************** FAHClient ******************************
18:09:07:        Version: 7.6.13
18:09:07:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:09:07:      Copyright: 2020 foldingathome.org
18:09:07:       Homepage: https://foldingathome.org/
18:09:07:           Date: Apr 27 2020
18:09:07:           Time: 21:21:01
18:09:07:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
18:09:07:         Branch: master
18:09:07:       Compiler: Visual C++ 2008
18:09:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:09:07:       Platform: win32 10
18:09:07:           Bits: 32
18:09:07:           Mode: Release
18:09:07:         Config: E:\FAHClient\config.xml
18:09:07:******************************** CBang ********************************
18:09:07:           Date: Apr 24 2020
18:09:07:           Time: 17:07:55
18:09:07:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
18:09:07:         Branch: master
18:09:07:       Compiler: Visual C++ 2008
18:09:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:09:07:       Platform: win32 10
18:09:07:           Bits: 32
18:09:07:           Mode: Release
18:09:07:******************************* System ********************************
18:09:07:            CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
18:09:07:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
18:09:07:           CPUs: 16
18:09:07:         Memory: 31.86GiB
18:09:07:    Free Memory: 26.96GiB
18:09:07:        Threads: WINDOWS_THREADS
18:09:07:     OS Version: 6.2
18:09:07:    Has Battery: true
18:09:07:     On Battery: false
18:09:07:     UTC Offset: -7
18:09:07:            PID: 16772
18:09:07:            CWD: E:\FAHClient
18:09:07:  Win32 Service: false
18:09:07:             OS: Windows 10 Enterprise
18:09:07:        OS Arch: AMD64
18:09:07:           GPUs: 1
18:09:07:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti Rev. A]
18:09:07:                 M 13448
18:09:07:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.2
18:09:07:OpenCL Device 0: Platform:0 Device:0 Bus:NA Slot:NA Compute:2.1 Driver:26.20
18:09:07:OpenCL Device 2: Platform:1 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:442.92
18:09:07:******************************* libFAH ********************************
18:09:07:           Date: Apr 15 2020
18:09:07:           Time: 14:53:14
18:09:07:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
18:09:07:         Branch: master
18:09:07:       Compiler: Visual C++ 2008
18:09:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:09:07:       Platform: win32 10
18:09:07:           Bits: 32
18:09:07:           Mode: Release
18:09:07:***********************************************************************
18:09:07:<config>
18:09:07:  <!-- Folding Core -->
18:09:07:  <checkpoint v='20'/>
18:09:07:
18:09:07:  <!-- Folding Slot Configuration -->
18:09:07:  <client-type v='advanced'/>
18:09:07:
18:09:07:  <!-- Network -->
18:09:07:  <proxy v=':8080'/>
18:09:07:
18:09:07:  <!-- Slot Control -->
18:09:07:  <power v='medium'/>
18:09:07:
18:09:07:  <!-- User Information -->
18:09:07:  <passkey v='*****'/>
18:09:07:  <team v='*********'/>
18:09:07:  <user v='**********'/>
18:09:07:
18:09:07:  <!-- Folding Slots -->
18:09:07:  <slot id='1' type='GPU'/>
18:09:07:  <slot id='0' type='CPU'>
18:09:07:    <cpus v='8'/>
18:09:07:  </slot>
18:09:07:</config>
Have had two 13405 WUs (both Gen 0) fail in the past 24 hours: Project: 13405 (Run 634, Clone 3, Gen 0) and Project: 13405 (Run 464, Clone 53, Gen 0)

In the past 24 hours have also successfully folded two Project 13405 WUs (Run 203, Clone 54, Gen 1 and Run 646, Clone 13, Gen 1), in addition to four Project 13404 WUs.

Project: 13405 (Run 634, Clone 3, Gen 0)

Code: Select all

******************************* Date: 2020-05-12 *******************************
00:45:09:WU00:FS01:Connecting to assign1.foldingathome.org:80
00:45:09:WU00:FS01:Assigned to work server 18.188.125.154
00:45:09:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
00:45:09:WU00:FS01:Connecting to 18.188.125.154:8080
00:45:11:WU00:FS01:Downloading 6.36MiB
00:45:13:WU00:FS01:Download complete
00:45:13:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13405 run:634 clone:3 gen:0 core:0x22 unit:0x0000000412bc7d9a5eb97d3bc5a97f36
00:46:11:WU00:FS01:Starting
00:46:11:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 16772 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu 0
00:46:11:WU00:FS01:Started FahCore on PID 19056
00:46:11:WU00:FS01:Core PID:15680
00:46:11:WU00:FS01:FahCore 0x22 started
00:46:12:WU00:FS01:0x22:*********************** Log Started 2020-05-12T00:46:11Z ***********************
00:46:12:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:46:12:WU00:FS01:0x22:       Type: 0x22
00:46:12:WU00:FS01:0x22:       Core: Core22
00:46:12:WU00:FS01:0x22:    Website: https://foldingathome.org/
00:46:12:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
00:46:12:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
00:46:12:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
00:46:12:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 19056 -checkpoint 20
00:46:12:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device
00:46:12:WU00:FS01:0x22:             0 -gpu 0
00:46:12:WU00:FS01:0x22:     Config: <none>
00:46:12:WU00:FS01:0x22:************************************ Build *************************************
00:46:12:WU00:FS01:0x22:    Version: 0.0.5
00:46:12:WU00:FS01:0x22:       Date: Apr 22 2020
00:46:12:WU00:FS01:0x22:       Time: 04:42:59
00:46:12:WU00:FS01:0x22: Repository: Git
00:46:12:WU00:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
00:46:12:WU00:FS01:0x22:     Branch: HEAD
00:46:12:WU00:FS01:0x22:   Compiler: Visual C++ 2008
00:46:12:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:46:12:WU00:FS01:0x22:   Platform: win32 10
00:46:12:WU00:FS01:0x22:       Bits: 64
00:46:12:WU00:FS01:0x22:       Mode: Release
00:46:12:WU00:FS01:0x22:************************************ System ************************************
00:46:12:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
00:46:12:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
00:46:12:WU00:FS01:0x22:       CPUs: 16
00:46:12:WU00:FS01:0x22:     Memory: 31.86GiB
00:46:12:WU00:FS01:0x22:Free Memory: 21.70GiB
00:46:12:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
00:46:12:WU00:FS01:0x22: OS Version: 6.2
00:46:12:WU00:FS01:0x22:Has Battery: true
00:46:12:WU00:FS01:0x22: On Battery: false
00:46:12:WU00:FS01:0x22: UTC Offset: -7
00:46:12:WU00:FS01:0x22:        PID: 15680
00:46:12:WU00:FS01:0x22:        CWD: E:\FAHClient\work
00:46:12:WU00:FS01:0x22:         OS: Windows 10 Pro
00:46:12:WU00:FS01:0x22:    OS Arch: AMD64
00:46:12:WU00:FS01:0x22:********************************************************************************
00:46:12:WU00:FS01:0x22:Project: 13405 (Run 634, Clone 3, Gen 0)
00:46:12:WU00:FS01:0x22:Unit: 0x0000000412bc7d9a5eb97d3bc5a97f36
00:46:12:WU00:FS01:0x22:Reading tar file core.xml
00:46:12:WU00:FS01:0x22:Reading tar file integrator.xml
00:46:12:WU00:FS01:0x22:Reading tar file state.xml
00:46:12:WU00:FS01:0x22:Reading tar file system.xml
00:46:12:WU00:FS01:0x22:Digital signatures verified
00:46:12:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:46:12:WU00:FS01:0x22:Version 0.0.5
00:46:20:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
00:46:20:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:47:15:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
00:47:26:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
00:47:26:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
00:47:36:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
00:47:36:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
00:47:47:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
00:47:47:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
00:47:47:WU00:FS01:0x22:ERROR:114: Max Retries Reached
00:47:47:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
00:47:47:WU00:FS01:0x22:Saving result file badstate-0.xml
00:47:47:WU00:FS01:0x22:Saving result file badstate-1.xml
00:47:47:WU00:FS01:0x22:Saving result file badstate-2.xml
00:47:47:WU00:FS01:0x22:Saving result file checkpt.crc
00:47:47:WU00:FS01:0x22:Saving result file globals.csv
00:47:47:WU00:FS01:0x22:Saving result file science.log
00:47:47:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
00:47:47:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:47:47:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13405 run:634 clone:3 gen:0 core:0x22 unit:0x0000000412bc7d9a5eb97d3bc5a97f36
00:47:47:WU00:FS01:Uploading 104.50KiB to 18.188.125.154
00:47:47:WU00:FS01:Connecting to 18.188.125.154:8080
00:47:48:WU00:FS01:Upload complete
00:47:48:WU00:FS01:Server responded WORK_ACK (400)
00:47:48:WU00:FS01:Cleaning up
Project: 13405 (Run 464, Clone 53, Gen 0)

Code: Select all

******************************* Date: 2020-05-12 *******************************
09:06:08:WU00:FS01:Connecting to assign1.foldingathome.org:80
09:06:08:WARNING:WU00:FS01:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
09:06:08:WU00:FS01:Connecting to assign2.foldingathome.org:80
09:06:09:WU00:FS01:Assigned to work server 18.188.125.154
09:06:09:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
09:06:09:WU00:FS01:Connecting to 18.188.125.154:8080
09:06:10:WU00:FS01:Downloading 6.36MiB
09:06:12:WU00:FS01:Download complete
09:06:12:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13405 run:464 clone:53 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb5846c5e680eb0
09:07:10:WU00:FS01:Starting
09:07:10:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 16772 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu 0
09:07:10:WU00:FS01:Started FahCore on PID 9188
09:07:10:WU00:FS01:Core PID:19036
09:07:10:WU00:FS01:FahCore 0x22 started
09:07:11:WU00:FS01:0x22:*********************** Log Started 2020-05-12T09:07:10Z ***********************
09:07:11:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
09:07:11:WU00:FS01:0x22:       Type: 0x22
09:07:11:WU00:FS01:0x22:       Core: Core22
09:07:11:WU00:FS01:0x22:    Website: https://foldingathome.org/
09:07:11:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
09:07:11:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
09:07:11:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
09:07:11:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 9188 -checkpoint 20
09:07:11:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device
09:07:11:WU00:FS01:0x22:             0 -gpu 0
09:07:11:WU00:FS01:0x22:     Config: <none>
09:07:11:WU00:FS01:0x22:************************************ Build *************************************
09:07:11:WU00:FS01:0x22:    Version: 0.0.5
09:07:11:WU00:FS01:0x22:       Date: Apr 22 2020
09:07:11:WU00:FS01:0x22:       Time: 04:42:59
09:07:11:WU00:FS01:0x22: Repository: Git
09:07:11:WU00:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
09:07:11:WU00:FS01:0x22:     Branch: HEAD
09:07:11:WU00:FS01:0x22:   Compiler: Visual C++ 2008
09:07:11:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:07:11:WU00:FS01:0x22:   Platform: win32 10
09:07:11:WU00:FS01:0x22:       Bits: 64
09:07:11:WU00:FS01:0x22:       Mode: Release
09:07:11:WU00:FS01:0x22:************************************ System ************************************
09:07:11:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
09:07:11:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
09:07:11:WU00:FS01:0x22:       CPUs: 16
09:07:11:WU00:FS01:0x22:     Memory: 31.86GiB
09:07:11:WU00:FS01:0x22:Free Memory: 20.99GiB
09:07:11:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
09:07:11:WU00:FS01:0x22: OS Version: 6.2
09:07:11:WU00:FS01:0x22:Has Battery: true
09:07:11:WU00:FS01:0x22: On Battery: false
09:07:11:WU00:FS01:0x22: UTC Offset: -7
09:07:11:WU00:FS01:0x22:        PID: 19036
09:07:11:WU00:FS01:0x22:        CWD: E:\FAHClient\work
09:07:11:WU00:FS01:0x22:         OS: Windows 10 Pro
09:07:11:WU00:FS01:0x22:    OS Arch: AMD64
09:07:11:WU00:FS01:0x22:********************************************************************************
09:07:11:WU00:FS01:0x22:Project: 13405 (Run 464, Clone 53, Gen 0)
09:07:11:WU00:FS01:0x22:Unit: 0x0000000212bc7d9a5eb5846c5e680eb0
09:07:11:WU00:FS01:0x22:Reading tar file core.xml
09:07:11:WU00:FS01:0x22:Reading tar file integrator.xml
09:07:11:WU00:FS01:0x22:Reading tar file state.xml
09:07:11:WU00:FS01:0x22:Reading tar file system.xml
09:07:11:WU00:FS01:0x22:Digital signatures verified
09:07:11:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:07:11:WU00:FS01:0x22:Version 0.0.5
09:07:18:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
09:07:18:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:08:12:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
09:09:07:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
09:10:01:WU00:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
09:10:55:WU00:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
09:11:50:WU00:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
09:12:44:WU00:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
09:13:38:WU00:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
09:14:32:WU00:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
09:15:27:WU00:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
09:16:21:WU00:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
09:17:15:WU00:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
09:18:09:WU00:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
09:19:04:WU00:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
09:19:58:WU00:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
09:20:52:WU00:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
09:21:47:WU00:FS01:0x22:Completed 160000 out of 1000000 steps (16%)
09:22:41:WU00:FS01:0x22:Completed 170000 out of 1000000 steps (17%)
09:23:35:WU00:FS01:0x22:Completed 180000 out of 1000000 steps (18%)
09:24:29:WU00:FS01:0x22:Completed 190000 out of 1000000 steps (19%)
09:25:24:WU00:FS01:0x22:Completed 200000 out of 1000000 steps (20%)
09:26:18:WU00:FS01:0x22:Completed 210000 out of 1000000 steps (21%)
09:27:12:WU00:FS01:0x22:Completed 220000 out of 1000000 steps (22%)
09:28:06:WU00:FS01:0x22:Completed 230000 out of 1000000 steps (23%)
09:29:00:WU00:FS01:0x22:Completed 240000 out of 1000000 steps (24%)
09:29:55:WU00:FS01:0x22:Completed 250000 out of 1000000 steps (25%)
09:30:56:WU00:FS01:0x22:Completed 260000 out of 1000000 steps (26%)
09:31:56:WU00:FS01:0x22:Completed 270000 out of 1000000 steps (27%)
09:32:55:WU00:FS01:0x22:Completed 280000 out of 1000000 steps (28%)
09:33:54:WU00:FS01:0x22:Completed 290000 out of 1000000 steps (29%)
09:34:54:WU00:FS01:0x22:Completed 300000 out of 1000000 steps (30%)
09:35:54:WU00:FS01:0x22:Completed 310000 out of 1000000 steps (31%)
09:36:53:WU00:FS01:0x22:Completed 320000 out of 1000000 steps (32%)
09:37:53:WU00:FS01:0x22:Completed 330000 out of 1000000 steps (33%)
09:38:52:WU00:FS01:0x22:Completed 340000 out of 1000000 steps (34%)
09:39:52:WU00:FS01:0x22:Completed 350000 out of 1000000 steps (35%)
09:40:51:WU00:FS01:0x22:Completed 360000 out of 1000000 steps (36%)
09:41:51:WU00:FS01:0x22:Completed 370000 out of 1000000 steps (37%)
09:42:50:WU00:FS01:0x22:Completed 380000 out of 1000000 steps (38%)
09:43:50:WU00:FS01:0x22:Completed 390000 out of 1000000 steps (39%)
09:44:49:WU00:FS01:0x22:Completed 400000 out of 1000000 steps (40%)
09:45:49:WU00:FS01:0x22:Completed 410000 out of 1000000 steps (41%)
09:46:48:WU00:FS01:0x22:Completed 420000 out of 1000000 steps (42%)
09:47:48:WU00:FS01:0x22:Completed 430000 out of 1000000 steps (43%)
09:48:47:WU00:FS01:0x22:Completed 440000 out of 1000000 steps (44%)
09:49:47:WU00:FS01:0x22:Completed 450000 out of 1000000 steps (45%)
09:50:46:WU00:FS01:0x22:Completed 460000 out of 1000000 steps (46%)
09:51:46:WU00:FS01:0x22:Completed 470000 out of 1000000 steps (47%)
09:52:45:WU00:FS01:0x22:Completed 480000 out of 1000000 steps (48%)
09:53:45:WU00:FS01:0x22:Completed 490000 out of 1000000 steps (49%)
09:54:45:WU00:FS01:0x22:Completed 500000 out of 1000000 steps (50%)
09:55:41:WU00:FS01:0x22:Completed 510000 out of 1000000 steps (51%)
09:56:35:WU00:FS01:0x22:Completed 520000 out of 1000000 steps (52%)
09:57:08:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:57:08:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
09:57:19:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:57:19:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
09:57:28:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:57:28:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
09:57:28:WU00:FS01:0x22:ERROR:114: Max Retries Reached
09:57:28:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
09:57:28:WU00:FS01:0x22:Saving result file badstate-0.xml
09:57:28:WU00:FS01:0x22:Saving result file badstate-1.xml
09:57:29:WU00:FS01:0x22:Saving result file badstate-2.xml
09:57:29:WU00:FS01:0x22:Saving result file checkpointState.xml
09:57:29:WU00:FS01:0x22:Saving result file checkpt.crc
09:57:29:WU00:FS01:0x22:Saving result file globals.csv
09:57:29:WU00:FS01:0x22:Saving result file positions.xtc
09:57:29:WU00:FS01:0x22:Saving result file science.log
09:57:29:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:57:30:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:57:30:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13405 run:464 clone:53 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb5846c5e680eb0
09:57:30:WU00:FS01:Uploading 4.98MiB to 18.188.125.154
09:57:30:WU00:FS01:Connecting to 18.188.125.154:8080
09:57:36:WU00:FS01:Upload 48.99%
09:57:40:WU00:FS01:Upload complete
09:57:41:WU00:FS01:Server responded WORK_ACK (400)
09:57:41:WU00:FS01:Cleaning up
Hope this helps, please let me know if there is anything I should do differently when reporting issues or if there is no need to do so.
Joe_H
Site Admin
Posts: 7938
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 13405 - Two failed WUs in 24-hours

Post by Joe_H »

This looks good, there is a higher chance of "bad" WUs on these projects as the researcher is doing some things differently than normal. He has responded to some of the other error reports on these projects.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: 13405 - Two failed WUs in 24-hours

Post by JohnChodera »

Thanks for the report, @SAK917!

@Joe_H is right---we're testing out some new workloads that help us prioritize compounds for synthesis via the COVID Moonshot (https://covid.postera.ai/covid/submissions/compounds) and are continuing to refine our process to make everything more stable!
The next batch of projects should make significant improvements over the first batch.

Thanks so much for your patience!

~ John Chodera // MSKCC
SAK917
Posts: 12
Joined: Tue May 12, 2020 4:17 pm

Re: 13405 - Two failed WUs in 24-hours

Post by SAK917 »

No worries on my patience, I am more than happy to help and am not worried about points or failures if it helps move the science forward. I just wasn't sure if the client reported back this information or if I needed to report it here on this forum? In the future, should I continue to report failed WUs as I did above? And if so, did I include all the information you need?

Thanks for YOUR patience as I figure this all out...
SAK917
Posts: 12
Joined: Tue May 12, 2020 4:17 pm

Re: 13405 - Two failed WUs in 24-hours

Post by SAK917 »

And in case it helps, here is one more I missed the first time I scanned for failed WUs:

Project: 13405 (Run 624, Clone 16, Gen 0)

Code: Select all

******************************* Date: 2020-05-12 *******************************
03:55:17:WU01:FS01:Connecting to assign1.foldingathome.org:80
03:55:17:WARNING:WU01:FS01:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
03:55:17:WU01:FS01:Connecting to assign2.foldingathome.org:80
03:55:18:WU01:FS01:Assigned to work server 18.188.125.154
03:55:18:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
03:55:18:WU01:FS01:Connecting to 18.188.125.154:8080
03:55:19:WU01:FS01:Downloading 6.36MiB
03:55:21:WU01:FS01:Download complete
03:55:21:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13405 run:624 clone:16 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb97d3c5824f777
03:56:18:WU01:FS01:Starting
03:56:18:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 16772 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu 0
03:56:18:WU01:FS01:Started FahCore on PID 3992
03:56:18:WU01:FS01:Core PID:9600
03:56:18:WU01:FS01:FahCore 0x22 started
03:56:19:WU01:FS01:0x22:*********************** Log Started 2020-05-12T03:56:18Z ***********************
03:56:19:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
03:56:19:WU01:FS01:0x22:       Type: 0x22
03:56:19:WU01:FS01:0x22:       Core: Core22
03:56:19:WU01:FS01:0x22:    Website: https://foldingathome.org/
03:56:19:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
03:56:19:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
03:56:19:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
03:56:19:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 3992 -checkpoint 20
03:56:19:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 1 -opencl-device 0 -cuda-device
03:56:19:WU01:FS01:0x22:             0 -gpu 0
03:56:19:WU01:FS01:0x22:     Config: <none>
03:56:19:WU01:FS01:0x22:************************************ Build *************************************
03:56:19:WU01:FS01:0x22:    Version: 0.0.5
03:56:19:WU01:FS01:0x22:       Date: Apr 22 2020
03:56:19:WU01:FS01:0x22:       Time: 04:42:59
03:56:19:WU01:FS01:0x22: Repository: Git
03:56:19:WU01:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
03:56:19:WU01:FS01:0x22:     Branch: HEAD
03:56:19:WU01:FS01:0x22:   Compiler: Visual C++ 2008
03:56:19:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:56:19:WU01:FS01:0x22:   Platform: win32 10
03:56:19:WU01:FS01:0x22:       Bits: 64
03:56:19:WU01:FS01:0x22:       Mode: Release
03:56:19:WU01:FS01:0x22:************************************ System ************************************
03:56:19:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
03:56:19:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 13
03:56:19:WU01:FS01:0x22:       CPUs: 16
03:56:19:WU01:FS01:0x22:     Memory: 31.86GiB
03:56:19:WU01:FS01:0x22:Free Memory: 21.37GiB
03:56:19:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
03:56:19:WU01:FS01:0x22: OS Version: 6.2
03:56:19:WU01:FS01:0x22:Has Battery: true
03:56:19:WU01:FS01:0x22: On Battery: false
03:56:19:WU01:FS01:0x22: UTC Offset: -7
03:56:19:WU01:FS01:0x22:        PID: 9600
03:56:19:WU01:FS01:0x22:        CWD: E:\FAHClient\work
03:56:19:WU01:FS01:0x22:         OS: Windows 10 Pro
03:56:19:WU01:FS01:0x22:    OS Arch: AMD64
03:56:19:WU01:FS01:0x22:********************************************************************************
03:56:19:WU01:FS01:0x22:Project: 13405 (Run 624, Clone 16, Gen 0)
03:56:19:WU01:FS01:0x22:Unit: 0x0000000212bc7d9a5eb97d3c5824f777
03:56:19:WU01:FS01:0x22:Reading tar file core.xml
03:56:19:WU01:FS01:0x22:Reading tar file integrator.xml
03:56:19:WU01:FS01:0x22:Reading tar file state.xml
03:56:19:WU01:FS01:0x22:Reading tar file system.xml
03:56:19:WU01:FS01:0x22:Digital signatures verified
03:56:19:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
03:56:19:WU01:FS01:0x22:Version 0.0.5
03:56:28:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
03:56:28:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:57:22:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
03:58:15:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
03:59:08:WU01:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
04:00:01:WU01:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
04:00:54:WU01:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
04:01:47:WU01:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
04:02:40:WU01:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
04:03:34:WU01:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
04:04:27:WU01:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
04:05:20:WU01:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
04:06:13:WU01:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
04:07:06:WU01:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
04:08:00:WU01:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
04:08:56:WU01:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
04:09:49:WU01:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
04:10:42:WU01:FS01:0x22:Completed 160000 out of 1000000 steps (16%)
04:11:36:WU01:FS01:0x22:Completed 170000 out of 1000000 steps (17%)
04:12:29:WU01:FS01:0x22:Completed 180000 out of 1000000 steps (18%)
04:13:22:WU01:FS01:0x22:Completed 190000 out of 1000000 steps (19%)
04:14:15:WU01:FS01:0x22:Completed 200000 out of 1000000 steps (20%)
04:15:09:WU01:FS01:0x22:Completed 210000 out of 1000000 steps (21%)
04:16:02:WU01:FS01:0x22:Completed 220000 out of 1000000 steps (22%)
04:16:55:WU01:FS01:0x22:Completed 230000 out of 1000000 steps (23%)
04:17:48:WU01:FS01:0x22:Completed 240000 out of 1000000 steps (24%)
04:18:41:WU01:FS01:0x22:Completed 250000 out of 1000000 steps (25%)
04:18:45:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:18:45:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
04:18:55:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:18:55:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
04:19:06:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:19:06:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
04:19:06:WU01:FS01:0x22:ERROR:114: Max Retries Reached
04:19:06:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
04:19:06:WU01:FS01:0x22:Saving result file badstate-0.xml
04:19:06:WU01:FS01:0x22:Saving result file badstate-1.xml
04:19:06:WU01:FS01:0x22:Saving result file badstate-2.xml
04:19:06:WU01:FS01:0x22:Saving result file checkpointState.xml
04:19:06:WU01:FS01:0x22:Saving result file checkpt.crc
04:19:06:WU01:FS01:0x22:Saving result file globals.csv
04:19:06:WU01:FS01:0x22:Saving result file positions.xtc
04:19:06:WU01:FS01:0x22:Saving result file science.log
04:19:06:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
04:19:07:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:19:07:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13405 run:624 clone:16 gen:0 core:0x22 unit:0x0000000212bc7d9a5eb97d3c5824f777
04:19:07:WU01:FS01:Uploading 4.92MiB to 18.188.125.154
04:19:07:WU01:FS01:Connecting to 18.188.125.154:8080
04:19:13:WU01:FS01:Upload 66.12%
04:19:16:WU01:FS01:Upload complete
04:19:16:WU01:FS01:Server responded WORK_ACK (400)
04:19:16:WU01:FS01:Cleaning up
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: 13405 - Two failed WUs in 24-hours

Post by Nuitari »

@JohnChodera Do report of failed WUs help you for those projects? Over the last 48h:

GeForce GTX 1660 SUPER:
02:56:08:WU01:FS02:0x22:ERROR:exception: Error invoking kernel gridSpreadCharge: clEnqueueNDRangeKernel (-4)
02:56:09:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13405 run:295 clone:29 gen:3 core:0x22 unit:0x0000000b12bc7d9a5eb3a38d95a537c7

02:56:21:WU02:FS02:0x22:ERROR:exception: Error invoking kernel computeExclusionParameters: clEnqueueNDRangeKernel (-4)
02:56:21:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:13405 run:621 clone:15 gen:0 core:0x22 unit:0x0000000112bc7d9a5eb97d3c6300afc9

02:56:34:WU01:FS02:0x22:ERROR:exception: Error invoking kernel clearFourBuffers: clEnqueueNDRangeKernel (-4)
02:56:35:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13404 run:400 clone:11 gen:2 core:0x22 unit:0x0000000712bc7d9a5eb37aa655db589f

02:56:47:WU02:FS02:0x22:ERROR:exception: Error uploading array posq: clEnqueueWriteBuffer (-4)
02:56:48:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:13404 run:542 clone:7 gen:0 core:0x22 unit:0x0000000112bc7d9a5eb97d45508c54bb


18:14:04:WU01:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
18:14:04:WU01:FS02:0x22:Following exception occured: Particle coordinate is nan
18:14:17:WU01:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
18:14:17:WU01:FS02:0x22:Following exception occured: Particle coordinate is nan
18:14:30:WU01:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
18:14:30:WU01:FS02:0x22:Following exception occured: Particle coordinate is nan
18:14:30:WU01:FS02:0x22:ERROR:114: Max Retries Reached
18:14:32:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13404 run:459 clone:71 gen:0 core:0x22 unit:0x0000000312bc7d9a5eb584720e81e542

03:08:53:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
03:08:53:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
03:09:07:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
03:09:07:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
03:09:20:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
03:09:20:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
03:09:20:WU03:FS02:0x22:ERROR:114: Max Retries Reached
03:09:22:WU03:FS02:Sending unit results: id:03 state:SEND error:FAULTY project:13405 run:243 clone:90 gen:1 core:0x22 unit:0x0000000312bc7d9a5eb3a384d6883f24

From memory clEnqueueWriteBuffer (-4) is a lack of memory, but its a 5gb card with a few gb available, and the times I saw this running it only took ~150mb.

I did have some successes on that card:
project:13405 run:494 clone:8 gen:3
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13405 - Two failed WUs in 24-hours

Post by bruce »

Please note that projects 13404 - 05 are hightly experimental, doing some very different kinds of analysis than traditional projects. Yes, they're experiencing a high failure rate.

Thank you for your report. When science does new things, often learning what does NOT work is as important as learning what does.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 13405 - Two failed WUs in 24-hours

Post by PantherX »

SAK917 wrote:No worries on my patience, I am more than happy to help and am not worried about points or failures if it helps move the science forward. I just wasn't sure if the client reported back this information or if I needed to report it here on this forum? In the future, should I continue to report failed WUs as I did above? And if so, did I include all the information you need?...
Welcome to the F@H Forum SAK917,

Yep, in future, please feel free to post WUs if you have issues. Apart from the log, there's also science.log which would be stored here for you:
E:\FAHClient\work\00\00\science.log

The 00\00 will change depending on the WU ID but if you can grab the science.log, when issues occur, it will provide additional scientific data and help researchers out.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: 13405 - Two failed WUs in 24-hours

Post by jrweiss »

My 1050ti seems to be running the 13405 OK:

Code: Select all

06:19:07:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13405 run:614 clone:64 gen:0 core:0x22 unit:0x0000000012bc7d9a5eb97d3c6e820942
06:19:07:WU02:FS01:Uploading 4.99MiB to 18.188.125.154
06:19:07:WU00:FS01:Starting
06:19:07:WU02:FS01:Connecting to 18.188.125.154:8080
06:19:16:WU02:FS01:Upload complete
06:19:16:WU02:FS01:Server responded WORK_ACK (400)
06:19:16:WU02:FS01:Final credit estimate, 68977.00 points
06:19:16:WU02:FS01:Cleaning up
. . .

18:16:08:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
18:16:09:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
18:16:09:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13405 run:475 clone:75 gen:3 core:0x22 unit:0x0000000512bc7d9a5eb5846cf90408a9
18:16:09:WU00:FS01:Uploading 5.00MiB to 18.188.125.154
18:16:09:WU00:FS01:Connecting to 18.188.125.154:8080
18:16:18:WU00:FS01:Upload complete
18:16:18:WU00:FS01:Server responded WORK_ACK (400)
18:16:18:WU00:FS01:Final credit estimate, 73280.00 points
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: 13405 - Two failed WUs in 24-hours

Post by JohnChodera »

> @JohnChodera Do report of failed WUs help you for those projects?

The failed WUs make their way back to us, so we see all these statistics on our end. No need to report them unless they are causing undue difficulties.

~ John Chodera // MSKCC
SAK917
Posts: 12
Joined: Tue May 12, 2020 4:17 pm

Re: 13405 - Two failed WUs in 24-hours

Post by SAK917 »

OK, good to know, thanks John. I won't report any more unless they are catastrophic on my end...

Thanks
Post Reply