Page 1 of 1

FAULTY project: 11749

Posted: Thu Mar 26, 2020 4:51 pm
by Jertzuu
This weird GPU project bug keeps harassing me, any fix?

Code: Select all

16:47:47:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11749 run:0 clone:7872 gen:19 core:0x22 unit:0x0000001c8ca304e75e6bb93a5175b6b5
16:47:47:WU02:FS01:Starting
16:47:47:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\jeret\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 705 -lifeline 12988 -checkpoint 10 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
16:47:47:WU02:FS01:Started FahCore on PID 10732
16:47:47:WU02:FS01:Core PID:14684
16:47:47:WU02:FS01:FahCore 0x22 started
16:47:48:WU02:FS01:0x22:*********************** Log Started 2020-03-26T16:47:47Z ***********************
16:47:48:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
16:47:48:WU02:FS01:0x22:       Type: 0x22
16:47:48:WU02:FS01:0x22:       Core: Core22
16:47:48:WU02:FS01:0x22:    Website: https://foldingathome.org/
16:47:48:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
16:47:48:WU02:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
16:47:48:WU02:FS01:0x22:             <rafal.wiewiora@choderalab.org>
16:47:48:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 10732 -checkpoint 10
16:47:48:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
16:47:48:WU02:FS01:0x22:             0 -gpu 0
16:47:48:WU02:FS01:0x22:     Config: <none>
16:47:48:WU02:FS01:0x22:************************************ Build *************************************
16:47:48:WU02:FS01:0x22:    Version: 0.0.2
16:47:48:WU02:FS01:0x22:       Date: Dec 6 2019
16:47:48:WU02:FS01:0x22:       Time: 21:30:31
16:47:48:WU02:FS01:0x22: Repository: Git
16:47:48:WU02:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
16:47:48:WU02:FS01:0x22:     Branch: HEAD
16:47:48:WU02:FS01:0x22:   Compiler: Visual C++ 2008
16:47:48:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
16:47:48:WU02:FS01:0x22:   Platform: win32 10
16:47:48:WU02:FS01:0x22:       Bits: 64
16:47:48:WU02:FS01:0x22:       Mode: Release
16:47:48:WU02:FS01:0x22:************************************ System ************************************
16:47:48:WU02:FS01:0x22:        CPU: AMD Ryzen 5 2600 Six-Core Processor
16:47:48:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
16:47:48:WU02:FS01:0x22:       CPUs: 12
16:47:48:WU02:FS01:0x22:     Memory: 31.92GiB
16:47:48:WU02:FS01:0x22:Free Memory: 26.97GiB
16:47:48:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
16:47:48:WU02:FS01:0x22: OS Version: 6.2
16:47:48:WU02:FS01:0x22:Has Battery: false
16:47:48:WU02:FS01:0x22: On Battery: false
16:47:48:WU02:FS01:0x22: UTC Offset: 2
16:47:48:WU02:FS01:0x22:        PID: 14684
16:47:48:WU02:FS01:0x22:        CWD: C:\Users\jeret\AppData\Roaming\FAHClient\work
16:47:48:WU02:FS01:0x22:         OS: Windows 10 Pro
16:47:48:WU02:FS01:0x22:    OS Arch: AMD64
16:47:48:WU02:FS01:0x22:********************************************************************************
16:47:48:WU02:FS01:0x22:Project: 11749 (Run 0, Clone 7872, Gen 19)
16:47:48:WU02:FS01:0x22:Unit: 0x0000001c8ca304e75e6bb93a5175b6b5
16:47:48:WU02:FS01:0x22:Reading tar file core.xml
16:47:48:WU02:FS01:0x22:Reading tar file integrator.xml
16:47:48:WU02:FS01:0x22:Reading tar file state.xml
16:47:48:WU02:FS01:0x22:Reading tar file system.xml
16:47:49:WU02:FS01:0x22:Digital signatures verified
16:47:49:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
16:47:49:WU02:FS01:0x22:Version 0.0.2
16:47:57:WU02:FS01:0x22:Completed 0 out of 2000000 steps (0%)
16:47:57:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
16:48:03:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
16:48:03:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
16:48:09:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
16:48:09:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
16:48:15:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
16:48:15:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
16:48:15:WU02:FS01:0x22:ERROR:114: Max Retries Reached
16:48:15:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
16:48:15:WU02:FS01:0x22:Saving result file badstate-0.xml
16:48:16:WU01:FS00:Connecting to 65.254.110.245:8080
16:48:16:WU02:FS01:0x22:Saving result file badstate-1.xml
16:48:17:WU02:FS01:0x22:Saving result file badstate-2.xml
16:48:19:WU02:FS01:0x22:Saving result file checkpt.crc
16:48:19:WU02:FS01:0x22:Saving result file science.log
16:48:19:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
16:48:19:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:48:19:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11749 run:0 clone:7872 gen:19 core:0x22 unit:0x0000001c8ca304e75e6bb93a5175b6b5
16:48:19:WU02:FS01:Uploading 32.88KiB to 140.163.4.231
16:49:59:WU02:FS01:Upload complete
16:49:59:WU02:FS01:Server responded WORK_ACK (400)
16:49:59:WU02:FS01:Cleaning up

Re: FAULTY project: 11749

Posted: Thu Mar 26, 2020 6:44 pm
by toTOW
No other report for this WU yet.

Re: FAULTY project: 11749

Posted: Thu Mar 26, 2020 8:28 pm
by MrFrizzy
I would try lowering your GPU/Memory clocks, increasing your fan speeds, or if you have enough thermal headroom, increasing the voltage on your GPU. Even if you are on stock, out of the box settings, I would still recommend tweaking things as I have seen this message come up even on stock settings in the past.

Re: FAULTY project: 11749

Posted: Fri Mar 27, 2020 8:09 pm
by Jertzuu
MrFrizzy wrote:I would try lowering your GPU/Memory clocks, increasing your fan speeds, or if you have enough thermal headroom, increasing the voltage on your GPU. Even if you are on stock, out of the box settings, I would still recommend tweaking things as I have seen this message come up even on stock settings in the past.
Thanks for the tip. Lowered clocks and now I'm just waiting on a project to try it out

*Edit* Reinstalled the client and made sure all previous data was deleted from my PC. So far so good, and seems to be working fine so far

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 7:40 am
by Manfred.Knick
+1: again

06:00:40:WU02:FS01:0x22:Project: 11749 (Run 0, Clone 6603, Gen 5) <-------------------------------- P R C G
...
07:12:11:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11749 run:0 clone:6603 gen:5 core:0x22 unit:0x000000128ca304e75e6bb7fd572b20ad
07:12:11:WU02:FS01:Uploading 12.57MiB to 140.163.4.231
07:12:11:WU02:FS01:Connecting to 140.163.4.231:8080
...
07:12:11:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11749 run:0 clone:6603 gen:5 core:0x22 unit:0x000000128ca304e75e6bb7fd572b20ad
...
07:16:11:WU02:FS01:Upload complete
07:16:11:WU02:FS01:Server responded WORK_QUIT (404) <----------------------------------------------- !
07:16:11:WARNING:WU02:FS01:Server did not like results, dumping <----------------------------------- !
07:16:11:WU02:FS01:Cleaning up

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 8:01 am
by anandhanju
The previous WU was successfully completed by someone else. I think you might need to revisit those tweaks.

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 8:11 am
by Manfred.Knick
anandhanju wrote: ... revisit those tweaks ...
? sorrry - which tweaks ?

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 8:23 am
by Neil-B
Believe mix up ... responder may have thought you were the original poster on this thread

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 8:30 am
by Manfred.Knick
Neil-B wrote:Believe mix up
Right, I see.
Question remains:
anandhanju wrote:The previous WU was successfully completed by someone else.
Why was this WU "double"-assigned ?

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 8:59 am
by Neil-B
A WU will usually be reissued to another folder under certain circumstances .. if not returned before timeout, if returned faulty (iirc), if it has been returned to a CS but not made its way back to the WS before timeout (possibly??), and I have a suspicion that under periods of high loads when assignments where overloaded there may have been some extra scenarios .. without looking into a specific case it is hard to be more precise .. under normal running a WU is given out once and when returned (hopefully well within timeout) it is then used to create the next gen of that WU … The way points are allocated in this circumstance is "logical" I just don't recall what it is, Sorry :(

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 9:30 am
by Manfred.Knick
@ Neil: Thanks for your hints!

Re: FAULTY project: 11749

Posted: Fri Apr 03, 2020 8:13 pm
by treckin
I think it could be the server, it’s been failing to upload WUs for me and others if you poke the “issues with specific servers” and even another thread in this sub

Re: FAULTY project: 11749

Posted: Fri Apr 10, 2020 1:18 pm
by tessa
This WU is also faulty for me:
viewtopic.php?f=19&t=32289