Page 1 of 1

Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Posted: Thu Apr 09, 2020 5:42 am
by Nuitari
Project 11778 (Run 0 Clone 4443 Gen 23) was done on a stock GeForce GTX 1060 3GB, no overclocking, no underclocking and no signs of overheating (GPU was around 74C).

Running on Linux, no AV.

Nvidia kernel module 440.59
Kernel 5.5.5-gentoo

Code: Select all

00:53:57:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11778 run:0 clone:4443 gen:23 core:0x22 unit:0x00000027287234c95e73c404cd16edea
00:53:57:WU02:FS01:Starting
00:53:57:WU02:FS01:Running FahCore: /opt/foldingathome/FAHCoreWrapper /opt/foldingathome/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 02 -suffix 01 -version 705 -lifeline 7496 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
00:53:57:WU02:FS01:Started FahCore on PID 7994
00:53:57:WU02:FS01:Core PID:7998
00:53:57:WU02:FS01:FahCore 0x22 started
00:53:57:WU02:FS01:0x22:*********************** Log Started 2020-04-09T00:53:57Z ***********************
00:53:57:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:53:57:WU02:FS01:0x22:       Type: 0x22
00:53:57:WU02:FS01:0x22:       Core: Core22
00:53:57:WU02:FS01:0x22:    Website: https://foldingathome.org/
00:53:57:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
00:53:57:WU02:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
00:53:57:WU02:FS01:0x22:             <rafal.wiewiora@choderalab.org>
00:53:57:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 7994 -checkpoint 15
00:53:57:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
00:53:57:WU02:FS01:0x22:             1 -gpu 1
00:53:57:WU02:FS01:0x22:     Config: <none>
00:53:57:WU02:FS01:0x22:************************************ Build *************************************
00:53:57:WU02:FS01:0x22:    Version: 0.0.2
00:53:57:WU02:FS01:0x22:       Date: Dec 6 2019
00:53:57:WU02:FS01:0x22:       Time: 21:20:17
00:53:57:WU02:FS01:0x22: Repository: Git
00:53:57:WU02:FS01:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
00:53:57:WU02:FS01:0x22:     Branch: core22
00:53:57:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
00:53:57:WU02:FS01:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
00:53:57:WU02:FS01:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
00:53:57:WU02:FS01:0x22:       Bits: 64
00:53:57:WU02:FS01:0x22:       Mode: Release
00:53:57:WU02:FS01:0x22:************************************ System ************************************
00:53:57:WU02:FS01:0x22:        CPU: AMD Ryzen 7 3700X 8-Core Processor
00:53:57:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
00:53:57:WU02:FS01:0x22:       CPUs: 8
00:53:57:WU02:FS01:0x22:     Memory: 62.80GiB
00:53:57:WU02:FS01:0x22:Free Memory: 52.21GiB
00:53:57:WU02:FS01:0x22:    Threads: POSIX_THREADS
00:53:57:WU02:FS01:0x22: OS Version: 5.5
00:53:57:WU02:FS01:0x22:Has Battery: false
00:53:57:WU02:FS01:0x22: On Battery: false
00:53:57:WU02:FS01:0x22: UTC Offset: -4
00:53:57:WU02:FS01:0x22:        PID: 7998
00:53:57:WU02:FS01:0x22:        CWD: /opt/foldingathome/work
00:53:57:WU02:FS01:0x22:         OS: Linux 5.5.5-gentoo x86_64
00:53:57:WU02:FS01:0x22:    OS Arch: AMD64
00:53:57:WU02:FS01:0x22:********************************************************************************
00:53:57:WU02:FS01:0x22:Project: 11778 (Run 0, Clone 4443, Gen 23)
00:53:57:WU02:FS01:0x22:Unit: 0x00000027287234c95e73c404cd16edea
00:53:57:WU02:FS01:0x22:Reading tar file core.xml
00:53:57:WU02:FS01:0x22:Reading tar file integrator.xml
00:53:57:WU02:FS01:0x22:Reading tar file state.xml
00:53:57:WU02:FS01:0x22:Reading tar file system.xml
00:53:57:WU02:FS01:0x22:Digital signatures verified
00:53:57:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:53:57:WU02:FS01:0x22:Version 0.0.2
00:54:05:WU02:FS01:0x22:Completed 0 out of 2000000 steps (0%)
00:54:05:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
04:52:35:WU02:FS01:0x22:Completed 2000000 out of 2000000 steps (100%)
04:52:39:WU02:FS01:0x22:Saving result file ../logfile_01.txt
04:52:39:WU02:FS01:0x22:Saving result file checkpointState.xml
04:52:39:WU02:FS01:0x22:Saving result file checkpt.crc
04:52:39:WU02:FS01:0x22:Saving result file positions.xtc
04:52:39:WU02:FS01:0x22:Saving result file science.log
04:52:39:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
04:52:40:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:52:40:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11778 run:0 clone:4443 gen:23 core:0x22 unit:0x00000027287234c95e73c404cd16edea
04:52:40:WU02:FS01:Uploading 23.04MiB to 40.114.52.201
04:52:40:WU02:FS01:Connecting to 40.114.52.201:8080
04:52:47:WU02:FS01:Upload 0.27%
04:53:05:WU02:FS01:Upload 0.54%
04:53:11:WU02:FS01:Upload 4.61%
04:53:17:WU02:FS01:Upload 7.05%
04:53:23:WU02:FS01:Upload 9.22%
04:53:29:WU02:FS01:Upload 12.21%
04:53:36:WU02:FS01:Upload 14.92%
04:53:43:WU02:FS01:Upload 17.09%
04:53:49:WU02:FS01:Upload 19.26%
04:53:55:WU02:FS01:Upload 21.43%
04:54:01:WU02:FS01:Upload 23.06%
04:54:07:WU02:FS01:Upload 24.68%
04:54:14:WU02:FS01:Upload 26.85%
04:54:20:WU02:FS01:Upload 29.57%
04:54:26:WU02:FS01:Upload 31.19%
04:55:05:WU02:FS01:Upload 33.91%
04:55:11:WU02:FS01:Upload 37.16%
04:55:17:WU02:FS01:Upload 38.79%
04:55:24:WU02:FS01:Upload 40.96%
04:55:31:WU02:FS01:Upload 43.13%
04:55:37:WU02:FS01:Upload 45.30%
04:55:44:WU02:FS01:Upload 46.93%
04:55:51:WU02:FS01:Upload 49.37%
04:55:58:WU02:FS01:Upload 51.54%
04:56:04:WU02:FS01:Upload 53.71%
04:56:10:WU02:FS01:Upload 56.42%
04:56:16:WU02:FS01:Upload 58.59%
04:56:23:WU02:FS01:Upload 60.22%
04:56:29:WU02:FS01:Upload 61.84%
04:56:35:WU02:FS01:Upload 64.28%
04:56:42:WU02:FS01:Upload 66.45%
04:56:48:WU02:FS01:Upload 68.08%
04:56:55:WU02:FS01:Upload 70.25%
04:57:01:WU02:FS01:Upload 72.42%
04:57:07:WU02:FS01:Upload 74.05%
04:57:13:WU02:FS01:Upload 76.22%
04:57:19:WU02:FS01:Upload 77.85%
04:57:26:WU02:FS01:Upload 80.02%
04:57:33:WU02:FS01:Upload 82.19%
04:57:39:WU02:FS01:Upload 83.81%
04:57:46:WU02:FS01:Upload 85.98%
04:57:52:WU02:FS01:Upload 88.43%
04:57:58:WU02:FS01:Upload 90.60%
04:58:05:WU02:FS01:Upload 93.04%
04:58:11:WU02:FS01:Upload 95.21%
04:58:17:WU02:FS01:Upload 97.38%
04:58:32:WU02:FS01:Upload complete
04:58:32:WU02:FS01:Server responded WORK_QUIT (404)
04:58:32:WARNING:WU02:FS01:Server did not like results, dumping

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Posted: Thu Apr 09, 2020 8:31 am
by PantherX
If your system folded many WUs, and this is the first one that has an issue, it is likely that there might have been some data corruption (while packing up the WU on your system or while transferring to the Server from your system). I can see from the system that the WU isn't a bad one: https://apps.foldingathome.org/wu#proje ... 443&gen=23

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Posted: Thu Apr 09, 2020 2:03 pm
by Nuitari
All I get from that page is Error: error ?

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Posted: Thu Apr 09, 2020 2:25 pm
by Neil-B
Apps Site is down … viewtopic.php?f=18&t=34165

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Posted: Thu Apr 16, 2020 1:52 pm
by Nuitari
I had a few more of these errors randomly (on 13 slots). No way that all of the hardware was making randomly corrupt units.
Turns out the network card for the Internet connection if flaking out. Hopefully after I get the new one today this will be fixed.

Once in a while the server does these errors:
[Thu Apr 16 04:23:26 2020] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
[Thu Apr 16 04:23:28 2020] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
[Thu Apr 16 04:23:30 2020] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
[Thu Apr 16 04:23:31 2020] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[Thu Apr 16 04:23:35 2020] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

I managed to correlate quite a few of the 404 WORK_QUIT to cases where this happen. There are also cases where it happens without a WU getting uploaded nor downloaded.

It might be worthwhile to reconsider how the servers handle bad work units. It feels like it should try to at least retry the upload at least once if something happens in transit.