Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Moderators: Site Moderators, FAHC Science Team

Post Reply
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Post by Nuitari »

Project 11778 (Run 0 Clone 4443 Gen 23) was done on a stock GeForce GTX 1060 3GB, no overclocking, no underclocking and no signs of overheating (GPU was around 74C).

Running on Linux, no AV.

Nvidia kernel module 440.59
Kernel 5.5.5-gentoo

Code: Select all

00:53:57:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11778 run:0 clone:4443 gen:23 core:0x22 unit:0x00000027287234c95e73c404cd16edea
00:53:57:WU02:FS01:Starting
00:53:57:WU02:FS01:Running FahCore: /opt/foldingathome/FAHCoreWrapper /opt/foldingathome/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 02 -suffix 01 -version 705 -lifeline 7496 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
00:53:57:WU02:FS01:Started FahCore on PID 7994
00:53:57:WU02:FS01:Core PID:7998
00:53:57:WU02:FS01:FahCore 0x22 started
00:53:57:WU02:FS01:0x22:*********************** Log Started 2020-04-09T00:53:57Z ***********************
00:53:57:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:53:57:WU02:FS01:0x22:       Type: 0x22
00:53:57:WU02:FS01:0x22:       Core: Core22
00:53:57:WU02:FS01:0x22:    Website: https://foldingathome.org/
00:53:57:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
00:53:57:WU02:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
00:53:57:WU02:FS01:0x22:             <rafal.wiewiora@choderalab.org>
00:53:57:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 7994 -checkpoint 15
00:53:57:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
00:53:57:WU02:FS01:0x22:             1 -gpu 1
00:53:57:WU02:FS01:0x22:     Config: <none>
00:53:57:WU02:FS01:0x22:************************************ Build *************************************
00:53:57:WU02:FS01:0x22:    Version: 0.0.2
00:53:57:WU02:FS01:0x22:       Date: Dec 6 2019
00:53:57:WU02:FS01:0x22:       Time: 21:20:17
00:53:57:WU02:FS01:0x22: Repository: Git
00:53:57:WU02:FS01:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
00:53:57:WU02:FS01:0x22:     Branch: core22
00:53:57:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
00:53:57:WU02:FS01:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
00:53:57:WU02:FS01:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
00:53:57:WU02:FS01:0x22:       Bits: 64
00:53:57:WU02:FS01:0x22:       Mode: Release
00:53:57:WU02:FS01:0x22:************************************ System ************************************
00:53:57:WU02:FS01:0x22:        CPU: AMD Ryzen 7 3700X 8-Core Processor
00:53:57:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
00:53:57:WU02:FS01:0x22:       CPUs: 8
00:53:57:WU02:FS01:0x22:     Memory: 62.80GiB
00:53:57:WU02:FS01:0x22:Free Memory: 52.21GiB
00:53:57:WU02:FS01:0x22:    Threads: POSIX_THREADS
00:53:57:WU02:FS01:0x22: OS Version: 5.5
00:53:57:WU02:FS01:0x22:Has Battery: false
00:53:57:WU02:FS01:0x22: On Battery: false
00:53:57:WU02:FS01:0x22: UTC Offset: -4
00:53:57:WU02:FS01:0x22:        PID: 7998
00:53:57:WU02:FS01:0x22:        CWD: /opt/foldingathome/work
00:53:57:WU02:FS01:0x22:         OS: Linux 5.5.5-gentoo x86_64
00:53:57:WU02:FS01:0x22:    OS Arch: AMD64
00:53:57:WU02:FS01:0x22:********************************************************************************
00:53:57:WU02:FS01:0x22:Project: 11778 (Run 0, Clone 4443, Gen 23)
00:53:57:WU02:FS01:0x22:Unit: 0x00000027287234c95e73c404cd16edea
00:53:57:WU02:FS01:0x22:Reading tar file core.xml
00:53:57:WU02:FS01:0x22:Reading tar file integrator.xml
00:53:57:WU02:FS01:0x22:Reading tar file state.xml
00:53:57:WU02:FS01:0x22:Reading tar file system.xml
00:53:57:WU02:FS01:0x22:Digital signatures verified
00:53:57:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:53:57:WU02:FS01:0x22:Version 0.0.2
00:54:05:WU02:FS01:0x22:Completed 0 out of 2000000 steps (0%)
00:54:05:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
04:52:35:WU02:FS01:0x22:Completed 2000000 out of 2000000 steps (100%)
04:52:39:WU02:FS01:0x22:Saving result file ../logfile_01.txt
04:52:39:WU02:FS01:0x22:Saving result file checkpointState.xml
04:52:39:WU02:FS01:0x22:Saving result file checkpt.crc
04:52:39:WU02:FS01:0x22:Saving result file positions.xtc
04:52:39:WU02:FS01:0x22:Saving result file science.log
04:52:39:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
04:52:40:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:52:40:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11778 run:0 clone:4443 gen:23 core:0x22 unit:0x00000027287234c95e73c404cd16edea
04:52:40:WU02:FS01:Uploading 23.04MiB to 40.114.52.201
04:52:40:WU02:FS01:Connecting to 40.114.52.201:8080
04:52:47:WU02:FS01:Upload 0.27%
04:53:05:WU02:FS01:Upload 0.54%
04:53:11:WU02:FS01:Upload 4.61%
04:53:17:WU02:FS01:Upload 7.05%
04:53:23:WU02:FS01:Upload 9.22%
04:53:29:WU02:FS01:Upload 12.21%
04:53:36:WU02:FS01:Upload 14.92%
04:53:43:WU02:FS01:Upload 17.09%
04:53:49:WU02:FS01:Upload 19.26%
04:53:55:WU02:FS01:Upload 21.43%
04:54:01:WU02:FS01:Upload 23.06%
04:54:07:WU02:FS01:Upload 24.68%
04:54:14:WU02:FS01:Upload 26.85%
04:54:20:WU02:FS01:Upload 29.57%
04:54:26:WU02:FS01:Upload 31.19%
04:55:05:WU02:FS01:Upload 33.91%
04:55:11:WU02:FS01:Upload 37.16%
04:55:17:WU02:FS01:Upload 38.79%
04:55:24:WU02:FS01:Upload 40.96%
04:55:31:WU02:FS01:Upload 43.13%
04:55:37:WU02:FS01:Upload 45.30%
04:55:44:WU02:FS01:Upload 46.93%
04:55:51:WU02:FS01:Upload 49.37%
04:55:58:WU02:FS01:Upload 51.54%
04:56:04:WU02:FS01:Upload 53.71%
04:56:10:WU02:FS01:Upload 56.42%
04:56:16:WU02:FS01:Upload 58.59%
04:56:23:WU02:FS01:Upload 60.22%
04:56:29:WU02:FS01:Upload 61.84%
04:56:35:WU02:FS01:Upload 64.28%
04:56:42:WU02:FS01:Upload 66.45%
04:56:48:WU02:FS01:Upload 68.08%
04:56:55:WU02:FS01:Upload 70.25%
04:57:01:WU02:FS01:Upload 72.42%
04:57:07:WU02:FS01:Upload 74.05%
04:57:13:WU02:FS01:Upload 76.22%
04:57:19:WU02:FS01:Upload 77.85%
04:57:26:WU02:FS01:Upload 80.02%
04:57:33:WU02:FS01:Upload 82.19%
04:57:39:WU02:FS01:Upload 83.81%
04:57:46:WU02:FS01:Upload 85.98%
04:57:52:WU02:FS01:Upload 88.43%
04:57:58:WU02:FS01:Upload 90.60%
04:58:05:WU02:FS01:Upload 93.04%
04:58:11:WU02:FS01:Upload 95.21%
04:58:17:WU02:FS01:Upload 97.38%
04:58:32:WU02:FS01:Upload complete
04:58:32:WU02:FS01:Server responded WORK_QUIT (404)
04:58:32:WARNING:WU02:FS01:Server did not like results, dumping
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Post by PantherX »

If your system folded many WUs, and this is the first one that has an issue, it is likely that there might have been some data corruption (while packing up the WU on your system or while transferring to the Server from your system). I can see from the system that the WU isn't a bad one: https://apps.foldingathome.org/wu#proje ... 443&gen=23
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Post by Nuitari »

All I get from that page is Error: error ?
Image
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Post by Neil-B »

Apps Site is down … viewtopic.php?f=18&t=34165
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 11778 (Run 0 Clone 4443 Gen 23) WORK_QUIT

Post by Nuitari »

I had a few more of these errors randomly (on 13 slots). No way that all of the hardware was making randomly corrupt units.
Turns out the network card for the Internet connection if flaking out. Hopefully after I get the new one today this will be fixed.

Once in a while the server does these errors:
[Thu Apr 16 04:23:26 2020] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
[Thu Apr 16 04:23:28 2020] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
[Thu Apr 16 04:23:30 2020] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
[Thu Apr 16 04:23:31 2020] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[Thu Apr 16 04:23:35 2020] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

I managed to correlate quite a few of the 404 WORK_QUIT to cases where this happen. There are also cases where it happens without a WU getting uploaded nor downloaded.

It might be worthwhile to reconsider how the servers handle bad work units. It feels like it should try to at least retry the upload at least once if something happens in transit.
Image
Post Reply