Page 1 of 1

Can't upload 2 FAULTY WUs

Posted: Mon Mar 16, 2020 7:40 pm
by rusty
Hello,

I'm new to Folding@home, so I apologize if the solution here is obvious/trivial.

For about the past ~15 hours maybe, I have 2 FAULTY WUs that are failing to upload.

All NO_ERROR WUs continue to upload just fine (with the occasional hiccup, of course, due to the unusually high volume). I have uploaded ~20 something NO_ERROR WUs today, so only these 2 faulty WUs seem to be affected.

I would like to be able to somehow drop these FAULTY WUs (with or without credit) so that somebody else can start working on them. Right now, I appear to just be holding them (until they time out, I suppose).

FAULTY WUs that won't upload:
11758 (0, 3765, 0)
11759 (0, 10513, 1)

I have checked the WU Status for each of these units (above links) and it looks like nothing has made it back to the collection servers.

A Project search for 11758 says Project Unspecified.

A Project search for 11759 leads to a valid covid project.


In most instances upload fails immediately:

Code: Select all

18:37:33:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11758 run:0 clone:3765 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771ae2d300da
18:37:33:WU00:FS00:Uploading 53.95MiB to 155.247.164.213
18:37:33:WU00:FS00:Connecting to 155.247.164.213:8080
18:37:33:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
18:37:33:WU00:FS00:Trying to send results to collection server
18:37:33:WU00:FS00:Uploading 53.95MiB to 155.247.164.214
18:37:33:WU00:FS00:Connecting to 155.247.164.214:8080
18:37:33:ERROR:WU00:FS00:Exception: Transfer failed
In other instances, upload will make it to maybe 0.20% and then fail:

Code: Select all

18:17:11:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.10:80: Connection timed out
18:17:11:WU02:FS01:Trying to send results to collection server
18:17:11:WU02:FS01:Uploading 154.15MiB to 155.247.166.220
18:17:11:WU02:FS01:Connecting to 155.247.166.220:8080
18:17:13:ERROR:WU02:FS01:Exception: Transfer failed
18:17:13:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11759 run:0 clone:10513 gen:1 core:0x22 unit:0x0000000180fccb0a5e6eb0329c88c5ba
18:17:13:WU02:FS01:Uploading 154.15MiB to 128.252.203.10
18:17:13:WU02:FS01:Connecting to 128.252.203.10:8080
18:17:52:WU02:FS01:Upload 0.04%
18:18:34:WU02:FS01:Upload 0.20%
18:18:35:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
18:18:35:WU02:FS01:Trying to send results to collection server
18:18:35:WU02:FS01:Uploading 154.15MiB to 155.247.166.220
18:18:35:WU02:FS01:Connecting to 155.247.166.220:8080
18:18:35:ERROR:WU02:FS01:Exception: Transfer failed
I saw the qfix instructions posted in the Troubleshooting "Bad WUs" sticky, but it appears that qfix (last update Jul 5, 2012) now only applies to v6. This particular node is running v7 (i.e. no queue.dat). I'm guessing the v7 client picked up this functionality natively since v6.

I have paused/restarted folding and restarted the FAHClient service on this node a few times, but to no avail.

Any advice would be greatly appreciated. Is the best course of action to just hold these failed WUs until they time out (or hopefully upload)?

Thanks in advance!

---------------------------------------

Additional logs regarding actual computation failure follow, in case anyone is interested:

Failure Log for 11758 (0, 3765, 0)

Code: Select all

Project: 11758 (Run 0, Clone 3765, Gen 0)
Unit: 0x000000009bf7a4d55e6d771ae2d300da
Digital signatures verified
Folding@home GPU Core22 Folding@home Core
Version 0.0.2
  Found a checkpoint file
Completed 650000 out of 1000000 steps (65%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
Completed 660000 out of 1000000 steps (66%)
Completed 670000 out of 1000000 steps (67%)
Completed 680000 out of 1000000 steps (68%)
Completed 690000 out of 1000000 steps (69%)
Completed 700000 out of 1000000 steps (70%)
Completed 710000 out of 1000000 steps (71%)
Completed 720000 out of 1000000 steps (72%)
Completed 730000 out of 1000000 steps (73%)
Completed 740000 out of 1000000 steps (74%)
Completed 750000 out of 1000000 steps (75%)
Completed 760000 out of 1000000 steps (76%)
Completed 770000 out of 1000000 steps (77%)
Completed 780000 out of 1000000 steps (78%)
Completed 790000 out of 1000000 steps (79%)
Completed 800000 out of 1000000 steps (80%)
Completed 810000 out of 1000000 steps (81%)
Completed 820000 out of 1000000 steps (82%)
Completed 830000 out of 1000000 steps (83%)
Completed 840000 out of 1000000 steps (84%)
Completed 850000 out of 1000000 steps (85%)
Completed 860000 out of 1000000 steps (86%)
Completed 870000 out of 1000000 steps (87%)
Completed 880000 out of 1000000 steps (88%)
Completed 890000 out of 1000000 steps (89%)
Completed 900000 out of 1000000 steps (90%)
Completed 910000 out of 1000000 steps (91%)
Completed 920000 out of 1000000 steps (92%)
Completed 930000 out of 1000000 steps (93%)
Completed 940000 out of 1000000 steps (94%)
Completed 950000 out of 1000000 steps (95%)
Caught signal SIGABRT(6) on PID 13851
WARNING:Unexpected exit from science code
Saving result file ../logfile_01.txt
Saving result file checkpointState.xml
Saving result file checkpt.crc
Saving result file positions.xtc
Saving result file science.log
Folding@home Core Shutdown: BAD_WORK_UNIT

Failure Log for 11759 (0, 10513, 1)

Code: Select all

Unit: 0x0000000180fccb0a5e6eb0329c88c5ba
Reading tar file core.xml
Reading tar file integrator.xml
Reading tar file state.xml
Reading tar file system.xml
Digital signatures verified
Folding@home GPU Core22 Folding@home Core
Version 0.0.2
Completed 0 out of 1000000 steps (0%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
Completed 10000 out of 1000000 steps (1%)
Completed 20000 out of 1000000 steps (2%)
Completed 30000 out of 1000000 steps (3%)
Completed 40000 out of 1000000 steps (4%)
Completed 50000 out of 1000000 steps (5%)
Completed 60000 out of 1000000 steps (6%)
Completed 70000 out of 1000000 steps (7%)
Completed 80000 out of 1000000 steps (8%)
Completed 90000 out of 1000000 steps (9%)
Completed 100000 out of 1000000 steps (10%)
Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
Following exception occured: Force RMSE error of 5.23885 with threshold of 5
Completed 60000 out of 1000000 steps (6%)
Completed 70000 out of 1000000 steps (7%)
Completed 80000 out of 1000000 steps (8%)
Completed 90000 out of 1000000 steps (9%)
Completed 100000 out of 1000000 steps (10%)
Completed 110000 out of 1000000 steps (11%)
Completed 120000 out of 1000000 steps (12%)
Completed 130000 out of 1000000 steps (13%)
Completed 140000 out of 1000000 steps (14%)
Completed 150000 out of 1000000 steps (15%)
Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
Following exception occured: Force RMSE error of 5.29797 with threshold of 5
Completed 110000 out of 1000000 steps (11%)
Completed 120000 out of 1000000 steps (12%)
Completed 130000 out of 1000000 steps (13%)
Completed 140000 out of 1000000 steps (14%)
Completed 150000 out of 1000000 steps (15%)
Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
Following exception occured: Force RMSE error of 5.31428 with threshold of 5
ERROR:114: Max Retries Reached
Saving result file ../logfile_01.txt
Saving result file badstate-0.xml
Saving result file badstate-1.xml
Saving result file badstate-2.xml
Saving result file checkpointState.xml
Saving result file checkpt.crc
Saving result file positions.xtc
Saving result file science.log
Folding@home Core Shutdown: BAD_WORK_UNIT

Re: Can't upload 2 FAULTY WUs

Posted: Mon Mar 16, 2020 7:57 pm
by Nathan_P
The servers are struggling to accept all the work units that are being sent in. I would just leave it to continue to try and upload, you do get partial credit for a failed WU so its still worth it going back.

Re: Can't upload 2 FAULTY WUs

Posted: Tue Mar 17, 2020 8:44 pm
by vnicolici
Nathan_P wrote:The servers are struggling to accept all the work units that are being sent in. I would just leave it to continue to try and upload, you do get partial credit for a failed WU so its still worth it going back.
That doesn't be to seem the case for project 11758, it looks more like some kind of configuration issue than an overload issue.

I have 2 units that are stuck like that, not being able to send the results for almost 48 hours in one case, and almost 24 hours for the other. Both from project 11758, both on servers 155.247.164.213/155.247.164.214:

Code: Select all

19:28:12:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:3900 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771ad0cf1564
19:28:12:WU02:FS01:Uploading 55.24MiB to 155.247.164.213
19:28:12:WU02:FS01:Connecting to 155.247.164.213:8080
19:28:12:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:5213 gen:0 core:0x22 unit:0x000000009bf7a4d55e6e8abff76953ed
19:28:12:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
19:28:12:WU00:FS01:Connecting to 155.247.164.213:8080
19:28:16:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:28:16:WU00:FS01:Trying to send results to collection server
19:28:16:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
19:28:16:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
19:28:16:WU02:FS01:Trying to send results to collection server
19:28:16:WU00:FS01:Connecting to 155.247.164.214:8080
19:28:16:WU02:FS01:Uploading 55.24MiB to 155.247.164.214
19:28:16:WU02:FS01:Connecting to 155.247.164.214:8080
19:28:16:ERROR:WU02:FS01:Exception: Transfer failed
19:28:16:ERROR:WU00:FS01:Exception: Transfer failed
Looking in other threads discussing this issue it seems only one or two projects from those servers are affected, the rest are working fine, suggesting it's not an overload issue.

Re: Can't upload 2 FAULTY WUs

Posted: Tue Mar 17, 2020 11:09 pm
by Thehead
I ended up uninstalling and reinstalling after getting stuck on 11758 for 24hrs. Hoping I don't get it again until it gets fixed. Until then, it seems to be working "fine" now.

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 3:29 am
by rusty
Interesting. Good to know I'm not the only one having an issue with Project 11758

32hrs since my original post, and my client is still attempting (unsuccessfully) to send these two "problem child" failed WUs:
11758 (0, 3765, 0)
11759 (0, 10513, 1)

It doesn't seem to be jamming up the works though.

I am still receiving and running new WUs no problem. Sending completed NO_ERROR WUs is also still not an issue. Submitted ~53 WUs so far today, yet still these 2 failed units refuse to upload :shock:

Looking at the Project Summary page, it seems like the deadline on these jobs is 8.2 days.

I still hate the thought that my client continuing to hold the lock on these WUs will delay the computation of the associated frames by 8 days. :-(

Wish there was a way to release these so somebody else could work on them (hopefully without error).

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 4:02 am
by metha
Also having problems with the same project. Refuses to upload.
Trying to follow the project link gives you an error, saying there is no project with that number.

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 7:21 am
by schertt
Had a 11758 fail for me as well.

Code: Select all

07:17:24:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)

Code: Select all

*********************** Log Started 2020-03-18T07:14:19Z ***********************
07:14:19:************************* Folding@home Client *************************
07:14:19:        Website: https://foldingathome.org/
07:14:19:      Copyright: (c) 2009-2018 foldingathome.org
07:14:19:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:14:19:           Args: --open-web-control
07:14:19:         Config: C:\Users\Blackbox2\AppData\Roaming\FAHClient\config.xml
07:14:19:******************************** Build ********************************
07:14:19:        Version: 7.5.1
07:14:19:           Date: May 11 2018
07:14:19:           Time: 13:06:32
07:14:19:     Repository: Git
07:14:19:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
07:14:19:         Branch: master
07:14:19:       Compiler: Visual C++ 2008
07:14:19:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:14:19:       Platform: win32 10
07:14:19:           Bits: 32
07:14:19:           Mode: Release
07:14:19:******************************* System ********************************
07:14:19:            CPU: AMD Ryzen 5 2600 Six-Core Processor
07:14:19:         CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
07:14:19:           CPUs: 12
07:14:19:         Memory: 31.93GiB
07:14:19:    Free Memory: 24.12GiB
07:14:19:        Threads: WINDOWS_THREADS
07:14:19:     OS Version: 6.2
07:14:19:    Has Battery: false
07:14:19:     On Battery: false
07:14:19:     UTC Offset: -5
07:14:19:            PID: 13344
07:14:19:            CWD: C:\Users\Blackbox2\AppData\Roaming\FAHClient
07:14:19:             OS: Windows 10 Enterprise
07:14:19:        OS Arch: AMD64
07:14:19:           GPUs: 1
07:14:19:          GPU 0: Bus:37 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
07:14:19:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
07:14:19:                 specified module could not be found.
07:14:19:
07:14:19:OpenCL Device 0: Platform:0 Device:0 Bus:37 Slot:0 Compute:1.2 Driver:3004.8
07:14:19:  Win32 Service: false
07:14:19:***********************************************************************
07:14:19:<config>
07:14:19:  <!-- Network -->
07:14:19:  <proxy v=':8080'/>
07:14:19:
07:14:19:  <!-- Slot Control -->
07:14:19:  <power v='full'/>
07:14:19:
07:14:19:  <!-- User Information -->
07:14:19:  <passkey v='********************************'/>
07:14:19:  <team v='150'/>
07:14:19:  <user v='schertt'/>
07:14:19:
07:14:19:  <!-- Folding Slots -->
07:14:19:  <slot id='1' type='GPU'/>
07:14:19:</config>
07:14:20:Trying to access database...
07:14:20:Successfully acquired database lock
07:14:20:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
07:14:20:WU00:FS01:Connecting to 65.254.110.245:8080
07:14:20:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:14:20:WU00:FS01:Connecting to 18.218.241.186:80
07:14:21:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
07:14:21:ERROR:WU00:FS01:Exception: Could not get an assignment
07:14:21:WU00:FS01:Connecting to 65.254.110.245:8080
07:14:21:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:14:21:WU00:FS01:Connecting to 18.218.241.186:80
07:14:21:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
07:14:21:ERROR:WU00:FS01:Exception: Could not get an assignment
07:14:22:10:127.0.0.1:New Web connection
07:15:21:WU00:FS01:Connecting to 65.254.110.245:8080
07:15:21:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:15:21:WU00:FS01:Connecting to 18.218.241.186:80
07:15:21:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
07:15:21:ERROR:WU00:FS01:Exception: Could not get an assignment
07:16:58:WU00:FS01:Connecting to 65.254.110.245:8080
07:16:58:WU00:FS01:Assigned to work server 155.247.164.213
07:16:58:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] from 155.247.164.213
07:16:58:WU00:FS01:Connecting to 155.247.164.213:8080
07:16:59:WU00:FS01:Downloading 86.40MiB
07:17:04:WU00:FS01:Download complete
07:17:04:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:11758 run:0 clone:825 gen:0 core:0x22 unit:0x000000039bf7a4d55e6d77112a436f60
07:17:04:WU00:FS01:Starting
07:17:04:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Blackbox2\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 13344 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
07:17:05:WU00:FS01:Started FahCore on PID 13280
07:17:05:WU00:FS01:Core PID:8776
07:17:05:WU00:FS01:FahCore 0x22 started
07:17:05:WU00:FS01:0x22:*********************** Log Started 2020-03-18T07:17:05Z ***********************
07:17:05:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
07:17:05:WU00:FS01:0x22:       Type: 0x22
07:17:05:WU00:FS01:0x22:       Core: Core22
07:17:05:WU00:FS01:0x22:    Website: https://foldingathome.org/
07:17:05:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
07:17:05:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
07:17:05:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
07:17:05:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 13280 -checkpoint 15
07:17:05:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
07:17:05:WU00:FS01:0x22:     Config: <none>
07:17:05:WU00:FS01:0x22:************************************ Build *************************************
07:17:05:WU00:FS01:0x22:    Version: 0.0.2
07:17:05:WU00:FS01:0x22:       Date: Dec 6 2019
07:17:05:WU00:FS01:0x22:       Time: 21:30:31
07:17:05:WU00:FS01:0x22: Repository: Git
07:17:05:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
07:17:05:WU00:FS01:0x22:     Branch: HEAD
07:17:05:WU00:FS01:0x22:   Compiler: Visual C++ 2008
07:17:05:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:17:05:WU00:FS01:0x22:   Platform: win32 10
07:17:05:WU00:FS01:0x22:       Bits: 64
07:17:05:WU00:FS01:0x22:       Mode: Release
07:17:05:WU00:FS01:0x22:************************************ System ************************************
07:17:05:WU00:FS01:0x22:        CPU: AMD Ryzen 5 2600 Six-Core Processor
07:17:05:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
07:17:05:WU00:FS01:0x22:       CPUs: 12
07:17:05:WU00:FS01:0x22:     Memory: 31.93GiB
07:17:05:WU00:FS01:0x22:Free Memory: 24.01GiB
07:17:05:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
07:17:05:WU00:FS01:0x22: OS Version: 6.2
07:17:05:WU00:FS01:0x22:Has Battery: false
07:17:05:WU00:FS01:0x22: On Battery: false
07:17:05:WU00:FS01:0x22: UTC Offset: -5
07:17:05:WU00:FS01:0x22:        PID: 8776
07:17:05:WU00:FS01:0x22:        CWD: C:\Users\Blackbox2\AppData\Roaming\FAHClient\work
07:17:05:WU00:FS01:0x22:         OS: Windows 10 Pro
07:17:05:WU00:FS01:0x22:    OS Arch: AMD64
07:17:05:WU00:FS01:0x22:********************************************************************************
07:17:05:WU00:FS01:0x22:Project: 11758 (Run 0, Clone 825, Gen 0)
07:17:05:WU00:FS01:0x22:Unit: 0x000000039bf7a4d55e6d77112a436f60
07:17:05:WU00:FS01:0x22:Reading tar file core.xml
07:17:05:WU00:FS01:0x22:Reading tar file integrator.xml
07:17:05:WU00:FS01:0x22:Reading tar file state.xml
07:17:06:WU00:FS01:0x22:Reading tar file system.xml
07:17:07:WU00:FS01:0x22:Digital signatures verified
07:17:07:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:17:07:WU00:FS01:0x22:Version 0.0.2
07:17:24:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
07:17:24:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
07:17:24:WU00:FS01:0x22:Saving result file science.log
07:17:24:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
07:17:24:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:17:24:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11758 run:0 clone:825 gen:0 core:0x22 unit:0x000000039bf7a4d55e6d77112a436f60
07:17:24:WU00:FS01:Uploading 8.00KiB to 155.247.164.213
07:17:24:WU00:FS01:Connecting to 155.247.164.213:8080
07:17:24:WU00:FS01:Upload complete
07:17:24:WU00:FS01:Server responded WORK_ACK (400)
07:17:24:WU00:FS01:Cleaning up

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 7:53 am
by bruce
The fact that 11758 (0, 3765, 0) cannot be found on the server probably means it has never been uploaded.

Regarding 11759 (0, 10513, 1) the WU has now been successfully completed. I don't know if either of these reports happen to be yours but the completion suggests that the problem might be on your end. (Marginally stable overclock or something else :?: )

Code: Select all

SamGamgee 	Faulty 2
Mrukson 	Ok
Unfortunately, I don't know the difference between Faulty 2 and any other type of FAULTY.

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 8:15 am
by bruce
One possibility: "transfer failed" generally means that the result file was somehow corrupted. Do you happen to be running an antivirus scanner?

Some folks have had scans pick up a string of random numbers that somehow matches the pattern of some malware that has been found installed on other computers. This false positive is probably followed by the quaranteen of the offending sequence ... leaving some incomplete data to be uploaded. The fix is to configure your scanner to avoid looking at FAH's work files.

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 3:54 pm
by vnicolici
Still can't upload my 2 GOOD WUs from 11758, getting close to 72 hours since I finished the first one. Before the last attempt I even disabled Windows Defender, the problem persisted:

Code: Select all

15:45:41:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:3900 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771ad0cf1564
15:45:41:WU02:FS01:Uploading 55.24MiB to 155.247.164.213
15:45:41:WU02:FS01:Connecting to 155.247.164.213:8080
15:45:41:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:5213 gen:0 core:0x22 unit:0x000000009bf7a4d55e6e8abff76953ed
15:45:41:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
15:45:41:WU00:FS01:Connecting to 155.247.164.213:8080
15:45:51:WU00:FS01:Upload 0.34%
15:45:51:WU02:FS01:Upload 0.34%
15:45:51:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
15:45:51:WU00:FS01:Trying to send results to collection server
15:45:51:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
15:45:51:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
15:45:51:WU02:FS01:Trying to send results to collection server
15:45:51:WU00:FS01:Connecting to 155.247.164.214:8080
15:45:51:WU02:FS01:Uploading 55.24MiB to 155.247.164.214
15:45:51:WU02:FS01:Connecting to 155.247.164.214:8080
15:45:51:ERROR:WU02:FS01:Exception: Transfer failed
15:45:52:ERROR:WU00:FS01:Exception: Transfer failed
It would be nice if the servers could report more helpful information to the clients when such errors occur, so that it's easier for everyone to troubleshoot the problem.

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 4:01 pm
by bruce
Actually, I think the error messages are a bit cryptic for two reasons.

1) there's probably no more information available. "Transfer failed" (or whatever) probably doesn't have any more information than the fact that there was a communications error.

2) FAH doesn't want to aid hackers. Most other distributed computing platforms have been plagued with cases of people trying to earn more points by trying to upload falsified data. FAH has had relatively few cases like that, but it has happened. FAH SHOULD NOT be telling those people how it detected that the data was falsified so they can improve their ability to construct better crafted false data.

Re: Can't upload 2 FAULTY WUs

Posted: Wed Mar 18, 2020 8:05 pm
by vnicolici
OK. As long as it's not a systemic issue, 2 stuck units here and there probably won't make much difference in the grand scheme of things.

So maybe it's not worth the effort to investigate this further.