Problem on sending results [Project 14283]

Moderators: Site Moderators, FAHC Science Team

Post Reply
sf8kkn
Posts: 10
Joined: Sat Oct 19, 2019 7:06 pm

Problem on sending results [Project 14283]

Post by sf8kkn »

Hi guys,

I've currently this error on various of my rigs.
Problem seems to be on project 12783 only.

01:22:48:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14283 run:0 clone:1 gen:46 core:0x21 unit:0x0000003380fccb0a5d9e11688fbd34af
01:22:48:WU00:FS01:Uploading 160.21MiB to 128.252.203.10
01:22:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14283 run:0 clone:79 gen:1 core:0x21 unit:0x0000000280fccb0a5d9e116d639f080f
01:22:48:WU00:FS01:Connecting to 128.252.203.10:8080
01:22:48:WU01:FS01:Uploading 193.10MiB to 128.252.203.10
...
01:22:50:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:50:WU00:FS01:Trying to send results to collection server
01:22:50:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:50:WU00:FS01:Uploading 160.21MiB to 155.247.166.219
01:22:50:WU01:FS01:Trying to send results to collection server
01:22:50:WU00:FS01:Connecting to 155.247.166.219:8080
01:22:50:WU01:FS01:Uploading 193.10MiB to 155.247.166.219
01:22:50:WU01:FS01:Connecting to 155.247.166.219:8080
01:22:51:ERROR:WU00:FS01:Exception: Transfer failed
01:22:51:ERROR:WU01:FS01:Exception: Transfer failed
01:22:52:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14283 run:0 clone:1 gen:46 core:0x21 unit:0x0000003380fccb0a5d9e11688fbd34af
01:22:52:WU00:FS01:Uploading 160.21MiB to 128.252.203.10
01:22:52:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14283 run:0 clone:79 gen:1 core:0x21 unit:0x0000000280fccb0a5d9e116d639f080f
01:22:52:WU00:FS01:Connecting to 128.252.203.10:8080
01:22:52:WU01:FS01:Uploading 193.10MiB to 128.252.203.10
01:22:52:WU01:FS01:Connecting to 128.252.203.10:8080
01:22:53:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:53:WU00:FS01:Trying to send results to collection server
01:22:53:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:53:WU00:FS01:Uploading 160.21MiB to 155.247.166.219
01:22:53:WU01:FS01:Trying to send results to collection server
01:22:53:WU00:FS01:Connecting to 155.247.166.219:8080
01:22:53:WU01:FS01:Uploading 193.10MiB to 155.247.166.219
01:22:53:WU01:FS01:Connecting to 155.247.166.219:8080
01:22:53:ERROR:WU00:FS01:Exception: Transfer failed
01:22:54:ERROR:WU01:FS01:Exception: Transfer failed
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results

Post by bruce »

See my explanation here

You've paused those WUs many times while they were processing (Most likely you processed them "on idle"). The upload packets are all greater than 100 MiB and are much too big to be valid results from project:14283
sf8kkn
Posts: 10
Joined: Sat Oct 19, 2019 7:06 pm

Re: Problem on sending results

Post by sf8kkn »

Not sure to understand, rigs are dedicated to folding and 1 wu takes like 4 hours to compute.
I see my monitoring tool has detected problem and relaunched wu several times, that's enough to lost that wu ?
So we can download wus with various size, all my rigs are configured for wus of 200MB max, but upload is limited to 100MB ? Well, that's a lot of time lost ...
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results

Post by bruce »

As I said in the linked explanation, every time the WU enters/leaves the paused state, extra garbage is added to the upload. If the WU never pauses, the bug in FAHCore_a7 for Windows keeps the results upload correct (and concise). The new version of FAHCore_a7 fixes this problem and the results will be up-loadable.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Problem on sending results

Post by Joe_H »

How many times did your tool relaunch the WU? Post that log and perhaps that will indicate where the problem was. Generally it does not take just a few times restarting to blow up the WU upload size to 193 MB, if it takes that many restarts the WU itself was bad or your system is not folding stable for GPU folding.

In this case Bruce missed that the WU's involved were running the GPU Core_21, so his comments about the Core_A7 issue are not completely relevant. Someone who has processed a Project 14283 WU will have to weigh in with the normal upload size for a WU from that project.

I have looked up both WU's. So far each has one report of a return where the WU failed to be processed successfully. Additional reports would be needed to determine that the WU's are bad, someone may successfully process them when reassigned.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results [Project 14283]

Post by bruce »

Oops. It looks like size has nothing to do with it. (So much for spending a week on "vacation"

Ib fact, your client detected the WU as FAULTY so there's probably more useful information in an earlier part of the log. Scroll back to where those WUs were downloaded.
sf8kkn
Posts: 10
Joined: Sat Oct 19, 2019 7:06 pm

Re: Problem on sending results [Project 14283]

Post by sf8kkn »

I will not find how many time wu has been relaunched, I've not this level of detail in my logs :(
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Problem on sending results [Project 14283]

Post by Joe_H »

The logs kept by the client would, fi your tool is completely relaunching processing, then even then the client keeps the last 16 logs by default.

Perhaps you need to rethink how your monitoring tool is handling problems.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Problem on sending results [Project 14283]

Post by toTOW »

Additional data added at each failure (bad state) on an already big WU might exceed the maximum upload size of the server ... and p14283 is already big when everything is fine (more than 100MB to upload).

Feel free to dump these WUs, they will never get back (and won't be very useful since they failed).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
artoar_11
Posts: 652
Joined: Sun Nov 22, 2009 8:42 pm
Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
Location: Bulgaria/Team #224497/artoar11_ALL_....

Re: Problem on sending results [Project 14283]

Post by artoar_11 »

I don't know if it's fair to compare that way. My WU upload from this project/2019-10-13T12:44:41Z:

12:44:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:14283 run:0 clone:2 gen:5 core:0x21 unit:0x0000000580fccb0a5d9e11684ba342e0
12:44:40:WU00:FS01:Uploading 115.64MiB to 128.252.203.10
12:44:40:WU00:FS01:Connecting to 128.252.203.10:8080
12:44:46:WU00:FS01:Upload 24.11%
12:44:52:WU00:FS01:Upload 59.24%
12:44:58:WU00:FS01:Upload 91.45%
12:45:01:WU00:FS01:Upload complete
12:45:01:WU00:FS01:Server responded WORK_ACK (400)
12:45:01:WU00:FS01:Final credit estimate, 155440.00 points
12:45:01:WU00:FS01:Cleaning up
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results [Project 14283]

Post by bruce »

P14283 is a GPU project. The bug in the CPU core_a7 which adds extra data to the upload has nothing to do with P14283. That bug has been causing congestion on 155.247.166.2xx and 14283 is on a server at a different site: 128.252.203.10.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Problem on sending results [Project 14283]

Post by toTOW »

artoar_11 wrote:I don't know if it's fair to compare that way. My WU upload from this project/2019-10-13T12:44:41Z:

12:44:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:14283 run:0 clone:2 gen:5 core:0x21 unit:0x0000000580fccb0a5d9e11684ba342e0
12:44:40:WU00:FS01:Uploading 115.64MiB to 128.252.203.10
12:44:40:WU00:FS01:Connecting to 128.252.203.10:8080
12:44:46:WU00:FS01:Upload 24.11%
12:44:52:WU00:FS01:Upload 59.24%
12:44:58:WU00:FS01:Upload 91.45%
12:45:01:WU00:FS01:Upload complete
12:45:01:WU00:FS01:Server responded WORK_ACK (400)
12:45:01:WU00:FS01:Final credit estimate, 155440.00 points
12:45:01:WU00:FS01:Cleaning up
This is the normal upload size for this project for a WU completed without Bad States ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results [Project 14283]

Post by bruce »

Unknown answer....

P14283 is a project that runs on the GPU. The recent change to the FAHCore was for CPU WUs so your question isn't applicable. Also, you can't really assume that one project returns a similar amount of data as some other project.
Post Reply