Page 1 of 2

Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 11:34 am
by superpan
I currently seem to be unable to upload to 128.252.203.1 & .9 with retries looping

11:25:50:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:25:50:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:25:50:WU00:FS00:Connecting to 128.252.203.9:8080
11:25:56:WU00:FS00:Upload 27.68%
11:26:02:WU00:FS00:Upload 58.43%
11:26:08:WU00:FS00:Upload 89.80%
11:26:19:WU01:FS00:0xa7:Completed 35000 out of 500000 steps (7%)
11:26:39:WU00:FS00:Upload 94.10%
11:26:39:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
11:26:39:WU00:FS00:Trying to send results to collection server
11:26:39:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:26:39:WU00:FS00:Connecting to 128.252.203.1:8080
11:26:45:WU00:FS00:Upload 27.68%
11:26:51:WU00:FS00:Upload 58.43%
11:26:57:WU00:FS00:Upload 91.03%
11:27:18:WU01:FS00:0xa7:Completed 40000 out of 500000 steps (8%)
11:27:28:WU00:FS00:Upload 94.10%
11:27:29:ERROR:WU00:FS00:Exception: Transfer failed

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 7:26 pm
by bruce
superpan wrote:I currently seem to be unable to upload to 128.252.203.1 & .9 with retries looping

... project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
... Transfer failed
... Transfer failed
I'm stumped. The servers reportedly have plenty of space and the WU reports "not found"

Is anybody else seeing either successes or failures on these servers?

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 7:47 pm
by psaam0001
Bruce,

I don't know if I have any units that will be going back to those servers (as I'm just letting my system's do their thing), but I will take a quick peek just to see if any of them are having the same hiccups. Results will be posted as a reply.

Paul

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 8:04 pm
by comixgoddess
bruce wrote:
superpan wrote:I currently seem to be unable to upload to 128.252.203.1 & .9 with retries looping

... project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
... Transfer failed
... Transfer failed
I'm stumped. The servers reportedly have plenty of space and the WU reports "not found"

Is anybody else seeing either successes or failures on these servers?
I have one WU (project:17422 run:0 clone:3159 gen:28) that successfully uploaded to 128.252.203.9 at 12:44:13 today.

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 8:06 pm
by psaam0001
I have a total of 10 units where either these servers, or 128.252.203.10 are listed as where they may be returned to.

Let me keep peeking at these units as best as I can, and I'll report if they get 'stuck in traffic'.

Paul

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 8:55 pm
by superpan
Just to confirm I am still having the errors:
19:25:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
19:27:52:ERROR:WU00:FS00:Exception: Transfer failed
20:27:01:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
20:27:50:ERROR:WU00:FS00:Exception: Transfer failed

Grep the log file:
11:19:16:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:20:06:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:20:57:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:21:47:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:22:38:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:25:01:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:25:50:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:26:39:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:28:27:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:29:19:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:32:42:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:33:31:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:39:33:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:40:23:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
11:50:39:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
11:51:28:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
12:08:35:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
12:09:23:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
12:37:38:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
12:38:28:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
13:02:28:WU01:FS00:Uploading 6.30MiB to 129.32.209.201
13:24:37:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
13:25:26:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
13:30:12:WU02:FS00:Uploading 2.56MiB to 128.252.203.10
13:49:39:WU01:FS00:Uploading 1.57MiB to 128.252.203.11
14:17:32:WU02:FS00:Uploading 2.56MiB to 128.252.203.10
14:24:37:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
14:25:25:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
14:49:01:WU01:FS00:Uploading 2.82MiB to 128.252.203.10
15:24:37:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
15:25:25:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
16:07:48:WU02:FS00:Uploading 8.24MiB to 130.237.11.145
16:24:37:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
16:25:27:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
16:39:13:WU01:FS00:Uploading 2.83MiB to 128.252.203.10
17:24:37:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
17:27:00:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
18:19:07:WU02:FS00:Uploading 6.30MiB to 129.32.209.201
18:24:37:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
18:25:26:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
19:22:39:WU01:FS00:Uploading 11.25MiB to 128.252.203.9
19:24:38:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
19:25:29:WU00:FS00:Uploading 10.16MiB to 128.252.203.1
20:24:38:WU00:FS00:Uploading 10.16MiB to 128.252.203.9
20:27:01:WU00:FS00:Uploading 10.16MiB to 128.252.203.1

The failing file is 10.16MiB
this entry 19:22:39:WU01:FS00:Uploading 11.25MiB to 128.252.203.9 was a successful upload.

Could it be a bad file that is being uploaded rather than a server error ?

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 9:15 pm
by bruce
Yes, it could be, but I don't have any way to know that. Various WUs return various sizes of uploads.

If you grep on for the project number are they all the same sizes?

In the case of project:17422 run:0 clone:217 gen:189, I did confirm that get:188 had been returned and 189 and 190 have not been returned yet --- but that's exactly what I would expect until your WU passes the deadline and it is declared "lost" and it's assigned to someone else.

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 10:07 pm
by superpan
07:56:23:WU00:FS00:0xa7:Project: 17423 (Run 0, Clone 2033, Gen 30)
08:52:27:WU01:FS00:0xa7:Project: 16927 (Run 9, Clone 648, Gen 29)
10:30:51:WU00:FS00:0xa7:Project: 17422 (Run 0, Clone 217, Gen 189) <+++ I think this is the failing one
11:19:16:WU01:FS00:0xa7:Project: 16927 (Run 28, Clone 681, Gen 23)
13:02:29:WU02:FS00:0xa7:Project: 17217 (Run 799, Clone 4, Gen 22)
13:30:12:WU01:FS00:0xa7:Project: 16461 (Run 49, Clone 3, Gen 66)
13:49:48:WU02:FS00:0xa7:Project: 17217 (Run 3093, Clone 1, Gen 85)
14:17:36:WU01:FS00:0xa7:Project: 17214 (Run 2365, Clone 4, Gen 13)
14:49:02:WU02:FS00:0xa8:Project: 16814 (Run 4, Clone 270, Gen 95)
16:07:48:WU01:FS00:0xa7:Project: 17214 (Run 1663, Clone 3, Gen 51)
16:39:13:WU02:FS00:0xa7:Project: 16927 (Run 18, Clone 357, Gen 20)
18:19:18:WU01:FS00:0xa7:Project: 13822 (Run 691, Clone 5, Gen 83)
19:22:39:WU02:FS00:0xa7:Project: 16927 (Run 3, Clone 56, Gen 57)
21:01:11:WU01:FS00:0xa7:Project: 17217 (Run 4021, Clone 3, Gen 69)
21:30:34:WU02:FS00:0xa7:Project: 13823 (Run 766, Clone 5, Gen 146)

I hope this is what you requested. I'm still a novice at this !

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Tue Dec 29, 2020 10:11 pm
by superpan
or... did you mean grep 17422

10:30:51:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
10:30:51:WU00:FS00:0xa7:Project: 17422 (Run 0, Clone 217, Gen 189)
11:19:16:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:20:57:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:22:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:25:50:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:28:27:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:32:42:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:39:33:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
11:50:39:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
12:08:35:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
12:37:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
13:24:36:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
14:24:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
15:24:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
16:24:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
17:24:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
18:24:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
19:24:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
20:24:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000
21:24:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17422 run:0 clone:217 gen:189 core:0xa7 unit:0x000000d9000000bd0000440e00000000

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Thu Mar 04, 2021 12:42 am
by jtmedic
No WU found for the following on 128.252.203.9 and have not received any credit in 2 days.

12:29:42:WU00:FS00:0xa7:Project: 16927 (Run 20, Clone 894, Gen 125)
22:26:54:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17800 run:20 clone:116 gen:125 core:0x22 unit:0x000000740000007d0000458800000014
23:34:45:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:16927 run:10 clone:1485 gen:64 core:0xa7 unit:0x000005cd000000400000421f0000000a
23:41:42:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:16927 run:20 clone:894 gen:125 core:0xa7 unit:0x0000037e0000007d0000421f00000014

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Thu Mar 04, 2021 5:33 pm
by bruce
Yes, the servers at *.wustl.edu are not connecting to the stats subsystem. As has been stated repeatedly, the stats records are delayed. We expect that there will be a sudden spike it points from those server when the problems are fixed.

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Wed Mar 31, 2021 9:49 am
by Eagle
@Bruce: About 4 weeks later, the problem _still_ persists as I've got over 40 WU not being uploaded for days if not weeks.
Why is FAH always struggling with this? Almost a year ago, due to the massively increased user base, I understood the reasons. Not so much nowadays, hence my request for a more lasting solution that keeps everything running even if 3 backup alternatives might fail to catch the connection load.

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Wed Mar 31, 2021 2:38 pm
by Joe_H
Eagle wrote:@Bruce: About 4 weeks later, the problem _still_ persists as I've got over 40 WU not being uploaded for days if not weeks.
Why is FAH always struggling with this? Almost a year ago, due to the massively increased user base, I understood the reasons. Not so much nowadays, hence my request for a more lasting solution that keeps everything running even if 3 backup alternatives might fail to catch the connection load.
The problem has been fixed for almost everyone. You may be having problems uploading to these servers, but most others are not. For example, just checking my recent log files there are several dozen uploads to these servers. None had any problems.

So a few specific examples of problem uploads would be needed to start figuring out what is happening. As for backup alternatives, the current software only allows for a single backup CS and upload to the original WS for each WU processed. Anything different would require a major rewrite of both the client and the server code.

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Wed Mar 31, 2021 3:30 pm
by Eagle
Joe_H wrote:The problem has been fixed for almost everyone. You may be having problems uploading to these servers, but most others are not.
I don't think so as any culprit on my end (OS, firewall, router, etc.) was already tested by myself extensively - all work perfectly, it's always the IP & port of the upload target that is rejecting, sending wrong replies, etc.
Joe_H wrote:For example, just checking my recent log files there are several dozen uploads to these servers. None had any problems.

So a few specific examples of problem uploads would be needed to start figuring out what is happening.
Sure, there you go (grouped & sorted by WU # for easier readability):

Code: Select all

15:10:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17231 run:493 clone:3 gen:51 core:0xa7 unit:0x00000003000000330000434f000001ed
15:10:38:WU00:FS00:Uploading 18.68MiB to 206.223.170.146
15:10:38:WU00:FS00:Connecting to 206.223.170.146:8080
15:11:09:WARNING:WU00:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:11:09:WU00:FS00:Trying to send results to collection server
15:11:09:WU00:FS00:Uploading 18.68MiB to 128.252.203.10
15:11:09:WU00:FS00:Connecting to 128.252.203.10:8080
15:11:39:ERROR:WU00:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:10:26:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:17231 run:117 clone:2 gen:43 core:0xa7 unit:0x000000020000002b0000434f00000075
15:10:26:WU02:FS00:Uploading 18.69MiB to 206.223.170.146
15:10:26:WU02:FS00:Connecting to 206.223.170.146:8080
15:10:56:WARNING:WU02:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:10:56:WU02:FS00:Trying to send results to collection server
15:10:56:WU02:FS00:Uploading 18.69MiB to 128.252.203.10
15:10:56:WU02:FS00:Connecting to 128.252.203.10:8080
15:11:26:ERROR:WU02:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:03:52:WU04:FS00:Sending unit results: id:04 state:SEND error:NO_ERROR project:16948 run:74 clone:3 gen:99 core:0xa8 unit:0x0000000300000063000042340000004a
15:03:52:WU04:FS00:Uploading 21.70MiB to 129.32.209.203
15:03:52:WU04:FS00:Connecting to 129.32.209.203:8080
15:04:22:WARNING:WU04:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:04:22:WU04:FS00:Trying to send results to collection server
15:04:22:WU04:FS00:Uploading 21.70MiB to 129.32.209.201
15:04:22:WU04:FS00:Connecting to 129.32.209.201:8080
15:04:52:ERROR:WU04:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

14:54:42:WU06:FS00:Sending unit results: id:06 state:SEND error:NO_ERROR project:17231 run:647 clone:1 gen:49 core:0xa7 unit:0x00000001000000310000434f00000287
14:54:42:WU06:FS00:Uploading 18.69MiB to 206.223.170.146
14:54:42:WU06:FS00:Connecting to 206.223.170.146:8080
14:55:13:WARNING:WU06:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
14:55:13:WU06:FS00:Trying to send results to collection server
14:55:13:WU06:FS00:Uploading 18.69MiB to 128.252.203.10
14:55:13:WU06:FS00:Connecting to 128.252.203.10:8080
14:55:43:ERROR:WU06:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:10:40:WU09:FS00:Sending unit results: id:09 state:SEND error:NO_ERROR project:17231 run:4447 clone:2 gen:30 core:0xa7 unit:0x000000020000001e0000434f0000115f
15:10:40:WU09:FS00:Uploading 18.70MiB to 206.223.170.146
15:10:40:WU09:FS00:Connecting to 206.223.170.146:8080
15:11:10:WARNING:WU09:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:11:10:WU09:FS00:Trying to send results to collection server
15:11:10:WU09:FS00:Uploading 18.70MiB to 128.252.203.10
15:11:10:WU09:FS00:Connecting to 128.252.203.10:8080
15:11:41:ERROR:WU09:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:10:38:WU10:FS00:Sending unit results: id:10 state:SEND error:NO_ERROR project:17231 run:6093 clone:2 gen:25 core:0xa7 unit:0x00000002000000190000434f000017cd
15:10:38:WU10:FS00:Uploading 18.71MiB to 206.223.170.146
15:10:38:WU10:FS00:Connecting to 206.223.170.146:8080
15:11:09:WARNING:WU10:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:11:09:WU10:FS00:Trying to send results to collection server
15:11:09:WU10:FS00:Uploading 18.71MiB to 128.252.203.10
15:11:09:WU10:FS00:Connecting to 128.252.203.10:8080
15:11:39:ERROR:WU10:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:10:45:WU12:FS00:Sending unit results: id:12 state:SEND error:NO_ERROR project:17231 run:1227 clone:4 gen:32 core:0xa7 unit:0x00000004000000200000434f000004cb
15:10:45:WU12:FS00:Uploading 18.70MiB to 206.223.170.146
15:10:45:WU12:FS00:Connecting to 206.223.170.146:8080
15:11:16:WARNING:WU12:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:11:16:WU12:FS00:Trying to send results to collection server
15:11:16:WU12:FS00:Uploading 18.70MiB to 128.252.203.10
15:11:16:WU12:FS00:Connecting to 128.252.203.10:8080
15:11:46:ERROR:WU12:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:03:34:WU13:FS00:Sending unit results: id:13 state:SEND error:NO_ERROR project:17231 run:258 clone:3 gen:46 core:0xa7 unit:0x000000030000002e0000434f00000102
15:03:34:WU13:FS00:Uploading 18.70MiB to 206.223.170.146
15:03:34:WU13:FS00:Connecting to 206.223.170.146:8080
15:04:04:WARNING:WU13:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:04:04:WU13:FS00:Trying to send results to collection server
15:04:04:WU13:FS00:Uploading 18.70MiB to 128.252.203.10
15:04:04:WU13:FS00:Connecting to 128.252.203.10:8080
15:04:35:ERROR:WU13:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:03:34:WU15:FS00:Sending unit results: id:15 state:SEND error:NO_ERROR project:17231 run:3975 clone:4 gen:37 core:0xa7 unit:0x00000004000000250000434f00000f87
15:03:34:WU15:FS00:Uploading 18.69MiB to 206.223.170.146
15:03:34:WU15:FS00:Connecting to 206.223.170.146:8080
15:04:05:WARNING:WU15:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:04:05:WU15:FS00:Trying to send results to collection server
15:04:05:WU15:FS00:Uploading 18.69MiB to 128.252.203.10
15:04:05:WU15:FS00:Connecting to 128.252.203.10:8080
15:04:35:ERROR:WU15:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:03:51:WU16:FS00:Sending unit results: id:16 state:SEND error:NO_ERROR project:17231 run:7531 clone:3 gen:16 core:0xa7 unit:0x00000003000000100000434f00001d6b
15:03:51:WU16:FS00:Uploading 18.71MiB to 206.223.170.146
15:03:51:WU16:FS00:Connecting to 206.223.170.146:8080
15:04:22:WARNING:WU16:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:04:22:WU16:FS00:Trying to send results to collection server
15:04:22:WU16:FS00:Uploading 18.71MiB to 128.252.203.10
15:04:22:WU16:FS00:Connecting to 128.252.203.10:8080
15:04:52:ERROR:WU16:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:03:35:WU17:FS00:Sending unit results: id:17 state:SEND error:NO_ERROR project:17231 run:3656 clone:3 gen:47 core:0xa7 unit:0x000000030000002f0000434f00000e48
15:03:35:WU17:FS00:Uploading 18.69MiB to 206.223.170.146
15:03:35:WU17:FS00:Connecting to 206.223.170.146:8080
15:04:05:WARNING:WU17:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:04:05:WU17:FS00:Trying to send results to collection server
15:04:05:WU17:FS00:Uploading 18.69MiB to 128.252.203.10
15:04:05:WU17:FS00:Connecting to 128.252.203.10:8080
15:04:36:ERROR:WU17:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:05:53:WU18:FS00:Sending unit results: id:18 state:SEND error:NO_ERROR project:17413 run:0 clone:998 gen:661 core:0xa7 unit:0x000003e6000002950000440500000000
15:05:53:WU18:FS00:Uploading 9.94MiB to 66.170.111.50
15:05:53:WU18:FS00:Connecting to 66.170.111.50:8080
15:06:23:WARNING:WU18:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0

15:04:52:WU19:FS00:Sending unit results: id:19 state:SEND error:NO_ERROR project:17231 run:1 clone:2 gen:42 core:0xa7 unit:0x000000020000002a0000434f00000001
15:04:52:WU19:FS00:Uploading 18.70MiB to 206.223.170.146
15:04:52:WU19:FS00:Connecting to 206.223.170.146:8080
15:05:23:WARNING:WU19:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:05:23:WU19:FS00:Trying to send results to collection server
15:05:23:WU19:FS00:Uploading 18.70MiB to 128.252.203.10
15:05:23:WU19:FS00:Connecting to 128.252.203.10:8080
15:05:53:ERROR:WU19:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:06:05:WU23:FS00:Sending unit results: id:23 state:SEND error:NO_ERROR project:17231 run:4298 clone:4 gen:42 core:0xa7 unit:0x000000040000002a0000434f000010ca
15:06:05:WU23:FS00:Uploading 18.70MiB to 206.223.170.146
15:06:05:WU23:FS00:Connecting to 206.223.170.146:8080
15:06:35:WARNING:WU23:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:06:35:WU23:FS00:Trying to send results to collection server
15:06:35:WU23:FS00:Uploading 18.70MiB to 128.252.203.10
15:06:35:WU23:FS00:Connecting to 128.252.203.10:8080
15:07:06:ERROR:WU23:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:06:23:WU24:FS00:Sending unit results: id:24 state:SEND error:NO_ERROR project:17231 run:878 clone:0 gen:34 core:0xa7 unit:0x00000000000000220000434f0000036e
15:06:23:WU24:FS00:Uploading 18.70MiB to 206.223.170.146
15:06:23:WU24:FS00:Connecting to 206.223.170.146:8080
15:06:53:WARNING:WU24:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:06:53:WU24:FS00:Trying to send results to collection server
15:06:53:WU24:FS00:Uploading 18.70MiB to 128.252.203.10
15:06:53:WU24:FS00:Connecting to 128.252.203.10:8080
15:07:24:ERROR:WU24:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:06:53:WU25:FS00:Sending unit results: id:25 state:SEND error:NO_ERROR project:17231 run:7313 clone:3 gen:28 core:0xa7 unit:0x000000030000001c0000434f00001c91
15:06:53:WU25:FS00:Uploading 18.70MiB to 206.223.170.146
15:06:53:WU25:FS00:Connecting to 206.223.170.146:8080
15:07:24:WARNING:WU25:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:07:24:WU25:FS00:Trying to send results to collection server
15:07:24:WU25:FS00:Uploading 18.70MiB to 128.252.203.10
15:07:24:WU25:FS00:Connecting to 128.252.203.10:8080
15:07:54:ERROR:WU25:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:07:36:WU32:FS00:Sending unit results: id:32 state:SEND error:NO_ERROR project:17231 run:4183 clone:1 gen:54 core:0xa7 unit:0x00000001000000360000434f00001057
15:07:36:WU32:FS00:Uploading 18.71MiB to 206.223.170.146
15:07:36:WU32:FS00:Connecting to 206.223.170.146:8080
15:08:07:WARNING:WU32:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
15:08:07:WU32:FS00:Trying to send results to collection server
15:08:07:WU32:FS00:Uploading 18.71MiB to 128.252.203.10
15:08:07:WU32:FS00:Connecting to 128.252.203.10:8080
15:08:37:ERROR:WU32:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

15:03:07:WU42:FS00:Sending unit results: id:42 state:SEND error:NO_ERROR project:17415 run:0 clone:603 gen:639 core:0xa7 unit:0x0000025b0000027f0000440700000000
15:03:08:WU42:FS00:Uploading 9.97MiB to 66.170.111.50
15:03:08:WU42:FS00:Connecting to 66.170.111.50:8080
15:03:38:WARNING:WU42:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0

15:15:34:WU49:FS00:Connecting to assign1.foldingathome.org:80
15:15:35:WU49:FS00:Assigned to work server 129.32.209.203
15:15:35:WU49:FS00:Requesting new work unit for slot 00: cpu:31 from 129.32.209.203
15:15:35:WU49:FS00:Connecting to 129.32.209.203:8080
15:15:35:WU49:FS00:Downloading 759.00KiB
15:15:36:WU49:FS00:Download complete
15:15:36:WU49:FS00:Received Unit: id:49 state:DOWNLOAD error:NO_ERROR project:16951 run:61 clone:1 gen:84 core:0xa8 unit:0x0000000100000054000042370000003d
Joe_H wrote:As for backup alternatives, the current software only allows for a single backup CS and upload to the original WS for each WU processed. Anything different would require a major rewrite of both the client and the server code.
I rather thought about more load balancers & servers behind them than more CS with even more IP - which in turn wouldn't need a single character of modified client code, probably not even on the actual FAH servers (besides the load balancers, etc.)..

Re: Issues with uploading to 128.252.203.1 & .9

Posted: Wed Mar 31, 2021 5:24 pm
by bruce
The single WS plus a single CS philosophy obviously will manage to find whatever server flaws might exist. Servers do run into problems that need to be fixed manually by the relatively small staff who are able to manage the specific server involved. It's certainly not an ideal solution.

Typically, we get enough information from folks like you to identify which server is having a problem and escalate the issue to the designated person(s). If FAH was a large company with a detailed business plan, we could count on the lost revinue paying for more staff, but there's a limit to what happens when we depend on a limited number of volunteers who can identify and fix whatever can go wrong with a server and who has the necessary security credentials for that server.

It was a lot simpler 20 years ago in the days when all the servers were at Stanford and a single university IT team managed all of them (except on weekends ;) ). In recent times, it seems that most of the unsolved problems are related to shared cloud servers.