Page 5 of 12

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 10:16 am
by Klutz
As new user, where do I read about status of the server? A link would be very helpful. I have found https://apps.foldingathome.org/serverstats , but cannot find any status (or unable to understand it).
You're on the right page. Find the entry for 155.247.164.214, then hover your cursor over the column "Has CS" (where it says "Yes" in green). A small popup will appear to display the errors that plague this server. Being unable to connect to 155.247.164.213 is one of them.

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 11:55 am
by roffvald
Same issue here with .213 and .214

11:53:17:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1380 gen:0 core:0x22 unit:0x000000059bf7a4d55e6d771272f2b9e7
11:53:17:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
11:53:17:WU00:FS01:Connecting to 155.247.164.213:8080
11:53:17:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
11:53:17:WU00:FS01:Trying to send results to collection server
11:53:17:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
11:53:17:WU00:FS01:Connecting to 155.247.164.214:8080
11:53:19:ERROR:WU00:FS01:Exception: Transfer failed

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 12:04 pm
by Darth_Peter_dualxeon
At https://apps.foldingathome.org/serverstats , what does "Has CS" acronym mean? Why are "public" and"beta" jobs the same number? What's the difference?
by the way on the page, both these servers have 0 error and >90TiB space (is that the free space on the hard disks where the results get uploaded?)
And I can ping them. So then how I cannot upload result?

Sometimes I was seeing that there are jobs on the server and then did not receive work unit.... is this the total number of jobs, including the ones that are already assigned to someone else?

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 3:21 pm
by Qwarkman
I now have a third WU with the exact same problem. The first two has timed out and are now worthless. I expect nothing different for this one.
Either shut the servers down or fix them, letting them assign WU's is a serious waste of resources if they keep dishing out WU's with no ability to collect results.
I'm considering shutting down my GPU folding because for now it's a waste of time and money unless I babysit the work queue and remove all WU's assigned from these servers.

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 4:29 pm
by vangli
I did see that hey tried to restart the 214 server twice. No effect. Not knowing the F&H software, do the server communicate through an encrypted channel with sertificates, and has this changed? It seems that some more servers has failures connecting to 213. Anyway, agree with Qwarkman, stop sending WU's that can't be collected. Waste of time and computing power.

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 4:50 pm
by Jesse_V
Darth_Peter_dualxeon wrote:At https://apps.foldingathome.org/serverstats , what does "Has CS" acronym mean? Why are "public" and"beta" jobs the same number? What's the difference?
by the way on the page, both these servers have 0 error and >90TiB space (is that the free space on the hard disks where the results get uploaded?)
And I can ping them. So then how I cannot upload result?

Sometimes I was seeing that there are jobs on the server and then did not receive work unit.... is this the total number of jobs, including the ones that are already assigned to someone else?
CS means Collection Server. It's where the workunits go after they are completed. There are new projects in "beta" because they might be unstable or cause errors, so the people opting into beta have stable hardware and watch the log a little more closely. That way the teams can fix any issues before pushing them to the larger "public" group.

I'm not sure why you can't upload the results, but they are currently working on the servers to keep up with the overwhelming demand, so hopefully that gets fixed soon.
Qwarkman wrote:I now have a third WU with the exact same problem. The first two has timed out and are now worthless. I expect nothing different for this one.
Either shut the servers down or fix them, letting them assign WU's is a serious waste of resources if they keep dishing out WU's with no ability to collect results.
I'm considering shutting down my GPU folding because for now it's a waste of time and money unless I babysit the work queue and remove all WU's assigned from these servers.
I get it. The WUs should upload in time once they fix some of the errors with the server. There's just so much demand at the moment. I'd recommend just keeping everything running as I expect it to clear up soon.
vangli wrote:I did see that hey tried to restart the 214 server twice. No effect. Not knowing the F&H software, do the server communicate through an encrypted channel with sertificates, and has this changed? It seems that some more servers has failures connecting to 213. Anyway, agree with Qwarkman, stop sending WU's that can't be collected. Waste of time and computing power.
The clients talk to the server over HTTP. There is a hash signature and sanity checks to ensure that the workunits are validated. Technically, it's an HTTP request to a raw IP address. The servers just may be overwhelmed at the moment. In 10 years I've never seen this many people jumping into the network at once and it's causing substantial load on the servers.

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 6:37 pm
by Joe_H
Klutz wrote:
Moderators here have been in contact with the people running the server, more than once. They are aware and working on it among the other issues.
It would be helpful to get an update from someone who actually has oversight over these two servers. The server status page has clearly stated the exact nature of this error for a number of days, yet we've seen no indication that any attempt has been made to rectify it. This is a waste of valuable resources and goodwill.
Some of the folding team supporting the project servers are posting on Twitter and Facebook.

As for the Server Status page showing anything about the "exact nature of this problem", where is that and what are you interpreting to be that? About the only thing I can see is that they have greatly reduced the rate of assignments from those two servers while they try to sort this out.

Re: Send Errors - 155.247.164.213 & .214

Posted: Fri Mar 20, 2020 7:40 pm
by Klutz
As for the Server Status page showing anything about the "exact nature of this problem", where is that and what are you interpreting to be that?
I already answered that above: viewtopic.php?f=18&t=32492&start=60#p315744

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 3:18 am
by TitanXp
Still down? These 2 servers are giving me the same error as OP.
Project 11777

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 5:56 am
by sixty4bitdiablo
Same issues here.

05:50:39:WU01:FS01:Trying to send results to collection server
05:50:39:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
05:50:39:WU01:FS01:Connecting to 155.247.164.214:8080
05:50:39:ERROR:WU01:FS01:Exception: Transfer failed

Any ideas? It's been like this for a day or so now.

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 7:30 am
by vangli
In fact several days. I have 3 WUs waiting for upload, the first one with 80, yes eighty, retries. I accept that technical problems occure. However, total lack of information from the administrators and the fact that new WUs are stil sent, which cannot be collected, is bad.

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 8:08 am
by jima13
At this point I'd just be piling on, but what the heck>

Code: Select all

 *********************** Log Started 2020-03-21T02:17:20Z ***********************
02:17:22:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:17:23:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:17:23:WU03:FS02:Connecting to 155.247.164.213:8080
02:19:50:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:19:50:WU03:FS02:Trying to send results to collection server
02:19:50:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:19:50:WU03:FS02:Connecting to 155.247.164.214:8080
02:19:51:ERROR:WU03:FS02:Exception: Transfer failed
02:19:51:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:19:52:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:19:52:WU03:FS02:Connecting to 155.247.164.213:8080
02:19:53:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:19:53:WU03:FS02:Trying to send results to collection server
02:19:53:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:19:53:WU03:FS02:Connecting to 155.247.164.214:8080
02:19:55:ERROR:WU03:FS02:Exception: Transfer failed
02:20:51:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:20:51:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:20:51:WU03:FS02:Connecting to 155.247.164.213:8080
02:20:52:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:20:52:WU03:FS02:Trying to send results to collection server
02:20:52:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:20:52:WU03:FS02:Connecting to 155.247.164.214:8080
02:20:53:ERROR:WU03:FS02:Exception: Transfer failed
02:22:28:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:22:29:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:22:29:WU03:FS02:Connecting to 155.247.164.213:8080
02:22:29:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:22:29:WU03:FS02:Trying to send results to collection server
02:22:29:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:22:29:WU03:FS02:Connecting to 155.247.164.214:8080
02:22:29:ERROR:WU03:FS02:Exception: Transfer failed
02:25:06:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:25:06:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:25:06:WU03:FS02:Connecting to 155.247.164.213:8080
02:25:06:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:25:06:WU03:FS02:Trying to send results to collection server
02:25:06:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:25:06:WU03:FS02:Connecting to 155.247.164.214:8080
02:25:06:ERROR:WU03:FS02:Exception: Transfer failed
02:29:20:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:29:20:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:29:20:WU03:FS02:Connecting to 155.247.164.213:8080
02:29:21:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:29:21:WU03:FS02:Trying to send results to collection server
02:29:21:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:29:21:WU03:FS02:Connecting to 155.247.164.214:8080
02:29:21:ERROR:WU03:FS02:Exception: Transfer failed
02:36:11:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:36:11:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:36:11:WU03:FS02:Connecting to 155.247.164.213:8080
02:36:12:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:36:12:WU03:FS02:Trying to send results to collection server
02:36:12:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:36:12:WU03:FS02:Connecting to 155.247.164.214:8080
02:36:12:ERROR:WU03:FS02:Exception: Transfer failed
02:47:17:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:47:17:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:47:17:WU03:FS02:Connecting to 155.247.164.213:8080
02:47:17:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:47:17:WU03:FS02:Trying to send results to collection server
02:47:17:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:47:17:WU03:FS02:Connecting to 155.247.164.214:8080
02:47:18:ERROR:WU03:FS02:Exception: Transfer failed
03:05:14:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
03:05:14:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
03:05:14:WU03:FS02:Connecting to 155.247.164.213:8080
03:05:15:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
03:05:15:WU03:FS02:Trying to send results to collection server
03:05:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
03:05:15:WU03:FS02:Connecting to 155.247.164.214:8080
03:05:15:ERROR:WU03:FS02:Exception: Transfer failed
03:34:16:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
03:34:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
03:34:16:WU03:FS02:Connecting to 155.247.164.213:8080
03:34:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
03:34:16:WU03:FS02:Trying to send results to collection server
03:34:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
03:34:16:WU03:FS02:Connecting to 155.247.164.214:8080
03:34:17:ERROR:WU03:FS02:Exception: Transfer failed
04:21:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
04:21:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
04:21:15:WU03:FS02:Connecting to 155.247.164.213:8080
04:21:15:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
04:21:15:WU03:FS02:Trying to send results to collection server
04:21:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
04:21:15:WU03:FS02:Connecting to 155.247.164.214:8080
04:21:16:ERROR:WU03:FS02:Exception: Transfer failed
05:37:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
05:37:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
05:37:16:WU03:FS02:Connecting to 155.247.164.213:8080
05:37:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
05:37:16:WU03:FS02:Trying to send results to collection server
05:37:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
05:37:16:WU03:FS02:Connecting to 155.247.164.214:8080
05:37:16:ERROR:WU03:FS02:Exception: Transfer failed
07:40:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
07:40:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
07:40:15:WU03:FS02:Connecting to 155.247.164.213:8080
07:40:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
07:40:16:WU03:FS02:Trying to send results to collection server
07:40:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
07:40:16:WU03:FS02:Connecting to 155.247.164.214:8080
07:40:17:ERROR:WU03:FS02:Exception: Transfer failed
What really bugs me is the time between tries keeps expanding, so after 12 tries the next try is in 3 hours. Is there a reason for this, or can it be coded down to an hour or less?

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 8:09 am
by Whittle
What happens for me is it tries to upload to 155.247.164.213:

Code: Select all

07:59:09:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:41 gen:0 core:0x22 unit:0x000000079bf7a4d55e6d770eee5f6026
07:59:09:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
07:59:09:WU01:FS01:Connecting to 155.247.164.213:8080
07:59:11:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
In a Wireshark capture I see the following response from 155.247.164.213, before it closes the connection with a TCP reset:

Code: Select all

HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE
When I follow the TCP stream for 155.247.164.213 in Wireshark it looks to be sending the results and then the server cuts it off part way through:

Code: Select all

	   [...]
		<Position x="1.8908989287272897" y="-2.547739371460885" z="5.77243400391751"/>
		<Position x="1.8994014240536208" y="-2.618247308183257" z="5.674706601713439"/>
		<Position x="2.0814601488350846" y="-2.46088035887HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE
Content-Type: text/html
Connection: close

<html><head><title>413 HTTP_REQUEST_ENTITY_TOO_LARGE</title></head><body><h1>413 HTTP_REQUEST_ENTITY_TOO_LARGE</h1></body></html>
It then immediately retries the upload, this time to 155.247.164.214 but I just a TCP reset in Wireshark which shows as "Transfer failed" in the logs:

Code: Select all

07:59:11:WU01:FS01:Trying to send results to collection server
07:59:11:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
07:59:11:WU01:FS01:Connecting to 155.247.164.214:8080
07:59:12:ERROR:WU01:FS01:Exception: Transfer failed
Sometimes I also get that "HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE" error in Wireshark for 155.247.164.214 too.

Hope that's of some help.

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 8:15 am
by jonault
jima13 wrote:What really bugs me is the time between tries keeps expanding, so after 12 tries the next try is in 3 hours. Is there a reason for this, or can it be coded down to an hour or less?
If the reason the client can't get a connection is because too many clients are trying to talk to the server, then having the failed clients slow down their attempts to connect eases the burden on the server so they can start getting through. If they didn't slow down, they could wind up effectively DDOSing the server. I doubt that they'll want to change that behavior.

Re: Send Errors - 155.247.164.213 & .214

Posted: Sat Mar 21, 2020 4:04 pm
by vangli
Found just the same as Whittle . Transmission ends with HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE analyzing with wireshark. One possibility seems to be that the collector isn't able to receive a chunk of 55 Mbyte . Not very uncommon in a web server setup. Could it be that a reconfiguration to allow chunks of 64 Mbyte or something like that will help? Wireshark analyze show that a connection is established, but then disconnected a part out in transmission

Code: Select all

4769	1213.944078814	192.168.123.21	155.247.164.214	TCP	66	44488 → 8080 [ACK] Seq=15948 Ack=220 Win=30336 Len=0 TSval=102210099 TSecr=3240794470
4770	1213.944189552	155.247.164.214	192.168.123.21	HTTP	66	HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE  (text/html)
Editet:

I have done some further analyzes. After connection to server, the following packet are sent to the collecting server:

Code: Select all

119	36.152390113	192.168.123.21	155.247.164.213	TCP	173	57632 → 8080 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=107 TSval=1078346373 TSecr=3159245236 [TCP segment of a reassembled PDU]
The payload of this packet is:

Code: Select all

POST http://155.247.164.213/ HTTP/1.0
Content-Length: 57925632
Content-Type: application/octet-stream
As you can see it tells the size of the transmission, 57925632 bytes. After this the connection collapses after exchanging some more packets, ending up with HTTP_REQUEST_ENTITY_TOO_LARGE. The real transmission seems never to start. If this can help, I would be happy.