Send Errors - 155.247.164.213 & .214

Moderators: Site Moderators, FAHC Science Team

Post Reply
Klutz
Posts: 7
Joined: Tue Mar 17, 2020 10:35 am

Re: Send Errors - 155.247.164.213 & .214

Post by Klutz »

As new user, where do I read about status of the server? A link would be very helpful. I have found https://apps.foldingathome.org/serverstats , but cannot find any status (or unable to understand it).
You're on the right page. Find the entry for 155.247.164.214, then hover your cursor over the column "Has CS" (where it says "Yes" in green). A small popup will appear to display the errors that plague this server. Being unable to connect to 155.247.164.213 is one of them.
roffvald
Posts: 3
Joined: Mon Mar 16, 2020 10:44 am

Re: Send Errors - 155.247.164.213 & .214

Post by roffvald »

Same issue here with .213 and .214

11:53:17:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1380 gen:0 core:0x22 unit:0x000000059bf7a4d55e6d771272f2b9e7
11:53:17:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
11:53:17:WU00:FS01:Connecting to 155.247.164.213:8080
11:53:17:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
11:53:17:WU00:FS01:Trying to send results to collection server
11:53:17:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
11:53:17:WU00:FS01:Connecting to 155.247.164.214:8080
11:53:19:ERROR:WU00:FS01:Exception: Transfer failed
Darth_Peter_dualxeon
Posts: 46
Joined: Fri Mar 20, 2020 3:13 am
Hardware configuration: EVGA SR-2 motherboard
2x Xeon x5670 CPU
64 GB ECC DDR3
Nvidia RTX 2070

Re: Send Errors - 155.247.164.213 & .214

Post by Darth_Peter_dualxeon »

At https://apps.foldingathome.org/serverstats , what does "Has CS" acronym mean? Why are "public" and"beta" jobs the same number? What's the difference?
by the way on the page, both these servers have 0 error and >90TiB space (is that the free space on the hard disks where the results get uploaded?)
And I can ping them. So then how I cannot upload result?

Sometimes I was seeing that there are jobs on the server and then did not receive work unit.... is this the total number of jobs, including the ones that are already assigned to someone else?
Qwarkman
Posts: 3
Joined: Wed Mar 18, 2020 12:54 pm
Hardware configuration: System 1: AMD Ryzen 5 3600, 32GB DDR4 3600, 2x nVidia GTX 980Ti 6GB
System 2: Intel i7 2600k, 16GB DDR3, nVidia GTX 970

Re: Send Errors - 155.247.164.213 & .214

Post by Qwarkman »

I now have a third WU with the exact same problem. The first two has timed out and are now worthless. I expect nothing different for this one.
Either shut the servers down or fix them, letting them assign WU's is a serious waste of resources if they keep dishing out WU's with no ability to collect results.
I'm considering shutting down my GPU folding because for now it's a waste of time and money unless I babysit the work queue and remove all WU's assigned from these servers.
vangli
Posts: 12
Joined: Thu Mar 19, 2020 10:35 am

Re: Send Errors - 155.247.164.213 & .214

Post by vangli »

I did see that hey tried to restart the 214 server twice. No effect. Not knowing the F&H software, do the server communicate through an encrypted channel with sertificates, and has this changed? It seems that some more servers has failures connecting to 213. Anyway, agree with Qwarkman, stop sending WU's that can't be collected. Waste of time and computing power.
Regards
Bent Vangli, Oslo, Norway
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Send Errors - 155.247.164.213 & .214

Post by Jesse_V »

Darth_Peter_dualxeon wrote:At https://apps.foldingathome.org/serverstats , what does "Has CS" acronym mean? Why are "public" and"beta" jobs the same number? What's the difference?
by the way on the page, both these servers have 0 error and >90TiB space (is that the free space on the hard disks where the results get uploaded?)
And I can ping them. So then how I cannot upload result?

Sometimes I was seeing that there are jobs on the server and then did not receive work unit.... is this the total number of jobs, including the ones that are already assigned to someone else?
CS means Collection Server. It's where the workunits go after they are completed. There are new projects in "beta" because they might be unstable or cause errors, so the people opting into beta have stable hardware and watch the log a little more closely. That way the teams can fix any issues before pushing them to the larger "public" group.

I'm not sure why you can't upload the results, but they are currently working on the servers to keep up with the overwhelming demand, so hopefully that gets fixed soon.
Qwarkman wrote:I now have a third WU with the exact same problem. The first two has timed out and are now worthless. I expect nothing different for this one.
Either shut the servers down or fix them, letting them assign WU's is a serious waste of resources if they keep dishing out WU's with no ability to collect results.
I'm considering shutting down my GPU folding because for now it's a waste of time and money unless I babysit the work queue and remove all WU's assigned from these servers.
I get it. The WUs should upload in time once they fix some of the errors with the server. There's just so much demand at the moment. I'd recommend just keeping everything running as I expect it to clear up soon.
vangli wrote:I did see that hey tried to restart the 214 server twice. No effect. Not knowing the F&H software, do the server communicate through an encrypted channel with sertificates, and has this changed? It seems that some more servers has failures connecting to 213. Anyway, agree with Qwarkman, stop sending WU's that can't be collected. Waste of time and computing power.
The clients talk to the server over HTTP. There is a hash signature and sanity checks to ensure that the workunits are validated. Technically, it's an HTTP request to a raw IP address. The servers just may be overwhelmed at the moment. In 10 years I've never seen this many people jumping into the network at once and it's causing substantial load on the servers.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
Joe_H
Site Admin
Posts: 7927
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Send Errors - 155.247.164.213 & .214

Post by Joe_H »

Klutz wrote:
Moderators here have been in contact with the people running the server, more than once. They are aware and working on it among the other issues.
It would be helpful to get an update from someone who actually has oversight over these two servers. The server status page has clearly stated the exact nature of this error for a number of days, yet we've seen no indication that any attempt has been made to rectify it. This is a waste of valuable resources and goodwill.
Some of the folding team supporting the project servers are posting on Twitter and Facebook.

As for the Server Status page showing anything about the "exact nature of this problem", where is that and what are you interpreting to be that? About the only thing I can see is that they have greatly reduced the rate of assignments from those two servers while they try to sort this out.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Klutz
Posts: 7
Joined: Tue Mar 17, 2020 10:35 am

Re: Send Errors - 155.247.164.213 & .214

Post by Klutz »

As for the Server Status page showing anything about the "exact nature of this problem", where is that and what are you interpreting to be that?
I already answered that above: viewtopic.php?f=18&t=32492&start=60#p315744
TitanXp
Posts: 15
Joined: Tue Apr 11, 2017 11:44 pm

Re: Send Errors - 155.247.164.213 & .214

Post by TitanXp »

Still down? These 2 servers are giving me the same error as OP.
Project 11777
sixty4bitdiablo
Posts: 1
Joined: Sat Mar 21, 2020 5:55 am

Re: Send Errors - 155.247.164.213 & .214

Post by sixty4bitdiablo »

Same issues here.

05:50:39:WU01:FS01:Trying to send results to collection server
05:50:39:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
05:50:39:WU01:FS01:Connecting to 155.247.164.214:8080
05:50:39:ERROR:WU01:FS01:Exception: Transfer failed

Any ideas? It's been like this for a day or so now.
vangli
Posts: 12
Joined: Thu Mar 19, 2020 10:35 am

Re: Send Errors - 155.247.164.213 & .214

Post by vangli »

In fact several days. I have 3 WUs waiting for upload, the first one with 80, yes eighty, retries. I accept that technical problems occure. However, total lack of information from the administrators and the fact that new WUs are stil sent, which cannot be collected, is bad.
Regards
Bent Vangli, Oslo, Norway
jima13
Posts: 29
Joined: Fri Dec 07, 2007 5:27 am
Location: La Grande, OR

Re: Send Errors - 155.247.164.213 & .214

Post by jima13 »

At this point I'd just be piling on, but what the heck>

Code: Select all

 *********************** Log Started 2020-03-21T02:17:20Z ***********************
02:17:22:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:17:23:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:17:23:WU03:FS02:Connecting to 155.247.164.213:8080
02:19:50:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:19:50:WU03:FS02:Trying to send results to collection server
02:19:50:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:19:50:WU03:FS02:Connecting to 155.247.164.214:8080
02:19:51:ERROR:WU03:FS02:Exception: Transfer failed
02:19:51:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:19:52:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:19:52:WU03:FS02:Connecting to 155.247.164.213:8080
02:19:53:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:19:53:WU03:FS02:Trying to send results to collection server
02:19:53:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:19:53:WU03:FS02:Connecting to 155.247.164.214:8080
02:19:55:ERROR:WU03:FS02:Exception: Transfer failed
02:20:51:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:20:51:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:20:51:WU03:FS02:Connecting to 155.247.164.213:8080
02:20:52:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:20:52:WU03:FS02:Trying to send results to collection server
02:20:52:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:20:52:WU03:FS02:Connecting to 155.247.164.214:8080
02:20:53:ERROR:WU03:FS02:Exception: Transfer failed
02:22:28:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:22:29:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:22:29:WU03:FS02:Connecting to 155.247.164.213:8080
02:22:29:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:22:29:WU03:FS02:Trying to send results to collection server
02:22:29:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:22:29:WU03:FS02:Connecting to 155.247.164.214:8080
02:22:29:ERROR:WU03:FS02:Exception: Transfer failed
02:25:06:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:25:06:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:25:06:WU03:FS02:Connecting to 155.247.164.213:8080
02:25:06:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:25:06:WU03:FS02:Trying to send results to collection server
02:25:06:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:25:06:WU03:FS02:Connecting to 155.247.164.214:8080
02:25:06:ERROR:WU03:FS02:Exception: Transfer failed
02:29:20:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:29:20:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:29:20:WU03:FS02:Connecting to 155.247.164.213:8080
02:29:21:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:29:21:WU03:FS02:Trying to send results to collection server
02:29:21:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:29:21:WU03:FS02:Connecting to 155.247.164.214:8080
02:29:21:ERROR:WU03:FS02:Exception: Transfer failed
02:36:11:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:36:11:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:36:11:WU03:FS02:Connecting to 155.247.164.213:8080
02:36:12:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:36:12:WU03:FS02:Trying to send results to collection server
02:36:12:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:36:12:WU03:FS02:Connecting to 155.247.164.214:8080
02:36:12:ERROR:WU03:FS02:Exception: Transfer failed
02:47:17:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:47:17:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:47:17:WU03:FS02:Connecting to 155.247.164.213:8080
02:47:17:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:47:17:WU03:FS02:Trying to send results to collection server
02:47:17:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:47:17:WU03:FS02:Connecting to 155.247.164.214:8080
02:47:18:ERROR:WU03:FS02:Exception: Transfer failed
03:05:14:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
03:05:14:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
03:05:14:WU03:FS02:Connecting to 155.247.164.213:8080
03:05:15:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
03:05:15:WU03:FS02:Trying to send results to collection server
03:05:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
03:05:15:WU03:FS02:Connecting to 155.247.164.214:8080
03:05:15:ERROR:WU03:FS02:Exception: Transfer failed
03:34:16:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
03:34:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
03:34:16:WU03:FS02:Connecting to 155.247.164.213:8080
03:34:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
03:34:16:WU03:FS02:Trying to send results to collection server
03:34:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
03:34:16:WU03:FS02:Connecting to 155.247.164.214:8080
03:34:17:ERROR:WU03:FS02:Exception: Transfer failed
04:21:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
04:21:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
04:21:15:WU03:FS02:Connecting to 155.247.164.213:8080
04:21:15:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
04:21:15:WU03:FS02:Trying to send results to collection server
04:21:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
04:21:15:WU03:FS02:Connecting to 155.247.164.214:8080
04:21:16:ERROR:WU03:FS02:Exception: Transfer failed
05:37:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
05:37:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
05:37:16:WU03:FS02:Connecting to 155.247.164.213:8080
05:37:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
05:37:16:WU03:FS02:Trying to send results to collection server
05:37:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
05:37:16:WU03:FS02:Connecting to 155.247.164.214:8080
05:37:16:ERROR:WU03:FS02:Exception: Transfer failed
07:40:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
07:40:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
07:40:15:WU03:FS02:Connecting to 155.247.164.213:8080
07:40:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
07:40:16:WU03:FS02:Trying to send results to collection server
07:40:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
07:40:16:WU03:FS02:Connecting to 155.247.164.214:8080
07:40:17:ERROR:WU03:FS02:Exception: Transfer failed
What really bugs me is the time between tries keeps expanding, so after 12 tries the next try is in 3 hours. Is there a reason for this, or can it be coded down to an hour or less?
Image
Whittle
Posts: 5
Joined: Sun Mar 15, 2020 3:33 pm
Hardware configuration: Intel i7-4790K @ stock 4.00Ghz, 24GB RAM, NVIDIA GeForce GTX 1080

Re: Send Errors - 155.247.164.213 & .214

Post by Whittle »

What happens for me is it tries to upload to 155.247.164.213:

Code: Select all

07:59:09:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:41 gen:0 core:0x22 unit:0x000000079bf7a4d55e6d770eee5f6026
07:59:09:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
07:59:09:WU01:FS01:Connecting to 155.247.164.213:8080
07:59:11:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
In a Wireshark capture I see the following response from 155.247.164.213, before it closes the connection with a TCP reset:

Code: Select all

HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE
When I follow the TCP stream for 155.247.164.213 in Wireshark it looks to be sending the results and then the server cuts it off part way through:

Code: Select all

	   [...]
		<Position x="1.8908989287272897" y="-2.547739371460885" z="5.77243400391751"/>
		<Position x="1.8994014240536208" y="-2.618247308183257" z="5.674706601713439"/>
		<Position x="2.0814601488350846" y="-2.46088035887HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE
Content-Type: text/html
Connection: close

<html><head><title>413 HTTP_REQUEST_ENTITY_TOO_LARGE</title></head><body><h1>413 HTTP_REQUEST_ENTITY_TOO_LARGE</h1></body></html>
It then immediately retries the upload, this time to 155.247.164.214 but I just a TCP reset in Wireshark which shows as "Transfer failed" in the logs:

Code: Select all

07:59:11:WU01:FS01:Trying to send results to collection server
07:59:11:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
07:59:11:WU01:FS01:Connecting to 155.247.164.214:8080
07:59:12:ERROR:WU01:FS01:Exception: Transfer failed
Sometimes I also get that "HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE" error in Wireshark for 155.247.164.214 too.

Hope that's of some help.
Last edited by Whittle on Sat Mar 21, 2020 8:24 am, edited 1 time in total.
jonault
Posts: 215
Joined: Fri Dec 14, 2007 9:53 pm

Re: Send Errors - 155.247.164.213 & .214

Post by jonault »

jima13 wrote:What really bugs me is the time between tries keeps expanding, so after 12 tries the next try is in 3 hours. Is there a reason for this, or can it be coded down to an hour or less?
If the reason the client can't get a connection is because too many clients are trying to talk to the server, then having the failed clients slow down their attempts to connect eases the burden on the server so they can start getting through. If they didn't slow down, they could wind up effectively DDOSing the server. I doubt that they'll want to change that behavior.
Image
vangli
Posts: 12
Joined: Thu Mar 19, 2020 10:35 am

Re: Send Errors - 155.247.164.213 & .214

Post by vangli »

Found just the same as Whittle . Transmission ends with HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE analyzing with wireshark. One possibility seems to be that the collector isn't able to receive a chunk of 55 Mbyte . Not very uncommon in a web server setup. Could it be that a reconfiguration to allow chunks of 64 Mbyte or something like that will help? Wireshark analyze show that a connection is established, but then disconnected a part out in transmission

Code: Select all

4769	1213.944078814	192.168.123.21	155.247.164.214	TCP	66	44488 → 8080 [ACK] Seq=15948 Ack=220 Win=30336 Len=0 TSval=102210099 TSecr=3240794470
4770	1213.944189552	155.247.164.214	192.168.123.21	HTTP	66	HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE  (text/html)
Editet:

I have done some further analyzes. After connection to server, the following packet are sent to the collecting server:

Code: Select all

119	36.152390113	192.168.123.21	155.247.164.213	TCP	173	57632 → 8080 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=107 TSval=1078346373 TSecr=3159245236 [TCP segment of a reassembled PDU]
The payload of this packet is:

Code: Select all

POST http://155.247.164.213/ HTTP/1.0
Content-Length: 57925632
Content-Type: application/octet-stream
As you can see it tells the size of the transmission, 57925632 bytes. After this the connection collapses after exchanging some more packets, ending up with HTTP_REQUEST_ENTITY_TOO_LARGE. The real transmission seems never to start. If this can help, I would be happy.
Last edited by vangli on Sat Mar 21, 2020 7:30 pm, edited 1 time in total.
Regards
Bent Vangli, Oslo, Norway
Post Reply