Page 1 of 1
Work Server 155.247.166.219 is down
Posted: Thu Dec 06, 2018 1:39 am
by stickman
According to the server stats page, 155.247.166.219, vav3.ocis.temple.edu, has been down since December 1st.
Can't reach it via the browser nor ping from several networks. Client gets the errors below:
01:40:46:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
01:41:07:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
01:41:28:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
I looked at the troubleshooting page, but I didn't see a way to force my client to upload to a different server. Am I missing something?
Re: Work Server 155.247.166.219 is down
Posted: Thu Dec 06, 2018 11:01 am
by bollix47
Thank you for your report.
When a WS is down and cannot accept returns the system should switch to it's CS which in this case is 155.247.166.220. Did the WU not return to that server or is it still attempting to return the WU to the WS (.219)? If it did send it to the CS, the system is working as it should.
I have reported the outage but one of the people @ Temple had already let some of us know that the server was Down a few days ago so perhaps, because of your report, we'll get an update.
Re: Work Server 155.247.166.219 is down
Posted: Thu Dec 06, 2018 2:24 pm
by stickman
I turned up my log verbosity and let the system run again overnight. It looks like it is attempting to connect to the collection server at 155.247.166.220, but is failing there as well.
05:23:57:WU00:FS00:Connecting to 155.247.166.219:80
05:24:18:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
05:52:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14123 run:37 clone:21 gen:11 core:0xa7 unit:0x0000000d0002894c5bc4bfdf229c40b7
05:52:38:WU00:FS00:Uploading 129.30MiB to 155.247.166.220
05:52:38:WU00:FS00:Connecting to 155.247.166.220:8080
05:52:38:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
05:52:38:WU00:FS00:Trying to send results to collection server
05:52:38:WU00:FS00:Uploading 129.30MiB to 155.247.166.219
05:52:38:WU00:FS00:Connecting to 155.247.166.219:8080
05:52:59:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
05:52:59:WU00:FS00:Connecting to 155.247.166.219:80
05:53:20:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
06:39:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14123 run:37 clone:21 gen:11 core:0xa7 unit:0x0000000d0002894c5bc4bfdf229c40b7
06:39:37:WU00:FS00:Uploading 129.30MiB to 155.247.166.220
06:39:37:WU00:FS00:Connecting to 155.247.166.220:8080
06:39:37:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
06:39:37:WU00:FS00:Trying to send results to collection server
06:39:37:WU00:FS00:Uploading 129.30MiB to 155.247.166.219
06:39:37:WU00:FS00:Connecting to 155.247.166.219:8080
06:39:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:39:58:WU00:FS00:Connecting to 155.247.166.219:80
06:40:19:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2018-12-06 *******************************
07:55:38:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14123 run:37 clone:21 gen:11 core:0xa7 unit:0x0000000d0002894c5bc4bfdf229c40b7
07:55:38:WU00:FS00:Uploading 129.30MiB to 155.247.166.220
07:55:38:WU00:FS00:Connecting to 155.247.166.220:8080
07:55:38:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
07:55:38:WU00:FS00:Trying to send results to collection server
07:55:38:WU00:FS00:Uploading 129.30MiB to 155.247.166.219
07:55:38:WU00:FS00:Connecting to 155.247.166.219:8080
07:55:59:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:55:59:WU00:FS00:Connecting to 155.247.166.219:80
07:56:20:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
09:58:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14123 run:37 clone:21 gen:11 core:0xa7 unit:0x0000000d0002894c5bc4bfdf229c40b7
09:58:37:WU00:FS00:Uploading 129.30MiB to 155.247.166.220
09:58:37:WU00:FS00:Connecting to 155.247.166.220:8080
09:58:37:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
09:58:37:WU00:FS00:Trying to send results to collection server
09:58:37:WU00:FS00:Uploading 129.30MiB to 155.247.166.219
09:58:37:WU00:FS00:Connecting to 155.247.166.219:8080
09:58:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
09:58:58:WU00:FS00:Connecting to 155.247.166.219:80
09:59:20:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
13:17:37:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14123 run:37 clone:21 gen:11 core:0xa7 unit:0x0000000d0002894c5bc4bfdf229c40b7
13:17:37:WU00:FS00:Uploading 129.30MiB to 155.247.166.220
13:17:37:WU00:FS00:Connecting to 155.247.166.220:8080
13:17:38:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
13:17:38:WU00:FS00:Trying to send results to collection server
13:17:38:WU00:FS00:Uploading 129.30MiB to 155.247.166.219
13:17:38:WU00:FS00:Connecting to 155.247.166.219:8080
13:17:59:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
13:17:59:WU00:FS00:Connecting to 155.247.166.219:80
13:18:20:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Re: Work Server 155.247.166.219 is down
Posted: Thu Dec 06, 2018 3:46 pm
by bollix47
Okay, the problem is that you were unable to return your results to .220 which in your case is the WS. That WS uses .219 as a CS. The real question is "Why can't your WU be returned to .220?" I can open 155.247.166.220 in my browser and it does display the correct info for that server ... could you try to connect via your browser to see if you get the logo page?
Re: Work Server 155.247.166.219 is down
Posted: Thu Dec 06, 2018 7:39 pm
by stickman
I am also able to get to the .220 address via the browser. I ran a packet capture and I am getting a 400 bad request response from the .220 server. Prior to that, the capture looks normal.
1215 32.602211 155.247.166.220 192.168.1.201 HTTP 258 HTTP/1.1 400 Bad Request (text/html)
Re: Work Server 155.247.166.219 is down
Posted: Fri Dec 07, 2018 8:29 pm
by vvoelz
Here's an update: Our FAH work server (vav3, 155.247.166.219) is down with a bad RAID controller, and likely so for the next few weeks while we order a new one and replace it. A noted by others, this server acted as a collection server for our other server (vav4, 155.247.166.220). We will get another server to collect for vav4, so this error can be avoided in the future!
Vince
Re: Work Server 155.247.166.219 is down
Posted: Wed Dec 12, 2018 5:10 am
by stickman
Thank you for the update. I will check back in a few weeks if I'm still unable to upload the WU.