Page 1 of 1

171.67.108.24 Down

Posted: Mon May 18, 2009 2:27 pm
by Flathead74
171.67.108.24
Status - Down

Code: Select all

[13:40:00] + Attempting to send results [May 18 13:40:00 UTC]
[13:40:00] - Reading file work/wuresults_00.dat from core
[13:40:00]   (Read 52710315 bytes from disk)
[13:40:00] Connecting to http://171.67.108.24:8080/
[13:43:09] - Couldn't send HTTP request to server
[13:43:09] + Could not connect to Work Server (results)
[13:43:09]     (171.67.108.24:8080)
[13:43:09] + Retrying using alternative port
[13:43:09] Connecting to http://171.67.108.24:80/
[13:46:18] - Couldn't send HTTP request to server
[13:46:18] + Could not connect to Work Server (results)
[13:46:18]     (171.67.108.24:80)
[13:46:18] - Error: Could not transmit unit 00 (completed May 18) to work server.
Thank you.

Re: 171.67.108.24 Down

Posted: Mon May 18, 2009 2:36 pm
by bruce
It has only been down a couple hours and it's still early in the morning at Stanford. Normally I'd give them a bit longer than that to take care of it.

Re: 171.67.108.24 Down

Posted: Mon May 18, 2009 3:26 pm
by kasson
The entire machine seems down. I've emailed our networking people, but they're working on another project this morning (UPS maintenance). We'll get it up as soon as we can...

Re: 171.67.108.24 Down

Posted: Mon May 18, 2009 4:51 pm
by Hazzard
ja its down i cant get new work units.

Re: 171.67.108.24 Down

Posted: Mon May 18, 2009 7:50 pm
by bruce
Hazzard wrote:ja its down i cant get new work units.
The portion of FAHlog that you posted does not show any attempts to get new WUs, it only shows that you were unable to upload the results immediately. The client is designed to deal with the condition where you are unable to upload by holding that WU in queue for later upload and downloading a new WU from a different server.

Re: 171.67.108.24 Not Assigning WUs - Problem or P.M.?

Posted: Tue May 26, 2009 5:02 am
by 314159
Rather than starting a new topic on this server, I decided to use an existing one.

Question:

Should one be concerned that this server has not assigned new WUs to 7 fast Linux Quads that completed between 4:02PM STANFORD TIME and 8:57PM STANFORD TIME? (make that 8 with the last at 9:50PM PDT)

It is pingable and IS accepting and acknowledging completed WUs properly.

It is just not assigning and the Quads are being served by xx.56, which is Ok for now but may present problems for me over the next 3 1/2 hours when 12 more Quads complete (if xx.56 goes down or 503's).

The only thing that sticks out on the serverstats page is that darned yellow zero in the "S" column and I now believe that I know what that means. :ewink:

Thanks!

Re: 171.67.108.24 Down

Posted: Tue May 26, 2009 5:56 am
by 314159
As one said on the old SNL show:

"Never Mind" :ewink:

It's back assigning and I am sorry to have bothered you folks needlessly! :!:

Re: 171.67.108.24 Down

Posted: Tue May 26, 2009 6:11 am
by kasson
No worries--thanks for the heads-up anyway.

Re: 171.67.108.24 Down

Posted: Tue Jun 30, 2009 6:07 am
by 314159
This server is in REJECT mode again.

Can we restart the binary once again, Please?

Thank you!

Question: I had one WU sent to the CS yesterday and another one just a few minutes ago (others are queued due to the CS's inability to handle the load).
Are we certain that those handled by the CS (xx.25) are being properly credited? I seem to be missing some production.

Re: 171.67.108.24 Down

Posted: Tue Jun 30, 2009 8:04 am
by kasson
The server should be back up and running now. Sorry for the downtime. Units that are sent to the CS aren't credited until they are received and processed by the work server, so one wouldn't expect them to register while the server is down. Drop me a PM if they don't show up today.

Re: 171.67.108.24 Down

Posted: Tue Jun 30, 2009 8:48 am
by 314159
Thank you very much!

I have to congratulate you for the general stability of this server and xx.56. (and the current WEIGHT parameters) :ewink:
I have no complaints since my machines continue to receive the class of WUs that "utilize them properly".

The recent stats run indicates that the three sent to the CS between about 11 PM and 11:30 PM (your time) have been credited.
Given that, I expect that the remaining ones sitting on the CS will show up on the next stats update given their times of submission.
I have never had problems with queued WUs and expect those to "arrive at your home" via autosend while you are asleep.

You do sleep, don't you? :ewink:

Since there was only one in question yesterday (Monday), I do not plan on bothering you with it but appreciate your gracious offer.

BTW, "DL"=0 on this server so enjoy moving some data today. 8-)

Thanks again!

Re: 171.67.108.24 Down

Posted: Tue Jun 30, 2009 9:12 am
by kasson
Yes, the DL=0 was the cause of the downtime. I started shifting some data around, hopped on a plane to Europe, and got off this morning to find out that the transfer filled up one of the RAID arrays. Everything looks stable now, at least for the time being.

Re: 171.67.108.24 Down

Posted: Mon Jul 27, 2009 12:45 am
by 314159
SERVER IS DOWN.... 545PM PDT 26 July.

Re: 171.67.108.24 Down

Posted: Mon Jul 27, 2009 1:41 am
by kasson
Yes, we've put in a request to our support staff (this isn't a machine I have physical access to)

Re: 171.67.108.24 Down

Posted: Mon Jul 27, 2009 5:52 am
by susato
Thanks - much appreciated.