Page 1 of 2

171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 6:29 pm
by DrSpalding
I have three classic clients on two different networks that cannot upload finished WUs nor connect to it in order to pick up a new WU. Server status says that it is up but these three clients have been in limbo for about 90-120 minutes as of 11:30 PDT.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 6:44 pm
by Hyperlife
Same here. Could someone take a look at the server?

Code: Select all

18:41:15:Unit 02: Uploading 438.49KiB
18:41:15:Connecting to 171.64.65.62:8080
18:41:15:Sending unit results: id:01 state:SEND project:6508 run:1 clone:215 gen:93 core:0x78 unit:0x0356bdba4d8916a1005d00d70001196c
18:41:15:WARNING: Exception: Failed to send results to work server: Upload failed

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 6:46 pm
by 7im
Could someone a look at Server Status page? ;)

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 6:50 pm
by Hyperlife
7im wrote:Could someone a look at Server Status page? ;)
I guess you didn't read DrSpalding's post carefully enough:
DrSpalding wrote:Server status says that it is up
:roll:

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 6:54 pm
by 7im
Look again. CPU load is at 4+. ;)
CPULOAD tells how many processes are running (in the past 1, 5, and 15 minutes). When this number gets above 2-3, the server is probably heavily loaded.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 6:56 pm
by Hyperlife
7im wrote:Look again. CPU load is at 4+. ;)
So? That's not excessive. For example, 171.67.108.20 has a CPU load of 4.59 right now and is working fine -- I've been able to upload and download WUs to it with no hiccups.

I suspect that 2-3 load warning is rather ancient. I've never had problems getting/sending WUs from servers in that load range. It's not even colored yellow!

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 7:00 pm
by 7im
See the whole picture.

Assignment percentage on 171.67.108.20 is nothing. It's 41% on 171.64.65.62. 171.64.65.62 is getting almost half of all CPU clients routed to it. Downloads also affect server availability.

However, someone may want to nugde that AS% down a tad so that completed WUs can be returned more quickly.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 7:20 pm
by Hyperlife
7im wrote:However, someone may want to nugde that AS% down a tad so that completed WUs can be returned more quickly.
Glad you agree that something should be done on the PG end. That wasn't so hard to admit, now was it?

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 7:23 pm
by DrSpalding
The server is now (well, 11:35 PDT) at 60% of WUs assigned to it. It is fairly obvious that it is neither handing out WUs nor accepting them or doing so at a rate that is wholly inadequate for the AS to keep assigning to it.

Edit: 12:40 PDT. It looks like many other classic WU servers are down and that may be why this server is trying to field so many requests now. The NetLoad is high: 96 and the only higher one is a PS3 WU server. In addition, one of the AS machines (vsp10v-vz00) is running super loaded with a netload of 150.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 8:37 pm
by VijayPande
Thanks for the feedback. We're looking into it right now.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 8:54 pm
by John_Weatherman
I've managed to upload and download now two clients, so thanks to the person who kicked the server :)

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 9:42 pm
by DrSpalding
Thanks Dr. Pande. Two out of three of my clients that were stuck have uploaded and retrieved new WUs successfully. I'm waiting on the third to wake up and try again on its own schedule.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 11:14 pm
by VijayPande
That's good to hear. The server is looking loaded but well behaved right now and the load has been working its way down over the last few hours. It's now below the level where we would expect there to be any major problems, although it is a heavy load.

Re: 171.64.65.62 (vspg10c) down?

Posted: Sun Apr 03, 2011 11:17 pm
by 7im
Hyperlife wrote:
7im wrote:However, someone may want to nugde that AS% down a tad so that completed WUs can be returned more quickly.
Glad you agree that something should be done on the PG end. That wasn't so hard to admit, now was it?
Well, since you didn't include a smilie, I have to assume you were serious about what you said. So here is the serious response.

Agree something "should" be done? On a normal day, that statement is a sarcastic way of saying I've already called in the calvary to take a look at the problem. I actually DO stuff around here. :roll: But in this case, PG noticed the problem without my help.

Re: 171.64.65.62 (vspg10c) down?

Posted: Mon Apr 04, 2011 1:18 am
by bruce
I'm currently tracking a problem that appears to be only in V7, where a WU that has certain types of EUEs is uploaded for partial credit in V6 and fails to upload at all in V7. Does that describe what you're seeing?

Please find the end of processing of the WU in question and the FIRST attempt to upload. Post that much of the log here. I don't have enough data for it to be conclusive yet.

I had not discovered a problem with getting a new WU yet, but there might be a tie-in and might not be.