Page 3 of 3
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Thu Apr 17, 2014 4:52 pm
by billford
7im wrote:No one wants the research done more quickly than the researcher, but they have real world expectations. They don't sit in front of the server all day long.
I wonder how many cpu-hours of computation were lost while clients waited to time out on a non-responding server… all for the sake of a few minutes taken to change assignment priorities.
In such rhetorical questions lies the essence of resource management.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Thu Apr 17, 2014 5:32 pm
by 7im
billford wrote:7im wrote:No one wants the research done more quickly than the researcher, but they have real world expectations. They don't sit in front of the server all day long.
I wonder how many cpu-hours of computation were lost while clients waited to time out on a non-responding server… all for the sake of a few minutes taken to change assignment priorities.
In such rhetorical questions lies the essence of resource management.
Not as many as if they had made a mistake while changing the Assignment Server logic two times. Some things you just don't mess with unless you have to. Better a few hours than all of them. This has been explained several times by PG over the years. It bears repeating for the newer donors.
FAH runs a big ship with a very small crew, and the ship doesn't turn on a dime. Small course corrections are always better than big turns. They don't want to break all the dishes in the dining galley. And for what, the equivalent of a whale fart? Steady as she goes...
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Thu Apr 17, 2014 6:59 pm
by pdbuzz
It looks like they finally went, and the estimated credit makes me sad!
Approx 620 - 650 points per box. Oof! I sure hope that isn't all. Oh well, at least they're in.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Thu Apr 17, 2014 8:33 pm
by 7im
Sorry to say, but the fine print on the Points FAQ bonus section clearly states that bonuses are never guaranteed. PG makes every effort to make sure bonuses are tracked and awarded, by having collection servers and other redundancies, but sometimes you get what you get.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 5:07 am
by HendricksSA
Thanks for getting the CS to accept the completed work units. I have not seen any update recently for the WS but it was still not handing out work as recently as 2100Z. Server stat page shows it is up but CPU load is 99%. I could not reach it via browser. I know PG is aware and I'm just posting FYI.
Code: Select all
21:00:06:WU00:FS00:Assigned to work server 155.247.166.219
21:00:06:WU00:FS00:Requesting new work unit for slot 00: READY cpu:6 from 155.247.166.219
21:00:06:WU00:FS00:Connecting to 155.247.166.219:8080
[93m21:02:14:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80[0m
21:02:14:WU00:FS00:Connecting to 155.247.166.219:80
[91m21:02:15:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: Connection refused[0m
21:18:02:WU00:FS00:Connecting to 171.67.108.200:8080
21:18:03:WU00:FS00:Assigned to work server 155.247.166.219
21:18:03:WU00:FS00:Requesting new work unit for slot 00: READY cpu:6 from 155.247.166.219
21:18:03:WU00:FS00:Connecting to 155.247.166.219:8080
[91m21:22:21:ERROR:WU00:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0[0m
21:47:10:WU00:FS00:Connecting to 171.67.108.200:8080
21:47:11:WU00:FS00:Assigned to work server 155.247.166.219
21:47:11:WU00:FS00:Requesting new work unit for slot 00: READY cpu:6 from 155.247.166.219
21:47:11:WU00:FS00:Connecting to 155.247.166.219:8080
[91m21:51:26:ERROR:WU00:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0[0m
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 8:58 am
by billford
Same here, from ~0920Z (and counting).
I suppose we're stuck with this until Easter is over
edit- it finally picked up a WU about 5 minutes after this post.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 12:41 pm
by Penfold
So is it merely a case now of a backlog of completed WUs being picked up by CS 155.247.166.220 ? Is there an orderly queueing arrangement?
I couldn't care less about points, and there do seem to be eight days before 'Expiration: 2014-04-26T04:20:38Z', so I'm just curious to know.
As for Easter, Bill, you know what the traffic's like at Easter.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 3:40 pm
by davidcoton
Penfold wrote:
As for Easter, Bill, you know what the traffic's like at Easter.
Do you have traffic in Scotland??
David
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 6:19 pm
by Penfold
davidcoton wrote:Do you have traffic in Scotland??
David
Hah! Yes - English tourists this weekend!
But seriously, and as I previously asked,
Penfold wrote:So is it merely a case now of a backlog of completed WUs being picked up by CS 155.247.166.220 ? Is there an orderly queueing arrangement?
I couldn't care less about points, and there do seem to be eight days before 'Expiration: 2014-04-26T04:20:38Z', so I'm just curious to know.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 6:36 pm
by billford
Penfold wrote:So is it merely a case now of a backlog of completed WUs being picked up by CS 155.247.166.220 ? Is there an orderly queueing arrangement?
I couldn't care less about points, and there do seem to be eight days before 'Expiration: 2014-04-26T04:20:38Z', so I'm just curious to know.
There's no "orderly queue" as such- when a client can't upload a WU it waits for a while then tries again. If it fails it waits for a longer while then tries again. It keeps this up, with increasing delays between attempts, until the upload is successful or the expiration time passes when it deletes the WU.
The idea is that a brief problem won't impact QRB very much, but a longer outage would mean a large number of clients attempting to upload more or less simultaneously when it came back on line and overloading the server. The increasing delay spreads the load.
On FAHControl you can see what it's up to on the status tab- highlight the Work Queue item showing 100% then under the progress bar and other information you'll see "Waiting On", "Attempts" and "Next Attempt", the first two are obvious, the last tells you how long before the next attempt.
The same happens when it can't get a WU, you just highlight the appropriate Work Queue item depending on the problem.
I'd have thought that the backlog would have been cleared by now (all mine are long gone), but I can't be sure.
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 6:49 pm
by Penfold
OK, thanks Bill.
And … just as I was going to take a butcher's at the info you described … it went!
What did you do?
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 6:58 pm
by billford
Penfold wrote:OK, thanks Bill.
And … just as I was going to take a butcher's at the info you described … it went!
What did you do?
I'd love to take the credit, but even my conscience won't let me
Sheer luck or Sod's Law, take your pick
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 10:15 pm
by vvoelz
Hi everybody -- just thought I'd give an update an the situation. The good news is that the collection server (155.247.166.220) continues to be working fine, so hopefully you at least got your WUs accepted. The bad news is that we found out the work server (155.247.166.219) had a bad RAID controller, so we had to shut it down until we can order a replacement. To all the donors, I want to thank you all for your continued efforts -- we really could not do this work without your contributions. Thanks again --Vince
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 10:27 pm
by billford
Thanks for the update Vince- not the best of news
Re: Rejecting from WS 155.247.166.219 [also CS .166.220]
Posted: Fri Apr 18, 2014 11:21 pm
by folding_hoomer
But - better to get bad news than none . . .
Thanks, Vince.