Page 1 of 1
171.67.108.25 / 171.67.108.11
Posted: Sat Jul 11, 2009 2:21 pm
by ArVee
For a few hours now I've had a completed GPU WU alternately attempting to transmit to the above servers and failing, showing a 503 on the work server and a generic fail to connect message on the collection server. The work server shows as Down on the Server Status page, but the Collection Server shows as ok. A client restart at this end hasn't helped. Work continues on new units in the interim, but is there anything I can do in the meantime other than hope?
Re: 171.67.108.25 / 171.67.108.11
Posted: Sat Jul 11, 2009 7:38 pm
by ArVee
It finally matched up with the Collection Server (108.25) and sent.
Re: 171.67.108.25 / 171.67.108.11
Posted: Thu Jul 16, 2009 3:42 pm
by Nathan_P
Looks like there may be a problem here - 108.11 is in reject mode and 108.25 is taking ages to collect wu's. I am getting work OK so its not holding up the science from that end but i am getting worried about the back log on my GPU's waiting to transmit their WU's
I've tried the usual of restartign the clients and i've even completely shut down the firewall but can someone give the GPU collection servers a nudge so we can get our work back before deadlines.
Cheers
Nathan
Re: 171.67.108.25 / 171.67.108.11
Posted: Thu Jul 16, 2009 8:35 pm
by bruce
ArVee wrote:For a few hours now I've had a completed GPU WU alternately attempting to transmit to the above servers and failing, showing a 503 on the work server and a generic fail to connect message on the collection server. The work server shows as Down on the Server Status page, but the Collection Server shows as ok. A client restart at this end hasn't helped. Work continues on new units in the interim, but is there anything I can do in the meantime other than hope?
Well, technically speaking, the FAH system is working exactly as it is designed and you shouldn't waste your time trying to solve a non-problem.
Yes, the servers are congested. There's always a chance that congestion will clear itself naturally or that somebody from the Pande Group will notice the problem and fix something, so that part of the system can benefit if you report it, but it's not serious unless the problem persists for a long time. (In the past congestion problems did persist for a long time, and the real fix was to get some new servers, so once a problem is known, additional reports are not useful.)
The client, itself, is specifically designed to deal with congestion by trying both the work server and the collection server on both ports and if all of those fail, to hold the WU in the local queue and retry later. Like I said, this is not a problem, but rather the way the system is supposed to work when there is a problem with the servers. Restarting sometimes works, but that's strictly from the perspective of a single client. From the server perspective, you are INCREASING congestion just when the servers need congestion to be reduced. You have to ask yourself the question: Just how important is it for me to upload this particular WU right now? Why can't I wait a little longer and let the system take care of it whenever it can?
Re: 171.67.108.25 / 171.67.108.11
Posted: Thu Jul 16, 2009 8:53 pm
by Nathan_P
Bruce
Thanks for the clarification - the reason why i reflagged it was because my clients had been trying to transmit on and off for 24 hours with limited success. I think its still worth us letting you know if there are problems as its easier to spend 30 seconds looking when we report there may be a fault than it is for us just to say its congestion and then you find out later its serious. Just my .02
Re: 171.67.108.25 / 171.67.108.11
Posted: Fri Jul 17, 2009 12:19 am
by bruce
Nathan_P wrote:Looks like there may be a problem here - 108.11 is in reject mode and 108.25 is taking ages to collect wu's. I am getting work OK so its not holding up the science from that end but i am getting worried about the back log on my GPU's waiting to transmit their WU's
Oh yes, I agree: Reports like the one above are often useful.
Some people spend a lot of effort trying to fix something on their client -- (often much more than you did) -- but taking extra steps like you did (below) are rarely fruitful if you have been able to upload normally recently and nothing else has been changed. I've seen cases where people completely uninstalled / reinstalled and that's a lot of wasted effort, particularly when the information on the server status page shows something might be amiss.
I've tried the usual of restarting the clients and i've even completely shut down the firewall but can someone give the GPU collection servers a nudge so we can get our work back before deadlines.
Have those WUs uploaded yet?
Re: 171.67.108.25 / 171.67.108.11
Posted: Sat Jul 18, 2009 4:22 pm
by Nathan_P
Hi Bruce
Yes they went in late on thursday - turns out i've been having router problems at my end as well which might not have helped either, looks like its time for a new one
