Page 9 of 28
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:07 pm
by *hondo*
Same problem with at least 3 team members from team 51078, I'd normally complete between 5 & 10 WUs per day. My PC as normal was Folding yesterday @ 8:30 GMT I checked the # of completed WUs on the stats page I do know for a fact that I saw 2 WUs go but now approx 23 F@H hours later nothing has been added to my stats score total. Also for the last 12 F@H hours my PC hasn't been able to download a single WU.
Come on Stanford tell us. WHAT IS GOING ON?
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:16 pm
by noorman
toTOW wrote:I've just sent an email to Vijay ... I hope well get more informations soon ...
.
I already sent him a PM about 171.67.108.21 almost 2 hours before you did ...
Nor even read yet !
.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:17 pm
by Russ_64
Today (15th) is a holiday in USA, so don't expect a quick answer or solution.
I shutdown my clients yesterday and tried again earlier today, both my GPU's and SMP recieved new WU's today.....
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:21 pm
by Tobit
I've had work for the past few hours but I'm still having the original "Server has already received unit" problem.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:25 pm
by noorman
.
Just did another shutdown and restart of F@H and got a WU from server 171.64.65.71 in stead of 171.67.108.21 at which all former requests for work were directed ...
It 's a new P10105 jobby (first one for me)
.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:27 pm
by VijayPande
Thanks for the posts. It's early AM in Califorina (that's why this went unfixed for several hours), but I think we've got everything going again. I've contacted Joe regarding this issue: there was a WS bug.
I've also balanced the weights so the other NV WS's can get into the mix better and improve the redundancy.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:40 pm
by chriskwarren
Thanks Dr. Pande. Can you confirm that the "Server has already received unit" problem means that our WUs were accepted by the server and not wasted? From our end it looks like the server rejects our work, and our WU gets wasted.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:40 pm
by ikerekes
Before I went to sleep last night, all of my GPU's were working (on units from 171.64.65.71). Woke up Today morning to a picture where all my GPU's are down, the assignment server reassigned all of them to 108.21 and it was dead in the water.
As of 9:31 PST I restarted every GPU's and they are all loaded 3 from 65.71, 3 from 108.11 and one from 108.21
Hurray!!! Apparently the assignment server needed the biggest kick. (Valentine's day is over, for whoever did the kick doesn't have to feel bad)
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 3:53 pm
by tonic
I just got WUs on all 5 of my clients...so perhaps the problem is fixed.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 4:32 pm
by Pette Broad
Early days yet, but I've just uploaded 3 units (and got some more)
Pete
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 4:38 pm
by noorman
.
The redistribution is the main thing that helped the
NetLoad to come down drastically on 171.67.108.21 (I 've seen).
It was extremely high compared to 'normal' figures, it 's come down to the usual levels already !
I got a WU the minute I restarted my GPU-Client after the news from another member (living in the U.K.) that he 'd got a WU, just before.
I thought that the problem for me was the high network delays I watched when pinging the server.
Since that U.K.member is across the Pond from the U.S. too, I tried my luck again and got a P10105 straight away
A fellow Folder (from the W. of the U.S.) got some WU's, 1 about every 2 hrs ...
I knew the server wasn't completely down because it had responded to a check by webbrowser with '
OK' and it was pingable all of the time.
Because of that, I sent a PM to Vijay. I only forgot to calculate the time difference and I also didn't know about the Public Holiday.
By the way, Uploading was no problem for my GPU-Client; that was done already, it just couldn't get new Work.
.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:11 pm
by Nathan_P
chriskwarren wrote:Thanks Dr. Pande. Can you confirm that the "Server has already received unit" problem means that our WUs were accepted by the server and not wasted? From our end it looks like the server rejects our work, and our WU gets wasted.
Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:25 pm
by Tobit
Nathan_P wrote:Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
Unfortunately, there is nothing left to force. When the client receives the message that the server has already received the work unit, the slot in queue.dat the work was assigned to is "emptied". Some of us still have some wuresults.dat files. However, this problem had gone on for so long, many of mine were over written several times with newer work. The clients have only so many slots and once the slot is cleared, there is no way to send any lingering work files back to Stanford.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:30 pm
by Nathan_P
Tobit wrote:Nathan_P wrote:Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
Unfortunately, there is nothing left to force. When the client receives the message that the server has already received the work unit, the slot in queue.dat the work was assigned to is "emptied". Some of us still have some wuresults.dat files. However, this problem had gone on for so long, many of mine were over written several times with newer work. The clients have only so many slots and once the slot is cleared, there is no way to send any lingering work files back to Stanford.
Thats a shame as i still have the files in my work directories. At least i'm folding again, and seeing my gtx 275 beaten by my gts 250 is indeed a sight to behold
I'd post a fahmon shot but i can't grab a screenie
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:31 pm
by VijayPande
Nathan_P wrote:chriskwarren wrote:Thanks Dr. Pande. Can you confirm that the "Server has already received unit" problem means that our WUs were accepted by the server and not wasted? From our end it looks like the server rejects our work, and our WU gets wasted.
Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
It depends on the nature of the WS bug that's causing this, but I'm worried that these won't go back. I've escalated this bug to the highest level on our bug tracker and Joe's on it. I'll post more when we know more.
Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.