Page 6 of 10

Re: 171.64.65.56 ???

Posted: Thu Oct 07, 2010 4:34 pm
by bollix47
Server has had a net load of 200 or more for hours now and I have 3 (soon to be 4) WUs that can't upload.

Please apply the usual "kick". :roll:

Re: 171.64.65.56 ???

Posted: Thu Oct 07, 2010 5:21 pm
by sortofageek
As susato said above:
susato wrote:Netload's back at 200. Every time Dr. K. bumps it, it heads right back to 200 connections and trouble. Good to hear about the upcoming equipment upgrade.
I have one waiting to return to it also, but I do understand there isn't much more to be done than is already being done until that new server is up and functional. Dr. Kasson is definitely aware of the issue and is doing all he can in the circumstance.

I have been folding since December 2001 ... there have definitely been bumps along the road. We're helping to accomplish great things and there must be some difficulties along the way given the effort to continually do more and do it better. At this point I look back over it all with no regrets. Frustrations and all, I still have the hope we are helping to improve the future for those who will come here after us and I'm glad to be a part of it.

Re: 171.64.65.56 ???

Posted: Thu Oct 07, 2010 8:04 pm
by bollix47
Thank you ... all 3 uploaded. :wink:

Re: 171.64.65.56 ???

Posted: Thu Oct 07, 2010 8:15 pm
by sortofageek
Mine did, too, and net load is already back up to 62. Guess that's just how it is in SMP land right now.

I don't tend to notice until I see a thread about it, though. I tend to let it all happen as it happens in cases like this because I know the Project Manager is aware and doing his best to keep things going.

Maybe time for taking down a box at a time, cleaning out the dust bunnies, checking to make sure the cooling system is running, troubleshooting any little glitches one might have noted, like that? Get 'em ready for the new server? :)

Re: 171.64.65.56 ???

Posted: Thu Oct 07, 2010 9:18 pm
by shdbcamping
sortofageek wrote:Mine did, too, and net load is already back up to 62. Guess that's just how it is in SMP land right now.

I don't tend to notice until I see a thread about it, though. I tend to let it all happen as it happens in cases like this because I know the Project Manager is aware and doing his best to keep things going.

Maybe time for taking down a box at a time, cleaning out the dust bunnies, checking to make sure the cooling system is running, troubleshooting any little glitches one might have noted, like that? Get 'em ready for the new server? :)
Great advice. A good cleanout of the Donor HW (I am religious about the 'once a month) is important to optimal HW performance. Servers and Server issues are part of our Folding contributions. Pande Group is on a very tight budget and I hope you don't dissagree that Science input is more important that Server/PPD credit :wink: . It all works out in the wash..... Your WU may be the 'breakthrough'. If that was the case. would you really care about the points being allocated correcly?
Sean

Re: 171.64.65.56 ???

Posted: Thu Oct 07, 2010 9:56 pm
by bruce
shdbcamping wrote:Pande Group is on a very tight budget and I hope you don't dissagree that Science input is more important that Server/PPD credit :wink: .
This forum is intentionally team neutral because all of the teams benfit FAH and discussing them is not part of our charter as a help-forum.

This forum is intentionally hardware agnostic because FAH benefits from both NVidia and ATI GPUs as well as Intel/AMD/etc. CPUs and arguing about which is best is not part of our charter as a help-forum.

This forum is intentionally motivationally neutral because FAH benefits from people who see science as primary as well as people who see points as primary. Please don't assume that everyone believes that science is more important than points, just because that's your personal bias. The scientists see it that way, but FAH excepts donations from anyone, no matter what team they're on, no matter what hardware they have, and no matter what their motivation is. This forum supports them all, too.

Re: 171.64.65.56 ???

Posted: Fri Oct 08, 2010 11:23 pm
by 314159
Well said, old friend. :!:

Now is it possible to alert the good Doctor that this and his only other active SMP server could use a "bit" of resetting (plus additional WUs in one case)? :)

I fold primarily in Memory of my Dad and Aunt who passed away in recent years. I believe that at the time of his death, Dad "may" have been the oldest active participant in this project at age 87. Mom, now age 89, continues to run his Mac mini (pure console mode) and "may" now have this honor. :?:

Re: 171.64.65.56 ???

Posted: Fri Oct 08, 2010 11:36 pm
by sortofageek
Someone will take a look. Please realize, however, that the netload number doesn't give the whole story. Sometimes it can be up there but WUs are passing and the high number for netload just means there is a heavy load.

Re: 171.64.65.56 ???

Posted: Sat Oct 09, 2010 12:11 am
by 314159
Thanks for the reply "friendly and most appreciated Moderator". :)

When one has a relatively large farm as I do (34 active computers, and NOT a corporate folder, i.e. all on my dime), the 503's hitting almost every completed WU tends to contradict your premise. This server "burps" virtually every time the NET LOAD hits 200. :(
Also, this has been the case for quite a few days.

Doctor Kasson apparently DOES have a way of bringing things back to normal since he has done this on quite a few occasions. (thank you!)

Being stuck with 5 computers without work or unable to submit completed WUs makes my "farm" extremely difficult to administer (I am an old, old, guy, dating back to the GAH days). :wink:

I expect the good Doctor to show up soon. I believe that he follows this thread religiously.

Otherwise, I shall just take two aspirin and see what the situation is in the morning. :D

Re: 171.64.65.56 ???

Posted: Sat Oct 09, 2010 12:18 am
by 314159
P.S. - Now fixed, at least for 20 or so hours. :D

(check NET LOAD "friendly and appreciated Moderator") :wink:

Thank you! :!:

Re: 171.64.65.56 ???

Posted: Sun Oct 10, 2010 4:02 am
by 314159
Once again. NET LOAD problem.

Is there noi a viable solution to this hopefully temporary problem (other than waiting for the new SMP server to come on line)?

Note that my prediction above was a "bit" off, but not by that much. :wink:

Please fix. Thanks. :!:

Re: 171.64.65.56 ???

Posted: Sun Oct 10, 2010 5:02 am
by sortofageek
I'll raise a flag, but the solution will be more than just temporary once we have a new server.

Re: 171.64.65.56 ???

Posted: Sun Oct 10, 2010 3:55 pm
by VijayPande
We're working on this on several fronts: new SMP WUs to go onto other servers to spread the load and an improved WS software to handle the load better. Both are moving to resolve this issue, but are taking longer than we expected. I hope that several PG members will be beta testing new SMP WUs shortly (hopefully later this week). Once they're through the QA process, the load will get *a lot* better on this server. Also, the new WUs will likely run off of the new WS software, which should help too (the new WS has code to specifically handle the situation when the # of connections saturates).

Re: 171.64.65.56 ???

Posted: Sun Oct 10, 2010 7:33 pm
by jimerickson
thank you Dr.Pande. appreciate all the work!

Re: 171.64.65.56 ???

Posted: Tue Oct 12, 2010 5:47 pm
by susato
.56 is experiencing delays again, having hit 200 connections 10 hours ago. Now the net load is over 250 - apparently the saturation point has been raised over at Stanford. Good idea! My machines are still waiting between WU on this machine, but the wait duration is on the order of 15 minutes rather than an hour or more.