Page 1 of 1

2, actually 3, questions

Posted: Fri Apr 09, 2010 9:34 pm
by djohnston
I am not complaining and I mean no disrespect. I simply would like an answer to a couple of questions.

First, the points I earn or accrue per day seem to fluctuate wildly. The spread currently varies between a peak of 1700+ and a low of 500+ per day. I am donating 5 CPUs. Each is running a 32-bit Linux client. The load on the client systems is usually no more than web browsing sessions, viewing flash video, etc. or editing in OoWriter. The remainder of the CPUs idle time ALWAYS goes to folding@home. I consider the project to be THAT WORTHWHILE.

Yes, yes, I know I can "earn more" by running a 64-bit client, when and if the work units are available. I'm not there yet. Not until next week. I would simply like to know how, if each CPU is running 24/7, the point spread can vary so much?

Second, there are recurring server problems at the Stanford labs and/or wherever else the project is hosted. During the last outage, through no fault of my own, none of my folding clients could connect for a period of time. That period of time, for one of the five clients, exceeded four days. Four days of waiting to send results and get more work.

The staff is probably overworked when server outages occur. Most likely, some of the fileservers are running at peak capacity during certain periods. And most likely, the project could do a lot better with more rackspace. I understand all those constraints. So, this is a 2-part question.

(A) Is there anyfailover designed into the existing network?

(B) Would it be possible to contain a switch within a future fah executable to allow the client to seek an alternate IP address if the target isn't responding? Does the client have to connect to a certain IP address, or could it be a range of addresses?

OK, that was actually three questions.

Re: 2, actually 3, questions

Posted: Fri Apr 09, 2010 10:04 pm
by John Naylor
As regards points: the units are all benchmarked on one of two machines: older unit series are benchmarked on a P4 Northwood @ 2.8GHz with SSE2 disabled, and should all earn 110PPD on a comparable machine (give or take a few % either way). Units which require uploads or downloads of more than 5MB are given a 100% bonus, so the benchmark for such units is 220PPD. Newer units introduced this year will be benchmarked on the new benchmark machine, a Core i5 @ 2.6GHz. I do not know what the benchmark PPD is for the new machine but it has been set up so a Pentium 4 @ 2.8 should still earn around 110PPD (or 220 for big units) under the new system. However due to the vast differences in architectures between different families of processors, points can produce wildly different PPD on different machines. For example, on the slowest units a single core of my 3.2GHz Q9450 gets about 250PPD. On the fastest units it can be as much as 1100PPD. This is simply down to the differences between the Core 2 and Pentium 4 Northwood architectures.

As regards the servers:
A) Yes. There are collection servers which can collect WUs whose work servers have gone down. However unfortunately these collection servers can only collect WUs they know about, so if your WU was assigned after the last update the work server gave to the collection server, the collection server will be unaware of your unit and will reject it. Also there are only three CSs for about 70 work servers at the moment, which means they can be extremely heavily loaded. The 70 servers represent redundancy in the sheer number of them... your clients may only be able to use 10 of those servers according to the settings and client type you use, but the assignment server should have redirected you to another server which has work.

B) This feature should not be necessary if the Assignment Servers do their job properly ;)

Re: 2, actually 3, questions

Posted: Sat Apr 10, 2010 12:04 am
by 7im
Agreed with YPMV. I forget the CPU cache size on the P4 2.8, but it may only be 512, where as newer C2Ds are double that. However, some WUs can benefit from larger cache sizes, some do not. However, because the cache size on the P4 is small, none of the WUs get the cache boost. And because you get different work units and different project on your various collection of machines, I'm not surprised by the variances.

A, Yes, some. 2 or 3 failovers on the high volume servers, maybe none at all for those servering up test work units, or serving older clients with fewer numbers of clients.

B. When A works, you don't need B. When A doesn't work, even a new switch won't help, because there are no other work servers matching your configuration.

BTW, switching to a 64 bit client does not increase points production. The 32 bit and 64 bit fahcores run at the same speed.

Re: 2, actually 3, questions

Posted: Sat Apr 10, 2010 3:44 am
by bruce
djohnston wrote:The staff is probably overworked when server outages occur. Most likely, some of the fileservers are running at peak capacity during certain periods. And most likely, the project could do a lot better with more rackspace. I understand all those constraints. So, this is a 2-part question
. . . .
OK, that was actually three questions.
And question 4 wasn't even a question, ;) . . . you made it as an observation/guess.

Yes, the staff is overworked when server outages occur. Unfortunately they've had a couple of errors on their RAIDs. I don't know enough about the actual hardware to make an intelligent guess, but their servers are huge and when a drive has to be replaced, it takes a long, long time to do an fsck. The real work is before and after that point, though. First someone has to figure out if there's a simple problem. Then several more complex issues are examined. If the decision leads to replacing hardware, then there's the long wait for fsck followed by other data validation steps before the server is trusted on-line again. Having a server off-line isn't something that ANYBODY wants, but when the only choice is to take it off-line, there's plenty of work that has to be done before it comes on-line again.

Re: 2, actually 3, questions

Posted: Sat Apr 10, 2010 6:09 am
by djohnston
Thanks, everyone for the quick responses. I understand a bit more of the nature of the beast in the points scheme.
7im wrote: BTW, switching to a 64 bit client does not increase points production. The 32 bit and 64 bit fahcores run at the same speed.
Thanks. I was misinformed, with large, sharp, pointy teeth.

Re: 2, actually 3, questions

Posted: Sat Apr 10, 2010 6:37 am
by AZBrandon
The only 64-bit thing that was faster was the A2 SMP WU's, which you could only run on 64-bit linux, versus the slower A1 WU's for 32-linux and for Windows. It sounds like you're running the classic client, which is the same speed pretty much no matter what operating system you run.