I am not complaining and I mean no disrespect. I simply would like an answer to a couple of questions.
First, the points I earn or accrue per day seem to fluctuate wildly. The spread currently varies between a peak of 1700+ and a low of 500+ per day. I am donating 5 CPUs. Each is running a 32-bit Linux client. The load on the client systems is usually no more than web browsing sessions, viewing flash video, etc. or editing in OoWriter. The remainder of the CPUs idle time ALWAYS goes to folding@home. I consider the project to be THAT WORTHWHILE.
Yes, yes, I know I can "earn more" by running a 64-bit client, when and if the work units are available. I'm not there yet. Not until next week. I would simply like to know how, if each CPU is running 24/7, the point spread can vary so much?
Second, there are recurring server problems at the Stanford labs and/or wherever else the project is hosted. During the last outage, through no fault of my own, none of my folding clients could connect for a period of time. That period of time, for one of the five clients, exceeded four days. Four days of waiting to send results and get more work.
The staff is probably overworked when server outages occur. Most likely, some of the fileservers are running at peak capacity during certain periods. And most likely, the project could do a lot better with more rackspace. I understand all those constraints. So, this is a 2-part question.
(A) Is there anyfailover designed into the existing network?
(B) Would it be possible to contain a switch within a future fah executable to allow the client to seek an alternate IP address if the target isn't responding? Does the client have to connect to a certain IP address, or could it be a range of addresses?
OK, that was actually three questions.
2, actually 3, questions
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 357
- Joined: Mon Dec 03, 2007 4:36 pm
- Hardware configuration: Q9450 OC @ 3.2GHz (Win7 Home Premium) - SMP2
E7500 OC @ 3.66GHz (Windows Home Server) - SMP2
i5-3750k @ 3.8GHz (Win7 Pro) - SMP2 - Location: University of Birmingham, UK
Re: 2, actually 3, questions
As regards points: the units are all benchmarked on one of two machines: older unit series are benchmarked on a P4 Northwood @ 2.8GHz with SSE2 disabled, and should all earn 110PPD on a comparable machine (give or take a few % either way). Units which require uploads or downloads of more than 5MB are given a 100% bonus, so the benchmark for such units is 220PPD. Newer units introduced this year will be benchmarked on the new benchmark machine, a Core i5 @ 2.6GHz. I do not know what the benchmark PPD is for the new machine but it has been set up so a Pentium 4 @ 2.8 should still earn around 110PPD (or 220 for big units) under the new system. However due to the vast differences in architectures between different families of processors, points can produce wildly different PPD on different machines. For example, on the slowest units a single core of my 3.2GHz Q9450 gets about 250PPD. On the fastest units it can be as much as 1100PPD. This is simply down to the differences between the Core 2 and Pentium 4 Northwood architectures.
As regards the servers:
A) Yes. There are collection servers which can collect WUs whose work servers have gone down. However unfortunately these collection servers can only collect WUs they know about, so if your WU was assigned after the last update the work server gave to the collection server, the collection server will be unaware of your unit and will reject it. Also there are only three CSs for about 70 work servers at the moment, which means they can be extremely heavily loaded. The 70 servers represent redundancy in the sheer number of them... your clients may only be able to use 10 of those servers according to the settings and client type you use, but the assignment server should have redirected you to another server which has work.
B) This feature should not be necessary if the Assignment Servers do their job properly
As regards the servers:
A) Yes. There are collection servers which can collect WUs whose work servers have gone down. However unfortunately these collection servers can only collect WUs they know about, so if your WU was assigned after the last update the work server gave to the collection server, the collection server will be unaware of your unit and will reject it. Also there are only three CSs for about 70 work servers at the moment, which means they can be extremely heavily loaded. The 70 servers represent redundancy in the sheer number of them... your clients may only be able to use 10 of those servers according to the settings and client type you use, but the assignment server should have redirected you to another server which has work.
B) This feature should not be necessary if the Assignment Servers do their job properly
Last edited by John Naylor on Mon Apr 12, 2010 9:35 pm, edited 1 time in total.
Folding whatever I'm sent since March 2006 Beta testing since October 2006. www.FAH-Addict.net Administrator since August 2009.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: 2, actually 3, questions
Agreed with YPMV. I forget the CPU cache size on the P4 2.8, but it may only be 512, where as newer C2Ds are double that. However, some WUs can benefit from larger cache sizes, some do not. However, because the cache size on the P4 is small, none of the WUs get the cache boost. And because you get different work units and different project on your various collection of machines, I'm not surprised by the variances.
A, Yes, some. 2 or 3 failovers on the high volume servers, maybe none at all for those servering up test work units, or serving older clients with fewer numbers of clients.
B. When A works, you don't need B. When A doesn't work, even a new switch won't help, because there are no other work servers matching your configuration.
BTW, switching to a 64 bit client does not increase points production. The 32 bit and 64 bit fahcores run at the same speed.
A, Yes, some. 2 or 3 failovers on the high volume servers, maybe none at all for those servering up test work units, or serving older clients with fewer numbers of clients.
B. When A works, you don't need B. When A doesn't work, even a new switch won't help, because there are no other work servers matching your configuration.
BTW, switching to a 64 bit client does not increase points production. The 32 bit and 64 bit fahcores run at the same speed.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: 2, actually 3, questions
And question 4 wasn't even a question, . . . you made it as an observation/guess.djohnston wrote:The staff is probably overworked when server outages occur. Most likely, some of the fileservers are running at peak capacity during certain periods. And most likely, the project could do a lot better with more rackspace. I understand all those constraints. So, this is a 2-part question
. . . .
OK, that was actually three questions.
Yes, the staff is overworked when server outages occur. Unfortunately they've had a couple of errors on their RAIDs. I don't know enough about the actual hardware to make an intelligent guess, but their servers are huge and when a drive has to be replaced, it takes a long, long time to do an fsck. The real work is before and after that point, though. First someone has to figure out if there's a simple problem. Then several more complex issues are examined. If the decision leads to replacing hardware, then there's the long wait for fsck followed by other data validation steps before the server is trusted on-line again. Having a server off-line isn't something that ANYBODY wants, but when the only choice is to take it off-line, there's plenty of work that has to be done before it comes on-line again.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: 2, actually 3, questions
Thanks, everyone for the quick responses. I understand a bit more of the nature of the beast in the points scheme.
Thanks. I was misinformed, with large, sharp, pointy teeth.7im wrote: BTW, switching to a 64 bit client does not increase points production. The 32 bit and 64 bit fahcores run at the same speed.
Re: 2, actually 3, questions
The only 64-bit thing that was faster was the A2 SMP WU's, which you could only run on 64-bit linux, versus the slower A1 WU's for 32-linux and for Windows. It sounds like you're running the classic client, which is the same speed pretty much no matter what operating system you run.