Stats update in progress
External access to the Folding@home stats server is currently not available in order to expedite the stats input process. This was started on the hour and we expect this to take about 45 minutes. The stats updates run in 2 hour cycles.
In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.
Third party stats systems work independently of this database and can be queried at this time. See this FAH Wiki page for a list of third party stats systems.
For more information
For more information, please see http://folding.typepad.com/ for news and more details or check out our forum at http://foldingforum.org.
new stats system?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 174
- Joined: Sun Nov 16, 2008 6:41 am
new stats system?
did i read this as youre going to have 20 gigs of ram running for stats? im really interested in hearing in layman version how this works on a technical level
Re: new stats system?
I'm not sure what your question is.
The individual work servers create lists of the WUs that have been returned along with the UserName, TeamNo, etc. The master database contains data for all of the historic data. When the lists are used to update the master database, each item must update the index associated with each UserName and with each TeamNo (and other things). This means there are lots of accesses to random parts of the master database. If it doesn't all fit in RAM, it takes hours to do the update because of the time required for disk accesses.
If on-line queries are allowed simultaneously, additional accesses would be required to pull data from other portions of the database. Also there are issues related to the required record locking. Simultaneous query access could easily extend the ~one hour update could easily extend to ~2 hours plus. Since the next update is scheduled to start in two hours, the updating process would never finish.
The only real solution to this problem is money for new or expanded hardware.
The individual work servers create lists of the WUs that have been returned along with the UserName, TeamNo, etc. The master database contains data for all of the historic data. When the lists are used to update the master database, each item must update the index associated with each UserName and with each TeamNo (and other things). This means there are lots of accesses to random parts of the master database. If it doesn't all fit in RAM, it takes hours to do the update because of the time required for disk accesses.
If on-line queries are allowed simultaneously, additional accesses would be required to pull data from other portions of the database. Also there are issues related to the required record locking. Simultaneous query access could easily extend the ~one hour update could easily extend to ~2 hours plus. Since the next update is scheduled to start in two hours, the updating process would never finish.
The only real solution to this problem is money for new or expanded hardware.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2948
- Joined: Sun Dec 02, 2007 4:36 am
- Hardware configuration: Machine #1:
Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).
Machine #2:
Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.
Machine 3:
Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32
I am currently folding just on the 5x GTX 460's for aprox. 70K PPD - Location: Salem. OR USA
Re: new stats system?
What it sounds like you need is a couple of the new Intel SSD's.
-
- Posts: 174
- Joined: Sun Nov 16, 2008 6:41 am
Re: new stats system?
i was wondering more about this portion of the message
In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.
Re: new stats system?
This is not intended as a "you guys are stupid" or "I'm smarter that you guys" type of commentbruce wrote:...The only real solution to this problem is money for new or expanded hardware.
As someone who works with databases on a full time basis, I would suggest that the real solution lies not in hardware but in a better approach to handling how the stats are updated.
There are several techniques which could provide significant benefit, eliminating the downtime for the public stats pages.
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: new stats system?
Thanks for your feedback. Over the years, we've had db experts look over the stats system (and actually we deal with databases on a full time basis ourselves, since data mining is critical to our science). They have given some tips and optimizations here and there and we have employed them in the current system. The main problem is that we track *a lot* of data and the experts have consistently said that we should keep less data (smaller db's) or get beefier hardware. There are lots of WUs to update (routinely 25,000 per hour of running) and each update must be checked for several issues (eg to avoid cheating); also, these aren't simple INSERT's, but require various math operations to be done in various db's to update the data. These checks, etc in particular slow down the stats update if multiple db's cannot residue simultaneously in memory.ParrLeyne wrote:This is not intended as a "you guys are stupid" or "I'm smarter that you guys" type of commentbruce wrote:...The only real solution to this problem is money for new or expanded hardware.
As someone who works with databases on a full time basis, I would suggest that the real solution lies not in hardware but in a better approach to handling how the stats are updated.
There are several techniques which could provide significant benefit, eliminating the downtime for the public stats pages.
Right now, the stats system is running on pretty old hardware -- circa 2003 -- and the box we are using wasn't even all that amazing at the time, so we can pretty simply get a pretty big boost by upgrading to something more modest.
However, with that said, if you have some specific suggestions, I'd be curious to hear them.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
-
- Posts: 174
- Joined: Sun Nov 16, 2008 6:41 am
Re: new stats system?
guys?
spazzychalk wrote:i was wondering more about this portion of the message
In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.
Re: new stats system?
What were you wondering? (I think I already answered it -- "money".)spazzychalk wrote:guys?
spazzychalk wrote:i was wondering more about this portion of the message
In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.
-
- Posts: 174
- Joined: Sun Nov 16, 2008 6:41 am
Re: new stats system?
i guess im just really interested in how it will work. it sounds really neat. and 20 gigs of ram is crazy (to someone who doesnt know anything about networking). so you guys have the money then and its being done? what will the stats be like then? when do you think itll happen? will this be new or expanded? is it going to be state of the art or midline like what prof pande mentioned you have now? how long will the hardware be good for before needing to be upgraded again considering the snowballing increase in wu coming back to you? there are worse problems to have i suppose
-
- Posts: 383
- Joined: Sun Jan 18, 2009 1:13 am
Re: new stats system?
Dr. Pande:
Have your group looked into distributed/parallel processing for your DB?
Do you have a particular budget (limitation or requirement) in mind?
I'm not entirely sure what it would take, but I would think that something along the lines of a much much much much much smaller version of Google's MPP and distributed computing system would be in order using the latest blade servers.
However, those tend to be rather expensive.
LIke P5-133XL was saying, I was doing some reading about Sun's new storage servers and how they use CF SSDs to augment and cache frequently accessed files. So that might be a way to go about that. (And you can always add more CF SSD capacity or tied them together into a virtualized storage pool, with RAID and redundancy built into it).
But if you want to run everything in RAM, I suppose it's doable, but then you'd be looking at quad socket quad-core processors and minature implementation of a MPP DB.
*edit*
If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).
Have your group looked into distributed/parallel processing for your DB?
Do you have a particular budget (limitation or requirement) in mind?
I'm not entirely sure what it would take, but I would think that something along the lines of a much much much much much smaller version of Google's MPP and distributed computing system would be in order using the latest blade servers.
However, those tend to be rather expensive.
LIke P5-133XL was saying, I was doing some reading about Sun's new storage servers and how they use CF SSDs to augment and cache frequently accessed files. So that might be a way to go about that. (And you can always add more CF SSD capacity or tied them together into a virtualized storage pool, with RAID and redundancy built into it).
But if you want to run everything in RAM, I suppose it's doable, but then you'd be looking at quad socket quad-core processors and minature implementation of a MPP DB.
*edit*
If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).
-
- Posts: 1024
- Joined: Sun Dec 02, 2007 12:43 pm
Re: new stats system?
Each update requires locking a portion of the DB, doing the update, then unlocking it. MPs lock greater portions simultaneously, and if the locks overlap, parallel updates often become serial updates anyway.alpha754293 wrote:Have your group looked into distributed/parallel processing for your DB?
If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).
-
- Posts: 383
- Joined: Sun Jan 18, 2009 1:13 am
Re: new stats system?
I don't know.codysluder wrote:Each update requires locking a portion of the DB, doing the update, then unlocking it. MPs lock greater portions simultaneously, and if the locks overlap, parallel updates often become serial updates anyway.alpha754293 wrote:Have your group looked into distributed/parallel processing for your DB?
If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).
I just remember watching a lecture from I think one of the VPs of Engineering from Google and they talked about their parallel/distributed processing platform and how their search engine (on a very very very broad scale) works.
Mind you, they have like a really really BIG MPP system going on, but I figure that even in much smaller implementations, that perhaps it something to think about.
That way, if the stats server is really one that just coordinates the slave nodes; perhaps it'd be easier to update a large numer of really tiny pieces of the puzzle than have a few number of really large chunks.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: new stats system?
There again, consider what Vijay said above. Fah data is not simple data. It isn't just numbers on a really big spreadsheet like what I assume most of the Google data looks like (storing lots of search terms, and URLs, advert links, etc.). Work units tie together in strings to forum larger units. Strings of data tie together to form vectors. And each update triggers several procedures to be run on the data. And Fah doesn't have the hardware budget that Google does either. I'd also guess there may be a process or legacy issue or two that even a hardware upgrade won't help to improve.
And with any upgrade project, one has to not only weigh the time and cost of the upgrade, but also the cost of a potential interruption in the project if the upgrade doesn't go as planned. Then weigh all that against the perceived benefit of the upgrade. Also consider that several papers have been or are about to be published, and this may not be the time to risk or take on the extra work of an upgrade. Things are never as simple as they seem.
And with any upgrade project, one has to not only weigh the time and cost of the upgrade, but also the cost of a potential interruption in the project if the upgrade doesn't go as planned. Then weigh all that against the perceived benefit of the upgrade. Also consider that several papers have been or are about to be published, and this may not be the time to risk or take on the extra work of an upgrade. Things are never as simple as they seem.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 383
- Joined: Sun Jan 18, 2009 1:13 am
Re: new stats system?
Apparently, the google engine is a parallel/distributed search engine and there's two parts to it. One is like the whole internet "indexing" thing. And the other part was where it actually also searches within the documents themselves (the doc server I think).
Well, I'm not suggest that F@H goes out and buys like a multi-million dollar server farm. But if you extract the framework topology, and then apply it, even on a very small scale, (proportional to the size of the database if at all possible). while I realize that they're not exactly the same, but I would think that there's be some kind of parallelization.
I don't know. All I'm saying is that IF it can be done, it might not be a bad model to follow.
Well, I'm not suggest that F@H goes out and buys like a multi-million dollar server farm. But if you extract the framework topology, and then apply it, even on a very small scale, (proportional to the size of the database if at all possible). while I realize that they're not exactly the same, but I would think that there's be some kind of parallelization.
I don't know. All I'm saying is that IF it can be done, it might not be a bad model to follow.
-
- Posts: 174
- Joined: Sun Nov 16, 2008 6:41 am
Re: new stats system?
google has even talked about launching their servers into space to protect privacy. goverment jurisdiction and all