new stats system?

spazzychalk · Post by **spazzychalk** » Sun Jan 04, 2009 4:34 pm

did i read this as youre going to have 20 gigs of ram running for stats? im really interested in hearing in layman version how this works on a technical level

Stats update in progress
External access to the Folding@home stats server is currently not available in order to expedite the stats input process. This was started on the hour and we expect this to take about 45 minutes. The stats updates run in 2 hour cycles.

In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.

Third party stats systems work independently of this database and can be queried at this time. See this FAH Wiki page for a list of third party stats systems.

For more information
For more information, please see http://folding.typepad.com/ for news and more details or check out our forum at http://foldingforum.org.

Post by **bruce** » Mon Jan 05, 2009 6:43 am

I'm not sure what your question is.

The individual work servers create lists of the WUs that have been returned along with the UserName, TeamNo, etc. The master database contains data for all of the historic data. When the lists are used to update the master database, each item must update the index associated with each UserName and with each TeamNo (and other things). This means there are lots of accesses to random parts of the master database. If it doesn't all fit in RAM, it takes hours to do the update because of the time required for disk accesses.

If on-line queries are allowed simultaneously, additional accesses would be required to pull data from other portions of the database. Also there are issues related to the required record locking. Simultaneous query access could easily extend the ~one hour update could easily extend to ~2 hours plus. Since the next update is scheduled to start in two hours, the updating process would never finish.

The only real solution to this problem is money for new or expanded hardware.

P5-133XL · Post by **P5-133XL** » Mon Jan 05, 2009 7:33 am

What it sounds like you need is a couple of the new Intel SSD's.

spazzychalk · Post by **spazzychalk** » Mon Jan 05, 2009 5:19 pm

i was wondering more about this portion of the message

In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.

ParrLeyne · Post by **ParrLeyne** » Mon Jan 05, 2009 6:36 pm

bruce wrote:...The only real solution to this problem is money for new or expanded hardware.

This is not intended as a "you guys are stupid" or "I'm smarter that you guys" type of comment

As someone who works with databases on a full time basis, I would suggest that the real solution lies not in hardware but in a better approach to handling how the stats are updated.

There are several techniques which could provide significant benefit, eliminating the downtime for the public stats pages.

Post by **VijayPande** » Mon Jan 05, 2009 6:45 pm

ParrLeyne wrote:
bruce wrote:...The only real solution to this problem is money for new or expanded hardware.
This is not intended as a "you guys are stupid" or "I'm smarter that you guys" type of comment

As someone who works with databases on a full time basis, I would suggest that the real solution lies not in hardware but in a better approach to handling how the stats are updated.

There are several techniques which could provide significant benefit, eliminating the downtime for the public stats pages.

Thanks for your feedback. Over the years, we've had db experts look over the stats system (and actually we deal with databases on a full time basis ourselves, since data mining is critical to our science). They have given some tips and optimizations here and there and we have employed them in the current system. The main problem is that we track *a lot* of data and the experts have consistently said that we should keep less data (smaller db's) or get beefier hardware. There are lots of WUs to update (routinely 25,000 per hour of running) and each update must be checked for several issues (eg to avoid cheating); also, these aren't simple INSERT's, but require various math operations to be done in various db's to update the data. These checks, etc in particular slow down the stats update if multiple db's cannot residue simultaneously in memory.

Right now, the stats system is running on pretty old hardware -- circa 2003 -- and the box we are using wasn't even all that amazing at the time, so we can pretty simply get a pretty big boost by upgrading to something more modest.

However, with that said, if you have some specific suggestions, I'd be curious to hear them.

spazzychalk · Post by **spazzychalk** » Mon Jan 05, 2009 11:39 pm

guys?

spazzychalk wrote:i was wondering more about this portion of the message

In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.

Post by **bruce** » Tue Jan 06, 2009 5:15 am

spazzychalk wrote:guys?

spazzychalk wrote:i was wondering more about this portion of the message

In 2009, we will be upgrading our stats server such that the whole database will fit in RAM (this used to be the case, but FAH has gotten quite big over the last few years and the related databases require over 20GB). We expect that this new stats server should greatly speed up stats updates.

What were you wondering? (I think I already answered it -- "money".)

spazzychalk · Post by **spazzychalk** » Tue Jan 06, 2009 5:36 am

i guess im just really interested in how it will work. it sounds really neat. and 20 gigs of ram is crazy (to someone who doesnt know anything about networking). so you guys have the money then and its being done? what will the stats be like then? when do you think itll happen? will this be new or expanded? is it going to be state of the art or midline like what prof pande mentioned you have now? how long will the hardware be good for before needing to be upgraded again considering the snowballing increase in wu coming back to you? there are worse problems to have i suppose

alpha754293 · Post by **alpha754293** » Fri Jan 30, 2009 4:41 pm

Dr. Pande:

Have your group looked into distributed/parallel processing for your DB?

Do you have a particular budget (limitation or requirement) in mind?

I'm not entirely sure what it would take, but I would think that something along the lines of a much much much much much smaller version of Google's MPP and distributed computing system would be in order using the latest blade servers.

However, those tend to be rather expensive.

LIke P5-133XL was saying, I was doing some reading about Sun's new storage servers and how they use CF SSDs to augment and cache frequently accessed files. So that might be a way to go about that. (And you can always add more CF SSD capacity or tied them together into a virtualized storage pool, with RAID and redundancy built into it).

But if you want to run everything in RAM, I suppose it's doable, but then you'd be looking at quad socket quad-core processors and minature implementation of a MPP DB.

*edit*
If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).

codysluder · Post by **codysluder** » Tue Feb 03, 2009 8:03 pm

alpha754293 wrote:Have your group looked into distributed/parallel processing for your DB?

If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).

Each update requires locking a portion of the DB, doing the update, then unlocking it. MPs lock greater portions simultaneously, and if the locks overlap, parallel updates often become serial updates anyway.

alpha754293 · Post by **alpha754293** » Tue Feb 03, 2009 8:55 pm

codysluder wrote:
alpha754293 wrote:Have your group looked into distributed/parallel processing for your DB?

If money is a real issue (as in there's no money for new hardware at all), I suppose that you might be able to scrounge around Stanford for old hardware and cluster the systems together to do the stats update. It certainly won't be fast or efficient, but it'd be cheap. (at least in theory anyways).
Each update requires locking a portion of the DB, doing the update, then unlocking it. MPs lock greater portions simultaneously, and if the locks overlap, parallel updates often become serial updates anyway.

I don't know.

I just remember watching a lecture from I think one of the VPs of Engineering from Google and they talked about their parallel/distributed processing platform and how their search engine (on a very very very broad scale) works.

Mind you, they have like a really really BIG MPP system going on, but I figure that even in much smaller implementations, that perhaps it something to think about.

That way, if the stats server is really one that just coordinates the slave nodes; perhaps it'd be easier to update a large numer of really tiny pieces of the puzzle than have a few number of really large chunks.

7im · Post by **7im** » Wed Feb 04, 2009 1:13 am

There again, consider what Vijay said above. Fah data is not simple data. It isn't just numbers on a really big spreadsheet like what I assume most of the Google data looks like (storing lots of search terms, and URLs, advert links, etc.). Work units tie together in strings to forum larger units. Strings of data tie together to form vectors. And each update triggers several procedures to be run on the data. And Fah doesn't have the hardware budget that Google does either. I'd also guess there may be a process or legacy issue or two that even a hardware upgrade won't help to improve.

And with any upgrade project, one has to not only weigh the time and cost of the upgrade, but also the cost of a potential interruption in the project if the upgrade doesn't go as planned. Then weigh all that against the perceived benefit of the upgrade. Also consider that several papers have been or are about to be published, and this may not be the time to risk or take on the extra work of an upgrade. Things are never as simple as they seem.

alpha754293 · Post by **alpha754293** » Wed Feb 04, 2009 4:09 am

Apparently, the google engine is a parallel/distributed search engine and there's two parts to it. One is like the whole internet "indexing" thing. And the other part was where it actually also searches within the documents themselves (the doc server I think).

Well, I'm not suggest that F@H goes out and buys like a multi-million dollar server farm. But if you extract the framework topology, and then apply it, even on a very small scale, (proportional to the size of the database if at all possible). while I realize that they're not exactly the same, but I would think that there's be some kind of parallelization.

I don't know. All I'm saying is that IF it can be done, it might not be a bad model to follow.

spazzychalk · Post by **spazzychalk** » Wed Feb 04, 2009 3:19 pm

google has even talked about launching their servers into space to protect privacy. goverment jurisdiction and all

Folding Forum

new stats system?

new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?

Re: new stats system?