Page 1 of 2

ideas on speeding up stats

Posted: Mon May 12, 2008 2:46 pm
by VijayPande
We have been looking at how we can speed our stats updates. Looking at our db, we see that there's a primary culprit to slow stats updates: The " Contributions by team and project" pages. These are the pages like

http://fah-web.stanford.edu/cgi-bin/mai ... range=4000

We think we should stop updating these pages in order to keep the stats updating going much more speedily. We have some ideas for how to get similar functionality by using other tables we have. This functionality would not list the number of WU's by project as is now, but would allow donors to query a project number to find out how many WU's were completed.

Slow stats updates are going to be a problem unless we deal with this now, so we'd like to get moving on this. Please give us your opinion over the next week. Thanks!

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 3:39 pm
by Ren02
I use these pages seldom. Mainly for looking up projects I folded awhile ago, so I probably won't remember the exact project numbers unless I see them. These pages definitely add to the FAH experience though.

Can I suggest another option?
Would it help your servers if these stats are updated less frequently, like once per week for instance?

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 5:14 pm
by toTOW
I use these pages ... I'd like to be able to keep using this functionality, but I don't care if they're updated less frequently than the other stats ...

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 5:42 pm
by Foxery
Does the code process everyone in the database every time, or is it able to only spend CPU cycles on donors who have submitted new results, and skip over inactive ones? I know you don't like to edit the database, but perhaps you could somehow separate donors and teams who have not been heard from in a very long time, such that very old results are archived, but not part of the processing. These unchanging data points would also not be packaged with the Lists that 3rd party sites download every day.

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 6:20 pm
by v00d00
Ideas:

1. Update the main stats page once per week as suggested (Sunday morning 3am??).
2. Update main stats once per day, and make it possible for the FAH client to download the relevant team/user data when it picks up a workunit, ready to be displayed in the MyFolding.html file. Maybe as an option in the cfg.
3. Make only short stats. The one only open to top 2k teams.
4. Do away with the stats pages and add more detail to the daily_user/team files, then create an offline reader to parse it so those that want to read there stats can.

Personally i use EOC more than i use Stanford.

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 8:47 pm
by uncle fuzzy
I do use those pages, but you can dump them if you provide that alternate look-up system.

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 9:16 pm
by patermann
I regularly use the (team member)) summary page to see the number of points and work units that I have contributed to the team (and to print off certificates for major milestones!) but I do not use the breakdown by project at all. Hope this helps!

Note: Obviously I can get the statistics from third-party sites but they tend to lag behind Stanford and I cannot get the certificates from anywhere else (as far as I know)! ;)

Re: ideas on speeding up stats

Posted: Mon May 12, 2008 10:16 pm
by P5-133XL
The only reason I access the Stanford stats is to check the number of CPU's. What I'd like is the ability to search the database to see the result of what was turned in so I could compare logs vs points/results: sometimes what FAHMon says is my PPD is not even close to what the stats say I'm getting and I have too many machines folding to keep track or even feel comfortable in asking someone to look them all up.

Re: ideas on speeding up stats

Posted: Tue May 13, 2008 3:33 am
by Tarx
I use the team page list with all the team members (not the one with just the first 1000 members) - that is very important, but for my purposes, updating it once a day would be all I need.

As for the project numbers & WU completed - I've very rarely checked what projects I've done (haven't for over a year except for yesterday when i checked on what the GPU2 client had worked on) so once a week would be fine, but entering the project number to see how much was done would be ok too.

Re: ideas on speeding up stats

Posted: Tue May 13, 2008 4:04 pm
by smartcat99s
I like looking at them every once in a while. Would it be possible to only update those pages once a day/every other day?

Re: ideas on speeding up stats

Posted: Tue May 13, 2008 5:25 pm
by Mactin
Please dont get rid of them.
I keep an extensive personal stats record, and doing so counts a lot towards my folding pleasure. It adds colour to my folding.

The speed issue is no big deal, having stats every 3 hours instead of two would be OK.

The most important thing with stats is their reflecting the folding done. After 13 months of folding, 19 WUs (1.6%) are uncredited, more reliability should be aimed for.

Re: ideas on speeding up stats

Posted: Wed May 14, 2008 4:53 am
by BillR
First, I question why this pole exists. If anything the stats site should include more information and it should update more often.

As for speed I find it hard to believe that a program such as folding supported by Stanford University can’t afford a proper server. I also find it even harder to believe that with all the computer courses Stanford has a proper programmer can’t be found to maintain not only the servers but help with the code problems as well.

Just today I read this:
kasson wrote:We try to be as transparent as possible with released projects, code, etc. We don't like publicizing things that are still under development precisely because of the fluid nature of that development--plans change, projects get delayed, we find bugs, etc. But requests for more communication are understandable and appreciated.

One note about the quad-core issue: the performance of A1 work units on quad-core machines is something that took us by surprise. We expected much more efficient utilization. We've been working very hard to improve this, and we anticipate releasing an update to the A2 core in the near future that has very close to full utilization of all four (or more) cores. [One of our rare pre-release announcements.]

One other response: we understand that many folders use points yield as a way of assessing the scientific impact of their contributions. We try to keep things as consistent as we can, but there are challenges both of inter-machine variation and of balancing points/effort and points/science
.
How on earth can a staff member be surprised by something that was supposed to be tested?

If anything, we the folders need more accountability, not less.

Re: ideas on speeding up stats

Posted: Wed May 14, 2008 5:21 am
by alancabler
BillR wrote:As for speed I find it hard to believe that a program such as folding supported by Stanford University can’t afford a proper server. I also find it even harder to believe that with all the computer courses Stanford has a proper programmer can’t be found to maintain not only the servers but help with the code problems as well.
Do you have any idea of the costs associated with this program? Perhaps your endeavors aren't limited by finances, or by trying to figure out how to do something that's never been done before. If so, then why don't you write a check...
How on earth can a staff member be surprised by something that was supposed to be tested?
I don't think you understand the issue... Stanford was surprised to find (during testing- since this had never been done before) how the Quad processors under- performed. There weren't Quad processors before, either. Who knew. This is all cutting- edge stuff. "Surprises" abound.

Ps In case you hadn't noticed, some of the most (if not The Most) illustrious names in modern programming are hard at work on various FAH clients and cores and myriad other FAH issues.
If you know of any Stanford (or other) students/coders who are competent in Molecular Dynamics and SMP (or GPU or just DC) programming, just let Pande Group know... no wait, they're there already.

Pps Sheesh

Re: ideas on speeding up stats

Posted: Wed May 14, 2008 3:32 pm
by Foxery
BillR wrote:First, I question why this pole exists. If anything the stats site should include more information and it should update more often.

As for speed I find it hard to believe that a program such as folding supported by Stanford University can’t afford a proper server. I also find it even harder to believe that with all the computer courses Stanford has a proper programmer can’t be found to maintain not only the servers but help with the code problems as well.
Stats have nothing to do with folding proteins. Pande Group also isn't made of money.

They're trying to improve performance by dropping an unpopular feature. Did you sign up here just to crap on everything? Go away.

Re: ideas on speeding up stats

Posted: Wed May 14, 2008 4:19 pm
by John Naylor
BillR wrote:As for speed I find it hard to believe that a program such as folding supported by Stanford University can’t afford a proper server.
http://fah-web.stanford.edu/serverstat.html

The Pande Group's main interest is in making sure that the whole network has enough units to consistently keep processing. The money they have goes on new servers to feed the clients with units (see the link above to get an idea of how many servers that requires), rather than stats. For some people the stats are important, but without the work they would be irrelevant because nothing would happen. The Pande Group is more bothered (quite rightly IMO) in furthering the science, rather than spending endless hours tracking down tiny little bugs. On the whole, this works well, and only recently have there been any genuinely major problems. At the same time, there have been unit shortages so their efforts have gone back into creating new units to keep 270,000+ machines happy and folding. Give them a break.

As for the cores not working as expected, they are still in testing. Ever wondered why the clients have "beta" in the version names? The SMP client is in beta, and always has been. It wasn't like they released the A1 core as a full, finished product and were surprised by the performance then - the methods are still being refined. Properly multithreaded code is still in its infancy, and this is bleeding-edge programming. If anything the Pande Group should be congratulated for even attempting to try this, nevermind getting near-perfect CPU usage at only the second attempt (the new A2 core), as the beta team are reporting.