Page 1 of 3
Bigadv reform
Posted: Wed May 30, 2012 5:53 am
by TonyStewart14
While looking up hardware minimums for bigadv WUs, I
noticed a post that pointed out how a machine with 16 physical cores could have a TPF of 45 minutes when a time of 34:34 was needed. Thus the machine, despite meeting the core requirements, was not even close to meeting the deadline. I realized the 16 core minimum is probably misleading since it suggests those with machines that have at least 16 cores, especially all physical cores, can meet the requirement, when that is not at all the case.
My initial thought was to suggest increasing the minimum to 24 cores as this would make it more clear how much computing power one would need to have a reasonable expectation of meeting bigadv deadlines. That way people aren't duped into buying a 16 core machine only to realize it doesn't come close to being enough for what they hoped to do with it. I also thought about what many have suggested already, which is to eliminate core requirements altogether. That way, as long as one can meet the deadlines, the cores are irrelevant. This also addresses the issue of the core minimum being a misleading indicator of what suffices as a bigadv machine, as one will have to look it up (at this point, dual 6172s or some dual-hex Xeons) to find what the minimum is and not assume based on a core minimum. As the ones who implemented the core minimum admit that number of cores is a very rough way of telling how fast a system is, this approach is likely best. A 24 core minimum, however, would give prospective bigadv users a better idea of what would be needed to enter the bigadv realm, and it seems to me that pretty much all systems that are bigadv capable have at least 24 cores.
Since there are multiple solutions, I decided to make a poll where you can choose whether you'd prefer to keep the minimum the same (at 16 cores), increase it to 24, remove the core minimum entirely, or offer a solution of your own (if the latter, please post it so we can take it into consideration as well). Although bigadv has been under constant scrutiny for some time, I feel that this poll will enable a more productive discussion and ultimately a better chance at reforming a system so important to the most dedicated folders.
Re: Bigadv reform
Posted: Wed May 30, 2012 6:44 am
by Zagen30
I'll preface this by saying that I haven't folded bigadv in at least a year and a half, and that was on my i7-930, so I'm not as up-to-date on it as others are. I do hope in the future to build an MP bigadv rig, but I don't think it'll be any time soon.
The main issue I see with removing the core count completely is that you could seriously slow down the work if a number of people with really sub-par machines try to fold them without realizing they're incapable of finishing them on time. I know I've seen several relatively new people posting here who have entered the bigadv flag on their Phenoms, i5s, and the like, presumably because they heard that bigadv gets you more points and are not aware of how demanding bigadv WUs are. Under the current core count system, that doesn't cause any problems since those machines aren't given any bigadv work, but if you removed that requirement, how long would those rigs continue to fail bigadv work before the owner realized what was happening? I don't know how widespread of a problem that would be; my guess is that it's a somewhat self-selecting group of people who are interested enough to dig that deep into the client's options and come here to post about issues they have, so the issue may be overrepresented here compared to the entirety of the FAH community. Still, how many people doing that would it take to significantly hold up bigadv progress?
Are there that many people building 16-core machines with the intent to use them for bigadv? My impression is the vast majority of bigadv folders belong to a serious forum of some sort, and are easily able to find out what is and isn't powerful enough to run it, but since I'm not a member of any major folding community I could be wrong.
Ideally the work servers would be able to cut off machines that were consistently missing the bigadv deadlines, but I don't know how feasible that is, and that still leaves the problem of needing to fail, say, 3 or 5 consecutive bigadv WUs before they decide you're really not powerful enough. I voted for the core minimum increase, but mainly because of my concerns about removing the core min completely.
Re: Bigadv reform
Posted: Wed May 30, 2012 12:44 pm
by iceman1992
Zagen30 wrote:The main issue I see with removing the core count completely is that you could seriously slow down the work if a number of people with really sub-par machines try to fold them without realizing they're incapable of finishing them on time.
How about creating a benchmark people can easily run on their machines? Using a copy of a real work unit for example? You know something you can just download, unzip, and run for a few moments, and the program will show the results in an easy-to-read style. Has anyone recommended this?
This will create a benchmarking standard so people will find out for sure if their machines are capable. (Or has this actually been done?)
Re: Bigadv reform
Posted: Wed May 30, 2012 12:58 pm
by jimerickson
@iceman1992: it has been brought up before. see this topic viewtopic.php?f=66&t=21538&p=216497&hilit=benchmark+unit#p216497
Re: Bigadv reform
Posted: Wed May 30, 2012 1:08 pm
by iceman1992
jimerickson wrote:@iceman1992: it has been brought up before. see this topic viewtopic.php?f=66&t=21538&p=216497&hilit=benchmark+unit#p216497
Yes that's about it. But instead of only PPD, the program should also determine whether or not the machine should run a particular unit (uni/SMP/GPU/bigadv/etc.) to make it easy for newcomers.
Re: Bigadv reform
Posted: Wed May 30, 2012 1:35 pm
by 7im
The V7 client already figures out whether to install SMP vs. CPU and/or GPU slots. But bigadv is not intended for "newcomers." Someone with that kind of hardware is willing or should be willing to read the fine print on the bigadv program requirements. Like how 16 cores is a ballpark estimation of the minimum requirements. And how making the deadlines is ALSO a requirement.
And for those of you just joining the party, it was the same when BA was 8 cores. Core count has never been a guarantee to make the deadlines. Some slower 8 core machines couldn't fold BA-8 WUs either. This is also true with SMP on a 2 core minimum.
Changing this is not a priority, and has already been suggested before. Big boys with big toys can figure it out easily enough.
edit for spelling.
Re: Bigadv reform
Posted: Wed May 30, 2012 5:11 pm
by iceman1992
I thought the main reason of the 16 core minimum was to trim the bigadv folders? I read a post (can't remember where) saying that too many people were doing bigadv and PG wanted to do resource balancing (or something like that) and force the lower spec-ed bigadv folders to step down to normal SMP.
Bumping it up to 24 so soon after 16 is not so good IMO
Re: Bigadv reform
Posted: Wed May 30, 2012 6:02 pm
by bruce
The problem here is that the Assignment Server detects how many cores you have, not how fast your hardware is. The FAH server has two choices: Assign WUs to machines with 16 cores or NOT.
The real requirement is the dealine, and they can't be adjusted for every machine. Some 16 core machines can meet the deadlines and and others cannot. Unfortunately the AS is unable to distinguish between the two so the choice is between giving WUs to some people who cannot be complete them or refusing to assign WUs to some people who could complete them successfully.
Neither is right but Stanford has to choose one or the other.
Re: Bigadv reform
Posted: Wed May 30, 2012 6:29 pm
by Nathan_P
Neither, especially since we are actually talking about threads not cores, no 16 thread machine can make it unless overclocked to ~3.6ghz. Some 24 thread machines can make it, some can't and a lot of true 24 "core" machines can't make it, even though they are the latest "server class" cpu's available. This is how it currently sits:-
16 Thread intel xeon based on Nehalem-EP, only if you can get clock speed up to 3.6
24 thread intel xeon based on westmere-EP - only at a clock speed of 2.66ghz or better, 16 thread westmere, see above
16 thread AMD 61xx Magy cours, only if you can get clock speed up to 3ghz
24 thread AMD 61xx Magny cours, only if you are running 6168's or faster
16 thread AMD intelagos - only if you can get clock speed upto 3ghz
24 thread AMD intelagos - 2.2 ghz or better
I haven't found data yet but i suspect that E5 xeon would need to be at a clock speed of 2ghz on octo core or 2.53ghz on hex cores to make it.
How are any of the CPU's listed above Old, obsolete or Slow? The sooner we can make it to a benchmark system for WU assignment the better.
Edit:- Updated opteron and E5 numbers to reflect reported results.
Re: Bigadv reform
Posted: Wed May 30, 2012 7:09 pm
by bruce
A benchmark that runs on your machine which can predict the processing time needed to complete standard WUs would be a big step forward. Then the AS would know more than just the number of threads and it could come close to predicting which WUs can be completed and which can't I presume everybody would want to know absolutely whether they're going to get a bonus or not (beat the timeout) rather than whether they'll get any points or not (beat the deadline). I'm not sure how perfectly that benchmark can be made to work, but there would still be an uncertainty zone where specific hardware would just meet the timeout of some WUs and almost meet it for others. Certainly any kind of benchmark would be better than the existing system of just counting threads and it would still be a matter of using your own judgement about how much margin you want to leave for the borderline cases.
Judgement: Using your numbers, let's say the AS criteria for a 24 thread AMD intelagos is defined as "2.3 ghz or better" and I have one at 2.29 ghz that can meet the deadline or perhaps one at 2.31 ghz that can't meet the deadline.
Re: Bigadv reform
Posted: Wed May 30, 2012 7:44 pm
by Nathan_P
Its never going to be perfect but it is a big step in the right direction, It would also allow for a bigger variety of WU, All my machines can complete BA8 and BA12 no problem, but only one can do BA16 and won't do BA24 when they appear, There must be some projects that would require something bigger than SMP but not quite the power of a 4p rig - That way points won't get so out hand either. As we have seen a BA12 WU on 48cores at up to 3Ghz just gives silly PPD
Re: Bigadv reform
Posted: Wed May 30, 2012 7:48 pm
by Jesse_V
I've never had the fortune of owning a monster machine capable of running bigadv, so my pathos is kind of weak here. Still, I would like to point out the lengthy thread in response to the blog announcement about the 16-core minimum. viewtopic.php?f=16&t=20036&start=0 You may notice that it was originally titled "Really" which in one word really exemplified the feelings of some (not all) of the donors. I understand the reasoning behind the change and agree with it, and as Dr. Pande pointed out in his post
http://folding.typepad.com/news/2011/11 ... -2012.html its clear that such changes are "disruptive" and should be minimized. His follow-up post is here:
http://folding.typepad.com/news/2012/02 ... llout.html. So even though it doesn't apply to me, I still think changing it to a 24-core minimum would be a bad idea.
A benchmark would be nice. Probably one of the easiest solutions is to make a small program that the user can download, and using a similar instruction set and memory load it will crunch numbers for a few minutes and then tell you if you qualify for bigadv. Maybe it could go something like this: 1) User requests that they'd like to use the bigadv flag, 2) V7 client provides a download link, or even better it downloads the latest benchmark work and runs it itself, 3) if the machine passes the benchmark, V7 applies the requested bigadv flag, 4) V7 downloads a bigadv WU from the AS and carries on from there. This benchmark implementation wouldn't require any server-side changes (which as I understand are difficult to implement as they require server downtime and whatnot). Clearly the core count method is the simplest, but I cast my vote for only a deadline, because AFAIK that's the only thing that matters in the end. If users want to enable hyperthreading and do other tricky stuff with their CPU to pass the benchmark, then they'll be able to meet the WU deadlines and everything will work out. The benchmark could be updated once in a while as time goes on, and people could be notified of the update so they can retest their systems. Thus, this would be mostly client-side, so the only drawback is that you'd have to convince the PG and particularly jcoffland that their efforts are worth it.
Re: Bigadv reform
Posted: Wed May 30, 2012 7:58 pm
by 7im
No convincing needed, but V7 is the current priority.
From that blog post...
VijayPande wrote:As a side note, we recognize that the number of cores is a somewhat crude measure for system performance. Long-term, we have some ideas on how we'd like to improve this and use better metrics. But in the near term, we are using this admittedly imperfect metric.
Not sure if their ideas include a client side benchmark, but they do plan to make improvements in this area.
Re: Bigadv reform
Posted: Thu May 31, 2012 4:59 pm
by bruce
The V6 client is no longer being revised so there will be no new configuration options added to that client. Parameter strings like -bigadv12 or -bigadv16 (or however you choose to spell them) are not going to be possible as parameter strings. There are still some development plans for the V7 client, so changes to it MIGHT happen. The statement quoted above by 7im on the previous page indicates that the Pande Group has some plans which might or might not involve new choices by donors or automatic choices from some kind of improved benchmarking software.
Re: Bigadv reform
Posted: Fri Jun 01, 2012 12:24 am
by P5-133XL
Personally, I think a reasonable solution is to kill the core minimums and redesign the points for bigadv's to produce fewer points than a normal SMP unless they can easily beat the first deadline. Something like the base of 1/2x the value of a SMP, a relatively short deadline, but with a larger K value so that only those with HW that can easily beat the first deadline are handsomely rewarded for running bigadv's. Then leave the bigadv choice up to the judgement of the user as to whether it is beneficial to run. With the knowledge that if they can't easily beat the deadlines they will lose points as compared to SMP, I would argue that the vast majority with borderline HW or those that are not full time folders would then choose not to run them. I believe that the users judgement is likely better than any benchmarking process that Stanford would create. Since this new formula makes the exponential value of the bonus even more extreme, I think there would also need to be a cap on the points for fear that the truly massive HW doesn't make a normal folding contribution look like nothing.
I recognize that this solution may not work, depending on Stanford's needs. It all depends on the effect on the science of the people that are not making this choice based on points and can't make the deadlines or are barely there. There will always be a few that make dumb decisions...