Bigadv reform

Grandpa_01 · Post by **Grandpa_01** » Fri Jun 01, 2012 2:42 am

Stanford has pretty much stated that they pretty much do not care about cores it is just a guideline what matters is the deadline if a machine just barley meets the deadline that is fine it suits there needs. Faster is better but is not a requirement thus the QRB, if you want the hardware that can greatly exceeds the deadline then you will have to give the rewards that entice people to run them, if it is unimportant then you will not, Pretty simple.

The best way to achieve this is to either set the deadline where you want it or use a benchmark system the both do and achieve the exact same things. As far as where the value is set that is up to Stanford, always has been always will be. If the lower it too far and it hurts the projects the can and will raise it.

One thing everybody needs to understand is there never was a short supply of bigadv WU's there always has been a more than ample abundance of them, Stanford never said there was a shortage of them, that was just something someone guessed at and stated it as fact which is a common thing to happen. What there was is a very large over abundance of smp WU's and some server issues going on at the same time.

There is really nothing that needs to be done to the Points system, just look at the growth in F@H the point system works, What needs to happen is Stanford needs to quit listing to those of us that like to complain about point system and set the points to what the value of the science is to them, I am pretty sure they have been doing this long enough to know what works and what does not. It is utterly amazing how some of us think we know what is better for F@H than the owners of the project do.

TonyStewart14 · Post by **TonyStewart14** » Sun Jun 03, 2012 9:23 pm

I posted a thread in the bigadv subforum asking if anyone is actually running bigadv with fewer than 24 threads, as this would help shed some light on whether the minimum should go up or not.

7im · Post by **7im** » Mon Jun 04, 2012 1:08 am

Actually, no. It won't answer that question because some of the BA-8 and BA-12 workunits are still available. The first two responses make that clear. Sorry.

TonyStewart14 · Post by **TonyStewart14** » Mon Jun 04, 2012 4:41 am

Well, it actually did clear up some things, albeit not the way I initially expected. There are in fact some people who run bigadv-12 with dual-hex Xeons on the physical cores only, so I just had to refine the question to specify bigadv-16 (project 8101) only and exclude -8 and -12.

7im · Post by **7im** » Mon Jun 04, 2012 5:09 am

Still not a good test. Even if they could run 8101s, they would choose to run 690x WUs which have abnormally higher points.

Nathan_P · Post by **Nathan_P** » Mon Jun 04, 2012 9:22 am

7im wrote:Still not a good test. Even if they could run 8101s, they would choose to run 690x WUs which have abnormally higher points.

Not everyone would, I have my faster rig running 8101 and my slower rig running 690x. And there is useful data out there regarding 8101 and its tpf on various cpu's

lightminer · Post by **lightminer** » Tue Jul 10, 2012 3:54 am

Hey guys, so I discovered something kind of odd, but it looks quite intentional. So I have a dual-socket server currently running 128 GB RAM (obviously not a dedicated folding machine, but we let it fold uninterrupted quite a bit) and a single 2687W processor (the fastest non-exotic non-OC'd you can get right now as far as I know - faster than Xeon 2960) that with HT on is 16 threads. It seems to run an 8101 in 2.43 days, but 8101 has a deadline of 2.4 days.

So after three days you get 22k points for 3 days of folding. So the fastest processor you can buy currently gets around 7k points per day. My laptop gets 10k per day!! So I guess I'm turning big-adv off. I wish I had figured this out a week ago! It is very confusing to figure out because the console says: 111154.49 ppd, but in the log it says: 00:48:13:WU00:FS00:Final credit estimate, 22607.00 points and it would be more clear from hardfolding daily actual ppd stats but I have a couple servers under the same name, but I now am fairly sure that I can see the 22k bump every 3 days or whatever.

I'll add this comment to Tony's thread above if no one else has observed this behaviour in that thread. EDIT - that thread is locked.

EDIT - just FYI got a 6903 on the same machine - 44 min 07 sec tpf so if I did the math right that is 3.06 days, under the 5 day limit for 6903.

Post by **Jesse_V** » Tue Jul 10, 2012 5:54 am

Lightminer, since you missed the Preferred Deadline, you only got base credit. Kind of a shame, because machines that can run bigadv WUs are pretty rare due to the massive computational demand and the short deadline. I think it'll pull some fantastic PPD running SMP, and I'm impressed by your processor, but bigadv doesn't seem to be working well for you.

My laptop peaks at 4.8k PPD...

7im · Post by **7im** » Tue Jul 10, 2012 6:17 am

Please read the BA program recommendations carefully. 16 "cores" are recommended, not 16 "threads" and ultimately, each donor is responsible for making sure the hardware meets the deadlines, whether it be GPU, SMP, or bigadv.

Post by **bruce** » Tue Jul 10, 2012 12:43 pm

lightminer wrote:Hey guys, so I discovered something kind of odd, but it looks quite intentional. So I have a dual-socket server currently running 128 GB RAM (obviously not a dedicated folding machine, but we let it fold uninterrupted quite a bit) and a single 2687W processor (the fastest non-exotic non-OC'd you can get right now as far as I know - faster than Xeon 2960) that with HT on is 16 threads. It seems to run an 8101 in 2.43 days, but 8101 has a deadline of 2.4 days.

So after three days you get 22k points for 3 days of folding. So the fastest processor you can buy currently gets around 7k points per day. My laptop gets 10k per day!! So I guess I'm turning big-adv off. I wish I had figured this out a week ago! It is very confusing to figure out because the console says: 111154.49 ppd, but in the log it says: 00:48:13:WU00:FS00:Final credit estimate, 22607.00 points and it would be more clear from hardfolding daily actual ppd stats but I have a couple servers under the same name, but I now am fairly sure that I can see the 22k bump every 3 days or whatever.

I'll add this comment to Tony's thread above if no one else has observed this behaviour in that thread. EDIT - that thread is locked.

EDIT - just FYI got a 6903 on the same machine - 44 min 07 sec tpf so if I did the math right that is 3.06 days, under the 5 day limit for 6903.

The requirements for BigAdv have been made more stringent. Although your machine is very powerful, it's just slow enough to miss the new Timeout requirements for the BigAdv bonus. Project 6903 is one of the old projects benchmarked under the old standard, and when those projects are gone, they're gone. The poll at the top of this topic is asking whether Stanford should continue to assign the new projects like 8101 and let you keep missing the deadline or whether the server should try harder to block such assignments where you're likely to get PPDs that are much lower than the estimates when you miss the Timeout.

lightminer · Post by **lightminer** » Tue Jul 10, 2012 6:10 pm

Well, right, and the other interesting thing is that while points are 'fun' I don't really care. The cliff of points for 2.43 days vs 2.4 days is not indicative at all of the value of the calculations being done to the group - obviously the science value of calcs being done in 2.39 vs 2.43 days would be the simple percent difference, i.e., something like 1% less valuable because they come in around 1% later. I'm not a point fanatic at all, just want to help the overall process taking place.

That said, I do look at points to see what the group wants from me. So as soon as I stop getting 6903s I'll turn off bigadv because what I'm 'hearing' from the points system is that they don't want a single 2687W processor running 8101s.

As to your question, and the question for the poll, I would say yeah, if a server misses a deadline X times (where X is maybe 3? Once isn't enough, maybe the person paused folidng for some reason or something happened) it shouldn't resend a project of the same magnitude to that computer. But then I would also suggest in the Control app there is a place to hit a button that sort of means 'reset what you think of me' in case someone swaps out a processor or RAM or whatever and they want to get re-graded.

So, I do things along these lines for work, and really this is a simple situation. You want the right hardware for the right job, supercomputer facilities typically have several supercomputers and they each have different advantages and disadvantages all of which appeal to slightly different problems, and the researchers need to try and match problems to hardware as much as is possible. In this situation the issue is how coarse is that calc? So we have: 1) single thread problems, 2) smp, 3) bigadv (note I'm ignoring beta to keep it simple). So if you bin everything into 3 cases, then that is all the resolution you will get in terms of efficiency. Modelling a sine wave with only 3 bins is too coarse. Maybe as time goes on there will be 10 gradations and they will just be called 1,2,3, etc to 10? How about 100? Maybe when someone starts folding there will be an app that runs for 3 - 5 minutes and decides what your number is? People at the borders of categories will always be annoyed if they look into the detail of it and discover they are at the border. Forget folding at home for a minute, just think about the largest distributed computer system in the world and how it should work and then slowly plan a course to get there. I would think it would be able to make use efficiently of all computers between some minimum config and the largest systems that would conceivably be used.

I think the specific question you mention above is no good - too simple - but in the short term we still have to live, eat, and fold. So perhaps we have to assume for now that points are aligned with scientific value? So, for example, my week of running 8101s at 0.03 days past the deadline is of extremely low value - almost totally worthless

. So, short term, yes, you should have the server block these from people who don't make the deadline.

However, that isn't interesting to me. Long term we have to have a more fluid system. Frankly my 3930k 6-core system at 4.5 GHz beats the 2687W at 8 cores, but the 6-core can't even try an 8101 (from what testing I've done comparing the processors it would come in under the deadline ) and *all* of this core-related stuff and other measures are very very basic inexact ways of classifying a system as has been discussed to death on these forums.

I think the longer term solution is a simple 5-minute gromacs run, and a grade from 1-10 or 1-100 or something. Then as the science guys submit jobs they choose a minimum number required to run that job at the speed that gets everything done reasonably. Also, you might internally say, ooh, this particular thing is super-important (new bird flu or something ravaging the world?) so lets run it only on machines that are 95+ only, you are able to really throttle up or down priority within the work queues you have - lots of control there over what is going on.

If you have that granularity you can also run things even much harder than 8101. And if you had a flexible interface where you can internally assign problems to minimum configs, then if some particular job gets slowed down you can just flip a swtich and give it to 95+ machines and catch it back up. You could use the 90+ machines either for the super-hard problems, or for ones where the queue is getting too long. It would sort of be like a cloud interface adding on-demand elasticity to the system. I think we will have a NUMA-like HP DL980 in here in a few months with 8 2690's and tons of RAM. Machines like that - as they come on board - you take a look at the range of available computers in the upper tiers and that would tell you how hard the problems can be - perhaps much harder than 8101, and having a rating on machines from 1 - 10 or 1 - 100 should be a very powerful tool for all kinds of reasons.

And the best part is that threads like on EVGA or wherever it is (haven't seen them in a while) about spoofing cores and VMWare and all that will magically go away and everyone will stop talking about them

.

Post by **bruce** » Tue Jul 10, 2012 6:21 pm

lightminer wrote:And the best part is that threads like on EVGA or wherever it is (haven't seen them in a while) about spoofing cores and VMWare and all that will magically go away and everyone will stop talking about them .

Maybe, but I'll bet that discussion is replaced with "The estimated PPD is XXXXXXXXX but I was only credited with YYYYY."

There's no really good answer to the poll -- just an individual preference.

Zagen30 · Post by **Zagen30** » Tue Jul 10, 2012 7:03 pm

I like the score idea. I'm not sure how complicated it would be to implement, but it seems like a better solution than straight-up core count (though, as has been mentioned, almost anything is). It also appears to address requests for things like keeping different bigadv levels, assuming that that's something PG would want to do.

7im · Post by **7im** » Tue Jul 10, 2012 7:09 pm

lightminer wrote:...

If you have that granularity...

That's the whole problem in a nutshell. The current software in the clients and servers do NOT have a finer granularity. It can't do it. So you get a points cliff drop off after the Timeout, and after the Deadline instead of a gradual slope. You also have a steep cliff of hardware requirements between SMP and -bigadv, instead of a fluid assignment ability.

These are known issues, and for a long time. And -bigadv is still a beta program, with obvious room for improvement.

Pande Group has already said they plan for a more a more dynamic system in the future. Also, a client side benchmark has been discussed in this forum. Both of which solves a lot of these concerns, eventually.

So yes, we all agree to agree that we need a finer grain of control in reforming -bigadv and assignments in general, with more linear points between levels of clients, not cliffs. But continuing with pages and pages of discussion here will not change anything. We already agree improvements are needed. And PG already has a general plan moving forward.

While discretion is the better part of valor, patience is a better substitute. Hence the lock on the previous thread, and the imminent one coming for this thread.

PinHead · Post by **PinHead** » Tue Jul 10, 2012 10:48 pm

It seems to me that the 16 true core minimum (not 16 threads) to participate is a pretty good recommended ( not guarenteed ) entry point and guideline to BA. There are a few sites/spreadsheets out there that track cpu time per frame, core, O.C. and what not. So if we are trying to keep someone from going and purchasing hardware that won't work or won't work in the near future, point them to one of those sites to do their homework.

That being said, I had purchased a 6272 Interlagos 16 core at 2.1 GHz. Stock clocks made the deadlines on 690x BA's. The 8101 made it by less than an hour with 2P Interlagos at 2.1GHz, so I don't think first generation Interlagos 16 core or below 2.2GHz second generation 2P Interlagos 16 core will do the trick if the 8101 is the WU of the future. So 1P quickly grew to 2P and when my wallet recovered it grew to 4P.

The goal of the research is to get to the results in a timely mannor, not to get there at the last possible minute. So no one should be targeting minimum requirements when building a machine for BA units.

(edit) reviewed an earlier post and I had already upgraded to 2P on the 8101

Folding Forum

Bigadv reform

What would be the best direction to take for bigadv? (Please read post first)

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform

Re: Bigadv reform