benchmarking F@H

alpha754293 · Post by **alpha754293** » Fri Jan 23, 2009 10:39 pm

Next update in approx. 16 hours.

fah1 -- p2671 r1 c21 g57 FAILED. Reported viewtopic.php?f=19&t=8092
fah2 -- started with "-smp 8" flag, picked up a1 core, so it's only running on 4 cores. (boohoo

)
fah3 -- created since fah2 only running on 4 cores. currently started with "-smp 4" flag. scheduled to be restarted with "-smp 8" flag since apparently it picks the cores at random and/or near random.

It's going to have a slight bit of impact on the overall benchmarking process but hopefully with one using 4-cores and the other switching to 8-cores, it'll reduce the system idle %.

7im · Post by **7im** » Fri Jan 23, 2009 10:43 pm

Check your bugs list again. Switching from smp -4 to smp -8 might cause data corruption.

alpha754293 · Post by **alpha754293** » Fri Jan 23, 2009 10:45 pm

7im wrote:
P5-133XL wrote:Download a project; Make a copy of the folding folder, paying attention to the date; Let the project complete and send results; then for all bechmarks on that machine or others, simply change the date on the machine back to the download date; copy the copy and fold from there.

There is no interference to the project, because the data has been returned and Stanford can continue forward with no delay. if the copy is ever accidentally returned then it will be rejected as a duplicate. The folding can be repeated over and over because the computer date has changed, so folding will not think the deadline has passed and quit.
Two suggestions to improve upon the process. Disable the deadlines in the client setup, then it doesn't matter what the system date is, and you no longer need to worry about that, change it, etc. Also consider running with the "Prompt before Connect" setting. The client will finish the WU, and prompt you to connect. Cancel out so you don't waste time uploading a duplicate, and don't waste the bandwidth on the internet or on the Stanford servers. The WU has to completely upload to Stanford before it gets kicked as a dupe. Just don't upload and save the bandwidth for the rest of us uploading work that is not yet done.

Thanks.

(I was going to point you to the previous thread with tips on benchmarking using the same WU repeatedly, but it was in the previous iteration of this forum. Nevermind.)

good tips. Thanks mate. I'll definitely keep that in-mind.

Side question though, because I've already started (and presumably finished by now) my original benchmark WU. Does that mean that I would have to reconfigure the client so that it picks up a new, deadlineless WU?

I thought that it said that they don't allow that anymore? (I forget where I read that, but someone else pointed that out to me.)

http://folding.stanford.edu/English/FAQ-main#ntoc27 via viewtopic.php?f=16&t=7999

alpha754293 · Post by **alpha754293** » Fri Jan 23, 2009 10:47 pm

7im wrote:Check your bugs list again. Switching from smp -4 to smp -8 might cause data corruption.

the other core that's running (originally with the "-smp 4" and switching to "-smp 8") is an a2 core.

That's why I figured that I should be safe to do so. (Actually, by the time you posted, I had already switched it. I was just waiting for the checkpoint to be written to disk before I did that.)

alpha754293 · Post by **alpha754293** » Fri Jan 23, 2009 10:54 pm

more sidenotes:

original fah a2 -smp 8 load average: 7.20
current fah a1 -smp 4 & fah a2 -smp 8 load average: 9.40

interesting enough, the system is reporting like 20% idle (vs. 10% before), but it's definitely doing more work though.

Post by **bruce** » Fri Jan 23, 2009 11:43 pm

alpha754293 wrote: apparently it picks the cores at random and/or near random.

For a specific client on specific hardware, there is a limited number of projects that can be assigned. Within that set, work is randomly assigned (though there is also a priority factor and the conditions of each server change over time). Each project uses a specific core so the randomness is from the project selection process.

alpha754293 · Post by **alpha754293** » Fri Jan 23, 2009 11:47 pm

bruce wrote:
alpha754293 wrote: apparently it picks the cores at random and/or near random.
For a specific client on specific hardware, there is a limited number of projects that can be assigned. Within that set, work is randomly assigned (though there is a priority setting and the conditions of the servers changes over time). Each project uses a specific core.

well..I ask/say that because apparently it kinda seemed to ignore the "8" part in the smp flag (and picked up the a1 core).

Really wished that it would have actually read whether it's an 8 or a 4 and selected the appropriate core. oh well. c'est la vie je suppose.

Post by **bruce** » Fri Jan 23, 2009 11:58 pm

The WU selection process knows how many cores your OS reports, not how many are selected in the client. As long as there is a mix of work for A1 and A2 cores, you'll get a mix of assignments. Eventually all a1 projects will be completed, but that may not be soon since the Windows clients can only run FahCore_a1 and they still need some help from Linux/MacOS, particularly if the servers happen to be short of A2 projects.

alpha754293 · Post by **alpha754293** » Sat Jan 24, 2009 12:12 am

bruce wrote:The WU selection process knows how many cores your OS reports, not how many are selected in the client. As long as there is a mix of work for A1 and A2 cores, you'll get a mix of assignments. Eventually all a1 projects will be completed, but that may not be soon since the Windows clients can only run FahCore_a1 and they still need some help from Linux/MacOS, particularly if the servers happen to be short of A2 projects.

*shrug* dunno. I guess that it depends on the luck of the draw and what's in the queue.

I was hoping to bench with just a2 cores, but I can't control that. so oh well. I'll just let it run for now.

Slight update though:

Looks like that I wont' be able to test CNL cuz apparently it's for Cray. darn!

Next up: CentOS 5.

alpha754293 · Post by **alpha754293** » Sat Jan 24, 2009 5:58 am

more preliminary results:

following the failed p2675 WU, I've gone ahead and started up a 4-core A1 and an 8-core A2.

load averages are 8.86, 9.12, 9.21

(albeit I was just doing a bit of reading into that and it doesn't seem like that it actually describes the full picture, but I can do more checking around on this to see if it makes sense or if it's a good/bad thing.)

In any case, here's what I've been able to find out so far.

For p2671 r20 c22 g69
It took 8 hours for it to run it.

While yes, I agree that I can't really compare different projects, but here's what I think is happening (based on the PPD calculations).

The initial calculations/estimates for the PPD with p2671 r20 c22 g69 was between 5760-5800. That would make sense given that that WU was credited as 1920 points, and at approximately 8 hours per WU (of that project), 5760 would be a perfect match for it. (3*1920 = 5760.)

However, now, the A2 core is running p2669 r16 c136 g56 (credit: 1920 points) but it's reporting the PPD to be 3645.89. That would mean that based on the PPD numbers, I'd only be able to work on approximately 1.9 p2669 WUs per day.

The A1 core however is running p5101 r0 c157 g55 (credit: 2165 points), but that's reporting the PPD to be 1028.35. By that calculation, it looks like that it would take approximately 2.1 days per p5101 WU.

Total est. PPD running it this way: 4674.24

Ergo, current preliminary results would point this being a bad idea. (Yes yes, I know. Some people here are prolly going to be like..."duh?" BUT...here's the numbers that may be able to prove it). However, this is only with one 4+8-core WUs for now, (hence why it's preliminary). We will need to wait and see a bit more to see if this pattern continues.

NOTE: As many of you have also said -- the A1 core is known to be slower in PPD. So, perhaps the real test would be when my system is running with two A2 cores and see how that fairs.

It is still possible that HTT will be bad...even on all current Q9-series and Core i7s. (WU slowdown. Possibly minimal impact on total overall PPD).

Testing still contiues. Will report back with more info once I get it.

alpha754293 · Post by **alpha754293** » Sat Jan 24, 2009 6:45 am

ha ha...i'm an idiot. I installed the wrong OS.

Original test (all the results listed above) are SuSE Linux Enterprise Desktop (not server -- which is what I had intended on testing. oops. oh well.) I only found out because I couldn't telnet into the system and some of the prereqs for installation weren't met and I didn't really want to spend a heck of a lot of time trying to go chase down the dependencies and stuff.

NOW I'm reinstalling with SLES (but I did save the old F@H WUs).

alpha754293 · Post by **alpha754293** » Sat Jan 24, 2009 2:14 pm

Okay...here's what I've been able to find out/ascertain so far.

Running more than one client at a time, bad. Presumably even for HTT (although I can't verify that).

Even with 4xA2 + 8xA2 running together, it's getting only about 96% of the possible PPD if I just ran a single instance of 8xA2.

(tested it using the same p2669 r7 c24 g62 -- accidentally didn't save the previous p2669 WU properly, but two instances of the same will work just fine.)

Flathead74 · Post by **Flathead74** » Sat Jan 24, 2009 3:34 pm

For benchmarking accuracy, one should not only use the same WU, but also test the same steps.
Different steps within any given WU can take unlike times to process.
Sometimes "folding events" can take place in one or more steps which can throw comparisons off.

http://fahwiki.net/index.php/Folding/Aggregation_Event

When I am benchmarking, for comparison I usually use the same three or four steps, from the same WU.

Of course, ymmv.

alpha754293 · Post by **alpha754293** » Sat Jan 24, 2009 10:51 pm

Flathead74 wrote:For benchmarking accuracy, one should not only use the same WU, but also test the same steps.
Different steps within any given WU can take unlike times to process.
Sometimes "folding events" can take place in one or more steps which can throw comparisons off.

http://fahwiki.net/index.php/Folding/Aggregation_Event

When I am benchmarking, for comparison I usually use the same three or four steps, from the same WU.

Of course, ymmv.

Yea...I ended up doing that (give or take a few) because when I saved the checkpoint onto my other server, I thought that I had saved two of WUs from the same project, but I ended up goofing it.

So, now I am only using just that one instead.

So, I am trying to isolate, keep as many things constant and under control as possible.

Most of the stuff so far are still rather preliminary only because I want to return the WUs back to F@H first, and THEN do the actual bulk testing by rolling back the clock.

That should truly ensure that the testing is fair.

But on the issue of "to HTT or to not HTT"...from my emulated tests, HTT would seem to be a bad idea. Some people would think that they're getting a WU speed and/or PPD bonus. But it turns out, they might not be. (Again, I dont' have an actual HTT system to test it on for certain), but because my system naturally had idle CPU time, I started a separate 4-core client in order to try and take up that idle time. While the load averages went up, in the end, the WU speed went down, and so did the total PPD between both clients. (Which according to the functional theory of HTT, is what is essentially does; and therefore, I think that it's a pretty good approximation of HTT without an actual HTT-capable machine.)

In another test, and I'm not entire sure/convinced that it's working, but I think that I might have managed to get the Linux cores to work on the Windows WUs (using the Linux version of the FahCore_a1). I'm not sure if that would be a valid idea, but I wanted to see if I can do it, mostly just cuz I had outstanding Windows SMP WUs left.

Hopefully it works. But I can't really tell for sure yet.

It looks like that it might be working for one of the two WUs. *shrug* oh well. It was at least worth a shot.

7im · Post by **7im** » Sun Jan 25, 2009 1:56 am

I recommend that you don't mix work units using a1 cores with a2 cores. They fold at different speeds and complicate the results. Use one or the other, or both seperately.

Folding Forum

benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H

Re: benchmarking F@H