Page 2 of 4

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 8:34 pm
by alpha754293
7im wrote:I think you missed the main idea of his post. Their recommendation is the run one fahcore per physical cpu core. Intel's HT's are not physical cores, they are virtual.

So if you need to run 2 clients with -smp 4 to use up 8 cores, then that's the way to do it consistently.
Actually, no I didn't miss that.
VijayPande wrote: We need to research i7 hyperthreading more to be sure
alpha754293 wrote: I agree with Dr. Pande that new testing is required for the Core i7's implementation of HTT.
VijayPande wrote: ...but I bet it will be similar.
I think so too.
VijayPande wrote: But for now, our preference is one physical processor core per FAH core.
alpha754293 wrote: *somewhere else in the forum*
HTT should be disabled.
(I'm pretty sure that I said that, or at least something to that effect, with the added caveat that more testing is required (actual testing) and not just my simulated/emulated HTT testing.)

And with two "-smp 4" clients, I am running one FahCore per physical core on my system.

So....what did I miss?

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 9:56 pm
by 7im
alpha754293 wrote:So....what did I miss?
Vijay's post applies to your question below as well. More cores does not change the standard recommendation (1:1)...
alpha754293 wrote:On another note though, he's yet to chime in on native 8-core (or greater) systems.

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 10:02 pm
by alpha754293
7im wrote:
alpha754293 wrote:So....what did I miss?
Vijay's post applies to your question below as well. More cores does not change the standard recommendation (1:1)...
alpha754293 wrote:On another note though, he's yet to chime in on native 8-core (or greater) systems.
Well...not really.

Because, as I've mentioned, you can't guarantee that the SMP client is going to always pick up a WU that uses the a2 core that spawns 8 processes.

Therefore; would he rather two clients, each running with "-smp 4" or would he rather to make a guarantee (that he probably can't do) that the single "-smp 8" client will ALWAYS pick up a WU that will actually be able to use all 8 cores, 100% of the time?

Both are in compliance with his recommendation.

In the first instance, the two "-smp 4" can be running both a1s, both a2s, or one of each. It's still in compliance.

But, running an a2 with only 4 cores would be roughly half the WU speed.

But running ONLY a single client means that it is very possible to process 25% of the possible WUs within a given time period.

Dr. Pande's "rule" (using the term loosely now), doesn't address any of the specifics.

Or if it is in strict accordance with all of the "rules", then in the past three days, I would have only have completed ONE WU instead of FOUR.

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 10:17 pm
by 7im
Yes, really.

Use some common sense. If you can't guarantee 1 client will always get a2 WUs so you can run with smp -8 and use all 8 cpu cores all the time, AND you CAN guarantee that 2 clients with smp -4 will always use all 8 cores all the time, then the only logical choice to make is running 2 clients with -smp 4, regardless of the WU mix.

When the a1 core finally goes away, and we only have a2 cores, then the answer will change.

I don't know why I even try... run 8 CPU clients for all I care. That is in "compliance" with the rule too. :roll:

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 10:32 pm
by alpha754293
But see, Bruce had mentioned to me that (and also according to your referenced post by Dr. Pande) that WU speed is important as well.

two "-smp 4" clients running a2 is approximately half the WU speed.

But I can't spend forever babysitting the system, swapping between 2x"-smp 4" and 1x"-smp 8" constantly pending the WU that it gets picked up.

So on one hand, it counter's Dr. Pande's "rule". And the other is in compliance.

(BTW...I tried to explain that to Bruce last night/this morning and I dunno, we had a LONG discussion about it, and he didn't really seem to get it.)

I think that from now on, I'm just going to cite you instead. Any gripes he has about that, I'll just refer them to you. :D

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 10:40 pm
by 7im
I'm flattered. But no comment about running 8 CPU clients? ;)

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 10:42 pm
by alpha754293
7im wrote:I'm flattered. But no comment about running 8 CPU clients? ;)
not PPD effective.

If Dr. Pande's ok with 1/4 to 1/8th slowdown on the WUs....sure! I'll set it to run "-smp 1". :D Heck, I'll set it to run as many "-smp 1" clients you want.

If there's uniprocessor WUs, then they're DEFINITELY not PPD effective at all. :D

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 10:59 pm
by 7im
Stop changing the subject. PPD has nothing to do with "compliance." Vijay didn't even mention PPD. If PPD where such a big factor, why didn't you bring it up before now?

And since you did bring it up, which earns a higher PPD, and by how much? 1 x smp -8 on a2, or 2 x smp -4 on a2?

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 11:18 pm
by alpha754293
7im wrote:Stop changing the subject. PPD has nothing to do with "compliance." Vijay didn't even mention PPD. If PPD where such a big factor, why didn't you bring it up before now?

And since you did bring it up, which earns a higher PPD, and by how much? 1 x smp -8 on a2, or 2 x smp -4 on a2?
Because they're conflicting interest.

I think that sometimes people presume/assume that higher PPD is better for science. But that isn't always necessarily the case.

HTT disabled complies with Dr. Pande's one core per physical core rule.

1x8*a2 complies with Dr. Pande's one core per physical core rule AND WU speed.

Corollary:
Cannot be guaranteed a2 WU.

2x4*a1 complies with Dr. Pande's one core per physical core rule AND WU speed. (If a2 cannot be guaranteed.)

2x4*a2 compiles with Dr. Pande's one core per physical core rule, BUT not WU speed.

1x8*a2 - 5945.81 PPD
2x4*a2 - 6355.95 PPD (3165.8 + 3190.15)

2x4*a2 is about 7% faster, but not the fastest that each WU can run at.

8x1*a2 is probably the slowest WU speed you can get. No data on speed or PPD.

See how between WU speed (or science), total utilization/efficiency, and PPD is this great ugly mess of a triangle?

You COULD always adjust the weighting functions in order to favor two of the 3.

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 11:28 pm
by 7im
Well stated!

So in your estimation, what is the "optimal" SMP client configuration on an 8 core i7 chip given the above information?

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 11:35 pm
by alpha754293
7im wrote:Well stated!

So in your estimation, what is the "optimal" SMP client configuration on an 8 core i7 chip given the above information?
Well...first off...i7's AREN'T 8-core processors.

They're 4. HTT doesn't count as "cores" because they're logical and do not possess any of it's own (discreet) FPUs.

Per Dr. Pande's "rule", HTT should be disabled.

(However, testing still remains to be conducted in order to verify that HTT is detrimental to the progress of F@H.)

Therefore; until HTT can be tested at large, in a proper, controlled setting; it should stay off for the purposes of F@H.

At which point, the default "-smp 4" should be used.

However, SHOULD the tests results from HTT demonstrate a sizable performance increase, then it should be running with "-smp 8" (such that if there wasn't a a2 WU available, it would be running at the fastest possible speed with an a1 core).

IF the results come back and indicate that HTT has negligble or marginal benefits or losses, then HTT should be disabled, and "-smp 4" takes precedence.

I have no official recommendations for multi-socket Xeon X5570 platforms as of yet. (Not until the results from HTT comes back).

*edit*
My current estimates is likely to put between 5-30% performance increase for WUs. The base WU FPU hasn't really changed, but it's the instruction cycling and ordering is what's really espected improve with HTT enabled on the Core i7 and X5570.

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 11:45 pm
by 7im
alpha754293 wrote:Well...first off...i7's AREN'T 8-core processors.
Sorry, I always forget which of the Intel processors have been released to the public, and which aren't. I must have been thinking about the 8 core chip Intel will be showing off next week at the International Solid-State Circuits Conference.

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 11:45 pm
by alpha754293
One alternative configuration that is pending for testing (after the current WU completes) is to actually start two "-smp 8" clients.

The theory is that if there are two a2 WUs, they'd both be competing for computational resources such that it'll effectively run at about the same speed as 2x4*a2.

If there's a1+a2, then it'd be able to run the a2 at I think 66-75% of full speed, but at least 50% faster than "-smp 4". (I think, if my calculations are correct.)

if there's two a1, it's going to run at the fastest speed they can possibly anyways at 2x4*a1.

Testing is scheduled to commence at around Jan 30 0230 UTC 2009

Re: my recommended configuration for >= 8-cores

Posted: Thu Jan 29, 2009 11:48 pm
by alpha754293
7im wrote:
alpha754293 wrote:Well...first off...i7's AREN'T 8-core processors.
Sorry, I always forget what Intel has released, and what isn't. I must have been thinking about the 8 core chip Intel will be showing off next week at the International Solid-State Circuits Conference.
Well...8-core processors aren't new.

Commodity, or COTS 8-cores are.

But considering that they're not scheduled to release dual-channel versions of the Core i7 until 4Q09, I wouldn't be surprised if 8-cores is at least a year off.

If it's a native 8-core processor, then the recommendations for my native 8-core system still applies.

Re: my recommended configuration for >= 8-cores

Posted: Fri Jan 30, 2009 2:18 am
by codysluder
alpha754293 wrote:I don't have an official statement with regards to HTT only because I do not have a system to test HTT with.

(I wished I did, but the closest thing I've got is Q9550, and there might be some substantial differences in architecture between that and the Core i7).

However, in my simulated results (I have a server with 8 native cores, and it is NOT HTT capable (AMD)), I have found that you take a 4% performance penalty when I ran with simulated HTT.
Well, since HT takes a ~45% performance hit straight out of the box due to saturation of the FPU, I don't beleive your "simulated HTT" is meaningful.
alpha754293 wrote:
7im wrote:Well stated!

So in your estimation, what is the "optimal" SMP client configuration on an 8 core i7 chip given the above information?
Well...first off...i7's AREN'T 8-core processors.

They're 4. HTT doesn't count as "cores" because they're logical and do not possess any of it's own (discreet) FPUs.

Per Dr. Pande's "rule", HTT should be disabled.
The problem here is that your "simulated HTT" is wrong.

With an i7, you have four physical cores. If you're assigned an A1 WU, you want to have four FahCores running in four physical cores, and there is will be no wasted processing power from the other four logical cores. If you happen to be assigned an A2 WU, then the differences between -smp 4 and -smp 8 are quite small because of the 45% HT loss mentioned above. You might as well use -smp 8 and gain the ~5% benefit of running with HT.

The real problem is getting the A1 WU to assign itself to the proper 4 out of 8 logical processors that represent one per physical processor. That can be done with Affinity Changer unless you know a better way.

By the way, HTT is HyperTransportTM Technology. What kind of credentials do you have to tell us otherwise?