Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

tear · Post by **tear** » Sun Nov 23, 2008 11:47 am

HaloJones wrote:Runs Win-SMP client with MPIEXEC 24/7.

That's a valid point (assuming it's true, no Win-SMP here).
Believe me, it's almost linear with [otherwise idle] Linux-SMP.

Anyway, I'm afraid we've put to much into equation by now

Cheers,
tear

EDIT: ok, it may need some extra tinkering (no pun intended) but the conclusion (better double clock than twice the cores) IMNSHO remains intact

bollix47 · Post by **bollix47** » Sun Nov 23, 2008 12:18 pm

HaloJones wrote:
Scenario 1.

3.2GHz Quad-core machine running Windows. Runs Win-SMP client with MPIEXEC 24/7. gets A2 unit and takes a whopping 26hours to fold it.

When did the Win-SMP start getting A2 core WUs?

If and when it does I doubt that it would take 26 hours to process a unit(unless it's one of the 3840 point units) as native linux on a Q6600 @ stock only takes 12-13 hours(on a 1920 point unit) and although windows is considerably slower it's not twice as slow.

oops ... sorry for the double post ... hit the quote button instead of the edit button.

HaloJones · Post by **HaloJones** » Sun Nov 23, 2008 12:18 pm

Good point, but the WinSMP still raises four threads and fully occupies the four cores and achieves 1700ppd.

The two VMs running two linux SMP clients with A2s is getting 5200ppd and returning units in the same time as the win-smp. Until the Win-smp client is better, I'll keep doing what I do which is return twice the work in the same time.

two SMP clients under Linux is better than a single one under Windows *even when the linux clients get a1s*

shatteredsilicon · Post by **shatteredsilicon** » Sun Nov 23, 2008 12:55 pm

WangFeiHong wrote:I thought we had established that 1x4Ghz processors were at least, faster than 2x2ghz processors due to inter-core bottlenecks.

That's what I thought I said - 1x4GHz is better than 2x2GHz.

7im · Post by **7im** » Sun Nov 23, 2008 4:23 pm

shatteredsilicon wrote:
WangFeiHong wrote:I thought we had established that 1x4Ghz processors were at least, faster than 2x2ghz processors due to inter-core bottlenecks.
That's what I thought I said - 1x4GHz is better than 2x2GHz.

Did I miss the post where you show SMP folding numbers to back up this statement?

@ HaloJones - This thread isn't about comparing the Windows SMP client (known to be slower) to the Linux SMP client. This was about 2x2 vs. 1x4 cores. If you can use VMs to process 2 WUs in the same time as 1 WU, then do it. Well done. However, that's rarely the case were 2 take the same time as 1. Even almost the same time is acceptable and helpful. But again, the curve drops off quickly as to when that difference in time becomes less helpful. Do what you think is best.

tear · Post by **tear** » Sun Nov 23, 2008 8:44 pm

Finally we're getting *somewhere*

TBH, coming up with numbers won't be very hard [isn't thread.excitement() unusually high already?

].
Okay, I can't come with *exactly* those but I can try doing a comparison of QC 1GHz vs. DC 2GHz
(as soon as I figure out how to turn on this bloody Speedstep in Xeons) [sorry, I'm a Lo-clock guy].

Would that work?

Cheers,
tear

Jeff_Grant · Post by **Jeff_Grant** » Sun Nov 23, 2008 9:34 pm

Even if you could run it on a p4 nortwood, it would take so much longer it will have no value when it gets back. Again, you should read that quote again.

Why do guys have to do this to me, now I have to dig out a P4 northwood, LN2 and push to 8ghz. Let's see here. . .

http://forums.hardwarezone.com.sg/showt ... ?t=2009804

Jeff_Grant · Post by **Jeff_Grant** » Sun Nov 23, 2008 9:36 pm

ahh maybe it was a prescott.

HaloJones · Post by **HaloJones** » Sun Nov 23, 2008 10:21 pm

Suppose for a moment, that the SMP client could run serially instead of parallellellelly. Each thread would take exactly as long as necessary and the total time spent on the cpu would be exactly as long as the threads added together. Now suppose you have a 9.6GHz cpu. It would take no longer than the sum of each thread's time and there would be no wasted cpu time.

Now look at what actually happens. You have a 4x2.4GHz quad. No core can start each frame until the slowest thread has completed. Any time spent by any other process on any core will result in an overrun of that thread compared to the rest, with the inevitable result that some of the total 9.6GHz is wasted as cores wait at the end of their frame for the other cores to catch up.

Of course, there isn't an option to run the four threads serially so does it still hold true if they're run in parallel? Arguably yes.

When the multi-core cpu is waiting for a frame to "catch up" with the others, it can only dedicate one quarter of its power to the frame. But a single-core 9.6GHz cpu could dedicate all of its power to that catch-up and finish it quicker. Now if only you could get a 9.6GHz single-core chip!

So is a really fast dual better than a slower quad? What about a 4GHz dual against a quad-core 2.4GHz? It all would depend on how unbalanced the threads are. When I set "verbosity = 9" it is interesting to see in logs, messages referring to "58.7% time waiting" or somesuch, it indicates to me that the thread balancing is very poor and that a fast dual is probably better than a quad with the same combined speed.

But I can't prove it. Just my suspicion.

tear · Post by **tear** » Sun Nov 23, 2008 10:47 pm

HaloJones wrote:(...) it is interesting to see in logs, messages referring to "58.7% time waiting" or somesuch, it indicates to me that the thread balancing is very poor

Yup, that's the thing [well, it *seems* it is].

Code: Select all

Average load imbalance: 0.6 %
Part of the total run time spent waiting due to load imbalance: 0.4 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %  0 %

Anyway, MONEY TALKS!!^W^W numbers speak!

[I'll try to conduct actual experiment sometime today]

tear

Jeff_Grant · Post by **Jeff_Grant** » Sun Nov 23, 2008 11:03 pm

o is a really fast dual better than a slower quad? What about a 4GHz dual against a quad-core 2.4GHz?

With all other things equal, like cache? That could make a difference too.

Jeff_Grant · Post by **Jeff_Grant** » Sun Nov 23, 2008 11:07 pm

Memory the same speed?

shatteredsilicon · Post by **shatteredsilicon** » Sun Nov 23, 2008 11:09 pm

7im wrote:
shatteredsilicon wrote:
WangFeiHong wrote:I thought we had established that 1x4Ghz processors were at least, faster than 2x2ghz processors due to inter-core bottlenecks.
That's what I thought I said - 1x4GHz is better than 2x2GHz.
Did I miss the post where you show SMP folding numbers to back up this statement?

I don't have a setup directly equivalent to this to demonstrate with, but the fact that running 4x clients on a quad with each affinity bound to one core only yields massively more PPD than running 1 client on all fore cores (1 FahCore per CPU core) is pretty strong evidence of it. If the scaling was perfectly balanced under real-world conditions, then the performance would favour the setup with fewest total processes, and thus running 4x clients would be slower because there is more process switching taking place (which is overheady and slows things down), and since there is 4x the amount of data being processed, cache effectiveness is also significantly reduced, not to mention the 4-fold increase in memory bandwidth contention. The fact that despite the extra process switching overheads and more cache and memory bandwidth contention from running multiple SMP folding processes each bound to one CPU core, this setup still yields much higher throughput (at the expense of a much less increased latency) means that the MPI FAH scaling is actually pretty dire.

The problem is exactly as Halo describes it - the performance of the whole operation is limited by the performance of the slowest core, which means that the effect of other processes competing for CPU time reflects 4-fold on the folding performance, i.e. another process using up 10% of one core should slow one thread down by 10%, but because the folding speed is limited by the slowest thread, it will actually slow all four threads down by 10%.

shatteredsilicon · Post by **shatteredsilicon** » Sun Nov 23, 2008 11:14 pm

tear wrote:
HaloJones wrote:(...) it is interesting to see in logs, messages referring to "58.7% time waiting" or somesuch, it indicates to me that the thread balancing is very poor
Yup, that's the thing [well, it *seems* it is].
Code: Select all
Average load imbalance: 0.6 %
Part of the total run time spent waiting due to load imbalance: 0.4 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %  0 %
Anyway, MONEY TALKS!!^W^W numbers speak!

[I'll try to conduct actual experiment sometime today]

This measurement must have come from a dedicated machine that has practically no other load on it. As discussed, dedicated folding machines scale reasonably well with the a2 core. Run it on a machine that is used 24/7 (e.g. a server, or even a dedicated folding machine that also runs some GPU clients) and you'll start seeing massive imbalances. At anything over 50% of one core being used on a quad, it'll actually be more efficient to run the folding SMP client just on 2 cores, and leave 1.5 cores completely unused. In fact, this seems to be exactly what the process scheduler under Linux tends to do under such conditions, leading to 30-40% idle time on a machine that should in theory have all it's free cycles saturated by the CPU hungry folding processes.

tear · Post by **tear** » Sun Nov 23, 2008 11:20 pm

Relax, I heard you the first time

[all my folders are dedicated].

Stay tuned,
tear

Folding Forum

Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?

Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?