Page 3 of 3

Re: Dual X5650 giving me 20k PPD, why so slow?

Posted: Mon Nov 09, 2020 11:48 am
by Darth_Peter_dualxeon
Just another result:
CPU:12 alone ~ 75k PPD (work unit was 4.5% complete)
CPU:12 while GPU slot running: ~ 64k ppd only (work unit 14% complete)

So, actually I realized, that the FPU (floating point unit) doing the science maths (that is big part of workunits) does not have hyper-threading,and practically there is only one per physical core in these Xeons. And it's possible to get most of the PPD from only one thread per physical cpu core. Somehow the GPU thread still manages to slow things down a little.(does that thread use FPU too?)

CPU:18 slot alone: 78k PPD (work unit 13% ready now ) while I'm seeing much higher CPU usage.
CPU:18 slot while GPU slot is working too, is ~ 62k PPD (workunit is at 18% now)

Re: Dual X5650 giving me 20k PPD, why so slow?

Posted: Mon Nov 09, 2020 6:03 pm
by bruce
Darth_Peter_dualxeon wrote:Just another result:
CPU:12 alone ~ 75k PPD (work unit was 4.5% complete)
CPU:12 while GPU slot running: ~ 64k ppd only (work unit 14% complete)

So, actually I realized, that the FPU (floating point unit) doing the science maths (that is big part of workunits) does not have hyper-threading,and practically there is only one per physical core in these Xeons. And it's possible to get most of the PPD from only one thread per physical cpu core. Somehow the GPU thread still manages to slow things down a little.(does that thread use FPU too?)

CPU:18 slot alone: 78k PPD (work unit 13% ready now ) while I'm seeing much higher CPU usage.
CPU:18 slot while GPU slot is working too, is ~ 62k PPD (workunit is at 18% now)
The "free" GPU thread does use the FPU a small percentage of the time. When the FAH code reaches a checkpoint, it compares the Free Energy in the primary calculation (OpennMM, in that case) with an alternate energy calculation on the "reference platform" ... which turns out to be on an external CPU thread or two. I'm not sure if it's the same thread that's a source of data transfers or an additional thread.

That calculation is brief but it'll probably be faster with extra "idle" CPU threads. It's small enough that most people don't notice it. If Intel iGPs are ever supported by FAHCore_a8(+), there may be additional competition for that last CPU thread. There's also potential competition from the OS (e.g.- rasterizing,etc.)

----------------------------------------------------

Without intimate knowledge of how your dual CPUs are managed by the M/B and how they are connected to memory, it's possible that two threads will be more efficient that one. The logic in the OS can also alter this effect if it's found to be true on your hardware. It's quite likely that memory cache is faster when all of the threads of one FAH Slot are confined to a single CPU and the other slot runs on the other CPU but whether that happens auto-magically or whether you have to manually manage that decision is unclear. It's equally difficult to predict if the effect is moderate or zero to small. Nevertheless, it's worth a try when you're testing your system.

Re: Dual X5650 giving me 20k PPD, why so slow?

Posted: Tue Nov 17, 2020 4:41 am
by kb9skw
Have you verified the clock speed is actually at 3.06GHz? I have a pair of X5680s in a Dell T610 and they would not clock up to speed, instead they stayed around 1.5=2.4GHz. I finally got them up to speed using a program called ThrottleStop and disabling the DB PROCHOT flag and they clock up to 26x multiplier, or 3.45GHz. At this clock speed I have seen upward of 60,000 PPD


Image

Re: Dual X5650 giving me 20k PPD, why so slow?

Posted: Wed Nov 18, 2020 7:31 pm
by Hopfgeist
kb9skw wrote:Have you verified the clock speed is actually at 3.06GHz?
Who are you talking to? Of the CPUs people have talked about, only the X5675 runs at 3.06GHz (or 3.07, sometimes rounded to 3.1, depending on which document you look at). I have two of those and I get around 100k PPD running 12 threads, one per physical core, and bound to the first (Hyper)thread of each core to avoid two threads running on the same physical core. Windows NT 4.0 could set CPU affinity, but I have no idea how to do it on Windows 10, let alone how to do it automatically every time the FahCore is started.
I have a pair of X5680s in a Dell T610 and they would not clock up to speed, instead they stayed around 1.5=2.4GHz. I finally got them up to speed using a program called ThrottleStop and disabling the DB PROCHOT flag and they clock up to 26x multiplier, or 3.45GHz. At this clock speed I have seen upward of 60,000 PPD
3.45 GHz seems to be a strange clockspeed for the X5680. It nominally runs at 3.33, with TurboBoost up to 3.6, if not all cores are active. Keep a close eye on your core temperatures when disabling CPU features.

Given that the X5680 should be roughly as fast at 3.33 GHz as the X5675 at 3.06 GHz (or so Passmark says), 60 kPPD seems quite low for a dual-CPU system.


Cheers,
HG.

Re: Dual X5650 giving me 20k PPD, why so slow?

Posted: Fri Nov 20, 2020 6:27 am
by kb9skw
Sorry that was for the OP, Nilem.


The X5650 has a turbo boost speed of 3.06GHz, which is why I suggested they take a look at what the actual clock speed the system is running at.

My Dell likes to put the CPUs at 26x multiplier when under load, that is the 3.45GHz.

Regarding the affinity. I assume node 0 and node one are the two CPUs, so if I select 0, 2, 4, 8..... it will avoid two threads running on the same physical core.

Re: Dual X5650 giving me 20k PPD, why so slow?

Posted: Fri Nov 20, 2020 7:29 am
by Hopfgeist
kb9skw wrote:Sorry that was for the OP, Nilem.


The X5650 has a turbo boost speed of 3.06GHz, which is why I suggested they take a look at what the actual clock speed the system is running at.
I see. But the turbo boost is only used when just a small number of cores is active, and that is not a typical FAH workload.
My Dell likes to put the CPUs at 26x multiplier when under load, that is the 3.45GHz.
Is that a setting of that TechPowerUp "ThrottleStop" tool? I guess if cooling is guaranteed that may work if it is within the range between normal frequency and maximum turboboost frequency.
Regarding the affinity. I assume node 0 and node one are the two CPUs, so if I select 0, 2, 4, 8..... it will avoid two threads running on the same physical core.
That may depend on the way the operating system enumerates the processor threads. My operating system first lists all the first threads of each physical core, and then the second threads of each core, thus I set affinity to cores 0,1,2,3,4,5,6,7,8,9,10,11.

On my system:

Code: Select all

cpu00: Socket 0, Core 0, Thread 0
cpu01: Socket 0, Core 1, Thread 0
cpu02: Socket 0, Core 2, Thread 0
cpu03: Socket 0, Core 3, Thread 0
cpu04: Socket 0, Core 4, Thread 0
cpu05: Socket 0, Core 5, Thread 0
cpu06: Socket 1, Core 0, Thread 0
cpu07: Socket 1, Core 1, Thread 0
cpu08: Socket 1, Core 2, Thread 0
cpu09: Socket 1, Core 3, Thread 0
cpu10: Socket 1, Core 4, Thread 0
cpu11: Socket 1, Core 5, Thread 0
cpu12: Socket 0, Core 0, Thread 1
cpu13: Socket 0, Core 1, Thread 1
cpu14: Socket 0, Core 2, Thread 1
cpu15: Socket 0, Core 3, Thread 1
cpu16: Socket 0, Core 4, Thread 1
cpu17: Socket 0, Core 5, Thread 1
cpu18: Socket 1, Core 0, Thread 1
cpu19: Socket 1, Core 1, Thread 1
cpu20: Socket 1, Core 2, Thread 1
cpu21: Socket 1, Core 3, Thread 1
cpu22: Socket 1, Core 4, Thread 1
cpu23: Socket 1, Core 5, Thread 1
I don't know nearly enough about Windows to make any guess how it is done there, or if it is defined by Intel.

Cheers,
HG.