Multi-GPU PPD drop

gordonbb · Post by **gordonbb** » Thu Jun 20, 2019 7:19 pm

HaloJones wrote:@Nathan_P, another machine (i5-2400, P67 mojo, 1070 at the same clock, Windows10) is getting 700K with these units so I'm just wondering if the relative slowness of this dual 1070 rig could be down to the alleged latency built into Z77 boards that use a PLX chip to achieve 2 slots at x16. I'd never read about this inherent latency until today. I had a simpler Z77 board with these two 1070s but it died when the PSU blew. I replaced it with this Gigabyte Sniper G1 board that promised 2x x16 but now it may be that this is at the cost of some latency on the PCIE channels. Hopefully when some Core22 units come back I can get a better picture of how much this improves the output.

I would be surprised if the PCIe Switches chips add any latency as they are acting, in this case, as a simple static switch rather than a multiplexer. In "SLI-Capable" boards based on X470 or Z370/Z390 the PCIe switches just take the upper 8 lanes from the top PCIe3 x16 slots and switch them to lower 8 lanes of the middle slot when the motherboard detects a card inserted there.
In Higher-end boards the PLX switches actually multiplex the PCIe lanes to multiple slots and take care of arbitration etc. so these would add latency.
It is confusing as both are referred to as "Switches" whereas one is a switch and the other a multiplexer.

foldy · Post by **foldy** » Thu Jun 20, 2019 8:57 pm

You could try to remove one 1070 to see if that gives a speedup. If true then you have a pcie limit. PLX is a multiplexer which means each GPU can use x16 speed but not if all GPUs want to use x16 concurrently at the same time. So x16 speed is only when GPUs workload on pcie is alternating.

MeeLee · Post by **MeeLee** » Fri Jun 21, 2019 1:36 am

I no longer run Windows. but the 1070 can easily get higher results.
What are your GPU temps?
Try to keep them below 70, preferably below 65C if the cooling solution noise isn't too loud for you.

If your GPU temps are above 75C, I'd recommend to open up your PC, and figure out ways to optimize airflow.

Do you overclock?
Overclocking too high can cause lower results as well, as upon error, FAH will try to redo the WU from the last checkpoint, losing valuable points in the process.

Also, download GPUZ, to verify if the slot is running at the full 16x.
I presume not, because most intel systems will run 1GPU at 8x, and the second one at 4x.
PCIE 2.0 4x speed is good enough for an RTX2060 to get 1M PPD, so it should be enough for a GTX 1070.
My recommendation is to try to install Linux, and see if this makes things better.
Latency is a big thing for FAH. It sends several packages per minute. If every package is delayed by a fraction of a second, your overall score will drop a lot.
However, I think your issue is hardware (or software driver) related.

HaloJones · Post by **HaloJones** » Fri Jun 21, 2019 7:51 pm

@MeeLee, cards are custom watercooled MSI SeaHawks. with load temperatures below 50C. Cards run around 2GHz. GPU-Z says they're both running at PCI Gen 2.0 x16. If I pause one of them, the times for each percentage on the other card improve. I have Linux on other machines that only fold. This one I would prefer to keep on Windows for a variety of reasons although it is an option.

@gordonbb, the PLX implementation in early use (motherboard is a Z77 so 2012/3 period) made a clever job of making a chipset designed for 32 lanes to have 48. The CPU is a SandyBridge 2600K so doesn't support PCIE gen 3.

Post by **bruce** » Sat Jun 22, 2019 3:14 am

Is it possible that you are actually running two WUs on the same card? If somehow your FAH is mis-configured both WUs will appear to be running but they'll only get half of the resources because they're competing with another WU. That should never happen, of course.

Stop both WUs. GPU-Z should say the GPU is idle. Start one GPU. The GPU should become active. Stop that WU and start the other. The GPU activity will either be about the same as it was with the first WU ... or the GPU should be idle.

MeeLee · Post by **MeeLee** » Sat Jun 22, 2019 8:45 am

HaloJones wrote: @gordonbb, the PLX implementation in early use (motherboard is a Z77 so 2012/3 period) made a clever job of making a chipset designed for 32 lanes to have 48. The CPU is a SandyBridge 2600K so doesn't support PCIE gen 3.

If it's like one of those PCIE 1x to 4x splitters you're talking about, then there may be your bottleneck.

Post by **toTOW** » Sun Jun 23, 2019 12:44 pm

HaloJones wrote:To resurrect an old thread, I have just built a dual 1070 rig running Windows and each card is struggling to do 600K when in its previous iteration a single 1070 was getting 850K. However, the very high PPD was using 11733 and the beta Core_22. Those units appear scarce right now so this new rig is running various 14xxx units. Is Windows with two cards always this bad or is this Windows+Core_21 is this bad?

(extra info, cpu is 2600K no CPU client; cards are water-cooled and <45C; cards are running at 2GHz, CPU at 4GHz, both GPU slots report PCIE gen 2x16)

620k PPD is the target PPD for my Windows 1070 @ 2 GHz ...

HaloJones · Post by **HaloJones** » Mon Jun 24, 2019 6:32 am

So the problem does not appear to be the PLX chip. I swapped over hard drives between the twin 1070 and another rig with a single 1070. The twin 1070 is now running Linux Mint and the two cards are running at 800K+ depending on the exact 14xxx unit. The Windows drive is now in the single 1070 - similarly clocked - and is struggling to hit 600K.

So, to draw a line under this, the PLX chip maybe has an effect, but Linux vs Windows for these 14xxx units is the overriding difference and the sooner we can get Core_22 released to the general population the better.

Theodore · Post by **Theodore** » Mon Jun 24, 2019 9:22 am

Yes, pcie 2.0 4x is at the lower end for GTX 1070s on Windows.

If I look at Linux,
I see a 5% drop using pcie 3.0 1x slots, as well as pcie 2.0 4x slots in Linux, vs running them at 8x or 16x.
While theoretically pcie 3.0 1x speeds equal pcie 2.0 2x speeds, for some reason, when folding, I get the same results at 2.0 4x in Linux as 3.0 1x instead.
Which means 2.0 2x would be bottlenecking a 2060 as well as a 1070 in Linux.

And it appears in your case, the drop of 2.0 4x is 20% in Windows, vs <5% in Linux.

I believe most likely that 1 card runs either at the full 16x, or 8x, both cases no real bottleneck is measured for the card in Windows.
The drop that you see in Windows vs Linux, is just the differences innhow windows handles the NVidia drivers vs Linux.
Windows has a lot more pcie overhead.
However running a single card in Windows at either the full 16x, or even 8x, shouldn't bottleneck the card at all!
Once you plug in a second card, at least 1 of them would be running at 2.0 4x speeds, and would be bottlenecking the card, hence why the drop in PPD.

I guess you've confirmed that pcie 2.0 4x under Windows is bottlenecking the card, which means 2.0 2x will do the same in Linux.

It would be interesting to know what can run pcie 2.0 4x in windows, without seeing a 20% bottleneck.
I believe my 1060 did have a 10% slowdown in PPD in windows, while my 1050 pretty much ran without a bottleneck (~5% slowdown).

If more people could confirm for Windows, and based on my experiences, I'd say that, in order to avoid a bottleneck of 10% or greater, your PCIE slot speed needs to be:
PCIE 2.0 16x slots or PCIE 3.0 8x speed slots (or greater) is needed for a 2080ti card (+2M PPD).
PCIE 2.0 8x slots, or PCIE 3.0 4x speed slots (or greater), are needed for most Pascal and Turing cards (500k - 1.5 M PPD range).
PCIE 2.0 4x slots are ok for 100-350k PPD range cards.
PCIE 2.0 1x slots for cards slower than Pascal cards (below 100k PPD).

For Linux, I'd say that:
PCIE 3.0 4x, or 2.0 8x is needed for RTX 2080 Ti cards (+2 M PPD).
PCIE 3.0 1x, or 2.0 4x are needed for Turing cards (+1 M PPD).
PCIE 2.0 2x (Or greater) is needed for Pascal cards (100-500k PPD range).
PCIE 2.0 1x, or 1.0 4x (or greater) is needed for cards below 100k PPD range.

Folding Forum

Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop