Page 7 of 18

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 6:44 pm
by rwh202
Aurum wrote:rxh202, So if you moved a single 1080 to different slots then does the second 16x slot (PCI_E4) run at x4 when used alone??? I see from the photo that MSI labels the first 16x slot (PCI_E2) as PCI-E3.0 but no label on PCI_E4 just a different lock style.
Yeah, the top slot is fixed x16 connected to the CPU and the lower x16 is actually x4 electrical and connected to the PCH. On more sophisticated motherboards, both slots are often connected to the 16 lanes from the CPU with switches that automatically reconfigure to an x8/x8 config when 2 cards are fitted (or even x0/x16 when just the lower slot used).
In linux, the driver helpfully reports the PCIE link info (width 1x, 4x etc. and speed (that dynamically changes to save power, but was at 5 GT/s for Gen2 and 8 GT/s for Gen3 during tests)) for each connected card, so I've been able to confirm the speeds when running the tests.
foldy wrote:So on Linux with a GTX 1080 gen3 x16 vs gen 3 x1 you loose only 4% and another 4% when going down to gen 2 x1.
But on your particular mainboard you loose another 10% when using mb instead of CPU connection.
Yep, that about sums it up! This experimentation has been informative for me, because in future I'll look for motherboards that allow both GPUs to be connected to the CPU (in an x8/x8 config).

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 08, 2017 6:58 am
by bruce
That's really good work, guys.
rwh202 wrote:... but I just moved the card between slots and used a 1x riser to drop to 1x.
You can also get a 4x riser.

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 08, 2017 10:09 am
by rwh202
bruce wrote:You can also get a 4x riser.
Thanks - powered 4x risers were a little harder to find, but just ordered one to try - shipping from Hong Kong, so it'll be a little while...

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 08, 2017 3:53 pm
by yalexey
Recent data in that the thread suggest - the performance is much stronger influence by latency, rather than the bus width.
That is why I want to repeat the question: Is it possible to eliminate this bottleneck, by reducing latency requirements? For example, by the GPU job queue formed CPU cycles or multiple jobs for the one GPU, which are processed virtually in parallel. That is, until one of them is waiting for data from the CPU, the other at the moment load GPU computation units.

This work can be performed once, and save thousands of dollars for people around the world. This will increase the performance of all systems, where the GPU is installed on the lines of the Northbridge.

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 08, 2017 10:02 pm
by Aurum
What is the fastest Nvidia card that can run with minimal performance loss on a 1x slot :?:

Re: PCI-e bandwidth/capacity limitations

Posted: Mon Jan 09, 2017 10:25 am
by foldy
There is report from anandtech forum where a R9-280X (HD7970) only looses 9% performance pcie 2.0 x16 vs x1 on Windows.

Code: Select all

I have quickly downloaded FAH Bench and ran the GUI, and ran 180 second tests, one at PCIe 2.0 16x and the other at 1x using this contraption (link).
My test bed is a BIOSTAR H81S2, with a 3.0GHz Pentium dual core (G3220, 22nm). The GPU is a lowly Radeon R9-280X (HD7970).
At 1x, the FAHBench software reported these results on Tahiti, Open-CL, accuracy check enabled, 180 seconds, dhfr task, single precision, NaN check disabled: 38.5408, 23558 atoms.
At 16x/2.0: 42.3566, 23558 atoms.
1x was only 91% of 16x, losing 9% of it's potential.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 14, 2017 3:49 pm
by rwh202
Some more numbers, same system as before, but with a x4 riser to play with:
Linux Mint 17.3 using drivers 367.44.
GPU: EVGA GTX 1080 FTW
MB: MSI Z87-G41
CPU: Pentium G3258 @ 3.2 GHz

Code: Select all

x16 Gen3 (CPU - no riser) 1% bus usage. Score: 150.476 (100%)
x4  Gen3 (CPU - w. riser) 4% bus usage. Score: 148.624 (98.8%)
x4  Gen2 (CPU - w. riser) 7% bus usage. Score: 146.809 (97.6%)
x4  Gen2 (MB  - w. riser) 5% bus usage. Score: 135.345 (89.9%) (only ran this test to check whether cheap riser itself was degrading performance above and beyond drop in PCIe speed - very similar result to previous runs without riser so, I assume not)
Previous results for comparison:

Code: Select all

x16 Gen3 (CPU)  1% bus usage. Score: 149.455 (100%)
x16 Gen2 (CPU)  2% bus usage. Score: 148.494 (99.4%)
x4  Gen2 (MB)   5% bus usage. Score: 135.417 (90.6%)
x1  Gen3 (CPU) 13% bus usage. Score: 143.917 (96.3%)
x1  Gen2 (CPU) 23% bus usage. Score: 137.669 (92.1%)
x1  Gen2 (MB)  17% bus usage. Score: 123.570 (82.6%)
So, x4 Gen3 only loses 1% and dropping to Gen2 loses another 1% - provided you're connected to CPU, not PCH.
This is broadly inline with the 4% loss going to x1 Gen3 and the further 4% dropping to Gen2 identified earlier.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 14, 2017 4:43 pm
by foldy
So we can say that pcie x4 only has a small performance loss compared to x16 by 2%, except your mainboard lanes which loose additional 8%.

As you use Linux even on pcie x1 there is only a small performance loss of 4% going from pcie Gen3 x16 to x1 and another 4% going to Gen2 x1, except your mainboard lanes which loose additional 10%. This is in contrast to the Windows results where pcie x1 cut the performance by half.

We only saw the results of fast Nvidia GPUs, I guess fast AMD GPUs like R9 Fury behave similar but that was not tested.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 14, 2017 9:32 pm
by bruce
That also suggests that a slow GPU will see very little degradation, no matter how it's connected.

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 22, 2017 10:39 am
by foldy
Posting a chat with Nathan_P about mainboards with PLX pcie switch chip:

Do you have numbers for a mainboard with PLX switch chip which supports x8/x8/x8/x8 with only a 16 lanes CPU? I guess it should work great for folding because the pcie bus is not used permanent by folding work units but every x millisecond and so 4 GPUs can alternate on cpu lanes. But it could also be the GPUs disturb each other and it will be effective x4 because all 16 cpu lanes are permanent busy?
Nathan_P: Those numbers I quoted are on a PLX equipped motherboard, the Asus Z87-WS, 2 slots run at x16, 3 at x16/x8/x8 and quad x8 but you can BIOS switch the speeds between PCIE 1/2/3 giving PCIe 3.0 x16, x8 & x4, PCIe2.0 x16 & x8 and PCIe 1 x16.

Bear in mind that PCIe 3.0 x8 has the same in bandwidth as 2.0 x16, 2.0 x8 is the same as 1.0 x16 and 1.0 x16 is the same as 3.0 x4
Ah I see, so with PLX quad x8 there is no performance loss?
Nathan_P: Very small, around 1% from PCIE 3.0 x16 to PCIe3.0x8 or PCIe2.0 x16. Some WU did give a greater drop but others less so the average is around 1%

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 22, 2017 3:55 pm
by Aurum
I had a dream Gigabyte made a motherboard just for folding. It did not have any bells and whistles, no PCI slots, no audio, no serial or parallel ports, no legacy USB. It did have 8-pin CPU power, auxillary molex and SATA power connectors, M.2 SSD socket, PCIe slots on both sides of the CPU and all PCIe slots were 3.0 x16 and spaced to fit double-wide graphics cards.

Then I woke up and it was snowing hard.

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 22, 2017 5:37 pm
by boristsybin
Can anyone tell, is pci-e 3.0 x8 enough to feed titan x pascal with folding task to maximum output?

Re: PCI-e bandwidth/capacity limitations

Posted: Sun Jan 22, 2017 5:45 pm
by foldy
Yes that's fine.

Re: PCI-e bandwidth/capacity limitations

Posted: Wed Jan 25, 2017 12:13 am
by PS3EdOlkkola
@boristsybin I've got a sample size of 12 Titan X pascal GPUs with about half on PCIe 3.0 x8 and the other half on PCIe 3.0 x16 interfaces. There is no discernible difference in frame time or PPD between the two interfaces. All of them are in 2011-v3 motherboards with 40 PCIe lane CPUs.

Re: PCI-e bandwidth/capacity limitations

Posted: Wed Jan 25, 2017 2:20 am
by hiigaran
This thread should be more than enough motivation for devs to come up with a F@H equivalent of BOINCs WUProp!