PCI-e bandwidth/capacity limitations

hiigaran · Post by **hiigaran** » Tue May 24, 2016 10:19 pm

That would be greatly appreciated. I really want to get the most processing power out of the $10k I plan to drop on this.

foldy · Post by **foldy** » Fri May 27, 2016 11:23 am

This site shows how to test different PCIe speeds using the BIOS and some note-it paper.
Maybe someone can test like this for folding? e.g. using FahBench with a demanding work unit?
(The conclusion of the article that pcie speed does not mater only holds for gaming not for GPGPU)
https://www.pugetsystems.com/labs/artic ... mance-518/

I measured my GTX 970 at PCIe 2.0 8x with FahBench 2.2.5 and a real work unit and MSI Afterburner showed BUS usage 55%.
PCIe 2.0 8x has 4 GB/s bandwidth so f@h uses 4GB * 0,55 = 2.2GB.
So "PCIe 2.0 4x" or "PCIe 3.0 2x" each having 2GB/s bandwidth will be the lower limit, maybe loosing 10% ppd.

Post by **bruce** » Fri May 27, 2016 2:45 pm

So you're saying that PCIe V2.0 x4 would place some limitations on a GTX960 but it would probably be fine for a slower GPU running FahBench and some undefined production WU. Scaling that down to the x1 PCIe splitter being discussed, many of the Kepler GPUs could manage using the adapter through a x1 v2 slot but not most of the Maxwell GPUs.

hiigaran · Post by **hiigaran** » Fri May 27, 2016 3:14 pm

I'm unfamiliar with the measurement process. How do you guys obtain that data?

Now basing this off foldy's data, if 3.0 x2 would see a 10% loss, then 3.0 x1, being only 1GB/s would see a...45% loss? Does that sound right? Or is my maths wrong here (I did fail maths in high school, so I wouldn't be surprised)?

foldy · Post by **foldy** » Sat Jun 04, 2016 1:28 pm

Hi hiigaran, I obtain that data from my GTX 970 and using windows tool MSI Afterburner which shows the pcie bandwidth used.
If math is right then yes 55% loss of PPD but we need somebody to proof that in reality.
Anyone has a GTX 970 or similar running on pcie slower than pcie 3.0 x2 or pcie 2.0 4x or wants to do the paper limit trick or can limit it in bios?
@Nathan_P: Is your rig running again?

@Bruce: I don't know if a slower GPU would use lower pcie bandwidth today on Core_21.
But tests years ago with older GPUs and FahCores showed a much lower pcie bandwidth limit.
The undefined production WU is just one copied from folding@home work unit folder.
I used it to test if FahBench and f@h use the same pcie bandwidth using the same work unit - they do.

Nathan_P · Post by **Nathan_P** » Sat Jun 04, 2016 1:37 pm

My tests have been unsuccessful, every time I move a gpu in my Z9PE it starts a different wu so I can't do an apples to apples comparison. I have a new mobo on order for a different project so i'll see if I can get any testing done on that. Ideally you would need a mobo that allows you to set the PCIE slot speed in the BIOS.

Edit:- Well it looks like my new mobo gives me the option to set the PCIe generation that a slot runs at but not the link speed so in theory I should still be able to simulate x4 and x8 speeds.

foldy · Post by **foldy** » Sat Jun 04, 2016 1:53 pm

Use FahBench 2.2.5 and select the WU "nav" which is a real work unit. It creates real pcie bandwidth use which I compared fahbench to f@h using MSI afterburner GPU bus usage monitoring. I think all Core_21 work units have similar pcie bandwidth usage on a given GPU, so apple to apple comparison is OK.

This test should only be a estimation so even with different work units if all are Core_21 you will get a trend.
What I expect the test to show is when the available bandwidth is half of needed bandwidth then PPD or FAHBench score also drops nearly to half.

For a gtx 970 we need speeds lower than pcie 3.0 x2 or pcie 2.0 4x to probably see the expected performance drop to half - or prove it wrong.

I tried using tape on my gtx 970 pins to force pcie 4x and 1x mode but then mainboard does not recognize the card anymore.

Post by **bruce** » Sat Jun 04, 2016 5:39 pm

foldy wrote:I don't know if a slower GPU would use lower pcie bandwidth today on Core_21.
But tests years ago with older GPUs and FahCores showed a much lower pcie bandwidth limit.
The undefined production WU is just one copied from folding@home work unit folder.
I used it to test if FahBench and f@h use the same pcie bandwidth using the same work unit - they do.

Pick an arbitrary goal for maximum PCIe bandwidth and a suitable test WU.

Run a slow GPU. The calculations of each block of data transferred will take a certain amount of time and data transfers will be overlapped with most of them. The GPU shaders will be the limiting factor.

Run the same test with a fast GPU. The calculations will take less time so the shaders will be waiting more for data to be transferred. Processing percentage will go down and the PCIe transfers will become the limiting factor.

There's also a factor somewhere for the size of VRAM which may allow more data blocks to be queued for future processing but in the early days of GPU folding, we also concluded that the size of VRAM was unimportant. Nobody has studied Core_21 to see if that matters today.

foldy · Post by **foldy** » Sat Jun 04, 2016 6:52 pm

I currently got a Core_18 and gpu bus usage is 60% on pcie 2.0 8x. VRAM usage is 221 MB, on paused fah 122 MB, so I guess there is some room left to queue a little work cause most GPUs have 1GB at least. So VRAM is not a limit today.

It may be worth to study if a reduction of transfer bandwidth is possible and queuing more work on GPU could speedup folding even on pcie 3.0 16x. On the other side future standard pcie 4.0 will double bandwidth again.

But first we should proof what is the bandwidth limit.

hiigaran · Post by **hiigaran** » Sun Jun 05, 2016 12:21 pm

Out of curiosity, is the data running through the slot compressed? Not that I have any understanding of how data flows and interacts with different hardware and their components here, but I'd assume that reducing lane bottlenecks through compression at the cost of a little processing power would be beneficial.

foldy · Post by **foldy** » Sun Jun 05, 2016 2:15 pm

This are developer questions we could raise as issue at OpenMM but only if we can reproduce the problem here to proof it.

Post by **bruce** » Sun Jun 05, 2016 7:10 pm

I'm guessing here, but most of the data being transferred is probably binary numbers (which don't compress).

foldy · Post by **foldy** » Tue Jun 07, 2016 5:10 pm

Nvidia GTX 1080 on PCIe 3.0 16x bus usage 40%
viewtopic.php?f=38&t=28784&start=75

If numbers are correct a PCIe 2.0 8x bus like I have would throttle this card because of bus is 60% too slow.

Post by **bruce** » Wed Jun 08, 2016 1:57 am

Reverting to the original question, that seems to imply that a majority of projects would be bandwidth limited when running through an x1 slot extender even if it's rather slow GPU.

beer · Post by **beer** » Wed Jun 08, 2016 5:03 am

My geforce 970 uses between 17-25 % of 3.0 16x. That means one should not go belove 3.0 4x, 2.0 8x or 1.0 16x if I am correct?

Folding Forum

PCI-e bandwidth/capacity limitations

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?

Re: PCI-e splitter?