PCIe bandwidth requirements (RTX 50xx edition)

CaptainHalon · Post by **CaptainHalon** » Wed Aug 13, 2025 10:18 am

Since it wasn't mentioned, I'm also curious how PCH vs. direct CPU lanes might affect folding. Pretty much every modern motherboard with a secondary/tertiary slot listed as x4 will go through the PCH. It's hard to do an apples to apples comparison unless you've got a board that can trifurcate the cpu lanes to x8/x4/x4. And it's getting pretty rare to see any board that can even do x8/x8, with SLI being dead.

muziqaz · Post by **muziqaz** » Wed Aug 13, 2025 10:48 am

Not by much
There is not much traffic happening. However minimum requirement is X2, ideally X4 mode

arisu · Post by **arisu** » Thu Aug 14, 2025 7:44 pm

There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023

Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.

I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.

muziqaz · Post by **muziqaz** » Thu Aug 14, 2025 7:55 pm

arisu wrote: ↑Thu Aug 14, 2025 7:44 pm There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023

Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.

I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.

Maybe it is recent development with more modern GPUs?

arisu · Post by **arisu** » Thu Aug 14, 2025 8:03 pm

I don't think so. I get the same thing with Maxwell. From my reading of the OpenMM source code and the CUDA kernels themselves, there really shouldn't be that much PCIe usage. The only frequent traffic should be command buffers containing kernels, and the atom positions being sent to the host every 250 steps to get re-ordered and sent back. That shouldn't give the 2+ GB/s traffic that I regularly see.

It could be a bug in NVML's reporting of PCIe traffic, or it could be something that FAH is doing that vanilla OpenMM isn't (without the source code I can't check that).

muziqaz · Post by **muziqaz** » Thu Aug 14, 2025 8:19 pm

So why we don't see any difference in performance until we go to pcie x1 speeds? (I mean considerable performance difference). Even back in the day of pcie v3 x2 and x4 throughput made no difference

arisu · Post by **arisu** » Thu Aug 14, 2025 8:24 pm

The FAH core uses hundreds of times more PCIe bandwidth than it seems like it should and way more than most other CUDA applications (that aren't doing heavy reduction or LLM stuff that is), but even a single PCIe 4 lane (or 2x PCIe 3 lanes) can handle 2 GB/s. I think the reason is just that PCIe is very fast and has lots of bandwidth to spare.

CaptainHalon · Post by **CaptainHalon** » Sat Aug 23, 2025 1:47 pm

Well, I officially tried running a 5070Ti on a 4.0 x4 slot, and it lost about 4-5m ppd. I suspect it has more to do with going through the PCH than the actual pcie lanes, though.

muziqaz · Post by **muziqaz** » Sat Aug 23, 2025 3:45 pm

Performance is not measured by PPD.
Check the time per frame on the same project. I bet the difference will be in seconds

enroscado · Post by **enroscado** » Mon Aug 25, 2025 7:54 am

I am getting much smaller numbers using 2 x 5080.

In Ubuntu, using nvidia-smi as below:

Code: Select all

nvidia-smi dmon -i 0 -s u -d 1 -o DT > gpu0_pcie.log
nvidia-smi dmon -i 1 -s u -d 1 -o DT > gpu1_pcie.log

Polled every second for over 3 hours while folding. Then I plotted the values in a chart only drawing the highest value recorded within every minute:

https://prnt.sc/PLzATy0WGyQo

arisu · Post by **arisu** » Wed Aug 27, 2025 12:28 am

Averages are going to be more useful than maximums, because each sample counts the transfers over the last 20ms, so it will be affected by bursts where the transfer uses up the entire 20ms.

enroscado · Post by **enroscado** » Wed Aug 27, 2025 7:36 am

Well noted.

However, I chose to plot the maximums since you originally were asking for a potential bottlenecks, if I understood correctly. Showing the peaks could be of more use.

Anyway, hope it helps.

arisu · Post by **arisu** » Sat Aug 30, 2025 11:43 pm

It does help, thank you. I'm surprised that the bandwidth is so low. I will test my system with different driver versions to see if it might be a driver issue.

CaptainHalon · Post by **CaptainHalon** » Wed Sep 03, 2025 3:19 pm

muziqaz wrote: ↑Sat Aug 23, 2025 3:45 pm Performance is not measured by PPD.
Check the time per frame on the same project. I bet the difference will be in seconds

I can't speak to anything but PPD because those are the only number trends I remember.

But for what it's worth to anyone, I also (surprisingly) noticed quite a difference going from x8/x8 pcie 4.0 to x8/x8 pcie 5.0. At least 2-3m PPD on the combined total for both cards. These are trends I've noticed over multiple days. So take it with a grain of salt if you will, but from what I'm seeing it matters more than I would have thought.

ViTe · Post by **ViTe** » Sun Sep 28, 2025 2:03 am

CaptainHalon wrote: ↑Wed Sep 03, 2025 3:19 pm
But for what it's worth to anyone, I also (surprisingly) noticed quite a difference going from x8/x8 pcie 4.0 to x8/x8 pcie 5.0. At least 2-3m PPD on the combined total for both cards. These are trends I've noticed over multiple days. So take it with a grain of salt if you will, but from what I'm seeing it matters more than I would have thought.

Something else changed, like OS, SATA/NVME or something in BIOS.
I tried RTX 2080 with pcie2.0 x8 and pcie3.0 x8. Found no difference in PPD.

Folding Forum

PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)