PCIe bandwidth requirements (RTX 50xx edition)

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
CaptainHalon
Posts: 116
Joined: Mon Apr 13, 2020 11:47 am

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by CaptainHalon »

Since it wasn't mentioned, I'm also curious how PCH vs. direct CPU lanes might affect folding. Pretty much every modern motherboard with a secondary/tertiary slot listed as x4 will go through the PCH. It's hard to do an apples to apples comparison unless you've got a board that can trifurcate the cpu lanes to x8/x4/x4. And it's getting pretty rare to see any board that can even do x8/x8, with SLI being dead.
muziqaz
Posts: 1940
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

Not by much
There is not much traffic happening. However minimum requirement is X2, ideally X4 mode
FAH Omega tester
Image
arisu
Posts: 579
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023

Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.

I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
muziqaz
Posts: 1940
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

arisu wrote: Thu Aug 14, 2025 7:44 pm There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023

Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.

I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
Maybe it is recent development with more modern GPUs?
FAH Omega tester
Image
arisu
Posts: 579
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

I don't think so. I get the same thing with Maxwell. From my reading of the OpenMM source code and the CUDA kernels themselves, there really shouldn't be that much PCIe usage. The only frequent traffic should be command buffers containing kernels, and the atom positions being sent to the host every 250 steps to get re-ordered and sent back. That shouldn't give the 2+ GB/s traffic that I regularly see.

It could be a bug in NVML's reporting of PCIe traffic, or it could be something that FAH is doing that vanilla OpenMM isn't (without the source code I can't check that).
muziqaz
Posts: 1940
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

So why we don't see any difference in performance until we go to pcie x1 speeds? (I mean considerable performance difference). Even back in the day of pcie v3 x2 and x4 throughput made no difference
FAH Omega tester
Image
arisu
Posts: 579
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

The FAH core uses hundreds of times more PCIe bandwidth than it seems like it should and way more than most other CUDA applications (that aren't doing heavy reduction or LLM stuff that is), but even a single PCIe 4 lane (or 2x PCIe 3 lanes) can handle 2 GB/s. I think the reason is just that PCIe is very fast and has lots of bandwidth to spare.
CaptainHalon
Posts: 116
Joined: Mon Apr 13, 2020 11:47 am

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by CaptainHalon »

Well, I officially tried running a 5070Ti on a 4.0 x4 slot, and it lost about 4-5m ppd. I suspect it has more to do with going through the PCH than the actual pcie lanes, though.
muziqaz
Posts: 1940
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

Performance is not measured by PPD.
Check the time per frame on the same project. I bet the difference will be in seconds
FAH Omega tester
Image
enroscado
Posts: 54
Joined: Mon Aug 11, 2008 2:33 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by enroscado »

I am getting much smaller numbers using 2 x 5080.

In Ubuntu, using nvidia-smi as below:

Code: Select all

nvidia-smi dmon -i 0 -s u -d 1 -o DT > gpu0_pcie.log
nvidia-smi dmon -i 1 -s u -d 1 -o DT > gpu1_pcie.log
Polled every second for over 3 hours while folding. Then I plotted the values in a chart only drawing the highest value recorded within every minute:

Image

https://prnt.sc/PLzATy0WGyQo
Image
Post Reply