PCIe bandwidth requirements (RTX 50xx edition)
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
-
- Posts: 114
- Joined: Mon Apr 13, 2020 11:47 am
Re: PCIe bandwidth requirements (RTX 50xx edition)
Since it wasn't mentioned, I'm also curious how PCH vs. direct CPU lanes might affect folding. Pretty much every modern motherboard with a secondary/tertiary slot listed as x4 will go through the PCH. It's hard to do an apples to apples comparison unless you've got a board that can trifurcate the cpu lanes to x8/x4/x4. And it's getting pretty rare to see any board that can even do x8/x8, with SLI being dead.
-
- Posts: 1916
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
Not by much
There is not much traffic happening. However minimum requirement is X2, ideally X4 mode
There is not much traffic happening. However minimum requirement is X2, ideally X4 mode
Re: PCIe bandwidth requirements (RTX 50xx edition)
There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023
Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.
I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.
I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
-
- Posts: 1916
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
Maybe it is recent development with more modern GPUs?arisu wrote: ↑Thu Aug 14, 2025 7:44 pm There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023
Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.
I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
Re: PCIe bandwidth requirements (RTX 50xx edition)
I don't think so. I get the same thing with Maxwell. From my reading of the OpenMM source code and the CUDA kernels themselves, there really shouldn't be that much PCIe usage. The only frequent traffic should be command buffers containing kernels, and the atom positions being sent to the host every 250 steps to get re-ordered and sent back. That shouldn't give the 2+ GB/s traffic that I regularly see.
It could be a bug in NVML's reporting of PCIe traffic, or it could be something that FAH is doing that vanilla OpenMM isn't (without the source code I can't check that).
It could be a bug in NVML's reporting of PCIe traffic, or it could be something that FAH is doing that vanilla OpenMM isn't (without the source code I can't check that).
-
- Posts: 1916
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
So why we don't see any difference in performance until we go to pcie x1 speeds? (I mean considerable performance difference). Even back in the day of pcie v3 x2 and x4 throughput made no difference
Re: PCIe bandwidth requirements (RTX 50xx edition)
The FAH core uses hundreds of times more PCIe bandwidth than it seems like it should and way more than most other CUDA applications (that aren't doing heavy reduction or LLM stuff that is), but even a single PCIe 4 lane (or 2x PCIe 3 lanes) can handle 2 GB/s. I think the reason is just that PCIe is very fast and has lots of bandwidth to spare.