PCIe bandwidth requirements (RTX 50xx edition)

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
CaptainHalon
Posts: 114
Joined: Mon Apr 13, 2020 11:47 am

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by CaptainHalon »

Since it wasn't mentioned, I'm also curious how PCH vs. direct CPU lanes might affect folding. Pretty much every modern motherboard with a secondary/tertiary slot listed as x4 will go through the PCH. It's hard to do an apples to apples comparison unless you've got a board that can trifurcate the cpu lanes to x8/x4/x4. And it's getting pretty rare to see any board that can even do x8/x8, with SLI being dead.
muziqaz
Posts: 1916
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

Not by much
There is not much traffic happening. However minimum requirement is X2, ideally X4 mode
FAH Omega tester
Image
arisu
Posts: 579
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023

Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.

I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
muziqaz
Posts: 1916
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

arisu wrote: Thu Aug 14, 2025 7:44 pm There's a surprising amount of PCIe traffic happening, and I'm not sure why: https://github.com/openmm/openmm/issues/5023

Although I haven't done extensive testing, direct CPU lanes seem to me to reduce latency enough that non-blocking sync is feasible. This allows the CPU thread feeding the GPU to run with much lower than 100% because it doesn't have to run in a spin-wait loop. Although I think that all PCIe lanes go through the PCH.

I have a board that can do bifurcation. I'll be away for a bit on vacation but when I'm back I can see if it supports trifurcation as well.
Maybe it is recent development with more modern GPUs?
FAH Omega tester
Image
arisu
Posts: 579
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

I don't think so. I get the same thing with Maxwell. From my reading of the OpenMM source code and the CUDA kernels themselves, there really shouldn't be that much PCIe usage. The only frequent traffic should be command buffers containing kernels, and the atom positions being sent to the host every 250 steps to get re-ordered and sent back. That shouldn't give the 2+ GB/s traffic that I regularly see.

It could be a bug in NVML's reporting of PCIe traffic, or it could be something that FAH is doing that vanilla OpenMM isn't (without the source code I can't check that).
muziqaz
Posts: 1916
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

So why we don't see any difference in performance until we go to pcie x1 speeds? (I mean considerable performance difference). Even back in the day of pcie v3 x2 and x4 throughput made no difference
FAH Omega tester
Image
arisu
Posts: 579
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

The FAH core uses hundreds of times more PCIe bandwidth than it seems like it should and way more than most other CUDA applications (that aren't doing heavy reduction or LLM stuff that is), but even a single PCIe 4 lane (or 2x PCIe 3 lanes) can handle 2 GB/s. I think the reason is just that PCIe is very fast and has lots of bandwidth to spare.
Post Reply