PCIe bandwidth requirements (RTX 50xx edition)

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

I am preparing to build a new system with four RTX 5080 GPUs (with the plan to upgrade to eight when the 5080 prices drop in a year or two). That's too much heat to put in one box without some seriously loud cooling so it will all be open air. I'm considering a single board computer and using ribbon risers (and NOT those awful 1x USB risers used for mining) to make space for the GPUs.

What are the PCIe bandwidth requirements for folding on something like a 5080? 8x and PCIe 4.0 (or perhaps 4x on PCIe 5.0)?

As ribbon length goes up, signal quality decreases. Does this increase latency in a way that impacts folding (like if it means more retransmissions that leave CUDA cores idle waiting for work) or is it an all-or-nothing thing where eventually the ribbon is too long and reliability plummets?
muziqaz
Posts: 1538
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

X4 is the minimum
X1 definitely does not work
FAH Omega tester
Image
Retvari Zoltan
Posts: 7
Joined: Fri Mar 21, 2025 6:40 pm
Location: Budapest, Hungary

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by Retvari Zoltan »

A motherboard and CPU that can drive 8 cards at least with x4 bandwith is quite expensive. Risers further increase the costs. Personally I prefer cheaper motherboards / CPUs (i3) (for builds dedicated to GPU crunching) and single GPUs. Crunching on CPUs and (Nvidia) GPUs simultaneously can significantly reduce the GPU crunching speed, the same is true to some extent for multiple GPUs in the same system. You should consider this, or something in between. For example dual (water cooled) GPU systems (without risers, and expensive motherboards).
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

muziqaz wrote: Wed Mar 26, 2025 11:04 am X4 is the minimum
X1 definitely does not work
The minimum for what PCIe gen? x4 on PCIe 5.0 has about the same bandwidth as x8 on PCIe 4.0.

When you say minimum is there any performance loss for using x4? I don't want to find out that "minimum" just means that it can barely finish a WU before it expires. But I also don't want to spend a lot of money on a high-end motherboard with extra PCIe lanes if the bus doesn't get even close to saturated.
muziqaz
Posts: 1538
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age
FAH Omega tester
Image
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

muziqaz wrote: Thu Mar 27, 2025 5:44 am Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age
That's enough to keep a 5080 fully occupied? That would mean that even a single x1 PCIe 5.0 lane is enough (3.0@x4 = 4.0@x2 = 5.0@x1 = 32 GT/s).
Joe_H
Site Admin
Posts: 8090
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by Joe_H »

I think the testing that showed an x4 lane connection was enough on PCIe 3 dates back to the era of the 3090 cards, and even then showed some loss of processing speed compared to x8 or wider connections. I don't recall any newer testing mentioned here, possibly on the discord but I don't follow that.
Image
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

Then to be safe, a 5.0@x4 or 4.0@x8 would be a good bet for a 5080. Then I could use M.2 to PCIe adapters and risers since M.2 is x4 which should be enough if it's PCIe 5.0, right?

I might just build two machines, both with four 5080s, instead of trying to squeeze out bandwidth from a consumer motherboard.
muziqaz
Posts: 1538
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

arisu wrote: Thu Mar 27, 2025 5:58 am
muziqaz wrote: Thu Mar 27, 2025 5:44 am Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age
That's enough to keep a 5080 fully occupied? That would mean that even a single x1 PCIe 5.0 lane is enough (3.0@x4 = 4.0@x2 = 5.0@x1 = 32 GT/s).
Are you really asking about the just released GPU testing on decades old standards and speeds? :D
With every new generation user is on their own to find out what is and what is not working for them.
We can only provide what is definitely not working.
FAH Omega tester
Image
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

Well that's why this thread is the RTX 50xx edition! :D

Wouldn't it be enough for someone with a 5080 to see what their PCIe bandwidth usage is like? If they're using under 64 GT/s then even x2 is fine if it's PCIe 5.0 (which is the newest that is in common usage). I don't want to make a very expensive mistake.

Since 4.0 is more common today especially with cheaper motherboards, I am curious if a single x16 could be used for four 5080s with something like:

Image

If 64 GT/s is enough (5.0@x2, 4.0@x4, 3.0@x8 etc) then that would work well. But without knowing the actual bandwidth requirements in GT/s, I can only guess. I just assumed someone might know what the requirements are based on the bandwidth usage on their system (doesn't nvidia-smi tell it?).
muziqaz
Posts: 1538
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by muziqaz »

First you need to find someone who can afford or even find to buy one, let alone fold on it and afford the electricity bill.
As I said you are on your own in this with brand new hardware.
Buy one 5080, and test it, then buy the rest of the hardware ;)
FAH Omega tester
Image
calxalot
Site Moderator
Posts: 1440
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by calxalot »

I think some people on discord have it.
You should ask there about folding bandwidth used.
FaaR
Posts: 67
Joined: Tue Aug 19, 2008 1:32 am

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by FaaR »

What software can report how much PCIe bandwidth is used by a GPU? Genuinely curious here! :D Windows task manager will do it for storage drives, but not GPUs...
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

FaaR wrote: Sat Mar 29, 2025 5:55 pm What software can report how much PCIe bandwidth is used by a GPU? Genuinely curious here! :D Windows task manager will do it for storage drives, but not GPUs...
On Linux for Nvidia GPUs there's nvidia-smi. On Windows with an Intel processor, I guess this thing might work? https://github.com/intel/pcm
arisu
Posts: 262
Joined: Mon Feb 24, 2025 11:11 pm

Re: PCIe bandwidth requirements (RTX 50xx edition)

Post by arisu »

Here is some data on PCIe bandwidth utilization for a 4090 Mobile (8x PCIe 4.0 connection). Not quite a 5080, but in the same league compared to anything from the PCIe 3.0 era.

Code: Select all

$ nvidia-smi dmon -i 0 -s tcu -c 40
# gpu  rxpci  txpci   mclk   pclk     sm    mem    enc    dec    jpg    ofa 
# Idx   MB/s   MB/s    MHz    MHz      %      %      %      %      %      % 
    0      7    656   6000   2145     95     17      0      0      0      0 
    0      9      1   6000   1995     98     16      0      0      0      0 
    0      9      1   6000   1950     98     17      0      0      0      0 
    0      9      1   6000   1980     96     16      0      0      0      0 
    0     10      2   6000   1995     85     14      0      0      0      0 
    0      7      1   6000   2205     98     16      0      0      0      0 
    0      8      1   6000   2235     97     16      0      0      0      0 
    0     10      1   6000   1995     94     16      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      9   1310   6000   2025     98     17      0      0      0      0 
    0    682      1   6000   2040     81     14      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      9      1   6000   1980     98     17      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      2      0   6000   2310     98     17      0      0      0      0 
    0      9      1   6000   1950     98     17      0      0      0      0 
    0      9      1   6000   2010     98     17      0      0      0      0 
    0      9      1   6000   1980     86     14      0      0      0      0 
    0      5    656   6000   1950     96     16      0      0      0      0 
    0      9    656   6000   1980     98     17      0      0      0      0 
    0      9      1   6000   2025     98     17      0      0      0      0 
    0      8   1739   6000   2085     98     17      0      0      0      0 
    0      6    939   6000   2205     96     16      0      0      0      0 
    0      7   1076   6000   2025     96     16      0      0      0      0 
    0    680    656   6000   2310     82     13      0      0      0      0 
    0      9      1   6000   1905     98     17      0      0      0      0 
    0      9      1   6000   2025     97     16      0      0      0      0 
    0      6      1   6000   2070     93     15      0      0      0      0 
    0      8      1   6000   1980     96     16      0      0      0      0 
    0      9      1   6000   2010     98     16      0      0      0      0 
    0      9      1   6000   1995     93     14      0      0      0      0 
    0      0     42   6000   2010     97     15      0      0      0      0 
    0      8   1310   6000   2100     96     15      0      0      0      0 
    0      9      1   6000   1965     98     16      0      0      0      0 
    0   1182     22   6000   2310     83     13      0      0      0      0 
    0      5    656   6000   2010     98     16      0      0      0      0 
    0      7      1   6000   1980     98     16      0      0      0      0 
    0     10      1   6000   1995     81     13      0      0      0      0 
    0      7    656   6000   1950     97     16      0      0      0      0
This is 40 seconds of folding at one sample per second. The fields rxpci and txpci are PCIe throughput in MB/s. It shows that the GPU receives from the CPU at about 9 MB/s constantly with occasional bursts into multiple GB/s. And the GPU sends data to the CPU at a constant rate of about 1 MB/s, punctuated by bursts of hundreds of MB/s. So even though throughput at any given time is very low and often well under 1% of even an 8x PCIe 4.0 link, it has to handle very fast bursts of data in both directions.

The other fields are not as important. mclk and pclk are memory and processor clock rates, sm is the percentage of the time that at least one CUDA core was being used, mem is the percentage of the time VRAM is being read or written to (a bit higher for me than usual because I reduce the memory clock from 9000 to 6000 which lets the CUDA cores boost higher while staying within the 150W TDP). The rest, enc, dec, jpg, and ofa, are for other accelerators in the GPU that FAH doesn't use.

P.S. I've got significant power savings at less than 2% PPD loss by locking the parent thread to a single core and limiting that core's clock rate to 1.8 GHz (otherwise it tries to boost to over 4 GHz all the time), and locking the child threads to the remaining cores (they need fast cores that can boost so that they can get through the checkpoint self-tests without too much delay). This keeps the affected CPU core at 75C most of the time instead of 95C. There's no reason Nvidia's spin wait loop should be keeping the CPU that hot, but turning down the clock rate for that one core helps.
Post Reply