PCIe bandwidth requirements (RTX 50xx edition)
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
PCIe bandwidth requirements (RTX 50xx edition)
I am preparing to build a new system with four RTX 5080 GPUs (with the plan to upgrade to eight when the 5080 prices drop in a year or two). That's too much heat to put in one box without some seriously loud cooling so it will all be open air. I'm considering a single board computer and using ribbon risers (and NOT those awful 1x USB risers used for mining) to make space for the GPUs.
What are the PCIe bandwidth requirements for folding on something like a 5080? 8x and PCIe 4.0 (or perhaps 4x on PCIe 5.0)?
As ribbon length goes up, signal quality decreases. Does this increase latency in a way that impacts folding (like if it means more retransmissions that leave CUDA cores idle waiting for work) or is it an all-or-nothing thing where eventually the ribbon is too long and reliability plummets?
What are the PCIe bandwidth requirements for folding on something like a 5080? 8x and PCIe 4.0 (or perhaps 4x on PCIe 5.0)?
As ribbon length goes up, signal quality decreases. Does this increase latency in a way that impacts folding (like if it means more retransmissions that leave CUDA cores idle waiting for work) or is it an all-or-nothing thing where eventually the ribbon is too long and reliability plummets?
-
- Posts: 1538
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
X4 is the minimum
X1 definitely does not work
X1 definitely does not work
-
- Posts: 7
- Joined: Fri Mar 21, 2025 6:40 pm
- Location: Budapest, Hungary
Re: PCIe bandwidth requirements (RTX 50xx edition)
A motherboard and CPU that can drive 8 cards at least with x4 bandwith is quite expensive. Risers further increase the costs. Personally I prefer cheaper motherboards / CPUs (i3) (for builds dedicated to GPU crunching) and single GPUs. Crunching on CPUs and (Nvidia) GPUs simultaneously can significantly reduce the GPU crunching speed, the same is true to some extent for multiple GPUs in the same system. You should consider this, or something in between. For example dual (water cooled) GPU systems (without risers, and expensive motherboards).
Re: PCIe bandwidth requirements (RTX 50xx edition)
The minimum for what PCIe gen? x4 on PCIe 5.0 has about the same bandwidth as x8 on PCIe 4.0.
When you say minimum is there any performance loss for using x4? I don't want to find out that "minimum" just means that it can barely finish a WU before it expires. But I also don't want to spend a lot of money on a high-end motherboard with extra PCIe lanes if the bus doesn't get even close to saturated.
-
- Posts: 1538
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age
-
- Site Admin
- Posts: 8090
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: PCIe bandwidth requirements (RTX 50xx edition)
I think the testing that showed an x4 lane connection was enough on PCIe 3 dates back to the era of the 3090 cards, and even then showed some loss of processing speed compared to x8 or wider connections. I don't recall any newer testing mentioned here, possibly on the discord but I don't follow that.
Re: PCIe bandwidth requirements (RTX 50xx edition)
Then to be safe, a 5.0@x4 or 4.0@x8 would be a good bet for a 5080. Then I could use M.2 to PCIe adapters and risers since M.2 is x4 which should be enough if it's PCIe 5.0, right?
I might just build two machines, both with four 5080s, instead of trying to squeeze out bandwidth from a consumer motherboard.
I might just build two machines, both with four 5080s, instead of trying to squeeze out bandwidth from a consumer motherboard.
-
- Posts: 1538
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
Are you really asking about the just released GPU testing on decades old standards and speeds?

With every new generation user is on their own to find out what is and what is not working for them.
We can only provide what is definitely not working.
Re: PCIe bandwidth requirements (RTX 50xx edition)
Well that's why this thread is the RTX 50xx edition! 
Wouldn't it be enough for someone with a 5080 to see what their PCIe bandwidth usage is like? If they're using under 64 GT/s then even x2 is fine if it's PCIe 5.0 (which is the newest that is in common usage). I don't want to make a very expensive mistake.
Since 4.0 is more common today especially with cheaper motherboards, I am curious if a single x16 could be used for four 5080s with something like:

If 64 GT/s is enough (5.0@x2, 4.0@x4, 3.0@x8 etc) then that would work well. But without knowing the actual bandwidth requirements in GT/s, I can only guess. I just assumed someone might know what the requirements are based on the bandwidth usage on their system (doesn't nvidia-smi tell it?).

Wouldn't it be enough for someone with a 5080 to see what their PCIe bandwidth usage is like? If they're using under 64 GT/s then even x2 is fine if it's PCIe 5.0 (which is the newest that is in common usage). I don't want to make a very expensive mistake.
Since 4.0 is more common today especially with cheaper motherboards, I am curious if a single x16 could be used for four 5080s with something like:

If 64 GT/s is enough (5.0@x2, 4.0@x4, 3.0@x8 etc) then that would work well. But without knowing the actual bandwidth requirements in GT/s, I can only guess. I just assumed someone might know what the requirements are based on the bandwidth usage on their system (doesn't nvidia-smi tell it?).
-
- Posts: 1538
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
First you need to find someone who can afford or even find to buy one, let alone fold on it and afford the electricity bill.
As I said you are on your own in this with brand new hardware.
Buy one 5080, and test it, then buy the rest of the hardware
As I said you are on your own in this with brand new hardware.
Buy one 5080, and test it, then buy the rest of the hardware

-
- Site Moderator
- Posts: 1440
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: PCIe bandwidth requirements (RTX 50xx edition)
I think some people on discord have it.
You should ask there about folding bandwidth used.
You should ask there about folding bandwidth used.
Re: PCIe bandwidth requirements (RTX 50xx edition)
What software can report how much PCIe bandwidth is used by a GPU? Genuinely curious here!
Windows task manager will do it for storage drives, but not GPUs...

Re: PCIe bandwidth requirements (RTX 50xx edition)
On Linux for Nvidia GPUs there's nvidia-smi. On Windows with an Intel processor, I guess this thing might work? https://github.com/intel/pcm
Re: PCIe bandwidth requirements (RTX 50xx edition)
Here is some data on PCIe bandwidth utilization for a 4090 Mobile (8x PCIe 4.0 connection). Not quite a 5080, but in the same league compared to anything from the PCIe 3.0 era.
This is 40 seconds of folding at one sample per second. The fields rxpci and txpci are PCIe throughput in MB/s. It shows that the GPU receives from the CPU at about 9 MB/s constantly with occasional bursts into multiple GB/s. And the GPU sends data to the CPU at a constant rate of about 1 MB/s, punctuated by bursts of hundreds of MB/s. So even though throughput at any given time is very low and often well under 1% of even an 8x PCIe 4.0 link, it has to handle very fast bursts of data in both directions.
The other fields are not as important. mclk and pclk are memory and processor clock rates, sm is the percentage of the time that at least one CUDA core was being used, mem is the percentage of the time VRAM is being read or written to (a bit higher for me than usual because I reduce the memory clock from 9000 to 6000 which lets the CUDA cores boost higher while staying within the 150W TDP). The rest, enc, dec, jpg, and ofa, are for other accelerators in the GPU that FAH doesn't use.
P.S. I've got significant power savings at less than 2% PPD loss by locking the parent thread to a single core and limiting that core's clock rate to 1.8 GHz (otherwise it tries to boost to over 4 GHz all the time), and locking the child threads to the remaining cores (they need fast cores that can boost so that they can get through the checkpoint self-tests without too much delay). This keeps the affected CPU core at 75C most of the time instead of 95C. There's no reason Nvidia's spin wait loop should be keeping the CPU that hot, but turning down the clock rate for that one core helps.
Code: Select all
$ nvidia-smi dmon -i 0 -s tcu -c 40
# gpu rxpci txpci mclk pclk sm mem enc dec jpg ofa
# Idx MB/s MB/s MHz MHz % % % % % %
0 7 656 6000 2145 95 17 0 0 0 0
0 9 1 6000 1995 98 16 0 0 0 0
0 9 1 6000 1950 98 17 0 0 0 0
0 9 1 6000 1980 96 16 0 0 0 0
0 10 2 6000 1995 85 14 0 0 0 0
0 7 1 6000 2205 98 16 0 0 0 0
0 8 1 6000 2235 97 16 0 0 0 0
0 10 1 6000 1995 94 16 0 0 0 0
0 9 1 6000 1995 98 17 0 0 0 0
0 9 1 6000 1995 98 17 0 0 0 0
0 9 1310 6000 2025 98 17 0 0 0 0
0 682 1 6000 2040 81 14 0 0 0 0
0 9 1 6000 1995 98 17 0 0 0 0
0 9 1 6000 1980 98 17 0 0 0 0
0 9 1 6000 1995 98 17 0 0 0 0
0 2 0 6000 2310 98 17 0 0 0 0
0 9 1 6000 1950 98 17 0 0 0 0
0 9 1 6000 2010 98 17 0 0 0 0
0 9 1 6000 1980 86 14 0 0 0 0
0 5 656 6000 1950 96 16 0 0 0 0
0 9 656 6000 1980 98 17 0 0 0 0
0 9 1 6000 2025 98 17 0 0 0 0
0 8 1739 6000 2085 98 17 0 0 0 0
0 6 939 6000 2205 96 16 0 0 0 0
0 7 1076 6000 2025 96 16 0 0 0 0
0 680 656 6000 2310 82 13 0 0 0 0
0 9 1 6000 1905 98 17 0 0 0 0
0 9 1 6000 2025 97 16 0 0 0 0
0 6 1 6000 2070 93 15 0 0 0 0
0 8 1 6000 1980 96 16 0 0 0 0
0 9 1 6000 2010 98 16 0 0 0 0
0 9 1 6000 1995 93 14 0 0 0 0
0 0 42 6000 2010 97 15 0 0 0 0
0 8 1310 6000 2100 96 15 0 0 0 0
0 9 1 6000 1965 98 16 0 0 0 0
0 1182 22 6000 2310 83 13 0 0 0 0
0 5 656 6000 2010 98 16 0 0 0 0
0 7 1 6000 1980 98 16 0 0 0 0
0 10 1 6000 1995 81 13 0 0 0 0
0 7 656 6000 1950 97 16 0 0 0 0
The other fields are not as important. mclk and pclk are memory and processor clock rates, sm is the percentage of the time that at least one CUDA core was being used, mem is the percentage of the time VRAM is being read or written to (a bit higher for me than usual because I reduce the memory clock from 9000 to 6000 which lets the CUDA cores boost higher while staying within the 150W TDP). The rest, enc, dec, jpg, and ofa, are for other accelerators in the GPU that FAH doesn't use.
P.S. I've got significant power savings at less than 2% PPD loss by locking the parent thread to a single core and limiting that core's clock rate to 1.8 GHz (otherwise it tries to boost to over 4 GHz all the time), and locking the child threads to the remaining cores (they need fast cores that can boost so that they can get through the checkpoint self-tests without too much delay). This keeps the affected CPU core at 75C most of the time instead of 95C. There's no reason Nvidia's spin wait loop should be keeping the CPU that hot, but turning down the clock rate for that one core helps.