PCIe bandwidth requirements (RTX 50xx edition)

arisu · Post by **arisu** » Wed Mar 26, 2025 12:01 am

I am preparing to build a new system with four RTX 5080 GPUs (with the plan to upgrade to eight when the 5080 prices drop in a year or two). That's too much heat to put in one box without some seriously loud cooling so it will all be open air. I'm considering a single board computer and using ribbon risers (and NOT those awful 1x USB risers used for mining) to make space for the GPUs.

What are the PCIe bandwidth requirements for folding on something like a 5080? 8x and PCIe 4.0 (or perhaps 4x on PCIe 5.0)?

As ribbon length goes up, signal quality decreases. Does this increase latency in a way that impacts folding (like if it means more retransmissions that leave CUDA cores idle waiting for work) or is it an all-or-nothing thing where eventually the ribbon is too long and reliability plummets?

muziqaz · Post by **muziqaz** » Wed Mar 26, 2025 11:04 am

X4 is the minimum
X1 definitely does not work

Retvari Zoltan · Post by **Retvari Zoltan** » Wed Mar 26, 2025 10:18 pm

A motherboard and CPU that can drive 8 cards at least with x4 bandwith is quite expensive. Risers further increase the costs. Personally I prefer cheaper motherboards / CPUs (i3) (for builds dedicated to GPU crunching) and single GPUs. Crunching on CPUs and (Nvidia) GPUs simultaneously can significantly reduce the GPU crunching speed, the same is true to some extent for multiple GPUs in the same system. You should consider this, or something in between. For example dual (water cooled) GPU systems (without risers, and expensive motherboards).

arisu · Post by **arisu** » Thu Mar 27, 2025 5:19 am

muziqaz wrote: ↑Wed Mar 26, 2025 11:04 am X4 is the minimum
X1 definitely does not work

The minimum for what PCIe gen? x4 on PCIe 5.0 has about the same bandwidth as x8 on PCIe 4.0.

When you say minimum is there any performance loss for using x4? I don't want to find out that "minimum" just means that it can barely finish a WU before it expires. But I also don't want to spend a lot of money on a high-end motherboard with extra PCIe lanes if the bus doesn't get even close to saturated.

muziqaz · Post by **muziqaz** » Thu Mar 27, 2025 5:44 am

Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age

arisu · Post by **arisu** » Thu Mar 27, 2025 5:58 am

muziqaz wrote: ↑Thu Mar 27, 2025 5:44 am Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age

That's enough to keep a 5080 fully occupied? That would mean that even a single x1 PCIe 5.0 lane is enough (3.0@x4 = 4.0@x2 = 5.0@x1 = 32 GT/s).

Post by **Joe_H** » Thu Mar 27, 2025 6:07 am

I think the testing that showed an x4 lane connection was enough on PCIe 3 dates back to the era of the 3090 cards, and even then showed some loss of processing speed compared to x8 or wider connections. I don't recall any newer testing mentioned here, possibly on the discord but I don't follow that.

arisu · Post by **arisu** » Thu Mar 27, 2025 6:08 am

Then to be safe, a 5.0@x4 or 4.0@x8 would be a good bet for a 5080. Then I could use M.2 to PCIe adapters and risers since M.2 is x4 which should be enough if it's PCIe 5.0, right?

I might just build two machines, both with four 5080s, instead of trying to squeeze out bandwidth from a consumer motherboard.

muziqaz · Post by **muziqaz** » Thu Mar 27, 2025 6:39 am

arisu wrote: ↑Thu Mar 27, 2025 5:58 am
muziqaz wrote: ↑Thu Mar 27, 2025 5:44 am Pcie3. Since I really don't think one would still run pcie2 mobos in this day and age
That's enough to keep a 5080 fully occupied? That would mean that even a single x1 PCIe 5.0 lane is enough (3.0@x4 = 4.0@x2 = 5.0@x1 = 32 GT/s).

Are you really asking about the just released GPU testing on decades old standards and speeds?

With every new generation user is on their own to find out what is and what is not working for them.
We can only provide what is definitely not working.

arisu · Post by **arisu** » Thu Mar 27, 2025 6:59 am

Well that's why this thread is the RTX 50xx edition!

Wouldn't it be enough for someone with a 5080 to see what their PCIe bandwidth usage is like? If they're using under 64 GT/s then even x2 is fine if it's PCIe 5.0 (which is the newest that is in common usage). I don't want to make a very expensive mistake.

Since 4.0 is more common today especially with cheaper motherboards, I am curious if a single x16 could be used for four 5080s with something like:

If 64 GT/s is enough (5.0@x2, 4.0@x4, 3.0@x8 etc) then that would work well. But without knowing the actual bandwidth requirements in GT/s, I can only guess. I just assumed someone might know what the requirements are based on the bandwidth usage on their system (doesn't nvidia-smi tell it?).

muziqaz · Post by **muziqaz** » Thu Mar 27, 2025 7:19 am

First you need to find someone who can afford or even find to buy one, let alone fold on it and afford the electricity bill.
As I said you are on your own in this with brand new hardware.
Buy one 5080, and test it, then buy the rest of the hardware

Post by **calxalot** » Thu Mar 27, 2025 7:23 am

I think some people on discord have it.
You should ask there about folding bandwidth used.

FaaR · Post by **FaaR** » Sat Mar 29, 2025 5:55 pm

What software can report how much PCIe bandwidth is used by a GPU? Genuinely curious here!

Windows task manager will do it for storage drives, but not GPUs...

arisu · Post by **arisu** » Sun Mar 30, 2025 3:13 am

FaaR wrote: ↑Sat Mar 29, 2025 5:55 pm What software can report how much PCIe bandwidth is used by a GPU? Genuinely curious here! Windows task manager will do it for storage drives, but not GPUs...

On Linux for Nvidia GPUs there's nvidia-smi. On Windows with an Intel processor, I guess this thing might work? https://github.com/intel/pcm

arisu · Post by **arisu** » Tue Apr 15, 2025 11:06 am

Here is some data on PCIe bandwidth utilization for a 4090 Mobile (8x PCIe 4.0 connection). Not quite a 5080, but in the same league compared to anything from the PCIe 3.0 era.

Code: Select all

$ nvidia-smi dmon -i 0 -s tcu -c 40
# gpu  rxpci  txpci   mclk   pclk     sm    mem    enc    dec    jpg    ofa 
# Idx   MB/s   MB/s    MHz    MHz      %      %      %      %      %      % 
    0      7    656   6000   2145     95     17      0      0      0      0 
    0      9      1   6000   1995     98     16      0      0      0      0 
    0      9      1   6000   1950     98     17      0      0      0      0 
    0      9      1   6000   1980     96     16      0      0      0      0 
    0     10      2   6000   1995     85     14      0      0      0      0 
    0      7      1   6000   2205     98     16      0      0      0      0 
    0      8      1   6000   2235     97     16      0      0      0      0 
    0     10      1   6000   1995     94     16      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      9   1310   6000   2025     98     17      0      0      0      0 
    0    682      1   6000   2040     81     14      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      9      1   6000   1980     98     17      0      0      0      0 
    0      9      1   6000   1995     98     17      0      0      0      0 
    0      2      0   6000   2310     98     17      0      0      0      0 
    0      9      1   6000   1950     98     17      0      0      0      0 
    0      9      1   6000   2010     98     17      0      0      0      0 
    0      9      1   6000   1980     86     14      0      0      0      0 
    0      5    656   6000   1950     96     16      0      0      0      0 
    0      9    656   6000   1980     98     17      0      0      0      0 
    0      9      1   6000   2025     98     17      0      0      0      0 
    0      8   1739   6000   2085     98     17      0      0      0      0 
    0      6    939   6000   2205     96     16      0      0      0      0 
    0      7   1076   6000   2025     96     16      0      0      0      0 
    0    680    656   6000   2310     82     13      0      0      0      0 
    0      9      1   6000   1905     98     17      0      0      0      0 
    0      9      1   6000   2025     97     16      0      0      0      0 
    0      6      1   6000   2070     93     15      0      0      0      0 
    0      8      1   6000   1980     96     16      0      0      0      0 
    0      9      1   6000   2010     98     16      0      0      0      0 
    0      9      1   6000   1995     93     14      0      0      0      0 
    0      0     42   6000   2010     97     15      0      0      0      0 
    0      8   1310   6000   2100     96     15      0      0      0      0 
    0      9      1   6000   1965     98     16      0      0      0      0 
    0   1182     22   6000   2310     83     13      0      0      0      0 
    0      5    656   6000   2010     98     16      0      0      0      0 
    0      7      1   6000   1980     98     16      0      0      0      0 
    0     10      1   6000   1995     81     13      0      0      0      0 
    0      7    656   6000   1950     97     16      0      0      0      0

This is 40 seconds of folding at one sample per second. The fields rxpci and txpci are PCIe throughput in MB/s. It shows about 9 MB/s received constantly with occasional bursts into multiple GB/s, and data sent at about 1 MB/s, punctuated by bursts of hundreds of MB/s. So even though throughput at any given time is very low and often well under 1% of even an 8x PCIe 4.0 link, it has to handle very fast bursts of data in both directions.

The other fields are not as important. mclk and pclk are memory and processor clock rates, sm is the percentage of the time that at least one CUDA core was being used, mem is the percentage of the time VRAM is being read or written to (a bit higher for me than usual because I reduce the memory clock from 9000 to 6000 which lets the CUDA cores boost higher while staying within the 150W TDP, though only some projects benefit from that). The rest (enc, dec, jpg, and ofa) are for other fixed-function accelerators in the GPU that FAH doesn't use.

Other projects on other cores can have very different PCIe utilization behavior. Here is a project running on a GTX 970M (Maxwell), showing a near constant 2-3 GB/s received and about 300 MB/s sent:

Code: Select all

$ nvidia-smi dmon -i 0 -s tcu -c 40
# gpu  rxpci  txpci   mclk   pclk     sm    mem    enc    dec    jpg    ofa
# Idx   MB/s   MB/s    MHz    MHz      %      %      %      %      %      %
    0   2477    250   2505    772     94     47      0      0      -      -
    0   2498    307   2505    772     99     47      0      0      -      -
    0    527    389   2505    772    100     54      0      0      -      -
    0   2512    310   2505    772     94     44      0      0      -      -
    0   2379    290   2505    772     99     49      0      0      -      -
    0   3306    237   2505    772     99     46      0      0      -      -
    0   2783    293   2505    772    100     51      0      0      -      -
    0   2757    224   2505    772     98     47      0      0      -      -
    0   2647    230   2505    772     99     48      0      0      -      -
    0   2705    259   2505    772    100     52      0      0      -      -
    0   1915    215   2505    772    100     53      0      0      -      -
    0   2527    298   2505    772     98     48      0      0      -      -
    0   2527    229   2505    772     79     37      0      0      -      -
    0   2722    276   2505    772    100     52      0      0      -      -
    0   2381    249   2505    772    100     53      0      0      -      -
    0   2738    272   2505    772     84     45      0      0      -      -
    0   2676    302   2505    772     94     45      0      0      -      -
    0   2445    254   2505    772     94     44      0      0      -      -
    0   2398    233   2505    772     84     45      0      0      -      -
    0    357    254   2505    772     95     47      0      0      -      -
    0   2465    300   2505    772     99     49      0      0      -      -
    0   3118    253   2505    772     97     49      0      0      -      -
    0   2813    242   2505    772    100     53      0      0      -      -
    0    640    285   2505    772    100     54      0      0      -      -
    0   2927    245   2505    772     98     47      0      0      -      -
    0   2406    297   2505    772     98     47      0      0      -      -
    0   2794    199   2505    772    100     53      0      0      -      -
    0   1846      0   2505    772    100     53      0      0      -      -
    0   2870    234   2505    772     99     49      0      0      -      -
    0   3391    221   2505    772    100     54      0      0      -      -
    0    642    273   2505    772    100     52      0      0      -      -
    0   2530    240   2505    772    100     53      0      0      -      -
    0   2958    221   2505    772     98     49      0      0      -      -
    0   2922    257   2505    772    100     54      0      0      -      -
    0   1643    443   2505    772    100     52      0      0      -      -
    0   2648    223   2505    772    100     53      0      0      -      -
    0   2656    256   2505    772     99     48      0      0      -      -
    0   3362    218   2505    772     99     52      0      0      -      -
    0   3256    237   2505    772    100     53      0      0      -      -
    0   2724    230   2505    772     98     50      0      0      -      -

Folding Forum

PCIe bandwidth requirements (RTX 50xx edition)

PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)

Re: PCIe bandwidth requirements (RTX 50xx edition)