PCI-e bandwidth/capacity limitations
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
Re: PCI-e bandwidth/capacity limitations
Is it possible to organize some kind of queue in OpenCL, and preload data? Perhaps due to the processing of two or more threads on a single GPU. Cost of multigpu system essentially depends on the required bus bandwidth.
The other day I saw the description and photo 12 Radeon GPU system on a single Supermicro board. It really works in the tasks of mining, but not suitable for calculations in connection with the issues discussed in this thread.
https://i.gyazo.com/cc8ca224dd86317f4fc ... b89e36.jpg
EDIT by Mod:
Replaced a large image of that system by a link to that image.
(Images are prohibited to save bandwidth]
The other day I saw the description and photo 12 Radeon GPU system on a single Supermicro board. It really works in the tasks of mining, but not suitable for calculations in connection with the issues discussed in this thread.
https://i.gyazo.com/cc8ca224dd86317f4fc ... b89e36.jpg
EDIT by Mod:
Replaced a large image of that system by a link to that image.
(Images are prohibited to save bandwidth]
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
12 GPUs is a little heavy but 8 should be possible. When each GPU needs pcie 3.0 x4 as minimum then with 8 GPUs you need 4x8 = 32 lanes. And for nvidia GPUs you need a CPU core each to feed them. A intel Core i7-6900K has 8 real cores and 40 pcie lanes - that matches. But the CPU and mainboard are expensive.
Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/
I don't know if using some splitters 16 GPUs with pcie 3.0 x2 would be possible with this board?
Most users find it more cheap and easy to just build several dual GPU systems.
12 GPUs is a little heavy but 8 should be possible. When each GPU needs pcie 3.0 x4 as minimum then with 8 GPUs you need 4x8 = 32 lanes. And for nvidia GPUs you need a CPU core each to feed them. A intel Core i7-6900K has 8 real cores and 40 pcie lanes - that matches. But the CPU and mainboard are expensive.
Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/
I don't know if using some splitters 16 GPUs with pcie 3.0 x2 would be possible with this board?
Most users find it more cheap and easy to just build several dual GPU systems.
Re: PCI-e bandwidth/capacity limitations
This MSI Z87-G45 GAMING motherboard (https://us.msi.com/Motherboard/Z87-G45- ... cification) has 3xPCIe 3.0 x16 slots with operating modes: x16x0x0, x8x8x0, or x8x4x4. So with only two cards it's running x8x8. I may be able to add an RX 480 in the third slot and test it.
Simultaneous FAHbench January 6, 2017
CPU, Card, GPU, GDDR, Brand, GPU Clock, Memory Clock, Shaders, Compute, Precision, WU, Accuracy Check, NaN Check, Run Length, Score, Scaled Score, Atoms
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.7551, 66.7551, 23558, in tandem
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.5858, 66.5858, 23558, in tandem
Individual ASUS RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 66.1621, 66.1621, 23558, alone
Individual MSI RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470 Ellesmere 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 65.2190, 65.2190, 23558, alone
Simultaneous FAHbench January 6, 2017
CPU, Card, GPU, GDDR, Brand, GPU Clock, Memory Clock, Shaders, Compute, Precision, WU, Accuracy Check, NaN Check, Run Length, Score, Scaled Score, Atoms
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.7551, 66.7551, 23558, in tandem
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.5858, 66.5858, 23558, in tandem
Individual ASUS RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 66.1621, 66.1621, 23558, alone
Individual MSI RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470 Ellesmere 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 65.2190, 65.2190, 23558, alone
Last edited by Aurum on Fri Jan 06, 2017 9:13 pm, edited 1 time in total.
In Science We Trust
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: PCI-e bandwidth/capacity limitations
Understand this how, please?foldy wrote:I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: PCI-e bandwidth/capacity limitations
I love the board, and look, it's only $510foldy wrote:Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/
Amazon just told me they cancelled the MB they sold me that they don't have so I'm looking for another, hopefully under $200.
In Science We Trust
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
I mean this is what I read in the forum threads.7im wrote:Understand this how, please?foldy wrote:I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
Re: PCI-e bandwidth/capacity limitations
Card, Score, note
GTX 1070 , 87.1, alone 16x slot
GTX 1070 , 54.9, alone 1x slot
GTX 980 Ti, 81.7, alone 16x slot
GTX 980 Ti, 41.1, alone 1x slot
GTX 1070 , 89.1, in tandem 16x slot
GTX 1070 , 49.0, in tandem 1x slot
GTX 980 Ti, 79.1, in tandem 16x slot
GTX 980 Ti, 39.7, in tandem 1x slot
single Precision, dhfr WU (23,558 atoms), Accuracy Check enabled, NaN Check 10 steps, Run Length 60 s alone or 120 s in tandem
Intel Core i3-4130T @ 2.9 GHz, Windows 7 64-bit, 8 GB RAM, 250 GB SATA-III SSD, Corsair AX1200
Nvidia ForceWare 376.48, FAH 7.4.15, FAHbench 2.2.5
EVGA GTX 1070, GP104, 8 GB, GPU Clock 1595 MHz, Memory 2002 MHz, 1920 shaders
EVGA GTX 980 Ti, GM200, 6 GB, GPU Clock 1102 MHz, Memory 1753 MHz, 2816 shaders
ASRock H81 Pro BTC: 1xPCIe 2.0 x16 + 5xPCIe 2.0 x1
GTX 1070 , 87.1, alone 16x slot
GTX 1070 , 54.9, alone 1x slot
GTX 980 Ti, 81.7, alone 16x slot
GTX 980 Ti, 41.1, alone 1x slot
GTX 1070 , 89.1, in tandem 16x slot
GTX 1070 , 49.0, in tandem 1x slot
GTX 980 Ti, 79.1, in tandem 16x slot
GTX 980 Ti, 39.7, in tandem 1x slot
single Precision, dhfr WU (23,558 atoms), Accuracy Check enabled, NaN Check 10 steps, Run Length 60 s alone or 120 s in tandem
Intel Core i3-4130T @ 2.9 GHz, Windows 7 64-bit, 8 GB RAM, 250 GB SATA-III SSD, Corsair AX1200
Nvidia ForceWare 376.48, FAH 7.4.15, FAHbench 2.2.5
EVGA GTX 1070, GP104, 8 GB, GPU Clock 1595 MHz, Memory 2002 MHz, 1920 shaders
EVGA GTX 980 Ti, GM200, 6 GB, GPU Clock 1102 MHz, Memory 1753 MHz, 2816 shaders
ASRock H81 Pro BTC: 1xPCIe 2.0 x16 + 5xPCIe 2.0 x1
Last edited by Aurum on Sat Jan 07, 2017 3:02 pm, edited 1 time in total.
In Science We Trust
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
To summaries:
GTX 1070 x16 90ns
GTX 1070 x1 50ns
GTX 980 Ti x16 80ns
GTX 980 Ti x1 40ns
This is up to 50% performance loss for fast GPUs on x1.
Did you measure this on Windows or Linux?
Can you also measure x4? Both in tandem only one measurement.
GTX 1070 x16 90ns
GTX 1070 x1 50ns
GTX 980 Ti x16 80ns
GTX 980 Ti x1 40ns
This is up to 50% performance loss for fast GPUs on x1.
Did you measure this on Windows or Linux?
Can you also measure x4? Both in tandem only one measurement.
Re: PCI-e bandwidth/capacity limitations
I revised the post (see above) to make it easier to read and more complete. Win7-64.foldy wrote:Did you measure this on Windows or Linux?
Not on this cheap MB that was on my bench getting a frame to mount 1xHD5830 + 5xHD5970s. I will on another MB.foldy wrote:Can you also measure x4? Both in tandem only one measurement.
So the Score is some timed event in nanoseconds??? I have yet to see the documenation that explains what FAHbench is doing. E.g., the final score seems to be the last recorded value and not some average of all measurements. This is a problem when running a tandem test because one always finishes first and the second place GPU takes a jump up at the end.
What's the difference between DHFR (23,558 atoms) and DHFR-implicit (2,489 atoms)??? Single versus double precision???
NAV is small so I wonder if it's useful to run.
In Science We Trust
-
- Posts: 410
- Joined: Mon Nov 15, 2010 8:51 pm
- Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces - Location: South Coast, UK
Re: PCI-e bandwidth/capacity limitations
Some numbers for FAH Bench on Linux Mint 17.3 using drivers 367.44.
GPU: EVGA GTX 1080 FTW
MB: MSI Z87-G41
CPU: Pentium G3258 @ 3.2 GHz
So, PCIe bus does have an effect, but the connection (either via MB chipset or direct to CPU) seems to have a greater effect than the nominal link speed.
However, on Linux, the performance drop off appears to be less than that being reported on Windows.
GPU: EVGA GTX 1080 FTW
MB: MSI Z87-G41
CPU: Pentium G3258 @ 3.2 GHz
Code: Select all
x16 Gen3 (CPU) 1% bus usage. Score: 149.455 (100%)
x16 Gen2 (CPU) 2% bus usage. Score: 148.494 (99.4%)
x4 Gen2 (MB) 5% bus usage. Score: 135.417 (90.6%)
x1 Gen3 (CPU) 13% bus usage. Score: 143.917 (96.3%)
x1 Gen2 (CPU) 23% bus usage. Score: 137.669 (92.1%)
x1 Gen2 (MB) 17% bus usage. Score: 123.570 (82.6%)
However, on Linux, the performance drop off appears to be less than that being reported on Windows.
Re: PCI-e bandwidth/capacity limitations
rwh202, How do you control whether you route via MB chipset or direct to CPU??? How do you monitor bus usage, a Linux feature???
In Science We Trust
-
- Posts: 410
- Joined: Mon Nov 15, 2010 8:51 pm
- Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces - Location: South Coast, UK
Re: PCI-e bandwidth/capacity limitations
The slots on my motherboard are hard wired to either the PCH or CPU - I don't have any control over it - some MBs have more configuration in bios for how the lanes are allocated and shared between slots, but I just moved the card between slots and used a 1x riser to drop to 1x.Aurum wrote:rwh202, How do you control whether you route via MB chipset or direct to CPU??? How do you monitor bus usage, a Linux feature???
Bus usage is reported by the driver to the nvidia x-server settings app in Linux and to the nvidia-smi interface - I think it's the same number reported by GPU-z and other utilities in Windows.
Re: PCI-e bandwidth/capacity limitations
rxh202, So if you moved a single 1080 to different slots then does the second 16x slot (PCI_E4) run at x4 when used alone??? I see from the photo that MSI labels the first 16x slot (PCI_E2) as PCI-E3.0 but no label on PCI_E4 just a different lock style.
The MSI web page spec for your MB says:
• 1 x PCIe 3.0 x16 slot
• 1 x PCIe 2.0 x16 slot
- PCI_E4 supports up to PCIe 2.0 x4 speed
• 2 x PCIe 2.0 x1 slots
TIA, just trying to learn this stuff as I've never thought about it before and would like to get the most out of my multi-GPU rigs.
Thanks for the GPU-Z tip, I see the Sensors tab has some interesting monitors. While folding my 1x slot with 980Ti has a Bus Interface Load of 74% and my 16x slot 2.0 slot with a 1070 has 52%. It even tells me why GPU performance is capped.
The MSI web page spec for your MB says:
• 1 x PCIe 3.0 x16 slot
• 1 x PCIe 2.0 x16 slot
- PCI_E4 supports up to PCIe 2.0 x4 speed
• 2 x PCIe 2.0 x1 slots
TIA, just trying to learn this stuff as I've never thought about it before and would like to get the most out of my multi-GPU rigs.
Thanks for the GPU-Z tip, I see the Sensors tab has some interesting monitors. While folding my 1x slot with 980Ti has a Bus Interface Load of 74% and my 16x slot 2.0 slot with a 1070 has 52%. It even tells me why GPU performance is capped.
In Science We Trust
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
You could also load a real work unit into FahBench. Just copy one from FahClient work folder to FahBench\share\fahbench\workunits and rename accordingly.
On my gtx 970 on Windows 7 64 pcie 2.0 x8 the default FahBench dhfr has 38% bus usage while the real work unit in FahBench uses 60% bus usage like in FahClient.
I always run FahBench in default settings except using a real work unit for bus usage test.
http://www.filedropper.com/real_2
On my gtx 970 on Windows 7 64 pcie 2.0 x8 the default FahBench dhfr has 38% bus usage while the real work unit in FahBench uses 60% bus usage like in FahClient.
I always run FahBench in default settings except using a real work unit for bus usage test.
http://www.filedropper.com/real_2
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
So on Linux with a GTX 1080 gen3 x16 vs gen 3 x1 you loose only 4% and another 4% when going down to gen 2 x1.
But on your particular mainboard you loose another 10% when using mb instead of CPU connection.
But on your particular mainboard you loose another 10% when using mb instead of CPU connection.