PCI-e bandwidth/capacity limitations

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Joe_H
Site Admin
Posts: 7929
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: PCI-e splitter?

Post by Joe_H »

Take a look at the Project Summary pages for info on the different projects. One important metric is the number of atoms in the simulation. So for example using the two different projects mentioned in Bruce's post:

Project 11414 - 77003 atoms (Core_21)

Project 9160 - 46000 atoms (Core_18)

Atom counts listed for current GPU projects range from under 10,000 to over 270,000 doing just a quick scan of the nearly 200 listed for cores 17, 18 and 21. My assumption is that there will be some correlation between project size in atoms and bandwidth requirements as more data is going to need to be moved to and from the GPU for larger WU's.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
hiigaran
Posts: 134
Joined: Thu Nov 17, 2011 6:01 pm

Re: PCI-e splitter?

Post by hiigaran »

Interesting. Right, so we have another variable in play. Now, when users select whether they want small, normal, or big work units, would this option request projects based on atom count, or would it just alter the length of the simulation to be calculated within the WU?

Would it also be safe to assume that computation time and atom counts are exponentially proportional to each other? I'm guessing it wouldn't be linear, since more atoms = more variables, but I have absolutely no idea how the WU is processed, let alone what is actually within it.

What I'm leading to here is this: If WU size determines project atom count, and the count is exponentially proportional to computation time, could bandwidth saturation be avoided by choosing to download small WUs exclusively? And if so, would the rate of progress from F@H be altered, if we completely forget about points?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: PCI-e splitter?

Post by bruce »

hiigaran wrote:Right, so it's a riser. This is where my confusion was stemming from.
That's perfectly reasonable confusion.

Your moderators have not enforced "off topic" policies on this wide-ranging topic and the topic title still says "splitter"
I'll edit my post.
hiigaran
Posts: 134
Joined: Thu Nov 17, 2011 6:01 pm

Re: PCI-e splitter?

Post by hiigaran »

Actually, good idea. Thread title edited.

Though before being aware of the high bandwidth consumption, I was actually originally referring to hardware that can split any PCI-e slot into multiple ones, rather than risers.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: PCI-e splitter?

Post by bruce »

hiigaran wrote:Interesting. Right, so we have another variable in play. Now, when users select whether they want small, normal, or big work units, would this option request projects based on atom count, or would it just alter the length of the simulation to be calculated within the WU?
The size of the download packet increases as the number of atoms increases, but the data is compressed so it's no longer proportional to anything.
The size of the upload packet increases with the number of checkpoints being reported to the server plus the length of some log files. Packet-size is supposed to consider both, but the scientists are only moderately consistent about setting that number for their project.

You're welcome to monitor messages such as
Downloading 1.36MiB
Uploading 6.05MiB to xx.xx.xx.xx

and report any discrepancies you see.
Would it also be safe to assume that computation time and atom counts are exponentially proportional to each other? I'm guessing it wouldn't be linear, since more atoms = more variables, but I have absolutely no idea how the WU is processed, let alone what is actually within it.
Portions of the calculation are proportional to the number of atoms N. Other portions are somewhat less than to N**2 so you can estimate the time as being somewhere between N and N**2. (That's why projects have to be benchmarked.)
What I'm leading to here is this: If WU size determines project atom count, and the count is exponentially proportional to computation time, could bandwidth saturation be avoided by choosing to download small WUs exclusively? And if so, would the rate of progress from F@H be altered, if we completely forget about points?
Neither. The other factor that needs to be taken into account is the number of steps which is not the same as the number of checkpoints. Double the number of steps and anything will run twice as long.

Stanford's limitations are probably not bandwidth related, but rather based on the frequency of new connection requests, although work has been done on an unreleased streaming core that uploads semi-continuously at a rather slow average rate.
hiigaran
Posts: 134
Joined: Thu Nov 17, 2011 6:01 pm

Re: PCI-e bandwidth/capacity limitations

Post by hiigaran »

Hmm. Guess we probably can't do anything with saturation and high-end GPUs, so let's change things a bit then. If high end will suffer from x1 saturation, what's the fastest card that could run properly from x1? You mentioned on the last page that a GT740 only used 17% of x1 bandwidth, so that's a start. Perhaps mid-range cards might work?

Also, if I split away from PCI-e for a bit, I always see people running server hardware for folding as well. As far as I'm aware, GPU folding is more cost effective, but with people going out of their way to get their hands on server CPUs and dual or quad socket mobos, are these CPUs actually better?
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: PCI-e bandwidth/capacity limitations

Post by Nathan_P »

The dual/quad cpu buying phase is now well and truly over for F@H, back when they ran the Bigadv projects people needed server boards with 2+ cpu's to get the WU done in time, PPD was upto 1 million per machine if you had the right cpu's and 500k+ was not uncommon. Back then GPU's were only scoring around 100k at most so as always people went with where the points were. Now the focus is on GPU and the bigadv projects are finished, anyone using server hardware today either still has leftover kit from the bigadv days like myself, or uses the server boards to get maximum PCIe lanes - a dual socket 2011 board gets you 80 PCIe lanes and upto 7 x16 slots with 4 running at PCIe 3.0 x16 and the other 3 at PCIe 3.0 x8 - you are not going to find that on a consumer board
Image
foldy
Posts: 2040
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

There is test of gtx 1080 on pcie 2.0 x1 and only 7% performance loss compared to pcie 2.0 x16 on Linux
viewtopic.php?f=38&t=28897&p=287927#p287927
Last edited by foldy on Sun Oct 23, 2016 4:01 pm, edited 1 time in total.
b4441b29
Posts: 2
Joined: Thu Sep 15, 2016 4:06 pm

Re: PCI-e bandwidth/capacity limitations

Post by b4441b29 »

Here is another data point. Ubuntu 16.06 and Nvidia driver 370.28. I ran FAHBench 2.2.5 tests with a GTX 1070 on an ASRock (Intel) Z170 Extreme3 motherboard. This is running at PCIe 3.0 speed. In a 16x lane slot it scored 142.223 on Single Precision and 9.31145 on Double Precision. In a 4x lane slot that is shared with ethernet,storage, etc. it scored 140.841 on Single Precision and 9.28171 on Double Precision. So less than 1% difference between PCIe 16x and 4x. I'm completing actual work units now, and so far it looks like the difference in PDD is similar to the benchmark. I''ll keep you updated.
Duce H_K_
Posts: 110
Joined: Mon Nov 09, 2015 3:52 pm
Hardware configuration: MoBo◘Gigabye X99 UD4-CF F24s
CPU◘2680V4 🔥Rosetta/SIDock
RAM◘64GB Hynix 2400 CL15
HDD◘ST1000DM003 Sata3 NCQ
GFX◘Zotac X-Gaming RTX3070 🔥Folding
VALID◘5nan6w
Location: Russia
Contact:

Re: PCI-e splitter?

Post by Duce H_K_ »

bruce wrote:In this particular case, the 1x riser keeps the GPU at 100%. at least during the routine analysis portions of the WU. It might or might not show different results during startup/finish/checkpointing/etc. but that's a small portion of the run.

Environment tested (as above except):
GPU: GTX 980 (Maxwell)
FAHCore_18 (Project 9160)

Results:
x16 utilization: 1%
GPU utilization: 99-100%
TPF: 2:26
PPD: 138877
From all of 19 p9160s my GTX970OC had average PPD of 293840,27 :shock: Could a riser cause such a performance loss? All the WUs ran at PCI-E x16 2.0
   510 290 819 pts earned in Folding@home project
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: PCI-e bandwidth/capacity limitations

Post by Nathan_P »

Possible, did you leave a cpu core free to feed the gpu? My testing never went down as far as x1 but others did see a dip, not as severe as yours though.
Image
hiigaran
Posts: 134
Joined: Thu Nov 17, 2011 6:01 pm

Re: PCI-e bandwidth/capacity limitations

Post by hiigaran »

Keep this data coming! I've got less than three days before I start buying the parts!

Also, that data was tested for Linux. While I plan to have one system running a distro, I plan to have another Windows system for DC projects that are more optimised for Windows. Have similar results been verified for Windows?
yalexey
Posts: 14
Joined: Sun Oct 30, 2016 5:10 pm

Re: PCI-e bandwidth/capacity limitations

Post by yalexey »

FAHbench show significant performance loss if I connect one GTX 1070 card to PCIe 3.0 x1 slot. Depending on test pattern - 9%-47% low performance. And 75-85% bus controller load on this card.
Something similar occurs in the processing of real jobs by core 21.

Win 10. Asrok B150 motherboard.
b4441b29
Posts: 2
Joined: Thu Sep 15, 2016 4:06 pm

Re: PCI-e bandwidth/capacity limitations

Post by b4441b29 »

Here are some real PPD numbers for the GTX 1070 on the Ubuntu 16.06 system I posted FAHBench tests on earlier.

In the 16x lane slot with Nvidia driver version 367.44 it averaged 659240 PPD one week and 629492 PPD the next.

In the 4x lane slot with Nvidia driver version 370.28 it averaged 662106 PPD over a week.

The driver upgrade sped up the folding more than switching to the 4x slot slowed it. That corresponds to what I saw in the benchmarks. I'm running the 370.28 driver in the 16x lane slot now. It hasn't been running a full week yet.
foldy
Posts: 2040
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

x4 slot should be fine but x1 is too slow.
Post Reply