PCI-e bandwidth/capacity limitations
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
-
- Site Admin
- Posts: 7927
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: PCI-e splitter?
Take a look at the Project Summary pages for info on the different projects. One important metric is the number of atoms in the simulation. So for example using the two different projects mentioned in Bruce's post:
Project 11414 - 77003 atoms (Core_21)
Project 9160 - 46000 atoms (Core_18)
Atom counts listed for current GPU projects range from under 10,000 to over 270,000 doing just a quick scan of the nearly 200 listed for cores 17, 18 and 21. My assumption is that there will be some correlation between project size in atoms and bandwidth requirements as more data is going to need to be moved to and from the GPU for larger WU's.
Project 11414 - 77003 atoms (Core_21)
Project 9160 - 46000 atoms (Core_18)
Atom counts listed for current GPU projects range from under 10,000 to over 270,000 doing just a quick scan of the nearly 200 listed for cores 17, 18 and 21. My assumption is that there will be some correlation between project size in atoms and bandwidth requirements as more data is going to need to be moved to and from the GPU for larger WU's.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: PCI-e splitter?
Interesting. Right, so we have another variable in play. Now, when users select whether they want small, normal, or big work units, would this option request projects based on atom count, or would it just alter the length of the simulation to be calculated within the WU?
Would it also be safe to assume that computation time and atom counts are exponentially proportional to each other? I'm guessing it wouldn't be linear, since more atoms = more variables, but I have absolutely no idea how the WU is processed, let alone what is actually within it.
What I'm leading to here is this: If WU size determines project atom count, and the count is exponentially proportional to computation time, could bandwidth saturation be avoided by choosing to download small WUs exclusively? And if so, would the rate of progress from F@H be altered, if we completely forget about points?
Would it also be safe to assume that computation time and atom counts are exponentially proportional to each other? I'm guessing it wouldn't be linear, since more atoms = more variables, but I have absolutely no idea how the WU is processed, let alone what is actually within it.
What I'm leading to here is this: If WU size determines project atom count, and the count is exponentially proportional to computation time, could bandwidth saturation be avoided by choosing to download small WUs exclusively? And if so, would the rate of progress from F@H be altered, if we completely forget about points?
Re: PCI-e splitter?
That's perfectly reasonable confusion.hiigaran wrote:Right, so it's a riser. This is where my confusion was stemming from.
Your moderators have not enforced "off topic" policies on this wide-ranging topic and the topic title still says "splitter"
I'll edit my post.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: PCI-e splitter?
Actually, good idea. Thread title edited.
Though before being aware of the high bandwidth consumption, I was actually originally referring to hardware that can split any PCI-e slot into multiple ones, rather than risers.
Though before being aware of the high bandwidth consumption, I was actually originally referring to hardware that can split any PCI-e slot into multiple ones, rather than risers.
Re: PCI-e splitter?
The size of the download packet increases as the number of atoms increases, but the data is compressed so it's no longer proportional to anything.hiigaran wrote:Interesting. Right, so we have another variable in play. Now, when users select whether they want small, normal, or big work units, would this option request projects based on atom count, or would it just alter the length of the simulation to be calculated within the WU?
The size of the upload packet increases with the number of checkpoints being reported to the server plus the length of some log files. Packet-size is supposed to consider both, but the scientists are only moderately consistent about setting that number for their project.
You're welcome to monitor messages such as
Downloading 1.36MiB
Uploading 6.05MiB to xx.xx.xx.xx
and report any discrepancies you see.
Portions of the calculation are proportional to the number of atoms N. Other portions are somewhat less than to N**2 so you can estimate the time as being somewhere between N and N**2. (That's why projects have to be benchmarked.)Would it also be safe to assume that computation time and atom counts are exponentially proportional to each other? I'm guessing it wouldn't be linear, since more atoms = more variables, but I have absolutely no idea how the WU is processed, let alone what is actually within it.
Neither. The other factor that needs to be taken into account is the number of steps which is not the same as the number of checkpoints. Double the number of steps and anything will run twice as long.What I'm leading to here is this: If WU size determines project atom count, and the count is exponentially proportional to computation time, could bandwidth saturation be avoided by choosing to download small WUs exclusively? And if so, would the rate of progress from F@H be altered, if we completely forget about points?
Stanford's limitations are probably not bandwidth related, but rather based on the frequency of new connection requests, although work has been done on an unreleased streaming core that uploads semi-continuously at a rather slow average rate.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: PCI-e bandwidth/capacity limitations
Hmm. Guess we probably can't do anything with saturation and high-end GPUs, so let's change things a bit then. If high end will suffer from x1 saturation, what's the fastest card that could run properly from x1? You mentioned on the last page that a GT740 only used 17% of x1 bandwidth, so that's a start. Perhaps mid-range cards might work?
Also, if I split away from PCI-e for a bit, I always see people running server hardware for folding as well. As far as I'm aware, GPU folding is more cost effective, but with people going out of their way to get their hands on server CPUs and dual or quad socket mobos, are these CPUs actually better?
Also, if I split away from PCI-e for a bit, I always see people running server hardware for folding as well. As far as I'm aware, GPU folding is more cost effective, but with people going out of their way to get their hands on server CPUs and dual or quad socket mobos, are these CPUs actually better?
-
- Posts: 1164
- Joined: Wed Apr 01, 2009 9:22 pm
- Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)
Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS
Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only) - Location: Jersey, Channel islands
Re: PCI-e bandwidth/capacity limitations
The dual/quad cpu buying phase is now well and truly over for F@H, back when they ran the Bigadv projects people needed server boards with 2+ cpu's to get the WU done in time, PPD was upto 1 million per machine if you had the right cpu's and 500k+ was not uncommon. Back then GPU's were only scoring around 100k at most so as always people went with where the points were. Now the focus is on GPU and the bigadv projects are finished, anyone using server hardware today either still has leftover kit from the bigadv days like myself, or uses the server boards to get maximum PCIe lanes - a dual socket 2011 board gets you 80 PCIe lanes and upto 7 x16 slots with 4 running at PCIe 3.0 x16 and the other 3 at PCIe 3.0 x8 - you are not going to find that on a consumer board
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
There is test of gtx 1080 on pcie 2.0 x1 and only 7% performance loss compared to pcie 2.0 x16 on Linux
viewtopic.php?f=38&t=28897&p=287927#p287927
viewtopic.php?f=38&t=28897&p=287927#p287927
Last edited by foldy on Sun Oct 23, 2016 4:01 pm, edited 1 time in total.
Re: PCI-e bandwidth/capacity limitations
Here is another data point. Ubuntu 16.06 and Nvidia driver 370.28. I ran FAHBench 2.2.5 tests with a GTX 1070 on an ASRock (Intel) Z170 Extreme3 motherboard. This is running at PCIe 3.0 speed. In a 16x lane slot it scored 142.223 on Single Precision and 9.31145 on Double Precision. In a 4x lane slot that is shared with ethernet,storage, etc. it scored 140.841 on Single Precision and 9.28171 on Double Precision. So less than 1% difference between PCIe 16x and 4x. I'm completing actual work units now, and so far it looks like the difference in PDD is similar to the benchmark. I''ll keep you updated.
-
- Posts: 110
- Joined: Mon Nov 09, 2015 3:52 pm
- Hardware configuration: MoBo◘Gigabye X99 UD4-CF F24s
CPU◘2680V4 🔥Rosetta/SIDock
RAM◘64GB Hynix 2400 CL15
HDD◘ST1000DM003 Sata3 NCQ
GFX◘Zotac X-Gaming RTX3070 🔥Folding
VALID◘5nan6w - Location: Russia
- Contact:
Re: PCI-e splitter?
From all of 19 p9160s my GTX970OC had average PPD of 293840,27 Could a riser cause such a performance loss? All the WUs ran at PCI-E x16 2.0bruce wrote:In this particular case, the 1x riser keeps the GPU at 100%. at least during the routine analysis portions of the WU. It might or might not show different results during startup/finish/checkpointing/etc. but that's a small portion of the run.
Environment tested (as above except):
GPU: GTX 980 (Maxwell)
FAHCore_18 (Project 9160)
Results:
x16 utilization: 1%
GPU utilization: 99-100%
TPF: 2:26
PPD: 138877
510 290 819 pts earned in Folding@home project
-
- Posts: 1164
- Joined: Wed Apr 01, 2009 9:22 pm
- Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)
Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS
Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only) - Location: Jersey, Channel islands
Re: PCI-e bandwidth/capacity limitations
Possible, did you leave a cpu core free to feed the gpu? My testing never went down as far as x1 but others did see a dip, not as severe as yours though.
Re: PCI-e bandwidth/capacity limitations
Keep this data coming! I've got less than three days before I start buying the parts!
Also, that data was tested for Linux. While I plan to have one system running a distro, I plan to have another Windows system for DC projects that are more optimised for Windows. Have similar results been verified for Windows?
Also, that data was tested for Linux. While I plan to have one system running a distro, I plan to have another Windows system for DC projects that are more optimised for Windows. Have similar results been verified for Windows?
Re: PCI-e bandwidth/capacity limitations
FAHbench show significant performance loss if I connect one GTX 1070 card to PCIe 3.0 x1 slot. Depending on test pattern - 9%-47% low performance. And 75-85% bus controller load on this card.
Something similar occurs in the processing of real jobs by core 21.
Win 10. Asrok B150 motherboard.
Something similar occurs in the processing of real jobs by core 21.
Win 10. Asrok B150 motherboard.
Re: PCI-e bandwidth/capacity limitations
Here are some real PPD numbers for the GTX 1070 on the Ubuntu 16.06 system I posted FAHBench tests on earlier.
In the 16x lane slot with Nvidia driver version 367.44 it averaged 659240 PPD one week and 629492 PPD the next.
In the 4x lane slot with Nvidia driver version 370.28 it averaged 662106 PPD over a week.
The driver upgrade sped up the folding more than switching to the 4x slot slowed it. That corresponds to what I saw in the benchmarks. I'm running the 370.28 driver in the 16x lane slot now. It hasn't been running a full week yet.
In the 16x lane slot with Nvidia driver version 367.44 it averaged 659240 PPD one week and 629492 PPD the next.
In the 4x lane slot with Nvidia driver version 370.28 it averaged 662106 PPD over a week.
The driver upgrade sped up the folding more than switching to the 4x slot slowed it. That corresponds to what I saw in the benchmarks. I'm running the 370.28 driver in the 16x lane slot now. It hasn't been running a full week yet.
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: PCI-e bandwidth/capacity limitations
x4 slot should be fine but x1 is too slow.