1 core per GPU still holds true?

ProDigit · Post by **ProDigit** » Fri Jan 18, 2019 2:38 am

"I think it is better to have small number of fast GPUs than high number of slow GPUs."

@foldy, Yes, you are right on that.
I guess it depends on what your initial capital is.
If I had an additional $1500, I might have gone for a used pair of TITAN Vs, or RTX 2080 Tis from the getgo.
But as things stand, a GTX 1060 from China, costs me $50-70. An RTX 2070 costs me $500+.
My GTX 1060 can be folding 5 years to cover the electric cost of the RTX, and by then there will be better and cheaper cards out there.

If all goes well, I'll be running the RTX 2070, + 3x GTX 1060 + 1050 +1030 in my second server soon, because that's what I can currently afford.
Thankfully my Xeon has 3x PCIE 1x slots, and 2x PCIE16 slots, and supports up to 48PCIE lanes, so I won't have to worry about it, nor about having enough CPU cores (12 cores, 24 threads on my 'new' server).

Just one thing I don't like about Ebay, is that I don't get my orders within 3-5 days, like on amazon.

Holdolin · Post by **Holdolin** » Fri Jan 25, 2019 3:44 am

I agree ProDigit. Up front cost is the thing here. If you have the up-front capital, the i'd highly recommend 2080ti's all day long. They are the most efficient cards out there in terms of PPD/Watt according to https://docs.google.com/spreadsheets/d/ ... edit#gid=0 . Problem most of us have is we just don't have that kind of capital to build systems out of those monsters.

ProDigit · Post by **ProDigit** » Fri Jan 25, 2019 1:44 pm

I just ran 2x GTX 1060s on my server,
The GTX 1060 on the PCIE 16x slot, runs at 333k PPD.
The one on the PCIE 1x slot, runs at 105k PPD, just 20-25k PPD faster than my 1050.
I think we've pretty much figured out the max setting for windows.
I will soon switch over to Linux.

As far as the RTX cards, your hardware must be compatible.
The RTX 2070 I purchased, would have been $500, vs a GTX 1080, or 1070 ti, it beats both of them in terms of PPD/$.
But it doesn't run on (at least some) older hardware.
I went to their forums, and saw tens of complaints about bios incompatibility, starting from GTX 1070 and up.
And it's true, the GTX cards I own work all fine, below 1060, and the RTX card wasn't compatible with neither my server, nor my second server, nor my laptop (using a egpu docking station).

kbeeveer46 · Post by **kbeeveer46** » Sat Jan 26, 2019 8:35 pm

Holdolin wrote:I agree ProDigit. Up front cost is the thing here. If you have the up-front capital, the i'd highly recommend 2080ti's all day long. They are the most efficient cards out there in terms of PPD/Watt according to https://docs.google.com/spreadsheets/d/ ... edit#gid=0 . Problem most of us have is we just don't have that kind of capital to build systems out of those monsters.

The prices in your spreadsheet are out-dated. In the past week I took that spreadsheet and looked up all the prices and created my own. Here's my version below. All the bright green cells are the updated prices I found within the past week using PC Part Picker. I also added a new PPD column with values from Overclock.net (column C) which seemed more accurate so I changed PPD/price column to use to use those values instead. I ended up buying the RTX 2070 because it had the best PPD/price. Even looking at it now a week later, some of the prices I found have already changed. Anyone who comes across that spread sheet like you and I did need to look up the current prices before trusting anything in it.

My spreadsheet: https://docs.google.com/spreadsheets/d/ ... vd/pubhtml

Overclock.net PPD values: https://www.overclock.net/forum/55-over ... abase.html

foldy · Post by **foldy** » Sun Jan 27, 2019 8:45 am

Even if prices change every week the spreadsheet still shows the relative difference between the GPUs. The PPD should be an average value and PPD depends on work unit and is different on Windows or Linux. But again you can use it as relative value to compare the GPUs. Put the disclaimer on top
https://docs.google.com/spreadsheets/d/ ... Ek/pubhtml

kbeeveer46 · Post by **kbeeveer46** » Sun Jan 27, 2019 2:55 pm

foldy wrote: The PPD should be an average value and PPD depends on work unit and is different on Windows or Linux. But again you can use it as relative value to compare the GPUs.

That's why I used the PPD list from Overlock.net. It looks to me like it's updated in real-time from people submitting their personal results. I took the latest PPD values from it from a week ago when I was looking to buy a GPU. But, I admit, I'm new to all of this and I could have made some wrong assumptions. https://www.overclock.net/forum/55-over ... abase.html

foldy · Post by **foldy** » Sun Jan 27, 2019 4:03 pm

The last column of overclock.net spreadcheet shows the sample size. GTX 1080 has 1366 samples so its PPD 876k seems valid. RTX 2070 has only 8 samples so this is definitly not real-time but users can submit their experience. And RTX 2070 value 1284k PPD is almost equal to the RTX 2080 value 1395k PPD. Knowing that RTX 2080 has 25% more shaders than RTX 2070 it doesn't make sense that it only has 10% more PPD. I believe the RTX 2080 gets 1400k PPD but RTX 2070 should be 1100k PPD average depending if you run Windows or Linux or have other HW limits. So in general you can calculate the performance difference of GPUs in same generation using shader count. e.g. we take the gtx 1080 with 876k PPD (verified with high sample size) which has 2560 shaders and compare with gtx 1080ti which has 3584 shaders => gtx 1080ti will get 1226k PPD.

Post by **bruce** » Mon Jan 28, 2019 3:37 am

foldy wrote:...Knowing that RTX 2080 has 25% more shaders than RTX 2070 it doesn't make sense that it only has 10% more PPD....

True, but it has been observed. The GPU utilization of the 2080 is significantly lower than the 2070 on the same project. No explanation has been offered, although many possibilities have been suggested and are being considered but so far, nobody has succeeded in identifying the problem.

foldy · Post by **foldy** » Mon Jan 28, 2019 10:57 am

Didn't here about that, who has problems with rtx 2080 not getting 1400k PPD?

Post by **bruce** » Mon Jan 28, 2019 8:50 pm

It depends on the project. Small proteins are worst than large proteins 00 but that's not a absolute.

ProDigit · Post by **ProDigit** » Tue Jan 29, 2019 12:35 am

kbeeveer46 wrote: I ended up buying the RTX 2070 because it had the best PPD/price. Even looking at it now a week later, some of the prices I found have already changed.....

Yeah, which is why I ended up buying the RTX 2070, which sadly wasn't compatible with my hardware.
If your hardware supports it, the RTX 2070 is definitely the best PPD/$$ right now!

Another option for those who don't have $500 to upgrade right now,
I have found that on ebay, you can buy second hand GTX 1060s, usually around $100 a piece.
Buying 3 of them gets you the same PPD as 1x RTX 2070; with $200 spare for the extra electricity those cards use (provided, your motherboard can host 3 graphics cards).

After 1 year one could sell one or two of the GTX cards, and add to this the $100 you saved from last year, buy an RTX 2070 card; which at that that time probably will cost a lot less than $500 on the second hand market.

gordonbb · Post by **gordonbb** » Tue Jan 29, 2019 2:27 am

bruce wrote:
foldy wrote:My theory is CPU loads data through pcie to GPU and then GPU processes data without CPU until it has a result. Then CPU gets result from GPU through pcie and sends next data to GPU dependent on the result. So the CPU to GPU communication occurs in peaks and then CPU spin waiting time.
If the pcie bus speed is too slow or has bad driver usage like nvidia opencl on Windows then it takes too long for the CPU to GPU communication which makes GPU get idle for short time which slows down FAH. If CPU speed is too slow then the same occurs. Then we have a ratio of CPU/pcie to GPU time and if it is CPU/pcie 1 to GPU 99 that is great. If it is CPU/pcie 10 to GPU 90 we have a light bottleneck. And if it is CPU/pcie 50 to GPU 50 then we have bad performance.
That's my theory, too. Certainly the PCIe data transfers occur in peaks and the CPU spin-waits which, for the sake of increased game frame rates, the CPU is always ready to transfer data without the added delay of interrupt processing and task switching overhead. (That's probably insignificant for FAHBench).

When the GPU finishes the processing that has been assigned to it, it will pause until a new block of data is transferred so it's ready to be processed. It doesn't matter whether the PCIe bus is slow or the CPU is slow or the CPU is busy with some other task -- the result is the same -- nothing for the GPU to process yielding lower average performance. It's sort of like your browser opening a document over a slow internet connection. As soon as the first page of data is displayed, you start reading and it doesn't bother you if pages 2, 3, ... have been loaded into memory or not; they'll be ready to read before you're ready to look at them.

All of the atoms are divided into multiple pages. As soon as the positions of all nearby atoms are available for the current time, the GPU can begin summing the forces on each atom. As this process advances, I suspect that this data must be returned to main RAM. Once the last page is available to the CPU, updated positions of all atoms for the next time step can be computed and sending them to the GPU begins again.

What I do not know is how the software figures out how to allocate VRAM. (What constitutes a "page"?) Clearly, once the first page is transferred, the GPU can start running and the second page can start transferring. If multiple pages fit in VRAM, other shaders can processing them in parallel so the GPU doesn't need to wait again until the last page has been computed.

Just to add to this there are differences definitely in the PCIe Bus Utilization while folding under Windows versus Linux.

Windows:
SuperMicro X10SL7-F, e3-1231v3 Xenon, 32GB DDR3 Cruicial ECC, EVGA GTX 1060 6GB in PCIe3 x16 (at x8) (a server-class mobo with an older card)

Linux:
Acer nForce iTX mobo, AMD Athlon II X2 2.9GHz, 3 GB DDR2, EVGA RTX 2070 XC Ultra in PCIe2 x 16 (an ancient relic with a new card)

Granted the Windows 10 system is my daily driver and was also driving 3 24" 1080P displays but I've seen similar behavior on a headless system. It seems that windows has much more contention for the PCIe bus than Linux.

Granted also that these numbers come from polling using nvidia-smi once a minute. There's an experimental feature included in nvidia-smi that allows logging to a file at a much higher rate that might show more detail but logging at a higher frequency might also impact traffic.

Also, to be fair, the numbers reported are from the point of view of the GPU and not the CPU and so are dependent on the underlying driver architecture.

Mstenholm · Post by **Mstenholm** » Tue Jan 29, 2019 7:15 pm

You got plenty of CPU treads, I know that CPU folding is still needed but what I would do (sorry only speed read the posts) would be to download HFM, turn off HT and let it run for a week with and without HT. I use my spare CPU cycles for WCG (two threads left for things incl folding) and have other experiments going on under Linux but I did try once with a weaker GPU (970 and saw a small improvement on a 4 GHZ i7 16x2.0).

Folding Forum

1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?

Re: 1 core per GPU still holds true?