CPU Cores/Threads vs GPU

Sparkly · Post by **Sparkly** » Thu Jun 04, 2020 10:41 am

Lots of discussion going on about this, so I’ll add my experience with Core 22 in a multi GPU system.

I took a couple of my ETH Mining rigs and put Windows 10 on them for easier comparison.

Initial Setup:
ASRock H110 Pro BTC+
9 x AMD RX580 8GB on 1x Riser
Intel Pentium G4400 (2 Cores 2 Threads)
4GB RAM
SSD
FAH 7.6.13

Getting all 9 cards to run at the same time seems to be a no go, since the CPU just can’t handle it and basically “idles” at 100%, which makes the FAH Processes fail in one way or another.

Sweet spot in this setup with a G4400 CPU seems to be 5 running AMD GPUs, which keeps the CPU in the 80% area when all 5 cards are active at the same time, with the 100% peaks occurring when a new FAH process is started.

Trying to get more cards running I switched 1 of the CPUs to an i5-6600 (4 Cores 4 Threads) in the hopes of getting all cards running, but that is still a no go in terms of stability, since the sweet spot now only increased to 7 cards, which keeps the CPU in the 90% area when all 7 cards are running at the same time.

So:
G4400 runs 5 x AMD RX580 on 1x Risers
i5-6600 runs 7 x AMD RX580 on 1x Risers

Can’t really tell if the 1x risers make any PPD impact at all, since the PPD numbers I get seems to be the same as for other people reporting their numbers to the GPU performance statistics available, but not using risers.

Project:
P11763 – 360K PPD
P13408 – 370K PPD

My guess is that the 1x Risers only prevent the CPU from frequently swinging up and down for each FAH Process, since the transfer is more constant and not so bursty as maybe 16x would be.

As a comment to the people making the software I would suggest adding a CPU throttling option for the FAH Process so it can be prevented from grabbing 100% of the available CPU resources when it starts, since this seems to create some issues with the response timers for other FAH processes running, typically after a reboot, or if multiple slots start jobs at the same time.

HugoNotte · Post by **HugoNotte** » Thu Jun 04, 2020 12:45 pm

In general FAH needs 1 CPU thread per GPU. Since AMD works with interrupts, unlike NVidia, the CPU doesn't really need a full thread and you get away with a bit more than 1.5 GPUs per CPU thread.
Are you running Windows? In that case, you might have better luck with Linux?

Sparkly · Post by **Sparkly** » Thu Jun 04, 2020 3:01 pm

The setup uses Windows 10, as it says in the post, but I do suspect that running Linux would be slightly better regarding the CPU load for multiple GPUs at the same time, since there would be less waste for the CPU on OS overhead.

It is not that I can’t run all 9 AMD GPUs with the i5-6600, but then the stability really depends on which Projects I would get, since some Projects seem to use less CPU and more RAM, where other Projects use more CPU but less RAM, so if I would only get the more CPU less RAM Projects, then the CPU will struggle and the different running FAH Processes will have issues.

Stuff goes something like this in Task Manager:

Core 21 Projects has 10-15% CPU load per FAH Process
Core 22 Projects with Low RAM use has 15-20% CPU load per FAH Process
Core 22 Projects with High RAM use has 5-10% CPU load per FAH Process

I don’t know how GPU Projects are created, but it would seem that there are configuration options regarding how each Project utilizes the hardware available, so I am guessing the people creating the projects could optimise this and use more RAM instead of CPU when making GPU Projects.

Post by **PantherX** » Thu Jun 04, 2020 7:34 pm

Sparkly wrote:...I don’t know how GPU Projects are created, but it would seem that there are configuration options regarding how each Project utilizes the hardware available, so I am guessing the people creating the projects could optimise this and use more RAM instead of CPU when making GPU Projects.

Welcome to the F@H Forum Sparkly,

Generally speaking, the amount of RAM used generally corresponds to the number of atoms being simulated. The more atoms in a simulation, the more RAM is being used. For CPU, it boils down to how often the checkpoints are written as there's validation checks being performed on the CPU to verify the results of the GPU. Checkpoints needs to be written at specific intervals which varies from simulation to simulation. Hence, there's no consistency between Projects, just a range based on the type of simulation.

Post by **bruce** » Fri Jun 05, 2020 1:32 am

I'm not a former miner so my GPUs have never been installed in 1x risers. That leads me to what is a "stupid question" that you can think about.

A GPU running FAH is choked by the 1x restriction, but running the maximum number of GPUs does have advantages, too, so your report makes sense to me. Also, 4GB RAM ram is going to spend a lot of time paging with that many independent tasks.

1) I didn't look up your CPU, but it can probably support quite a few 2x or even 4x GPU connections. Is there a reasonable way you can reallocate those connections to get higher speed to some or all of the GPUs you're running?

2) I'd consider adding more RAM.Take a look at the Process tab of Task Manager and add whatever columns help you understand your RAM constraints. (There are other performance monitoring tools.) What you see will also depend on what projects have been assigned.

Sparkly · Post by **Sparkly** » Fri Jun 05, 2020 9:33 am

PantherX wrote:Generally speaking, the amount of RAM used generally corresponds to the number of atoms being simulated. The more atoms in a simulation, the more RAM is being used. For CPU, it boils down to how often the checkpoints are written as there's validation checks being performed on the CPU to verify the results of the GPU. Checkpoints needs to be written at specific intervals which varies from simulation to simulation. Hence, there's no consistency between Projects, just a range based on the type of simulation.

That makes sense when it comes to what I see on the CPU process load, since a higher atom count would spend more time on the GPU and less time with the CPU feeding the GPU, thus matching with the percentage load numbers I see for different projects, where higher RAM use is less CPU intensive.

bruce wrote:1) I didn't look up your CPU, but it can probably support quite a few 2x or even 4x GPU connections. Is there a reasonable way you can reallocate those connections to get higher speed to some or all of the GPUs you're running?

The motherboard is made for mining, so all the hardware slots are 1x on the board, meaning I can’t really put any card physically down on it.

bruce wrote:2) I'd consider adding more RAM.Take a look at the Process tab of Task Manager and add whatever columns help you understand your RAM constraints. (There are other performance monitoring tools.) What you see will also depend on what projects have been assigned.

To be honest I don’t see the RAM being a problem, since each FAH process generally use around 100MB of memory, and the high atom count projects use around 400MB, so I normally don’t see my RAM usage run more than 70%. Any accidental page swapping would also go to the onboard SSD, so that isn’t really a big constraint compared to RAM.

bruce wrote:A GPU running FAH is choked by the 1x restriction, but running the maximum number of GPUs does have advantages, too, so your report makes sense to me.

In general I don’t think the RX580 cards are fast enough to benefit significantly from a much higher x factor, so I would guess, based on the numbers I see from the process CPU load, that going to 2x or more would maybe increase the PPD by 5-10% at best, depending on atom count in the project.

Comparing the numbers I see on this system with the GPU statistics:
https://docs.google.com/spreadsheets/d/ ... utput=html
they seem rather similar to what the RX580 is listed at, and I would assume those numbers are not based on 1x reports from people.

I would assume that faster cards like the 1080 etc. would benefit significantly more from a higher x factor, since they would return work faster, and by the looks of it the lower atom count projects seem to communicate a lot more with the GPU than should be needed in my opinion.

The software people could probably improve on the amount of communication needed with the GPU by sending bigger chunks for computation, especially regarding the low atom count projects, to minimize all the tiny back and forth communication that seems to be going on between the CPU and GPU, something that would free up the CPU while the GPU did the work.

Post by **PantherX** » Fri Jun 05, 2020 11:00 am

Sparkly wrote:...To be honest I don’t see the RAM being a problem, since each FAH process generally use around 100MB of memory, and the high atom count projects use around 400MB, so I normally don’t see my RAM usage run more than 70%. Any accidental page swapping would also go to the onboard SSD, so that isn’t really a big constraint compared to RAM...

I have seen FahCore_22 use 2.5 GBs of RAM while folding. I round that up to 3 GBs per GPU just to be on the safe side and future proof for around 5 years as I personally upgrade a GPU twice in 5 years before moving onto a completely new build.

Sparkly wrote:...The software people could probably improve on the amount of communication needed with the GPU by sending bigger chunks for computation, especially regarding the low atom count projects, to minimize all the tiny back and forth communication that seems to be going on between the CPU and GPU, something that would free up the CPU while the GPU did the work.

Given that GPUs don't have any scheduling/priority system unlike the CPU, it may not be an ideal option as slow GPUs would encounter screen lag which can be a problem for some donors. Plus, not all Projects behave the same way and with varying amounts of atoms and simulations, it can get tricky. I do believe that the new FahCore_22 version 0.0.6 might have additional optimizations so once it's released, will have to to test it out and see how it behaves.

MeeLee · Post by **MeeLee** » Fri Jun 05, 2020 5:04 pm

I would presume that FAH won't assign large WUs on those GPUs in the first place.
And that it looks at whatever RAM is available.
Upgrading the RAM would be counterproductive, as you'll probably get access to larger WUs, which will run a lot longer on those 'slow' GPUs.
An RX580 is about as fast as a GTX 1060. PCIE 3.0 x2 is about what it needs, but it'll run on an x1 slot without too much drop in PPD.

You can always pause the WU that corresponds to your primary GPU (where the display is connected to), if you want to have a more fluid desktop/Youtube experience, just as long as you don't forget to unpause it after you finish.

Post by **PantherX** » Sat Jun 06, 2020 4:00 am

The Servers don't differentiate between the speeds of GPUs, just the architecture so it doesn't know what Pascal GPU is a low-end or a high-end one. However, it does get the RAM available so some researchers might use that if their Project uses a significant amount of RAM. However, like you said, if you have heaps of RAM and a low-end GPU, you will be assigned those WUs.

Regarding screen lag, you can disable hardware acceleration on some programs that are frequently used. I have used it in the past and it worked pretty well for me.

foldy · Post by **foldy** » Sat Jun 06, 2020 8:12 am

Sparkly wrote: Getting all 9 cards to run at the same time seems to be a no go, since the CPU just can’t handle it and basically “idles” at 100%, which makes the FAH Processes fail in one way or another.

It is not expected that FAH processes fail on 100% CPU load feeding the GPUs. What fail do you see?

Sparkly wrote: Can’t really tell if the 1x risers make any PPD impact at all, since the PPD numbers I get seems to be the same as for other people reporting their numbers to the GPU performance statistics available, but not using risers.
Project:
P11763 – 360K PPD
P13408 – 370K PPD

I can tell that Windows bottlenecks on x1 risers much more than Linux. On Linux for RX580 your would get 450k PPD.

Sparkly · Post by **Sparkly** » Sat Jun 06, 2020 8:38 am

foldy wrote:It is not expected that FAH processes fail on 100% CPU load feeding the GPUs. What fail do you see?

What seems to happen at constant 100% CPU load is that one or more FAH processes time out and the jobs flush before they are done, so I suspect that there is some kind of “I am alive” timer somewhere, since the slots flushing jobs just pick up another job when the CPU load drops after the flush.

Sparkly · Post by **Sparkly** » Sat Jun 06, 2020 9:00 am

MeeLee wrote:You can always pause the WU that corresponds to your primary GPU (where the display is connected to), if you want to have a more fluid desktop/Youtube experience, just as long as you don't forget to unpause it after you finish.

PantherX wrote:Regarding screen lag, you can disable hardware acceleration on some programs that are frequently used. I have used it in the past and it worked pretty well for me.

Since this is a mining rig setup there is no issue with screen lag or whatnot from all the working GPUs running at max, since the display is connected to the onboard Intel chipset and not a working GPU.

Sparkly · Post by **Sparkly** » Sat Jun 06, 2020 9:10 am

foldy wrote:I can tell that Windows bottlenecks on x1 risers much more than Linux. On Linux for RX580 your would get 450k PPD.

The PPD is really all over the place, since I also get jobs where I have 490K PPD, but they are not as frequent as the 350K ones are, but I do suspect that Linux would do better than Windows 10 regarding this.

Sparkly · Post by **Sparkly** » Sat Jun 06, 2020 10:05 am

PantherX wrote:The Servers don't differentiate between the speeds of GPUs, just the architecture so it doesn't know what Pascal GPU is a low-end or a high-end one. However, it does get the RAM available so some researchers might use that if their Project uses a significant amount of RAM. However, like you said, if you have heaps of RAM and a low-end GPU, you will be assigned those WUs.

Actually, getting bigger WUs that use more RAM would probably be better, since it would ease the CPU load and take longer on the GPU, as can already be seen from the numbers I get, where higher RAM usage has a lower CPU load and generates more PPD, so is there any statistics available showing WU size distribution based on amount of RAM available?

What I do find rather pointless is that there is no real usage of the GPUs memory to any extent, since the VRAM usage I see is minimal at best, so by looking at the numbers for the GPUs it doesn’t seem to matter if the graphics card has 100MB of memory or 8GB, since it doesn’t seem to be used much.

I would assume that more of the WU could be put directly in GPU memory in one go, instead of constantly feeding it tiny pieces, especially since I find it hard to believe that after the Core 22 with OpenCL 1.2 requirement you will find a graphics card with less than 512MB available memory.

foldy · Post by **foldy** » Sat Jun 06, 2020 10:07 am

Sparkly wrote:
foldy wrote:It is not expected that FAH processes fail on 100% CPU load feeding the GPUs. What fail do you see?
What seems to happen at constant 100% CPU load is that one or more FAH processes time out and the jobs flush before they are done, so I suspect that there is some kind of “I am alive” timer somewhere, since the slots flushing jobs just pick up another job when the CPU load drops after the flush.

And how does it look like in FAH logfile when a job gets flushed?

Folding Forum

CPU Cores/Threads vs GPU

CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU