CPU Cores/Threads vs GPU

HugoNotte · Post by **HugoNotte** » Mon Jun 22, 2020 8:52 am

Sparkly wrote:
Joe_H wrote:The assignment system does not work the way, nor have the features you appear to think it does.
I don’t get what features you think I am talking about that is missing, since the thing I describe is already happening and being used, where the projects are assigned/made to run as CPU, GPU or both (via cloning or whatever), something that is decided by the person/persons setting up the projects for distribution, same as is being done for the P148xx projects, which are CPU only, so I am not talking about subdividing GPUs into different GPU classes, I am talking about running on CPU or GPU.

You don't seem to get that the consideration, whether or not to assign a project to GPU or CPU doesn't only depend on the atom count. A CPU can DIFFERENT types of processing / calculations which the GPU can't do. A GPU is again better at doing other types of calculations than a CPU. Those factors play a roll in the creation of WUs, too.

When you say a small GPU WU puts more load on your CPU than a larger one, which in turn slows down your processing of WUs, do you have 1 CPU core / thread dedicated per GPU? Otherwise how can a GPU WU put such a load on a CPU that 1 threat can't cope with it?
If a multi GPU system is configured in such a way that each GPU has to cope with less than 1 dedicated thread, it doesn't really comply with the FAH system requirements, which might not be a problem when running certain large WUs, but in general the system is not working optimal. The fault in that case is with whoever set up that system, not with FAH.
So whom should FAH cater for? The majority of volunteers who have a pretty standard computer with a mulitcore CPU and 1 GPU, maybe even 2 GPUs, who then allocate 1 thread to each GPU or those few who try to run 4+ GPUs on a some dual core without HT and push their system beyond FAH specs?

Sparkly · Post by **Sparkly** » Mon Jun 22, 2020 11:34 am

HugoNotte wrote: You don't seem to get that the consideration, whether or not to assign a project to GPU or CPU doesn't only depend on the atom count. A CPU can DIFFERENT types of processing / calculations which the GPU can't do. A GPU is again better at doing other types of calculations than a CPU. Those factors play a roll in the creation of WUs, too.

Seriously, have you ever looked at a PPD CPU chart and seen the numbers a CPU produces compared to GPU? Maybe something like this: viewtopic.php?f=16&t=34230

Most people will generally have something like i7-8700 or less, which has less than 100k PPD averages, which means that most GPUs used these days, after the OpenCL 1.2 requirement came along, will do more, which can be seen here: viewtopic.php?f=38&t=34240

HugoNotte wrote: When you say a small GPU WU puts more load on your CPU than a larger one, which in turn slows down your processing of WUs, do you have 1 CPU core / thread dedicated per GPU? Otherwise how can a GPU WU put such a load on a CPU that 1 threat can't cope with it?

You generally can’t assign a CPU core to a specific GPU, that is just not how CPU cores work, since the CPU handles the load balancing between the available cores on its own, for the plethora of CPU threads running at any given time in a OS like Windows, where most of them will be idle for most of the time.

Running 1 card or 10 cards in a system really makes no difference on this, since the limiting factor will still be the load balancing and speed of the CPU vs GPU and RAM, so a lower end 4 core/thread CPU may handle 4 cards fine, while a higher end 4 core/thread CPU can handle 10 cards fine, and it will still also be dependant on what other stuff the CPU is doing on the system, other than folding, so if Windows Defender is active you might only be able to run 2 cards on a 4 core/thread CPU.

The thing we are talking about here is the VERY small atom count projects, typically the lower than 10k atom count ones, and at this time there are currently 279 active projects, where 16 of them is in the VERY low atom count category:
https://apps.foldingathome.org/psummary?visibility=ALL

So it is not a big thing to put those few ones directly to CPU to begin with and keep them away from GPU, unless there is spare capacity in the GPU part of the network, or due to some other completion time reason for a project.

HugoNotte wrote: So whom should FAH cater for? The majority of volunteers who have a pretty standard computer with a mulitcore CPU and 1 GPU, maybe even 2 GPUs, who then allocate 1 thread to each GPU or those few who try to run 4+ GPUs on a some dual core without HT and push their system beyond FAH specs?

Well, looking at the numbers on OS distribution: https://stats.foldingathome.org/os
I would argue that the majority of current contributors are in the categories Dedicated, Nerds and Gamers.

The fact that FAH has lost more than 40% of its computing power, over the last month or so, would indicate that the “Man in the street” category has diminished significantly.

So, FAH should cater to getting the most WUs completed as fast as possible with the resources available in the network, regardless of where the resources come from.

HugoNotte · Post by **HugoNotte** » Mon Jun 22, 2020 2:27 pm

Soarky, I am trying to make you understand that CPU processing abilities and GPU processing abilities differ, vastly. The CPU can do things a GPU won't be able to do, some things a GPU can do a lot better than a CPU. Hence atom count isn't the only deciding factor for a project to issue CPU or GPU WUs.

Seriously, have you ever looked at a PPD CPU chart and seen the numbers a CPU produces compared to GPU? Maybe something like this: viewtopic.php?f=16&t=34230

Most people will generally have something like i7-8700 or less, which has less than 100k PPD averages, which means that most GPUs used these days, after the OpenCL 1.2 requirement came along, will do more, which can be seen here: viewtopic.php?f=38&t=34240

Relevance? Nobody disputes that a decent GPU can produce more PPD than most CPUs.

You generally can’t assign a CPU core to a specific GPU, that is just not how CPU cores work, since the CPU handles the load balancing between the available cores on its own, for the plethora of CPU threads running at any given time in a OS like Windows, where most of them will be idle for most of the time.

Running 1 card or 10 cards in a system really makes no difference on this, since the limiting factor will still be the load balancing and speed of the CPU vs GPU and RAM, so a lower end 4 core/thread CPU may handle 4 cards fine, while a higher end 4 core/thread CPU can handle 10 cards fine, and it will still also be dependant on what other stuff the CPU is doing on the system, other than folding, so if Windows Defender is active you might only be able to run 2 cards on a 4 core/thread CPU.

Well, I have never said that you can assign a physical CPU core to a GPU WU. FAH requires 1 thread per GPU. If you allow for less, because certain WUs let you get away with it and, don't blame small atom count WUs if they all of a sudden require that 1 thread for them self.

The thing we are talking about here is the VERY small atom count projects, typically the lower than 10k atom count ones, and at this time there are currently 279 active projects, where 16 of them is in the VERY low atom count category:
https://apps.foldingathome.org/psummary?visibility=ALL

I have counted 9 GPU projects out of way over 100 with an atom count of less than 10k. Big deal, about 7 or 8% of the GPU projects issue low atom count WUs.

So it is not a big thing to put those few ones directly to CPU to begin with and keep them away from GPU, unless there is spare capacity in the GPU part of the network, or due to some other completion time reason for a project.

Says who? The average mid to low range GPU will probably still crunch it faster than the average CPU, unless one tries to squeeze more than 1 GPU per available CPU thread. Besides, see above.

Well, looking at the numbers on OS distribution: https://stats.foldingathome.org/os
I would argue that the majority of current contributors are in the categories Dedicated, Nerds and Gamers.

How does that list of OS distribution lead you to that? Windows, the goto OS for the average dude with a low or mid range hardware far outweighs Linux, THE nerd OS. If you even want to draw conclusions from a totally unrelated list, you have proven yourself wrong there.

So, FAH should cater to getting the most WUs completed as fast as possible with the resources available in the network, regardless of where the resources come from.

Isn't that what they do? Churn out WUs to be crunched instead of putting massive effort into fiddling with a running system, introducing a lot more possible weak points.

I don't quite see how your idea would benefit the majority of volunteers here, or FAH overall. It would benefit a minority of people who try to squeeze more than 1 GPU per CPU thread. It would benefit a few people who are more interested in gaining PPDs (what for?) than simply donating resources to FAH, to science in order to get work done. You make it sound as if FAH i so hugely inefficient, which it is not. Work gets done. You lost out on a few PPD. Big deal.
Setup your system so that every GPU got one CPU thread available, reserve 1 thread for your Windows OS and then see whether small atom count GPU WUs still slow down your productivity.

Sparkly · Post by **Sparkly** » Mon Jun 22, 2020 3:01 pm

HugoNotte wrote:Soarky, I am trying to...

Your assumptions and bla bla isn’t even worth spending the time to comment on in more detail, since you clearly have no clue what you are talking about, and you have no useful real life numbers regarding any of this to contribute, so maybe you should take a trip over to GitHub and see what is actually going on.

Neil-B · Post by **Neil-B** » Mon Jun 22, 2020 6:30 pm

… and with respect Sparkly a slight attitude adjustment on your part towards someone trying to help/engage from another perspective to yours might be advised? … Disagreeing fine, choosing to ignore fine, comments bordering on belittling and insulting don't sit well within the forums and are unnecessary … You may get more traction for your thoughts and ideas if you aren't rude?

Sparkly · Post by **Sparkly** » Mon Jun 22, 2020 7:09 pm

Neil-B wrote:… and with respect Sparkly a slight attitude adjustment on your part towards someone trying to help/engage from another perspective to yours might be advised? … Disagreeing fine, choosing to ignore fine, comments bordering on belittling and insulting don't sit well within the forums and are unnecessary … You may get more traction for your thoughts and ideas if you aren't rude?

There is no problem with engaging with a completely different view/perspective, or totally disagreeing for any reason whatsoever, which is why I responded in detail on the first comment, then I also responded in detail on the second comment, that by the way contained plenty of the belittling attitude towards me that you are describing doesn’t sit well within the forums, so when the third comment arrives, which is basically a troll post, then there is no more point in wasting my time on writing yet another detailed response, when the commenter hasn’t even bothered to read what the thread is about.

Post by **bruce** » Thu Jun 25, 2020 5:52 am

Let's suppose that there are too many small projects and not enough large projects. What should FAH do? First, obviously they will insure that a (small) project is assigned to each CPU and the large projects are distributed to GPUs.

OK. So now there are many GPUs that don't have any assignments and we have a lot of small projects queued up with nobody to work on them.

<sarcasm_on> Suggestions anyone? <end_sarcasm> How long should we let this condition persist?

Sparkly · Post by **Sparkly** » Thu Jun 25, 2020 11:04 am

bruce wrote:Let's suppose that there are too many small projects and not enough large projects. What should FAH do? First, obviously they will insure that a (small) project is assigned to each CPU and the large projects are distributed to GPUs.

OK. So now there are many GPUs that don't have any assignments and we have a lot of small projects queued up with nobody to work on them.

<sarcasm_on> Suggestions anyone? <end_sarcasm> How long should we let this condition persist?

In general we are mostly talking about not sending the sub 10k atom projects to GPU, which there are very few of in the first place, since my earlier count, which included the low atom count test projects, had a distribution where 16 of the 279 active projects where sub 10k - https://apps.foldingathome.org/psummary?visibility=ALL

I don’t know how many additional sub 10k projects this COVID Moonshot thing will create, since the test projects for it has ranged from 4k to 130k atoms, but it would be beneficial for the total utilization and computing power available to distribute the projects in a way that puts the lower atom count projects on CPU and the higher atom count projects on GPU.

Current running test projects indicate that you will easily lose more than 60% of the total available GPU computational power for the active GPU projects, if you send sub 10k atom projects to GPU, and we are not talking about for the Moonshot projects alone, we are talking about the impact for everyone running any GPU project in the same timeframe.

Since there are several projects currently running with 50k or more atoms, that are only running on CPU, the logical thing would be to reassign those projects to GPU, if you somehow end up in a situation where you happen to have spare GPU capacity.

And if you actually happen to have spare capacity in either the CPU or GPU part of the network, that you can't fill, then you basically have a luxury problem.

The best distribution would be somewhat easy to calculate/estimate, if the amount of incoming projects and their atom count were know, and if the https://stats.foldingathome.org/os could be upgraded to also include the CPU/GPU distribution within each column.

Post by **bruce** » Fri Jun 26, 2020 6:28 am

OK, are you willing to deal with folks who complaing about a 2060 being assigned a 22K atom project because his GPU is only 75% busy? He's got a point.
Subject: WU 16904 VERY low PPD

ajm · Post by **ajm** » Fri Jun 26, 2020 10:40 am

A thought: it is quite probable that if FAH creates a precise hardware/WUs matching system, many users won't get a constant flow of WUs tailored for their hardware. And people would then demand an "anything goes" option in order to keep their kit(s) occupied, even if not at 100% capacity, rather than waiting on the perfect WU. It is very hard to please everyone. If FAH is to try that route, it must be understood and accepted from the start.

BobWilliams757 · Post by **BobWilliams757** » Sat Jun 27, 2020 1:20 pm

ajm wrote:A thought: it is quite probable that if FAH creates a precise hardware/WUs matching system, many users won't get a constant flow of WUs tailored for their hardware. And people would then demand an "anything goes" option in order to keep their kit(s) occupied, even if not at 100% capacity, rather than waiting on the perfect WU. It is very hard to please everyone. If FAH is to try that route, it must be understood and accepted from the start.

I gave this some thought as well. We already saw this with the preferences for Covid when they were short on Covid related WU's and people were assigned others. To perfect a system would take a great deal of time. And no matter what, someone is going to be upset still.

And to some extent, we would need a group of beta testers with a great deal of varied hardware to even figure out what projects run better on slow cards, and/or suffer on faster cards. As it is now, I think most beta testers have at least mid to upper range GPU's. I considered volunteering, but when I started looking I realized that the current crew could run 10-12 of these in a day, post results, adjustments are made, and they are on to advanced. If I take a day to process a WU, by the time I have results to report I've missed the bus. And slowing down the process to adjust for hardware really just slows things down still, as a great number of people will just fold them regardless of PPD returns. But if it helped on the exclude side to not issue them to systems with errors, that could be a plus I guess.

I also considered that a PPD database by project might help the researchers view trends in what different hardware is doing. This would take a great deal of time, but there might be a possibility that it could be somewhat automated over time and help them select the prime candidates for inclusion/exclusion on certain projects.

Post by **bruce** » Sat Jun 27, 2020 5:28 pm

We may already have the data the beta testers will gather. One of the active GPU projects has been gathering that information already. I have not idea how complete it is, so beta testers may also be appropriate.

Any server-based recommendations will always be a preference, like the covid cases that you reference. If a project is recommended for your GPU and there don't happen to be any WUs available, the server is going to give you something else (if possible).

Sparkly · Post by **Sparkly** » Tue Jun 30, 2020 10:12 am

Just to show what can happen on dedicated multi GPU systems, when very low Atom Count projects arrive:

Where you can see several of the P13415 (very low Atom Count) WUs, which would normally run at 130k+ PPD on this system, when there are only one of them, now runs at 5878 PPD, 2392 PPD and 575 PPD, based on the order they arrived in.

The impact on the rest of the running projects are also obvious, since the P14253 (High Atom Count) WU, which is currently running at 153k PPD, normally runs at 550k+ PPD alongside other projects on this system, until the P13415 arrives.

Even thou this is more on the extreme side of things, it still shows what is generally happening to the overall computational speed in the GPU part of the network, when very low Atom Count projects are assigned to it.

MeeLee · Post by **MeeLee** » Tue Jun 30, 2020 5:37 pm

Sparkly wrote:Just to show what can happen on dedicated multi GPU systems, when very low Atom Count projects arrive:

Where you can see several of the P13415 (very low Atom Count) WUs, which would normally run at 130k+ PPD on this system, when there are only one of them, now runs at 5878 PPD, 2392 PPD and 575 PPD, based on the order they arrived in.

The impact on the rest of the running projects are also obvious, since the P14253 (High Atom Count) WU, which is currently running at 153k PPD, normally runs at 550k+ PPD alongside other projects on this system, until the P13415 arrives.

Even thou this is more on the extreme side of things, it still shows what is generally happening to the overall computational speed in the GPU part of the network, when very low Atom Count projects are assigned to it.

If you're running an Nvidia GPU of Pascal (GTX) gen, or greater, you can adjust the idle frequency of your GPU.
What happens with a lot of these GPUs, is that they enter one of 3 states (~1440 something Mhz, ~1350Mhz, and lower).
When your GPU has a load below a certain gpu cores, it'll lower in frequency.
Not only do you run lower cuda cores, you also will suffer lower GPU frequency.
If this is the case, in Linux, there's a command where you can set the minimum frequency (eg: 1935Mhz) and the max frequency (eg: 2050Mhz).
This max frequency was very important for older RTX GPUs, that had no upper threshold, and could, with an overclock, hit way past 2100Mhz, and crash and die; or permanently lock in at 1350Mhz (their default limp-mode).
in linux the command would be:

Code: Select all

sudo nvidia-smi -i 1 lgc 1935,2010

Where '1' is the second GPU (0 is the first), '1935' is the new idle frequency, and '2010' is the max frequency the GPU will support.
If you ever get a large atom WU, the frequency can drop below 1935Mhz, as the drivers try to keep that frequency within the tolerances given to them, but can't do the impossible.
By omitting '-i 1' in the above example, the setting will apply to ALL the Nvidia GPUs installed, supported by that driver.

I'm sure there are options like this for Windows as well..

Sparkly · Post by **Sparkly** » Tue Jun 30, 2020 9:52 pm

MeeLee wrote:I'm sure there are options like this for Windows as well..

What I am showing in the running statistics picture has nothing to do with GPU frequencies, and in my case all the GPUs are hard coded in firmware to always run at max, on top of also having better memory timings than regular RX 580 cards, so they can’t run slower than max, even if the OS or any idle settings wanted them to.

The added slowdown effect is purely caused by poorly programmed WU to GPU communication and nothing else, since the programming doesn’t take into account that certain hardware operations have a huge overhead, every time you use them, thus treats every WU the same when communicating with the GPU, resulting in low atom count WUs accessing hardware resources significantly more times in the same fixed time period than a high atom count WU would do.

A typical huge overhead in the programming comes from accessing the PCIe bus, so the more often you force access to it with tiny bits of information the slower everything gets, and I mean everything, since it will impact your CPU load and entire system in a huge way, as can be seen here:

which is the constant CPU load of a single P13415 WU running on a dual core CPU doing nothing else than feeding the GPU, which means the low atom count WU is basically spending more CPU resources on getting access to the PCIe bus to communicate with the GPU, than the CPU would have spent just calculating everything on its own.

Folding Forum

CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU