Fast GPU, not enough CPU power to keep up?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Fast GPU, not enough CPU power to keep up?
I'm foreseeing a problem, modern CPUs are heading to more cores rather than more speed per core.
From rough calculations, with a fast GPU like a 3080, even a Ryzen 9 will only just keep up with one of it's cores.
Therefore we need a way (as many do with Boinc) to have more than 1 CPU core allocated to a GPU.
Does, or will in the near future, FAH run a task on a GPU and more than 1 CPU core? If not, is there a way of telling it to run two tasks per GPU? If not can we somehow force two FAH clients to run at once, and they both use the same GPU (but obviously a different CPU core)?
From rough calculations, with a fast GPU like a 3080, even a Ryzen 9 will only just keep up with one of it's cores.
Therefore we need a way (as many do with Boinc) to have more than 1 CPU core allocated to a GPU.
Does, or will in the near future, FAH run a task on a GPU and more than 1 CPU core? If not, is there a way of telling it to run two tasks per GPU? If not can we somehow force two FAH clients to run at once, and they both use the same GPU (but obviously a different CPU core)?
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Re: Fast GPU, not enough CPU power to keep up?
v8 does support more than one CPU core on a GPU job, although I'm not sure if this is how/why its going to be used and its not implemented in any cores yet AFAIK.
-
- Site Admin
- Posts: 8224
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Fast GPU, not enough CPU power to keep up?
Some of the features in v8 are there to support future folding core options. Both the OpenMM and GROMACS code the cores are based on have support in recent versions for using both a GPU and CPU together to work on a WU. However testing is needed to see how best to optimize using both resources, and it may depend on the specific parameters connected to various projects. From what I have heard it has been doable on a standalone system where specific hardware can be targeted, but will be harder to implement for systems where that is not known ahead of time.Alex_Atkin wrote: ↑Mon Apr 17, 2023 2:48 am v8 does support more than one CPU core on a GPU job, although I'm not sure if this is how/why its going to be used and its not implemented in any cores yet AFAIK.
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
Even something like Boinc does would be good. If I have a fast GPU, and the task on it is requiring a lot of CPU time to assist (more than 1 core), but it will only use 1 core, I can run multiple tasks on the GPU, so each gets a core. It's not in the actual GUI interface, but editing a config file (or with some projects on the website settings for my account) lets me say how many tasks run at once on the GPU. So if I see a GPU at low percentage load in MSI Afterburner, I set it to do two at once, then even three or four if needed.
-
- Posts: 2121
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: Fast GPU, not enough CPU power to keep up?
You cannot run multiple tasks on any desktop GPU. nVidia and AMD do not support it for desktop users.
High CPU core usage while GPU is folding is due to driver overhead. This can be proven by running same project on nVidia card and AMD card. AMD cards rarely use much of the CPU, while nVidia tends to push single cpu core to the limit quite a lot of times.
High CPU core usage while GPU is folding is due to driver overhead. This can be proven by running same project on nVidia card and AMD card. AMD cards rarely use much of the CPU, while nVidia tends to push single cpu core to the limit quite a lot of times.
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
Of course you can run multiple tasks on a GPU, I do it all the time with Boinc. Up to 4 usually. This means each task can utilise a CPU core each, so the GPU is kept busier. Folding lacks this ability.
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
This is what I'm getting on my main computer right now. It has a 24 core Ryzen 9 3900XT CPU, and a Radeon R9 Nano GPU. With Boinc I run them both flat out, with Folding I'm not giving as much research as I could be. All I can do with Folding is turn the CPU usage up and down. But turning it up makes the GPU slow down.


-
- Posts: 2121
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: Fast GPU, not enough CPU power to keep up?
Whatever you are doing in BOINC, I assure you, you are not running multiple tasks on a single GPU at the same time. Your GPU driver does not support that. So folding does not lack that ability, GPUs themselves lack it. Only server/enterprise grade GPUs have support for these type of endeavours.
What you are describing in your second comment is not what you think is happening.
Yes, busy CPU does slow down GPU work by a little bit. Some projects suffer more, some are not influenced by it.
The reason you do not see CPU 100% load in your screenshot is because fahclient automatically gives one thread to feed the GPU. That can be confirmed by looking at Fahcontrol main tab, where it will show CPU slot with CPU:23. 24th thread is given to a GPU. Now, I would suggest setting that CPU slot to CPU:21, you will not lose any CPU performance, but might gain a bit in GPU performance.
Now, the reason you are not seeing your GPU used 100% has absolutely nothing to do with your CPU.
The reason is simple:
AMD OpenCL implementation if catastrophically bad.
There are plans on improving this drastically, but even then you might see drops in utilisation, due to the nature of simulations, but things will improve performance wise more than double (if early tests to be believed).
So to summarise: any decently modern CPU is enough to feed any modern GPU, let alone old GPU like your Nano
What you are describing in your second comment is not what you think is happening.
Yes, busy CPU does slow down GPU work by a little bit. Some projects suffer more, some are not influenced by it.
The reason you do not see CPU 100% load in your screenshot is because fahclient automatically gives one thread to feed the GPU. That can be confirmed by looking at Fahcontrol main tab, where it will show CPU slot with CPU:23. 24th thread is given to a GPU. Now, I would suggest setting that CPU slot to CPU:21, you will not lose any CPU performance, but might gain a bit in GPU performance.
Now, the reason you are not seeing your GPU used 100% has absolutely nothing to do with your CPU.
The reason is simple:
AMD OpenCL implementation if catastrophically bad.
There are plans on improving this drastically, but even then you might see drops in utilisation, due to the nature of simulations, but things will improve performance wise more than double (if early tests to be believed).
So to summarise: any decently modern CPU is enough to feed any modern GPU, let alone old GPU like your Nano
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
Sorry, you're wrong. I can run Einstein and Milkyway on one GPU for example. Two completely different programs. Just like with a CPU, they get some of the cores each, they share nicely. I can even play a game at the same time.
If you don't believe me, it's easy to find discussions over in Boinc projects about making the most of a GPU. This for example is part of a config file I use in Milkyway:
<gpu_versions>
<gpu_usage>0.250000</gpu_usage>
<cpu_usage>0.375000</cpu_usage>
</gpu_versions>
It tells it to run four tasks at once on the GPU, and each will require 3/8ths of a CPU core to help it out.
The default in Folding is to leave one CPU per GPU. But it's a guess. It depends how fast the CPU and GPU are relative to each other, Folding doesn't account for that itself. I have one machine which is running 5 GPUs on folding, but it only needs 1 core to support them all. So I'm running the other cores on Boinc (because Folding refuses). I have another machine, the one pictured above, which does things other than science. It's the computer I'm using right now to type this, it runs two security cameras, etc. So the default in this case is wrong the other way. I need to turn down the number of cores for the Folding CPU task, to make the GPU reasonable. But I can't make them both flat out at once like I can in Boinc, because Folding won't allow two tasks to run on the GPU at once. It's a Folding limitation, not a GPU limitation. If the GPU is fast enough to need more than one CPU core to help it, it's going to be sat idle unless you can give it another task to do at the same time.
If you don't believe me, it's easy to find discussions over in Boinc projects about making the most of a GPU. This for example is part of a config file I use in Milkyway:
<gpu_versions>
<gpu_usage>0.250000</gpu_usage>
<cpu_usage>0.375000</cpu_usage>
</gpu_versions>
It tells it to run four tasks at once on the GPU, and each will require 3/8ths of a CPU core to help it out.
The default in Folding is to leave one CPU per GPU. But it's a guess. It depends how fast the CPU and GPU are relative to each other, Folding doesn't account for that itself. I have one machine which is running 5 GPUs on folding, but it only needs 1 core to support them all. So I'm running the other cores on Boinc (because Folding refuses). I have another machine, the one pictured above, which does things other than science. It's the computer I'm using right now to type this, it runs two security cameras, etc. So the default in this case is wrong the other way. I need to turn down the number of cores for the Folding CPU task, to make the GPU reasonable. But I can't make them both flat out at once like I can in Boinc, because Folding won't allow two tasks to run on the GPU at once. It's a Folding limitation, not a GPU limitation. If the GPU is fast enough to need more than one CPU core to help it, it's going to be sat idle unless you can give it another task to do at the same time.
-
- Posts: 2121
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: Fast GPU, not enough CPU power to keep up?
Since there is no documentation on how BOINC is managing to overcome driver limitations, let's shelve that discussion to the side.
I can also play games while folding, but believe me, during that period there is no science being done, a lot of times things just crash completely due to GPU drivers not supporting context switching.
Default in F@H is to leave one thread to a GPU, and it is not a guess. The reason GPU requires that thread in F@H is due to driver overhead and every now and then F@H GPU work is double checked by CPU. GPU does not care about CPU's performance in relation to itself. As long as CPU is decently recent, GPU doesn't give a rat's a**. You will not make both of them run flat out, because CPU already gives one CPU core to GPU, and GPU does not use that core fully, and due to driver overhead and various other reasons, GPU load fluctuates. I hope you understand the concept of different workload.
The reason folding does not allow 2 or more tasks on a GPU is because it follows driver support model. If context switching was possible on desktop GPUs, openmm would have added ability to do more than one workload on single GPU. Heck even when server GPUs support context switching, openmm did not think it was worth adding support for multiple processes on same GPU.
There is no GPU supported by FAH which is running idle. 80-95% GPU load is typical usage, and that is fine, and it is not caused by "slow" cpus
I can also play games while folding, but believe me, during that period there is no science being done, a lot of times things just crash completely due to GPU drivers not supporting context switching.
Default in F@H is to leave one thread to a GPU, and it is not a guess. The reason GPU requires that thread in F@H is due to driver overhead and every now and then F@H GPU work is double checked by CPU. GPU does not care about CPU's performance in relation to itself. As long as CPU is decently recent, GPU doesn't give a rat's a**. You will not make both of them run flat out, because CPU already gives one CPU core to GPU, and GPU does not use that core fully, and due to driver overhead and various other reasons, GPU load fluctuates. I hope you understand the concept of different workload.
The reason folding does not allow 2 or more tasks on a GPU is because it follows driver support model. If context switching was possible on desktop GPUs, openmm would have added ability to do more than one workload on single GPU. Heck even when server GPUs support context switching, openmm did not think it was worth adding support for multiple processes on same GPU.
There is no GPU supported by FAH which is running idle. 80-95% GPU load is typical usage, and that is fine, and it is not caused by "slow" cpus
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
Boinc does nothing clever, it's a very basic scheduler. It simply starts and stops other applications. If I say to run 4 per card, it will just start 4 seperate instances. If I happen to have a load of Milkyway and a load of Einstein tasks queued up, it may well pick 2 of each. Nothing crashes. A card is perfectly capable of multitasking, precisely like a CPU does. I'm quite surprised to hear you think it can't since I've never known anyone say that. Boinc programs on AMD use OpenCL. On Nvidea they use Cuda. Both work perfectly fine with multiple and different apps on the GPU simultaneously. I'm not making it up, just go look in the Einstein forum where they discuss how to push your GPU to the max. If folding has a problem it's because OpenMM is different, I'm not familiar with that, is it similar to OpenCL?
User A could have a fast GPU and a slow CPU. User B could have a slow GPU and a fast CPU.
I'm user B with one of my machines, where 1 CPU core is perfectly capable of servicing 5 GPUs. The calls to the CPU are not that often as the GPUs are slow. Therefore I'm currently running Boinc on all the remaining cores, leaving only one free, and the GPUs do not slow down one bit, they sit from 95% to 98%.
User A would have a problem, the reason I started this thread. You can get phenominally fast GPUs now, like the Nvidias that do 80Tflops per second. CPUs are also fast, but not per core. 1 core cannot possibly feed one of those cards fast enough. But if I was to run two folding instances on the same card, each instance could call on a seperate CPU core. Now it keeps up.
80% is not fine, I could be doing 25% more science.
User A could have a fast GPU and a slow CPU. User B could have a slow GPU and a fast CPU.
I'm user B with one of my machines, where 1 CPU core is perfectly capable of servicing 5 GPUs. The calls to the CPU are not that often as the GPUs are slow. Therefore I'm currently running Boinc on all the remaining cores, leaving only one free, and the GPUs do not slow down one bit, they sit from 95% to 98%.
User A would have a problem, the reason I started this thread. You can get phenominally fast GPUs now, like the Nvidias that do 80Tflops per second. CPUs are also fast, but not per core. 1 core cannot possibly feed one of those cards fast enough. But if I was to run two folding instances on the same card, each instance could call on a seperate CPU core. Now it keeps up.
80% is not fine, I could be doing 25% more science.
-
- Site Moderator
- Posts: 6497
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Fast GPU, not enough CPU power to keep up?
You are just exploiting the fact that most BOINC projects have poorly optimised applications for GPUs that still have (sometimes a lot of) operations that can't be done on the GPU ... they are usually unable to fill the command queue (which is what you are looking at with GPU load information) while waiting from CPU operations to finish. Running multiple BOINC WUs is just filling the blanks of one WU with another when waiting for those CPU operations to finish.
Fahcore are already highly optimised and you could feed a fast GPU with a slow CPU core ... (Spoiler : some people are feeding multiple GPUs with low end dual core CPUs in former systems designed for minig)
Fahcore are already highly optimised and you could feed a fast GPU with a slow CPU core ... (Spoiler : some people are feeding multiple GPUs with low end dual core CPUs in former systems designed for minig)
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
Folding may be better optimised than Boinc, but it's not good enough to fill the GPU, depending on the relative speeds.
Folding also needs more communication between the GPU and the CPU, so using mining risers like I do can cause a bottleneck. For example I have a machine with only one single lane PCI Express V2 slot! A very basic cheap Packard Bell small case. With Boinc, I plugged in a 4 way multiplexing riser into that socket, and connected 4 GPUs (it has a quad core CPU, so they can have a core each if necessary). That worked fine, but with folding they slowed down because they were sharing the bandwidth to the motherboard.
Folding also needs more communication between the GPU and the CPU, so using mining risers like I do can cause a bottleneck. For example I have a machine with only one single lane PCI Express V2 slot! A very basic cheap Packard Bell small case. With Boinc, I plugged in a 4 way multiplexing riser into that socket, and connected 4 GPUs (it has a quad core CPU, so they can have a core each if necessary). That worked fine, but with folding they slowed down because they were sharing the bandwidth to the motherboard.
-
- Posts: 2121
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580 - Location: London
- Contact:
Re: Fast GPU, not enough CPU power to keep up?
You realise that communication between CPU and GPU goes through the pcie slot? You are yourself telling us that your CPU is being bottlenecked by pcie slot. You are trying to push 2-3-4GB of data through 500MB/s link split in 4 ways. It would be faster if you hand fed your GPUs with data rather than use pcie interface. I think this thread went long enough.
-
- Posts: 370
- Joined: Wed Feb 16, 2022 1:18 am
- Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers. - Location: Scotland
Re: Fast GPU, not enough CPU power to keep up?
I've eliminated all those 4 way adapters, as I now have more computers, especially one with 12 slots (a mining MB). Boinc was fine with the adapters, Folding is also happy with 2 or 3 GPUs per adapter. The adapters themselves are only using one lane at v2, no matter what the MB will do. Although I do have (not plugged in just now) a fancy adapter I was given which uses 4 lanes at v3 to connect 8 GPUs.
It would be better if Folding stored the data on the GPU, but perhaps the computation makes this impossible. A lot of Boinc projects will load a few GB of data onto the GPU at the start, and it can refer to that much faster than having to get it from main RAM.
It would be better if Folding stored the data on the GPU, but perhaps the computation makes this impossible. A lot of Boinc projects will load a few GB of data onto the GPU at the start, and it can refer to that much faster than having to get it from main RAM.