Running CPU + GPU results in lower PPD than GPU alone?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 18
- Joined: Thu Sep 03, 2020 1:12 am
Re: Running CPU + GPU results in lower PPD than GPU alone?
Both my CPU and GPU are thermally throttled on my laptop, so as an experiment, I selectively paused the CPU slot and PPD went down, and back up when resumed, so at least in my case, running both at the same time is best.
Re: Running CPU + GPU results in lower PPD than GPU alone?
How long did you pause the CPU? You need to let it run for a few percentages before the client's PPD estimate becomes reaonable. As in, when you pause the CPU, the estimate will immediately drop, but it won't see the speed gain in the GPU slot yet. It also depends on the atom count of the work unit, so a definitive answer might require some benchmarking across several work units - but just pausing the CPU for a 5% of GPU work is usually enough to get a feel for if your GPU is CPU limited.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: Running CPU + GPU results in lower PPD than GPU alone?
That's a hardware description, not a driver version number. Did you install drivers from https://www.nvidia.com/en-us/geforce/drivers/?gunnarre wrote:GPU: RTX 2070 Super Hybrid (Mobile)
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 18
- Joined: Thu Sep 03, 2020 1:12 am
Re: Running CPU + GPU results in lower PPD than GPU alone?
After your suggestion, I did a longer test and paused the CPU and let the GPU run alone for over 4 hours and complete 9.5% of the WU. During the pause I lost about 8,200 PPD from the CPU but only gained about 1,250 PPD on the GPU for a net loss of about 6,950. So for me the answer is to run both cores.gunnarre wrote:How long did you pause the CPU? You need to let it run for a few percentages before the client's PPD estimate becomes reaonable.
Thanks for the help.
Re: Running CPU + GPU results in lower PPD than GPU alone?
I'm glad to have found this thread as it had good relevant information.
On my Windows system I have a 6-core Xeon and a Quadra T2000 Mobile GPU. I have just started "playing" with the settings. On default, the CPU is allotted 10 cores (6x2=12-1 for GPU and -1 for getting to even threads/cores). In that config, the GPU would vary between 85k-110k per WU. The issue was on the CPU side, where it was getting 2k points per day, which is really really low.
I read that mobile GPUs are not typically designed for or expect 24/7 usage, which I understand. So I am going to experiment with CPU-only folding after the current WU finishes. I am curious if there is any way to manually throttle the GPU or control how many of the GPU cores are being used, like we can with CPUs? I have not found out how to do this in any advanced settings.
On my Windows system I have a 6-core Xeon and a Quadra T2000 Mobile GPU. I have just started "playing" with the settings. On default, the CPU is allotted 10 cores (6x2=12-1 for GPU and -1 for getting to even threads/cores). In that config, the GPU would vary between 85k-110k per WU. The issue was on the CPU side, where it was getting 2k points per day, which is really really low.
I read that mobile GPUs are not typically designed for or expect 24/7 usage, which I understand. So I am going to experiment with CPU-only folding after the current WU finishes. I am curious if there is any way to manually throttle the GPU or control how many of the GPU cores are being used, like we can with CPUs? I have not found out how to do this in any advanced settings.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Running CPU + GPU results in lower PPD than GPU alone?
CPUs get low ppd ... tbh your GPU even though not a fast one will deliver more science than the CPU ... letting the CPU tick over and produce what it can witnout impacting/reducing the G{U is probably best ... please note that there is an old issue where the first CPU WU completed on an install is only one core - you need to check your log to ensure that your 2k ppd was on a full 10 core slot not a single core.
This isn't t say CPU is less important than GPU (note I fold mainly CPU) it just doesn't process science as fast as GPU.
This isn't t say CPU is less important than GPU (note I fold mainly CPU) it just doesn't process science as fast as GPU.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Running CPU + GPU results in lower PPD than GPU alone?
belloq wrote:On my Windows system I have a 6-core Xeon and a Quadra T2000 Mobile GPU. I have just started "playing" with the settings. On default, the CPU is allotted 10 cores (6x2=12-1 for GPU and -1 for getting to even threads/cores). In that config, the GPU would vary between 85k-110k per WU. The issue was on the CPU side, where it was getting 2k points per day, which is really really low.
GPUs do not run on their own OS so managment is rather limited. In effect, the GPU is being used as a math coprocessor and it's either ON or OFF. There are a few useful settings in the drivers but they're rather limited. You'll find several discussions in this forum on adjusting the power limit on the GPU.I read that mobile GPUs are not typically designed for or expect 24/7 usage, which I understand. So I am going to experiment with CPU-only folding after the current WU finishes. I am curious if there is any way to manually throttle the GPU or control how many of the GPU cores are being used, like we can with CPUs? I have not found out how to do this in any advanced settings.
Consider the fact that your CPU runs 10 parallel threads and your Quadro T2000 Mobile has 1024 individual cores which can run in parallel. I'm not suggesting that they're one-for-one equivalent, but it does suggest that the GPU can do a lot more processing than the CPU. Many GPUs do run 24x7. I'm not intimately familiar with your Quadro, but a lot depends on the effectiveness of your cooling subsystem.
While you're "playing" with CPU settings consider that FAHCore_a7, will partition real space into a 3D grid of 5x2x1 blocks to be processed in parallel. Depending on the protein, you may find that 9 threads (=3x3x1) works effectively, too. Your CPU will use one thread so transfer data to/from main RAM and the GPU and one thread will be available to run your OS and any foreground tasks you may invoke.
FAHCore_a8 is nearing full release and it does things somewhat differently. We're still learning about it.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Running CPU + GPU results in lower PPD than GPU alone?
Using very few cores on my CPU may be what is best. Based on the last check of CPU folding alone, I can only get up to 5k-6k PPD. On a 6-core Xeon that's pretty low, but I think the thermal throttling is coming in to play. This was also with the GPU slot paused.Neil-B wrote:CPUs get low ppd ... tbh your GPU even though not a fast one will deliver more science than the CPU ... letting the CPU tick over and produce what it can witnout impacting/reducing the G{U is probably best ... please note that there is an old issue where the first CPU WU completed on an install is only one core - you need to check your log to ensure that your 2k ppd was on a full 10 core slot not a single core.
Thanks Bruce. Though I am not following the math here. 5x2x1 = 10 right? Why would I want to use 3x3x1 threads? I thought odd-numbered threads was not ideal? If I am running CPU+GPU, that's 9 threads for CPU, +1 for GPU/RAM, +1 for OS/foreground = 11. That leaves an "unused" (not really, the OS will use it of course) thread.bruce wrote:While you're "playing" with CPU settings consider that FAHCore_a7, will partition real space into a 3D grid of 5x2x1 blocks to be processed in parallel. Depending on the protein, you may find that 9 threads (=3x3x1) works effectively, too. Your CPU will use one thread so transfer data to/from main RAM and the GPU and one thread will be available to run your OS and any foreground tasks you may invoke.
FAHCore_a8 is nearing full release and it does things somewhat differently. We're still learning about it.
I've received some a8 projects on all my machines in the last few months, and I feel like they, for whatever reason, have a higher PPD.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Running CPU + GPU results in lower PPD than GPU alone?
5 can cause issues as it is nearly a big prime (with 7 counting as big) .. if you search for big primes you should find threads where JimboPalmer explains clearer than I can .. this applies to a7 core projects as I believe a8 takes a different approach
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Running CPU + GPU results in lower PPD than GPU alone?
A8 core can use cpus better ... search for avx should pick up topics that discuss this
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Site Admin
- Posts: 7927
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Running CPU + GPU results in lower PPD than GPU alone?
Early on the limits on thread counts were expressed in a way that implied odd-numbered counts are not good. I don't recall the source, and it may have been more of a rule of thumb. A better way of stating the issue is that the domain decomposition has issues with factors being "large primes" or their multiples. Initial limits kept the folding core from using factors that were primes 11 or larger or their multiples.
Later on a number of projects were run that had issues with 7 and its multiples, and then some were identified that had problems with multiples of 5. Not all projects have these problems, those that are large enough can work fine using them. They do try to determine during internal testing which need limits on 5, and they don't bother with 7 usually.
Complicating things a bit for systems that have support for more than about 18-20 threads is that by the current default Core_A7 will start allocating some threads to do separate PME calculations. Another poster here has delved into the underlying Gromacs code and analyzed how it works on some different projects and posted here and collaborated with the researchers on better settings for projects. As an example some projects work well on a thread count of 21. That is not decomposed as 3x7x1, but more likely as 3x3x2 with 3 threads assigned to PME or 2x2x4 and 5 PME threads.
Later on a number of projects were run that had issues with 7 and its multiples, and then some were identified that had problems with multiples of 5. Not all projects have these problems, those that are large enough can work fine using them. They do try to determine during internal testing which need limits on 5, and they don't bother with 7 usually.
Complicating things a bit for systems that have support for more than about 18-20 threads is that by the current default Core_A7 will start allocating some threads to do separate PME calculations. Another poster here has delved into the underlying Gromacs code and analyzed how it works on some different projects and posted here and collaborated with the researchers on better settings for projects. As an example some projects work well on a thread count of 21. That is not decomposed as 3x7x1, but more likely as 3x3x2 with 3 threads assigned to PME or 2x2x4 and 5 PME threads.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: Running CPU + GPU results in lower PPD than GPU alone?
Thanks for testing, then accepting feedback, then testing again more robustly. This is helpful information and I appreciate it.MajorCaliber wrote:After your suggestion, I did a longer test and paused the CPU and let the GPU run alone for over 4 hours and complete 9.5% of the WU. During the pause I lost about 8,200 PPD from the CPU but only gained about 1,250 PPD on the GPU for a net loss of about 6,950. So for me the answer is to run both cores.gunnarre wrote:How long did you pause the CPU? You need to let it run for a few percentages before the client's PPD estimate becomes reaonable.
Thanks for the help.
Out of curiosity, how many threads do you have going on your CPU client and how many cores are available?