Running CPU + GPU results in lower PPD than GPU alone?

Moderators: Site Moderators, FAHC Science Team

MajorCaliber
Posts: 18
Joined: Thu Sep 03, 2020 1:12 am

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by MajorCaliber »

Both my CPU and GPU are thermally throttled on my laptop, so as an experiment, I selectively paused the CPU slot and PPD went down, and back up when resumed, so at least in my case, running both at the same time is best.
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by gunnarre »

How long did you pause the CPU? You need to let it run for a few percentages before the client's PPD estimate becomes reaonable. As in, when you pause the CPU, the estimate will immediately drop, but it won't see the speed gain in the GPU slot yet. It also depends on the atom count of the work unit, so a definitive answer might require some benchmarking across several work units - but just pausing the CPU for a 5% of GPU work is usually enough to get a feel for if your GPU is CPU limited.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by bruce »

gunnarre wrote:GPU: RTX 2070 Super Hybrid (Mobile)
That's a hardware description, not a driver version number. Did you install drivers from https://www.nvidia.com/en-us/geforce/drivers/?
MajorCaliber
Posts: 18
Joined: Thu Sep 03, 2020 1:12 am

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by MajorCaliber »

gunnarre wrote:How long did you pause the CPU? You need to let it run for a few percentages before the client's PPD estimate becomes reaonable.
After your suggestion, I did a longer test and paused the CPU and let the GPU run alone for over 4 hours and complete 9.5% of the WU. During the pause I lost about 8,200 PPD from the CPU but only gained about 1,250 PPD on the GPU for a net loss of about 6,950. So for me the answer is to run both cores.

Thanks for the help.
belloq
Posts: 42
Joined: Thu Sep 24, 2020 12:58 pm

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by belloq »

I'm glad to have found this thread as it had good relevant information.

On my Windows system I have a 6-core Xeon and a Quadra T2000 Mobile GPU. I have just started "playing" with the settings. On default, the CPU is allotted 10 cores (6x2=12-1 for GPU and -1 for getting to even threads/cores). In that config, the GPU would vary between 85k-110k per WU. The issue was on the CPU side, where it was getting 2k points per day, which is really really low.

I read that mobile GPUs are not typically designed for or expect 24/7 usage, which I understand. So I am going to experiment with CPU-only folding after the current WU finishes. I am curious if there is any way to manually throttle the GPU or control how many of the GPU cores are being used, like we can with CPUs? I have not found out how to do this in any advanced settings.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by Neil-B »

CPUs get low ppd ... tbh your GPU even though not a fast one will deliver more science than the CPU ... letting the CPU tick over and produce what it can witnout impacting/reducing the G{U is probably best ... please note that there is an old issue where the first CPU WU completed on an install is only one core - you need to check your log to ensure that your 2k ppd was on a full 10 core slot not a single core.

This isn't t say CPU is less important than GPU (note I fold mainly CPU) it just doesn't process science as fast as GPU.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by bruce »

belloq wrote:On my Windows system I have a 6-core Xeon and a Quadra T2000 Mobile GPU. I have just started "playing" with the settings. On default, the CPU is allotted 10 cores (6x2=12-1 for GPU and -1 for getting to even threads/cores). In that config, the GPU would vary between 85k-110k per WU. The issue was on the CPU side, where it was getting 2k points per day, which is really really low.
I read that mobile GPUs are not typically designed for or expect 24/7 usage, which I understand. So I am going to experiment with CPU-only folding after the current WU finishes. I am curious if there is any way to manually throttle the GPU or control how many of the GPU cores are being used, like we can with CPUs? I have not found out how to do this in any advanced settings.
GPUs do not run on their own OS so managment is rather limited. In effect, the GPU is being used as a math coprocessor and it's either ON or OFF. There are a few useful settings in the drivers but they're rather limited. You'll find several discussions in this forum on adjusting the power limit on the GPU.

Consider the fact that your CPU runs 10 parallel threads and your Quadro T2000 Mobile has 1024 individual cores which can run in parallel. I'm not suggesting that they're one-for-one equivalent, but it does suggest that the GPU can do a lot more processing than the CPU. Many GPUs do run 24x7. I'm not intimately familiar with your Quadro, but a lot depends on the effectiveness of your cooling subsystem.

While you're "playing" with CPU settings consider that FAHCore_a7, will partition real space into a 3D grid of 5x2x1 blocks to be processed in parallel. Depending on the protein, you may find that 9 threads (=3x3x1) works effectively, too. Your CPU will use one thread so transfer data to/from main RAM and the GPU and one thread will be available to run your OS and any foreground tasks you may invoke.

FAHCore_a8 is nearing full release and it does things somewhat differently. We're still learning about it.
belloq
Posts: 42
Joined: Thu Sep 24, 2020 12:58 pm

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by belloq »

Neil-B wrote:CPUs get low ppd ... tbh your GPU even though not a fast one will deliver more science than the CPU ... letting the CPU tick over and produce what it can witnout impacting/reducing the G{U is probably best ... please note that there is an old issue where the first CPU WU completed on an install is only one core - you need to check your log to ensure that your 2k ppd was on a full 10 core slot not a single core.
Using very few cores on my CPU may be what is best. Based on the last check of CPU folding alone, I can only get up to 5k-6k PPD. On a 6-core Xeon that's pretty low, but I think the thermal throttling is coming in to play. This was also with the GPU slot paused.
bruce wrote:While you're "playing" with CPU settings consider that FAHCore_a7, will partition real space into a 3D grid of 5x2x1 blocks to be processed in parallel. Depending on the protein, you may find that 9 threads (=3x3x1) works effectively, too. Your CPU will use one thread so transfer data to/from main RAM and the GPU and one thread will be available to run your OS and any foreground tasks you may invoke.

FAHCore_a8 is nearing full release and it does things somewhat differently. We're still learning about it.
Thanks Bruce. Though I am not following the math here. 5x2x1 = 10 right? Why would I want to use 3x3x1 threads? I thought odd-numbered threads was not ideal? If I am running CPU+GPU, that's 9 threads for CPU, +1 for GPU/RAM, +1 for OS/foreground = 11. That leaves an "unused" (not really, the OS will use it of course) thread.

I've received some a8 projects on all my machines in the last few months, and I feel like they, for whatever reason, have a higher PPD.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by Neil-B »

5 can cause issues as it is nearly a big prime (with 7 counting as big) .. if you search for big primes you should find threads where JimboPalmer explains clearer than I can .. this applies to a7 core projects as I believe a8 takes a different approach
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by Neil-B »

A8 core can use cpus better ... search for avx should pick up topics that discuss this
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Joe_H
Site Admin
Posts: 7927
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by Joe_H »

Early on the limits on thread counts were expressed in a way that implied odd-numbered counts are not good. I don't recall the source, and it may have been more of a rule of thumb. A better way of stating the issue is that the domain decomposition has issues with factors being "large primes" or their multiples. Initial limits kept the folding core from using factors that were primes 11 or larger or their multiples.

Later on a number of projects were run that had issues with 7 and its multiples, and then some were identified that had problems with multiples of 5. Not all projects have these problems, those that are large enough can work fine using them. They do try to determine during internal testing which need limits on 5, and they don't bother with 7 usually.

Complicating things a bit for systems that have support for more than about 18-20 threads is that by the current default Core_A7 will start allocating some threads to do separate PME calculations. Another poster here has delved into the underlying Gromacs code and analyzed how it works on some different projects and posted here and collaborated with the researchers on better settings for projects. As an example some projects work well on a thread count of 21. That is not decomposed as 3x7x1, but more likely as 3x3x2 with 3 threads assigned to PME or 2x2x4 and 5 PME threads.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
joncrane
Posts: 10
Joined: Mon Oct 19, 2020 6:00 pm

Re: Running CPU + GPU results in lower PPD than GPU alone?

Post by joncrane »

MajorCaliber wrote:
gunnarre wrote:How long did you pause the CPU? You need to let it run for a few percentages before the client's PPD estimate becomes reaonable.
After your suggestion, I did a longer test and paused the CPU and let the GPU run alone for over 4 hours and complete 9.5% of the WU. During the pause I lost about 8,200 PPD from the CPU but only gained about 1,250 PPD on the GPU for a net loss of about 6,950. So for me the answer is to run both cores.

Thanks for the help.
Thanks for testing, then accepting feedback, then testing again more robustly. This is helpful information and I appreciate it.

Out of curiosity, how many threads do you have going on your CPU client and how many cores are available?
Post Reply