Page 1 of 1
More CPU = slower
Posted: Sun Apr 26, 2020 12:30 am
by GunnarB_Hamburg
I have a work package for CPU (project 14534 Unit 0x00000001cedfaa925ea34b233aef4e3b) running on an Intel 8700-K with 6 cores/12 threads.
I needed CPU for myself and reduced from then 12 cpu to 4 cpu usage. This did not result in significant longer calculation time, which spiked my interest.
I reduced it then to only 1 CPU and got a TPF of about 5:30. I then double to 2 CPU and again to 4 CPU. The TPF sank to 1:50, which is to be expected.
I then added 2 more CPU, but then the TPF rose to 2:00 (GPU also running and taking CPU), which I find odd. And no, the CPU is not throttling. Going back to 4 CPU it sank again to now 1:42.
Summary
1 CPU = 5:35 TPF with xx% total CPU load and 4,29 GHz.
4 CPU = 1:42 TPF with 41% total CPU load and 4,29 GHz.
6 CPU = 1:20 TPF with 61% total CPU load and 4,29 GHz. (no GPU calculation)
12 CPU/Threads = 1:33 TPF with 81% total CPU load and 4,31 GHz. (no GPU calculation)
4 CPU = 1: TPF with 52% total CPU load and 4,32 GHz. (with GPU calculation on another package)
6 CPU = 1:53 TPF with 71% total CPU load and 4,29 GHz. (with GPU calculation on another package)
Since this is using a lot more watts, why is it not faster?
I believe that the CPU is doing more paging and can not use the cache as efficiently.
Re: More CPU = slower
Posted: Sun Apr 26, 2020 1:36 am
by _r2w_ben
Scaling is a challenge for parallel processing. As the number of threads increases, so does the time spent communicating and waiting to synchronize.
If the operating system is scheduling well, 6 threads would each end up on a physical core. Beyond that, threads will be sharing resources on physical cores. HT generally improves FAH performance.
Deep in the FAH work folder, there should be a file named md.log. Can you find this section and post it? This will give a bit more information about the characteristics of this project.
Code: Select all
Initializing Domain Decomposition on 12 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
two-body bonded interactions: 0.415 nm, LJ-14, atoms 974 982
multi-body bonded interactions: 0.415 nm, Proper Dih., atoms 974 982
Minimum cell size due to bonded interactions: 0.457 nm
Maximum distance for 7 constraints, at 120 deg. angles, all-trans: 1.166 nm
Estimated maximum distance required for P-LINCS: 1.166 nm
This distance will limit the DD cell size, you can override this with -rcon
Using 0 separate PME ranks, as there are too few total
ranks for efficient splitting
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 12 cells with a minimum initial size of 1.457 nm
The maximum allowed number of cells is: X 3 Y 3 Z 3
Domain decomposition grid 2 x 3 x 2, separate PME ranks 0
PME domain decomposition: 2 x 6 x 1
Re: More CPU = slower
Posted: Sun Apr 26, 2020 1:55 am
by PantherX
Welcome to the F@H Forum GunnarB_Hamburg,
Out of curiosity, did you wait at least 3% between each CPU value? Reason is that it takes a bit of time for the value pf TPF to change when the CPU value has changed.
Re: More CPU = slower
Posted: Sun Apr 26, 2020 2:07 am
by GunnarB_Hamburg
The work package is done, so md.log is gone. I am starting a new package and then will wait 3%. Then I can post again.
Unfortunately I got a package, which does not use more than 4 cores. Need to wait for next package
Re: More CPU = slower
Posted: Sun Apr 26, 2020 2:19 am
by PantherX
The WU (Work Unit) has a maximum number of CPU which is determined when the CPU Slot contacts the AS (Assignment Server) to get WU. If you had 4 CPUs assigned to it initially but changed it to increase it, it won't take effect until the next WU is downloaded. You can always decrease the number of CPUs when folding a WU but not increase it beyond the value that it was downloaded at.
Re: More CPU = slower
Posted: Sun Apr 26, 2020 4:16 am
by MeeLee
Keep an eye out on CPU frequency and temperature, and the balance between the two.
The only reason higher thread count slows down a Wu, is when the temperatures force boost speeds to go down.
Re: More CPU = slower
Posted: Sun Apr 26, 2020 6:17 pm
by GunnarB_Hamburg
MeeLee, what you say is just totally wrong. There is always more than one reason.
Also, if you had read my post, you would have seen, that I checked that.
Re: More CPU = slower
Posted: Sun Apr 26, 2020 6:19 pm
by GunnarB_Hamburg
More measurements:
Summary (all around 4,3 GHz, no GPU calculation, CPU load is of windows task-manager, values in brackets from Intel Extreme Tuning Tool)
2 CPU = 5:54 TPF with 22% total CPU load (19% CPU, 60°C, 49W, no throttling)
4 CPU = 3:02 TPF with 41% total CPU load (34% CPU, 69°C, 72W, no throttling)
6 CPU = 2:10 TPF with 61% total CPU load (51% CPU, 72°C, 97W, no throttling)
12 CPU/Threads = 2:10 TPF with 98% total CPU load (86% CPU, 80°C, 103W, no throttling)
with GPU (different package parallel)
4 CPU = 3:06 TPF with 41% total CPU load (43% CPU, 75°C, 75W, no throttling)
6 CPU = 3:26 TPF with 72% total CPU load (60% CPU, 80°C, 82W, no throttling)
11 CPU/Threads = 2:12 TPF with 100% total CPU load (93% CPU, 86°C, 100W, no throttling)
Here I used 11 CPU because this is the automatic selection.
The impact on the GPU calculation time was nearly none/not measurable.
If one runs only CPU, then there does not seem to be an issue. If also GPU is working on a package, there is a slowdown. I still believe, it has to do with different packages using the same first, second and third level cache.