Page 5 of 5
Re: CPU Cores/Threads vs GPU
Posted: Wed Jul 01, 2020 5:42 am
by bruce
A GPU's speed comes from both the basic clock rate and the number of parallel operations that the project can produce. if the gpU has, say, 5000 shaders, it takes the gpu the same about of time to perform 5000 floating point operations as it does to perform one floating point operation. If the protein has very few atoms, then the problem cannot be structured to perform a lot of operations in parallel. Many cpus can perform 32 or 64 floating point operations in parallel with SSe or AvX and clock rate still matters. All FAHCores still have to deal with a percentage of operations that are serial in nature.
Re: CPU Cores/Threads vs GPU
Posted: Wed Jul 01, 2020 7:53 am
by Sparkly
Just for comparison, so people can get an idea of how much impact difference very low to somewhat high atom count matters in GPU systems.
Constant CPU load with the same GPU and hardware:
P14251 – 371k Atoms
P11761 – 62k Atoms
P13415 – 4k Atoms
You might spot the difference.
Re: CPU Cores/Threads vs GPU
Posted: Wed Jul 01, 2020 2:24 pm
by bruce
@sparky: The real work is being done on your GPU, not the CPU. Those reports are showing the activity on the CPU which is strictly doing a support role, not where the GPU computations are being shown.
Re: CPU Cores/Threads vs GPU
Posted: Wed Jul 01, 2020 3:51 pm
by Sparkly
bruce wrote:@sparky: The real work is being done on your GPU, not the CPU. Those reports are showing the activity on the CPU which is strictly doing a support role, not where the GPU computations are being shown.
Should be rather obvious from my different posts that I am perfectly aware of this, but the point here was to show the impact on the CPU in that support role, when handling a very low atom count WU vs a higher atom count WU.
The GPU activity on the very low atom count WUs are basically negligible, and hardly even peak most of the time, since the GPU spends more time on waiting for work from the CPU than actually doing work.
This thing impacts a users system in a way that is visible to the user, being it slower response from their Excel sheets or whatever, something that is bad practice, if you want to keep the free donors around and minimize the likelihood of them just turning the client off, or just uninstalling it.
But by all means keep sending very low atom count WUs to GPU and see if the active donor count can be diminished even further, since the calculation capacity in the network has only dropped by a tiny bit over 40% so far in the last month or so.
Re: CPU Cores/Threads vs GPU
Posted: Fri Jul 03, 2020 3:23 pm
by Sparkly
Good job on the upgrades to the programming of the core from v0.0.10 to v0.0.11, since multiple low atom count WUs are now handled significantly better due to it, on top of the overall speed increase to everything.
From the G4400 (2 core 2 thread) setup:
v0.0.11 - P13417 – 4k Atoms
v0.0.10 – P13415 - 4k Atoms
Re: CPU Cores/Threads vs GPU
Posted: Fri Jul 03, 2020 3:48 pm
by MeeLee
Interesting. It means on a low atom count, and fast GPU, the CPU can become the bottleneck.
Re: CPU Cores/Threads vs GPU
Posted: Fri Jul 03, 2020 9:04 pm
by _r2w_ben
13417 is writing "Global context and integrator variables" 10x less often than 13415.
Code: Select all
19:46:25:WU01:FS01:0x22:Project: 13417 (Run 122, Clone 87, Gen 1)
19:46:25:WU01:FS01:0x22:Unit: 0x0000000312bc7d9a5efeb57be7135027
19:46:25:WU01:FS01:0x22:Reading tar file core.xml
19:46:25:WU01:FS01:0x22:Reading tar file integrator.xml
19:46:25:WU01:FS01:0x22:Reading tar file state.xml.bz2
19:46:25:WU01:FS01:0x22:Reading tar file system.xml.bz2
19:46:25:WU01:FS01:0x22:Digital signatures verified
19:46:25:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:46:25:WU01:FS01:0x22:Version 0.0.11
19:46:25:WU01:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
19:46:25:WU01:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:46:25:WU01:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
19:46:25:WU01:FS01:0x22: Global context and integrator variables write interval: 2500 steps (0.25%) [400 total]
Code: Select all
07:03:14:WU00:FS01:0x22:Project: 13415 (Run 1624, Clone 19, Gen 1)
07:03:14:WU00:FS01:0x22:Unit: 0x0000000112bc7d9a5ef1ae9bf101f441
07:03:14:WU00:FS01:0x22:Reading tar file core.xml
07:03:14:WU00:FS01:0x22:Reading tar file integrator.xml
07:03:14:WU00:FS01:0x22:Reading tar file state.xml
07:03:14:WU00:FS01:0x22:Reading tar file system.xml
07:03:14:WU00:FS01:0x22:Digital signatures verified
07:03:14:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:03:14:WU00:FS01:0x22:Version 0.0.10
07:03:14:WU00:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
07:03:14:WU00:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
07:03:14:WU00:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
07:03:14:WU00:FS01:0x22: Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
Re: CPU Cores/Threads vs GPU
Posted: Sat Jul 04, 2020 1:05 am
by bruce
I suspect that the difference is more attributable to the change from project 13415 to 13417 than the change from core22 v0.0.10 to v0.0.11. I'm not suggesting that there's anything wrong with the core update ... that's a good thing, but project 13417 is based in a number of things learned in projet 13415.
Re: CPU Cores/Threads vs GPU
Posted: Sat Jul 04, 2020 7:48 am
by Sparkly
bruce wrote:I suspect that the difference is more attributable to the change from project 13415 to 13417 than the change from core22 v0.0.10 to v0.0.11. I'm not suggesting that there's anything wrong with the core update ... that's a good thing, but project 13417 is based in a number of things learned in projet 13415.
Well, as pointed out by
_r2w_ben, seeing as the “write interval” for the 13417 is 10x less than for the 13415, that will remove a lot of overhead, when handling the small WUs, compared to the larger ones, that do not have this change, but since other projects look faster too, not only the 134xx ones, something was changed in the v0.0.11 for the better for everything.
Re: CPU Cores/Threads vs GPU
Posted: Sat Jul 04, 2020 4:21 pm
by MeeLee
I wonder if the same update interval is used (in percent) to large WUs (transferring larger amounts of data packets over PCIE) than with small WUs?
Eg: if the program is set up to upload x-amount of updates per WU to the GPU, rather than controlling the size of the packets to be similar; which would result in the same data being transferred but in less transactions on small atom count WUs.
I'm fairly sure it might be an easy thing to do, to send larger data packets to the GPU on small atom count WUs.