Folding Forum

Posted: **Wed Jul 01, 2020 5:42 am**

A GPU's speed comes from both the basic clock rate and the number of parallel operations that the project can produce. if the gpU has, say, 5000 shaders, it takes the gpu the same about of time to perform 5000 floating point operations as it does to perform one floating point operation. If the protein has very few atoms, then the problem cannot be structured to perform a lot of operations in parallel. Many cpus can perform 32 or 64 floating point operations in parallel with SSe or AvX and clock rate still matters. All FAHCores still have to deal with a percentage of operations that are serial in nature.

Posted: **Wed Jul 01, 2020 7:53 am**

Just for comparison, so people can get an idea of how much impact difference very low to somewhat high atom count matters in GPU systems.

Constant CPU load with the same GPU and hardware:

P14251 – 371k Atoms

P11761 – 62k Atoms

P13415 – 4k Atoms

You might spot the difference.

Posted: **Wed Jul 01, 2020 2:24 pm**

@sparky: The real work is being done on your GPU, not the CPU. Those reports are showing the activity on the CPU which is strictly doing a support role, not where the GPU computations are being shown.

Posted: **Wed Jul 01, 2020 3:51 pm**

bruce wrote:@sparky: The real work is being done on your GPU, not the CPU. Those reports are showing the activity on the CPU which is strictly doing a support role, not where the GPU computations are being shown.

Should be rather obvious from my different posts that I am perfectly aware of this, but the point here was to show the impact on the CPU in that support role, when handling a very low atom count WU vs a higher atom count WU.

The GPU activity on the very low atom count WUs are basically negligible, and hardly even peak most of the time, since the GPU spends more time on waiting for work from the CPU than actually doing work.

This thing impacts a users system in a way that is visible to the user, being it slower response from their Excel sheets or whatever, something that is bad practice, if you want to keep the free donors around and minimize the likelihood of them just turning the client off, or just uninstalling it.

But by all means keep sending very low atom count WUs to GPU and see if the active donor count can be diminished even further, since the calculation capacity in the network has only dropped by a tiny bit over 40% so far in the last month or so.

Posted: **Fri Jul 03, 2020 3:23 pm**

Good job on the upgrades to the programming of the core from v0.0.10 to v0.0.11, since multiple low atom count WUs are now handled significantly better due to it, on top of the overall speed increase to everything.

From the G4400 (2 core 2 thread) setup:

v0.0.11 - P13417 – 4k Atoms

v0.0.10 – P13415 - 4k Atoms

Posted: **Fri Jul 03, 2020 3:48 pm**

Interesting. It means on a low atom count, and fast GPU, the CPU can become the bottleneck.

Posted: **Fri Jul 03, 2020 9:04 pm**

13417 is writing "Global context and integrator variables" 10x less often than 13415.

Code: Select all

19:46:25:WU01:FS01:0x22:Project: 13417 (Run 122, Clone 87, Gen 1)
19:46:25:WU01:FS01:0x22:Unit: 0x0000000312bc7d9a5efeb57be7135027
19:46:25:WU01:FS01:0x22:Reading tar file core.xml
19:46:25:WU01:FS01:0x22:Reading tar file integrator.xml
19:46:25:WU01:FS01:0x22:Reading tar file state.xml.bz2
19:46:25:WU01:FS01:0x22:Reading tar file system.xml.bz2
19:46:25:WU01:FS01:0x22:Digital signatures verified
19:46:25:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:46:25:WU01:FS01:0x22:Version 0.0.11
19:46:25:WU01:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
19:46:25:WU01:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:46:25:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
19:46:25:WU01:FS01:0x22:  Global context and integrator variables write interval: 2500 steps (0.25%) [400 total]

Code: Select all

07:03:14:WU00:FS01:0x22:Project: 13415 (Run 1624, Clone 19, Gen 1)
07:03:14:WU00:FS01:0x22:Unit: 0x0000000112bc7d9a5ef1ae9bf101f441
07:03:14:WU00:FS01:0x22:Reading tar file core.xml
07:03:14:WU00:FS01:0x22:Reading tar file integrator.xml
07:03:14:WU00:FS01:0x22:Reading tar file state.xml
07:03:14:WU00:FS01:0x22:Reading tar file system.xml
07:03:14:WU00:FS01:0x22:Digital signatures verified
07:03:14:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:03:14:WU00:FS01:0x22:Version 0.0.10
07:03:14:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
07:03:14:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
07:03:14:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
07:03:14:WU00:FS01:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]

Posted: **Sat Jul 04, 2020 1:05 am**

I suspect that the difference is more attributable to the change from project 13415 to 13417 than the change from core22 v0.0.10 to v0.0.11. I'm not suggesting that there's anything wrong with the core update ... that's a good thing, but project 13417 is based in a number of things learned in projet 13415.

Posted: **Sat Jul 04, 2020 7:48 am**

bruce wrote:I suspect that the difference is more attributable to the change from project 13415 to 13417 than the change from core22 v0.0.10 to v0.0.11. I'm not suggesting that there's anything wrong with the core update ... that's a good thing, but project 13417 is based in a number of things learned in projet 13415.

Well, as pointed out by _r2w_ben, seeing as the “write interval” for the 13417 is 10x less than for the 13415, that will remove a lot of overhead, when handling the small WUs, compared to the larger ones, that do not have this change, but since other projects look faster too, not only the 134xx ones, something was changed in the v0.0.11 for the better for everything.

Posted: **Sat Jul 04, 2020 4:21 pm**

I wonder if the same update interval is used (in percent) to large WUs (transferring larger amounts of data packets over PCIE) than with small WUs?
Eg: if the program is set up to upload x-amount of updates per WU to the GPU, rather than controlling the size of the packets to be similar; which would result in the same data being transferred but in less transactions on small atom count WUs.

I'm fairly sure it might be an easy thing to do, to send larger data packets to the GPU on small atom count WUs.

Folding Forum

CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU