Page 1 of 1
GPU utilization drops regularily
Posted: Mon Feb 20, 2017 3:38 pm
by foldinghomealone
I'm using 7.4.4 and a GTX 1070 on Win10.
There are utilization drops which occur very often, like every few minutes. Is there a reason behind this?
I worry a bit about steep temp gradients which appear every few minutes on the GPU.
Is there a way I can stop this or at least make it happen less frequent? Like a config setting?
In case those drops are necessary - maybe because CPU only is used for some checkup - and it is not possible to continue GPU calculation during those checkups, I would rather prefer that you calculate some garbage with the GPU meanwhile to keep GPU utilization up and therefore temps at the same level.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 5:43 pm
by bruce
There are two possible things that happen periodically:
* A FAHCore suspends processing long enough to write checkpoint data to disk (via whatever cache you have)
* A GPU FAHCore does a sanity check periodically to make sure the simulation hasn't encountered certain types of errors. It uses the CPU to check the results produced up to that point by the GPU.
For GPU cores (at least), the frequency of these events is set by the scientist in the configuration of the WU.
In my experience, they often seem to happen at the same time, but I'm not sure that's true in all cases.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 7:05 pm
by foldinghomealone
Thanks again for your fast answer.
I doubt it is the checkpoint data which causes the GPU to stop for a 'long' time like this. I use an SSD and a few MB should be written very fast.
Why can't sanity checks be done in parallel to GPU folding? Why GPU folding has to be stopped?
I would be happy if sanity checks could be done in parallel to GPU folding, then I wouldn't need to worry about my GPU and processing of WUs would be even faster.
I haven't measured but I guess that the processing time of a medium sized WU could be reduced by a few minutes which would result in higher overall GPU yields.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 7:11 pm
by 7im
GPU usage charts I have seen are square saw-toothed shape. Nature of the beast. How long is "long"? and how often? Except for the checkpoints every 5 frames, the temp shouldn't fluctuate much. What degree of temp changes are you seeing?
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 9:19 pm
by foldy
foldinghomealone wrote:I would rather prefer that you calculate some garbage with the GPU meanwhile to keep GPU utilization up and therefore temps at the same level.
I don't get it, what is the problem when GPU temps do not stay at the full load level during checkpoints?
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 9:24 pm
by ComputerGenie
If you mean something like this:
That's perfectly normal.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 9:48 pm
by foldinghomealone
ComputerGenie wrote:That's perfectly normal.
That's a perfect waste of time
Currently I'm folding Project 9151 (7, 21, 607). Utilization drops every 4 frames for about 5 sec. Temp reduces about 14-18K.
https://ibb.co/d4Eu6F
On other WUs I can hear every time when a checkpoint is reached because temps drop much further and therefore the fans almost stop.
I would prefer constant temps for durability of GPU.
And I would prefer that whatever causes the utilization drops to do it parallel to CPU computing.
5secs every 4 frames means that the WU takes 2mins longer or around 2% than (in my opinion) necessary.
What is the reason to stop GPU computing to write checkpoints or make sanity checks?
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 9:50 pm
by ComputerGenie
foldinghomealone wrote:ComputerGenie wrote:That's perfectly normal.
That's a perfect waste of time
...
As you can see, every project is going to be different. Just relax and let it do what it does.
P.S. - if you prefer the software to act differently than it's designed to act, then, perhaps, you should get on the team and get involved in a rewrite.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 9:56 pm
by ComputerGenie
foldinghomealone wrote:...I would prefer constant temps for durability of GPU...
That statement doesn't match reality. Permanently sustained high temps
lower durability and longevity.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 10:02 pm
by foldinghomealone
ComputerGenie wrote:foldinghomealone wrote:...I would prefer constant temps for durability of GPU...
That statement doesn't match reality. Permanently sustained high temps
lower durability and longevity.
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 10:04 pm
by foldinghomealone
ComputerGenie wrote:P.S. - if you prefer the software to act differently than it's designed to act, then, perhaps, you should get on the team and get involved in a rewrite.
I don't demand things but I see room for optimization
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 10:19 pm
by bruce
There's absolutely nothing unusual about the first and third GPU. It's perfectly reasonable to assume that the dips you see are repeated many times at equal intervals but outside of the field of view.
In the middle image, a periodic pattern is harder to discern. There is no reason to be bothered by a variatino in the height/width of individual pulses. If the cache is mostly empty, it will look different that if the cache i mostly full of data that needs to sync to disk. (i.e.- the third pulse from the left compared to the first two,)
What's important here is that the total time each GPU is waiting on the HardDisk adds up to almost nothing . (That's why running FAH on a SSD is only a little bit faster than running it on a HD.)
Re: GPU utilization drops regularily
Posted: Mon Feb 20, 2017 10:35 pm
by foldinghomealone
bruce wrote:What's important here is that the total time each GPU is waiting on the HardDisk adds up to almost nothing . (That's why running FAH on a SSD is only a little bit faster than running it on a HD.)
Still, each time almost nothing adds up to a >2% longer processing time for each WU.
For sure, 2% performance increase is nothing compared to waiting for next GPU generation.
Re: GPU utilization drops regularily
Posted: Tue Feb 21, 2017 4:46 pm
by Joe_H
foldinghomealone wrote:
What is the reason to stop GPU computing to write checkpoints or make sanity checks?
To write a checkpoint, the data structures describing the current state of the WU being processed that is being written out needs to be in a consistent and static state. Continuing to compute would not allow that. My assumption is that the calculations for the sanity checks needs that same static, consistent set of data structures.
As for the necessity of doing sanity checks, that was found to be needed with the computational results on consumer level GPU cards. They are not optimized for stable numerical calculations like the "Pro" series of cards sold explicitly for that purpose.