Page 2 of 2

Re: Core 21 causing severe GUI lag on Linux

Posted: Mon Oct 26, 2015 8:59 pm
by JohnChodera
Ah, nevermind. I see it *was* 0.0.12 in the log snippet.

Re: Core 21 causing severe GUI lag on Linux

Posted: Mon Oct 26, 2015 9:59 pm
by JohnChodera
We found and fixed a few memory leaks in earlier core 21 versions, so it is surprising we are still seeing leaks.

We have found an issue (since 0.0.12) where the simulation might encounter a `NaN` that is not detected until the next snapshot/checkpoint. The simulation slows down at this point as the `NaN`s are propagated throughout the GPU code. I am speculating, but this might also cause an increase in memory usage if exceptions are tracked during this time.

I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier. We're also working to add more instrumentation to return useful information when Bad State errors occur so that we can figure out and eliminate their causes.

Re: Core 21 causing severe GUI lag on Linux

Posted: Mon Oct 26, 2015 11:20 pm
by weirddan455
JohnChodera wrote:@weirddan: Thanks for pointing this out! Can you confirm this was with core 21 version 0.0.12?
Yes, it was.
JohnChodera wrote:Ah, nevermind. I see it *was* 0.0.12 in the log snippet.
And I didn't click the 2nd page :P
JohnChodera wrote:I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier.
The WU that terminated early on me turned out not to be bad. I made another thread asking about it and bruce said someone else completed it successfully. If you're saying these "NaN's" only happen on bad WU's that's probably not my issue as the WU turned out to be good but still caused a memory leak and terminated early for me.

Not sure if the terminating early is related to the memory leak or not, when my VRAM gets used up the WU just slows to a crawl, it doesn't crash. I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.

Re: Core 21 causing severe GUI lag on Linux

Posted: Tue Oct 27, 2015 3:05 pm
by bruce
weirddan455 wrote:...I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
FahCore_21 should report the bad state and automatically restart from the last checkpoint. Is that not working for you?

Re: Core 21 causing severe GUI lag on Linux

Posted: Wed Oct 28, 2015 2:20 pm
by weirddan455
bruce wrote:
weirddan455 wrote:...I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
FahCore_21 should report the bad state and automatically restart from the last checkpoint. Is that not working for you?
I think I've seen it do that but it takes an hour or two after it eats all the VRAM during which time folding slows to a crawl. I was on the computer at the time so I restarted it manually so it would get back to folding faster.

Re: Core 21 causing severe GUI lag on Linux

Posted: Wed Oct 28, 2015 9:51 pm
by davidcoton
Yes, that's the way it usually happens. The Bad State is only detected at the checkpoint, and when the "go-slow" bug strikes, it takes a long time to get there. The manual restart is the best available workaround at the moment, until a fixed Core-21 is available (soon™).

As for what makes a bad WU, if you can't complete it, it's bad for you. If someone else completes it, it's not inherently bad. Some people have reduced the number of problems by reducing the GPU clock (7xx series, it seems to be a GPU main clock issue, 9xx series, its seems to be GPU memory clock). Ask if you want to try that and don't know how -- there is a way on Linux :).

Edited to keep Bruce smiling :D

Re: Core 21 causing severe GUI lag on Linux

Posted: Wed Oct 28, 2015 11:21 pm
by jimerickson
how does one reduce memory clocks on linux?

Re: Core 21 causing severe GUI lag on Linux

Posted: Thu Oct 29, 2015 12:42 am
by Grandpa_01
davidcotton you can only change the P0 state memory clocks in Linux the P2 memory clocks are locked at 6000Mhz and the Max you can lower them in P0 state is down to 6000Mhz, the memory needs to be below 5975Mhz on my best 980 to stop the Bad State and slowdowns.

Nvidia can enable this in their drivers we just need to get them to do it.

Re: Core 21 causing severe GUI lag on Linux

Posted: Thu Oct 29, 2015 1:13 am
by davidcoton
Grandpa I said the GPU clock could be controlled, because the OP's card is a 770 and the problem there is not the same as on 9xx . In fact the memory clock is also controllable at least on my 780Ti, it seems this doesn't work on 9xx. My card runs in P3 state and there is reasonable control of both clocks, up or down.

The way to do it is "sudo nvidia-xconfig --cool-bits=12" in a Terminal window, then (IIRC) reboot.
The nvidia-settings app now gets two extra settings boxes on the PowerMizer tab.