Core 21 causing severe GUI lag on Linux

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: Core 21 causing severe GUI lag on Linux

Post by JohnChodera »

Ah, nevermind. I see it *was* 0.0.12 in the log snippet.
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: Core 21 causing severe GUI lag on Linux

Post by JohnChodera »

We found and fixed a few memory leaks in earlier core 21 versions, so it is surprising we are still seeing leaks.

We have found an issue (since 0.0.12) where the simulation might encounter a `NaN` that is not detected until the next snapshot/checkpoint. The simulation slows down at this point as the `NaN`s are propagated throughout the GPU code. I am speculating, but this might also cause an increase in memory usage if exceptions are tracked during this time.

I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier. We're also working to add more instrumentation to return useful information when Bad State errors occur so that we can figure out and eliminate their causes.
weirddan455
Posts: 12
Joined: Mon Oct 19, 2015 12:53 pm

Re: Core 21 causing severe GUI lag on Linux

Post by weirddan455 »

JohnChodera wrote:@weirddan: Thanks for pointing this out! Can you confirm this was with core 21 version 0.0.12?
Yes, it was.
JohnChodera wrote:Ah, nevermind. I see it *was* 0.0.12 in the log snippet.
And I didn't click the 2nd page :P
JohnChodera wrote:I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier.
The WU that terminated early on me turned out not to be bad. I made another thread asking about it and bruce said someone else completed it successfully. If you're saying these "NaN's" only happen on bad WU's that's probably not my issue as the WU turned out to be good but still caused a memory leak and terminated early for me.

Not sure if the terminating early is related to the memory leak or not, when my VRAM gets used up the WU just slows to a crawl, it doesn't crash. I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Core 21 causing severe GUI lag on Linux

Post by bruce »

weirddan455 wrote:...I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
FahCore_21 should report the bad state and automatically restart from the last checkpoint. Is that not working for you?
weirddan455
Posts: 12
Joined: Mon Oct 19, 2015 12:53 pm

Re: Core 21 causing severe GUI lag on Linux

Post by weirddan455 »

bruce wrote:
weirddan455 wrote:...I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
FahCore_21 should report the bad state and automatically restart from the last checkpoint. Is that not working for you?
I think I've seen it do that but it takes an hour or two after it eats all the VRAM during which time folding slows to a crawl. I was on the computer at the time so I restarted it manually so it would get back to folding faster.
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Core 21 causing severe GUI lag on Linux

Post by davidcoton »

Yes, that's the way it usually happens. The Bad State is only detected at the checkpoint, and when the "go-slow" bug strikes, it takes a long time to get there. The manual restart is the best available workaround at the moment, until a fixed Core-21 is available (soon™).

As for what makes a bad WU, if you can't complete it, it's bad for you. If someone else completes it, it's not inherently bad. Some people have reduced the number of problems by reducing the GPU clock (7xx series, it seems to be a GPU main clock issue, 9xx series, its seems to be GPU memory clock). Ask if you want to try that and don't know how -- there is a way on Linux :).

Edited to keep Bruce smiling :D
Image
jimerickson
Posts: 533
Joined: Tue May 27, 2008 11:56 pm
Hardware configuration: Parts:
Asus H370 Mining Master motherboard (X2)
Patriot Viper DDR4 memory 16gb stick (X4)
Nvidia GeForce GTX 1080 gpu (X16)
Intel Core i7 8700 cpu (X2)
Silverstone 1000 watt psu (X4)
Veddha 8 gpu miner case (X2)
Thermaltake hsf (X2)
Ubit riser card (X16)
Location: ames, iowa

Re: Core 21 causing severe GUI lag on Linux

Post by jimerickson »

how does one reduce memory clocks on linux?
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Core 21 causing severe GUI lag on Linux

Post by Grandpa_01 »

davidcotton you can only change the P0 state memory clocks in Linux the P2 memory clocks are locked at 6000Mhz and the Max you can lower them in P0 state is down to 6000Mhz, the memory needs to be below 5975Mhz on my best 980 to stop the Bad State and slowdowns.

Nvidia can enable this in their drivers we just need to get them to do it.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Core 21 causing severe GUI lag on Linux

Post by davidcoton »

Grandpa I said the GPU clock could be controlled, because the OP's card is a 770 and the problem there is not the same as on 9xx . In fact the memory clock is also controllable at least on my 780Ti, it seems this doesn't work on 9xx. My card runs in P3 state and there is reasonable control of both clocks, up or down.

The way to do it is "sudo nvidia-xconfig --cool-bits=12" in a Terminal window, then (IIRC) reboot.
The nvidia-settings app now gets two extra settings boxes on the PowerMizer tab.
Image
Post Reply