Core 21 causing severe GUI lag on Linux
Moderators: Site Moderators, FAHC Science Team
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Core 21 causing severe GUI lag on Linux
Ah, nevermind. I see it *was* 0.0.12 in the log snippet.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Core 21 causing severe GUI lag on Linux
We found and fixed a few memory leaks in earlier core 21 versions, so it is surprising we are still seeing leaks.
We have found an issue (since 0.0.12) where the simulation might encounter a `NaN` that is not detected until the next snapshot/checkpoint. The simulation slows down at this point as the `NaN`s are propagated throughout the GPU code. I am speculating, but this might also cause an increase in memory usage if exceptions are tracked during this time.
I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier. We're also working to add more instrumentation to return useful information when Bad State errors occur so that we can figure out and eliminate their causes.
We have found an issue (since 0.0.12) where the simulation might encounter a `NaN` that is not detected until the next snapshot/checkpoint. The simulation slows down at this point as the `NaN`s are propagated throughout the GPU code. I am speculating, but this might also cause an increase in memory usage if exceptions are tracked during this time.
I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier. We're also working to add more instrumentation to return useful information when Bad State errors occur so that we can figure out and eliminate their causes.
-
- Posts: 12
- Joined: Mon Oct 19, 2015 12:53 pm
Re: Core 21 causing severe GUI lag on Linux
Yes, it was.JohnChodera wrote:@weirddan: Thanks for pointing this out! Can you confirm this was with core 21 version 0.0.12?
And I didn't click the 2nd pageJohnChodera wrote:Ah, nevermind. I see it *was* 0.0.12 in the log snippet.
The WU that terminated early on me turned out not to be bad. I made another thread asking about it and bruce said someone else completed it successfully. If you're saying these "NaN's" only happen on bad WU's that's probably not my issue as the WU turned out to be good but still caused a memory leak and terminated early for me.JohnChodera wrote:I'll see if we can at least get a core revision out that fixes the slowdown issue using some new code that detects`NaN`s and terminates bad work units earlier.
Not sure if the terminating early is related to the memory leak or not, when my VRAM gets used up the WU just slows to a crawl, it doesn't crash. I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
Re: Core 21 causing severe GUI lag on Linux
FahCore_21 should report the bad state and automatically restart from the last checkpoint. Is that not working for you?weirddan455 wrote:...I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 12
- Joined: Mon Oct 19, 2015 12:53 pm
Re: Core 21 causing severe GUI lag on Linux
I think I've seen it do that but it takes an hour or two after it eats all the VRAM during which time folding slows to a crawl. I was on the computer at the time so I restarted it manually so it would get back to folding faster.bruce wrote:FahCore_21 should report the bad state and automatically restart from the last checkpoint. Is that not working for you?weirddan455 wrote:...I restarted the client after that and it went from 74% (where the memory leak happened) to 87% and then ended early but it never leaked memory during that time.
-
- Posts: 1094
- Joined: Wed Nov 05, 2008 3:19 pm
- Location: Cambridge, UK
Re: Core 21 causing severe GUI lag on Linux
Yes, that's the way it usually happens. The Bad State is only detected at the checkpoint, and when the "go-slow" bug strikes, it takes a long time to get there. The manual restart is the best available workaround at the moment, until a fixed Core-21 is available (soon™).
As for what makes a bad WU, if you can't complete it, it's bad for you. If someone else completes it, it's not inherently bad. Some people have reduced the number of problems by reducing the GPU clock (7xx series, it seems to be a GPU main clock issue, 9xx series, its seems to be GPU memory clock). Ask if you want to try that and don't know how -- there is a way on Linux .
Edited to keep Bruce smiling ™
As for what makes a bad WU, if you can't complete it, it's bad for you. If someone else completes it, it's not inherently bad. Some people have reduced the number of problems by reducing the GPU clock (7xx series, it seems to be a GPU main clock issue, 9xx series, its seems to be GPU memory clock). Ask if you want to try that and don't know how -- there is a way on Linux .
Edited to keep Bruce smiling ™
-
- Posts: 533
- Joined: Tue May 27, 2008 11:56 pm
- Hardware configuration: Parts:
Asus H370 Mining Master motherboard (X2)
Patriot Viper DDR4 memory 16gb stick (X4)
Nvidia GeForce GTX 1080 gpu (X16)
Intel Core i7 8700 cpu (X2)
Silverstone 1000 watt psu (X4)
Veddha 8 gpu miner case (X2)
Thermaltake hsf (X2)
Ubit riser card (X16) - Location: ames, iowa
Re: Core 21 causing severe GUI lag on Linux
how does one reduce memory clocks on linux?
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Core 21 causing severe GUI lag on Linux
davidcotton you can only change the P0 state memory clocks in Linux the P2 memory clocks are locked at 6000Mhz and the Max you can lower them in P0 state is down to 6000Mhz, the memory needs to be below 5975Mhz on my best 980 to stop the Bad State and slowdowns.
Nvidia can enable this in their drivers we just need to get them to do it.
Nvidia can enable this in their drivers we just need to get them to do it.
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
-
- Posts: 1094
- Joined: Wed Nov 05, 2008 3:19 pm
- Location: Cambridge, UK
Re: Core 21 causing severe GUI lag on Linux
Grandpa I said the GPU clock could be controlled, because the OP's card is a 770 and the problem there is not the same as on 9xx . In fact the memory clock is also controllable at least on my 780Ti, it seems this doesn't work on 9xx. My card runs in P3 state and there is reasonable control of both clocks, up or down.
The way to do it is "sudo nvidia-xconfig --cool-bits=12" in a Terminal window, then (IIRC) reboot.
The nvidia-settings app now gets two extra settings boxes on the PowerMizer tab.
The way to do it is "sudo nvidia-xconfig --cool-bits=12" in a Terminal window, then (IIRC) reboot.
The nvidia-settings app now gets two extra settings boxes on the PowerMizer tab.