Page 3 of 4

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Fri Oct 16, 2015 4:18 pm
by bigblock990
Grandpa_01 wrote: I agree with the temporary solution but it shouldn't be considered a permanent solution, there is definitely a problem and that needs to be addressed, I believe removing them from Linux would be a better choice at this point in time. I am not quite sure I understand the concept of forcing a person to run something that doesn't work at default settings, when they can easily be removed at Stanford's end, they work great on windows but really struggle on Linux.
billford wrote: The problem is only with FAH software, running Core_21 under Linux, nothing to do with NVidia.

Your statement exemplifies PG's ivory-tower approach and lack of appreciation of the real world outside academia.
billford wrote:
Grandpa_01 wrote:I believe removing them from Linux would be a better choice at this point in time
Or at least putting them back to adv so the donors have a choice.
I agree on all three.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Fri Oct 16, 2015 4:34 pm
by artoar_11
A similar problem gamers have discovered a year ago:

https://forums.geforce.com/default/topi ... ituations/

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Fri Oct 16, 2015 8:14 pm
by toTOW
And how did they fix it ?

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Fri Oct 16, 2015 10:12 pm
by artoar_11
If I'm not mistaken this is a forum of NVidia. I did not see there a response from the representative of NV. Only complaints from consumers.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 6:06 am
by bruce
One of tho posts in that topic suggests that NV should fix it but that it's probably not high on their list. That seems to still be the case.

My personal theory is that NV's boost clocks constitutes their own versions of overclocking. It seems likely that their transitions to/from the boosted clock and voltage state work smoothly in their tests but if you start from an overclocked base setting and add their boost to that, it's not so smooth.

Unfortunately I don't expect that people who overclock don't that that into account. The OC rules have changed.

Stanford does not support overclocking. Many Donors do overclock but I find it difficult to separate their reports from those from non-OC people.

If you do overclock, have you disabled the boost function?

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 9:34 am
by billford
bruce wrote: My personal theory is that NV's boost clocks constitutes their own versions of overclocking.
I think that goes without saying.
bruce wrote: Stanford does not support overclocking.
.
.
If you do overclock, have you disabled the boost function?
Putting your statements together, it seems you are suggesting that a GPU used for folding should have any factory overclocking removed and the boost function disabled. If that is not what you meant, perhaps you would like to clarify?

You also say:
bruce wrote:The OC rules have changed.
You are not, I trust, implying that PG's developers were unaware of this hence haven't taken it into account?


My GTX 980's are factory overclocked, and I apply further overclocking on top of that. I fully accept that any instability due to the latter is my responsibility completely. I do not consider that that applies to the factory overclock, and even less to the boost.

I think it likely that most people buy a GPU primarily for gaming, and get the fastest one they can afford. (Maybe using other criteria, but speed is likely to be high on the list of priorities.) They expect it to work "out of the box", and in general (and for the intended purpose) it will. Not many will know about the fine detail of boost clocks, fan profiles, maximum power demand, thermal environments etc etc- it's just a consumer component of their computer that gives them a better gaming experience.

As these are the sort of people PG would like to attract to the folding project (and are signally failing to do so) it behoves them to look outside the cosy world of academia and make sure their software works on the hardware that people actually buy.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 2:41 pm
by toTOW
Is it even possible to disable boost function (or any dynamic management of voltage and frequency) ?

It would be so easier if we could get back to the old behaviour of the hardware ... it was easier to control what was happeing.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 2:51 pm
by bruce
You are not, I trust, implying that PG's developers were unaware of this hence haven't taken it into account?
What part of "not supported" do you fail to understand? Even if they are aware, their policy is that they WILL NOT take it into account. That's the responsibility of the OC community.

Putting those two statements together without the intervening words changes the meaning to something that I did not imply. If you overclock, whether you disable BOOST or not, instabilities are your responsiblity. I SUGGEST that if you overclock and you do not disable boost, your system is probably going to be unstable, and it's still your responsibility.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:00 pm
by billford
toTOW wrote:Is it even possible to disable boost function (or any dynamic management of voltage and frequency) ?
It should be possible to control voltage and clock frequency- programs such as Afterburner can do it and NVidia say that anything that can be changed from the NVidia X Server Settings app (ie amount of over/underclock) can also be changed from a command line, so some sort of API presumably exists.

Don't know if it's possible to turn boost off.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:03 pm
by bruce
billford wrote:Don't know if it's possible to turn boost off.
I don't know either, but that should have been addressed by the year-old discussion on NV forums.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:11 pm
by billford
bruce wrote:What part of "not supported" do you fail to understand?
I understand it perfectly well, but you state that PG do not support overclocking, and (as your own opinion, admittedly) that boost is NVidia's version of overclocking. So do they support the built-in boost without any extra user-applied overclocking? Clarification is all I'm asking for.
bruce wrote:Even if they are aware, their policy is that they WILL NOT take it into account.
That implies that they don't.
bruce wrote:If you overclock, whether you disable BOOST or not, instabilities are your responsiblity.
I accept that, I said so.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:12 pm
by billford
bruce wrote:
billford wrote:Don't know if it's possible to turn boost off.
I don't know either, but that should have been addressed by the year-old discussion on NV forums.
"Someone else should have done it" is just another abdication of responsibility.

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:28 pm
by Grandpa_01
bruce wrote:One of tho posts in that topic suggests that NV should fix it but that it's probably not high on their list. That seems to still be the case.
I agree and they probably never will, so who will that leave the responsibility with ?

Remember in many cases these are factory OC cards that work with all of the other WU's including Core21 except the new 9xxx on Linux. Windows does not appear to have the same problem.
My personal theory is that NV's boost clocks constitutes their own versions of overclocking. It seems likely that their transitions to/from the boosted clock and voltage state work smoothly in their tests but if you start from an overclocked base setting and add their boost to that, it's not so smooth.
This could be true and could be the cause but we do not know that, On the GTX 980 Classified cards I can not find any evidence of any boost state being used on Linux when the card is running on the Ln2 bios which disables all of Nvidias limitations. I see it being used on Windows and Windows works fine with them, so the problem may be that it is not being used on Linux, Which also suggest there may be a simple cure for now of bumping the voltage up a bit on the 980 Classifieds, I am currently working on that. The 970 SC is utilizing boost though and there may not be a cure for it if that is the problem.
bruce wrote:Unfortunately I don't expect that people who overclock don't that that into account. The OC rules have changed.
Probably true there for the majority
bruce wrote:Stanford does not support overclocking. Many Donors do overclock but I find it difficult to separate their reports from those from non-OC people.

If you do overclock, have you disabled the boost function?
They may not support it but in there Teams & Stats web page they sure encourage it. and the people are doing exactly what they are told to do on that page.
Teams & Stats wrote:reporting bugs and making suggestions about how to improve the software
That statement suggest PG will try and work on problems and they are taking the responsibility. If it different than that it should read Factory OCed hardware is not supported
Teams & Stats wrote:Teams & Stats


One of the best ways to help Folding@home is by recruiting your friends and family. Start by sharing our project with them. Then join a team or even start your own team. The more points you and your team earn, the closer we come to finding cures.

On this page you will find access to statistics for individuals and teams who have joined together to earn points and compete with other teams. Some of us are quite intense in our approach to folding. We have team websites, we supe up our computers' and we drive the technology forward by reporting bugs and making suggestions about how to improve the software.

It’s really the best way to …

maximize-your-effort


Statistics

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:34 pm
by toTOW
billford wrote:
toTOW wrote:Is it even possible to disable boost function (or any dynamic management of voltage and frequency) ?
It should be possible to control voltage and clock frequency- programs such as Afterburner can do it and NVidia say that anything that can be changed from the NVidia X Server Settings app (ie amount of over/underclock) can also be changed from a command line, so some sort of API presumably exists.

Don't know if it's possible to turn boost off.
I'm afraid that all tools only apply offset (either positive or negative) to the voltage or the clock, but the boost algorithm is still there :(

Re: 9634 (Run 0, Clone 9, Gen 5)

Posted: Sat Oct 17, 2015 3:58 pm
by billford
bruce wrote: Stanford does not support overclocking. Many Donors do overclock but I find it difficult to separate their reports from those from non-OC people.
OK, here's one, GTX 980, Linux, 355.11 drivers running with with my normal overclock:

Code: Select all

13:37:17:WU00:FS01:Starting 
13:37:17:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1445 -checkpoint 15 -gpu 0 -gpu-vendor nvidia 
13:37:17:WU00:FS01:Started FahCore on PID 12262 
13:37:17:WU00:FS01:Core PID:12266 
13:37:17:WU00:FS01:FahCore 0x21 started 
13:37:17:WU00:FS01:0x21:*********************** Log Started 2015-10-17T13:37:17Z *********************** 
13:37:17:WU00:FS01:0x21:Project: 9630 (Run 0, Clone 33, Gen 9) 
13:37:17:WU00:FS01:0x21:Unit: 0x0000000bab436c9b5609bee27d40327a 
13:37:17:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000 
13:37:17:WU00:FS01:0x21:Machine: 1 
13:37:17:WU00:FS01:0x21:Reading tar file core.xml 
13:37:17:WU00:FS01:0x21:Reading tar file integrator.xml 
13:37:17:WU00:FS01:0x21:Reading tar file state.xml 
13:37:17:WU00:FS01:0x21:Reading tar file system.xml 
13:37:17:WU00:FS01:0x21:Digital signatures verified 
13:37:17:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core 
13:37:17:WU00:FS01:0x21:Version 0.0.11 
13:37:38:WU00:FS01:0x21:Completed 0 out of 2000000 steps (0%) 
13:37:38:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900 
13:39:16:WU00:FS01:0x21:Completed 20000 out of 2000000 steps (1%) 
.
.
13:59:43:WU00:FS01:0x21:Completed 280000 out of 2000000 steps (14%) 
14:01:17:WU00:FS01:0x21:Completed 300000 out of 2000000 steps (15%) 
14:01:21:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint 
14:02:57:WU00:FS01:0x21:Completed 220000 out of 2000000 steps (11%) 
14:04:37:WU00:FS01:0x21:Completed 240000 out of 2000000 steps (12%) 
.
.
15:40:30:WU00:FS01:0x21:Completed 1380000 out of 2000000 steps (69%) 
15:42:10:WU00:FS01:0x21:Completed 1400000 out of 2000000 steps (70%) 
15:42:14:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint 
15:43:54:WU00:FS01:0x21:Completed 1320000 out of 2000000 steps (66%) 
15:45:33:WU00:FS01:0x21:Completed 1340000 out of 2000000 steps (67%) 
At the first bad state I immediately removed my overclock, ie returned it to factory base. As far as I'm concerned (supported, I think, by Grandpa_01's post) it was then no longer overclocked, but it still hit a bad state.

I'm still waiting for it to finish (or get dumped with another bad state). (Edit- it completed successfully.)

I'll concede that one swallow doth not a summer make, but I'll keep note of future behaviour.