9634 (Run 0, Clone 9, Gen 5)

Moderators: Site Moderators, FAHC Science Team

bigblock990
Posts: 20
Joined: Wed Sep 09, 2015 12:42 pm

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by bigblock990 »

Grandpa_01 wrote: I agree with the temporary solution but it shouldn't be considered a permanent solution, there is definitely a problem and that needs to be addressed, I believe removing them from Linux would be a better choice at this point in time. I am not quite sure I understand the concept of forcing a person to run something that doesn't work at default settings, when they can easily be removed at Stanford's end, they work great on windows but really struggle on Linux.
billford wrote: The problem is only with FAH software, running Core_21 under Linux, nothing to do with NVidia.

Your statement exemplifies PG's ivory-tower approach and lack of appreciation of the real world outside academia.
billford wrote:
Grandpa_01 wrote:I believe removing them from Linux would be a better choice at this point in time
Or at least putting them back to adv so the donors have a choice.
I agree on all three.
artoar_11
Posts: 652
Joined: Sun Nov 22, 2009 8:42 pm
Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
Location: Bulgaria/Team #224497/artoar11_ALL_....

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by artoar_11 »

A similar problem gamers have discovered a year ago:

https://forums.geforce.com/default/topi ... ituations/
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

And how did they fix it ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
artoar_11
Posts: 652
Joined: Sun Nov 22, 2009 8:42 pm
Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
Location: Bulgaria/Team #224497/artoar11_ALL_....

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by artoar_11 »

If I'm not mistaken this is a forum of NVidia. I did not see there a response from the representative of NV. Only complaints from consumers.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by bruce »

One of tho posts in that topic suggests that NV should fix it but that it's probably not high on their list. That seems to still be the case.

My personal theory is that NV's boost clocks constitutes their own versions of overclocking. It seems likely that their transitions to/from the boosted clock and voltage state work smoothly in their tests but if you start from an overclocked base setting and add their boost to that, it's not so smooth.

Unfortunately I don't expect that people who overclock don't that that into account. The OC rules have changed.

Stanford does not support overclocking. Many Donors do overclock but I find it difficult to separate their reports from those from non-OC people.

If you do overclock, have you disabled the boost function?
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

bruce wrote: My personal theory is that NV's boost clocks constitutes their own versions of overclocking.
I think that goes without saying.
bruce wrote: Stanford does not support overclocking.
.
.
If you do overclock, have you disabled the boost function?
Putting your statements together, it seems you are suggesting that a GPU used for folding should have any factory overclocking removed and the boost function disabled. If that is not what you meant, perhaps you would like to clarify?

You also say:
bruce wrote:The OC rules have changed.
You are not, I trust, implying that PG's developers were unaware of this hence haven't taken it into account?


My GTX 980's are factory overclocked, and I apply further overclocking on top of that. I fully accept that any instability due to the latter is my responsibility completely. I do not consider that that applies to the factory overclock, and even less to the boost.

I think it likely that most people buy a GPU primarily for gaming, and get the fastest one they can afford. (Maybe using other criteria, but speed is likely to be high on the list of priorities.) They expect it to work "out of the box", and in general (and for the intended purpose) it will. Not many will know about the fine detail of boost clocks, fan profiles, maximum power demand, thermal environments etc etc- it's just a consumer component of their computer that gives them a better gaming experience.

As these are the sort of people PG would like to attract to the folding project (and are signally failing to do so) it behoves them to look outside the cosy world of academia and make sure their software works on the hardware that people actually buy.
Image
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

Is it even possible to disable boost function (or any dynamic management of voltage and frequency) ?

It would be so easier if we could get back to the old behaviour of the hardware ... it was easier to control what was happeing.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by bruce »

You are not, I trust, implying that PG's developers were unaware of this hence haven't taken it into account?
What part of "not supported" do you fail to understand? Even if they are aware, their policy is that they WILL NOT take it into account. That's the responsibility of the OC community.

Putting those two statements together without the intervening words changes the meaning to something that I did not imply. If you overclock, whether you disable BOOST or not, instabilities are your responsiblity. I SUGGEST that if you overclock and you do not disable boost, your system is probably going to be unstable, and it's still your responsibility.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

toTOW wrote:Is it even possible to disable boost function (or any dynamic management of voltage and frequency) ?
It should be possible to control voltage and clock frequency- programs such as Afterburner can do it and NVidia say that anything that can be changed from the NVidia X Server Settings app (ie amount of over/underclock) can also be changed from a command line, so some sort of API presumably exists.

Don't know if it's possible to turn boost off.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by bruce »

billford wrote:Don't know if it's possible to turn boost off.
I don't know either, but that should have been addressed by the year-old discussion on NV forums.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

bruce wrote:What part of "not supported" do you fail to understand?
I understand it perfectly well, but you state that PG do not support overclocking, and (as your own opinion, admittedly) that boost is NVidia's version of overclocking. So do they support the built-in boost without any extra user-applied overclocking? Clarification is all I'm asking for.
bruce wrote:Even if they are aware, their policy is that they WILL NOT take it into account.
That implies that they don't.
bruce wrote:If you overclock, whether you disable BOOST or not, instabilities are your responsiblity.
I accept that, I said so.
Image
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

bruce wrote:
billford wrote:Don't know if it's possible to turn boost off.
I don't know either, but that should have been addressed by the year-old discussion on NV forums.
"Someone else should have done it" is just another abdication of responsibility.
Last edited by billford on Sat Oct 17, 2015 3:40 pm, edited 1 time in total.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by Grandpa_01 »

bruce wrote:One of tho posts in that topic suggests that NV should fix it but that it's probably not high on their list. That seems to still be the case.
I agree and they probably never will, so who will that leave the responsibility with ?

Remember in many cases these are factory OC cards that work with all of the other WU's including Core21 except the new 9xxx on Linux. Windows does not appear to have the same problem.
My personal theory is that NV's boost clocks constitutes their own versions of overclocking. It seems likely that their transitions to/from the boosted clock and voltage state work smoothly in their tests but if you start from an overclocked base setting and add their boost to that, it's not so smooth.
This could be true and could be the cause but we do not know that, On the GTX 980 Classified cards I can not find any evidence of any boost state being used on Linux when the card is running on the Ln2 bios which disables all of Nvidias limitations. I see it being used on Windows and Windows works fine with them, so the problem may be that it is not being used on Linux, Which also suggest there may be a simple cure for now of bumping the voltage up a bit on the 980 Classifieds, I am currently working on that. The 970 SC is utilizing boost though and there may not be a cure for it if that is the problem.
bruce wrote:Unfortunately I don't expect that people who overclock don't that that into account. The OC rules have changed.
Probably true there for the majority
bruce wrote:Stanford does not support overclocking. Many Donors do overclock but I find it difficult to separate their reports from those from non-OC people.

If you do overclock, have you disabled the boost function?
They may not support it but in there Teams & Stats web page they sure encourage it. and the people are doing exactly what they are told to do on that page.
Teams & Stats wrote:reporting bugs and making suggestions about how to improve the software
That statement suggest PG will try and work on problems and they are taking the responsibility. If it different than that it should read Factory OCed hardware is not supported
Teams & Stats wrote:Teams & Stats


One of the best ways to help Folding@home is by recruiting your friends and family. Start by sharing our project with them. Then join a team or even start your own team. The more points you and your team earn, the closer we come to finding cures.

On this page you will find access to statistics for individuals and teams who have joined together to earn points and compete with other teams. Some of us are quite intense in our approach to folding. We have team websites, we supe up our computers' and we drive the technology forward by reporting bugs and making suggestions about how to improve the software.

It’s really the best way to …

maximize-your-effort


Statistics
Last edited by Grandpa_01 on Sat Oct 17, 2015 4:01 pm, edited 1 time in total.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

billford wrote:
toTOW wrote:Is it even possible to disable boost function (or any dynamic management of voltage and frequency) ?
It should be possible to control voltage and clock frequency- programs such as Afterburner can do it and NVidia say that anything that can be changed from the NVidia X Server Settings app (ie amount of over/underclock) can also be changed from a command line, so some sort of API presumably exists.

Don't know if it's possible to turn boost off.
I'm afraid that all tools only apply offset (either positive or negative) to the voltage or the clock, but the boost algorithm is still there :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

bruce wrote: Stanford does not support overclocking. Many Donors do overclock but I find it difficult to separate their reports from those from non-OC people.
OK, here's one, GTX 980, Linux, 355.11 drivers running with with my normal overclock:

Code: Select all

13:37:17:WU00:FS01:Starting 
13:37:17:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1445 -checkpoint 15 -gpu 0 -gpu-vendor nvidia 
13:37:17:WU00:FS01:Started FahCore on PID 12262 
13:37:17:WU00:FS01:Core PID:12266 
13:37:17:WU00:FS01:FahCore 0x21 started 
13:37:17:WU00:FS01:0x21:*********************** Log Started 2015-10-17T13:37:17Z *********************** 
13:37:17:WU00:FS01:0x21:Project: 9630 (Run 0, Clone 33, Gen 9) 
13:37:17:WU00:FS01:0x21:Unit: 0x0000000bab436c9b5609bee27d40327a 
13:37:17:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000 
13:37:17:WU00:FS01:0x21:Machine: 1 
13:37:17:WU00:FS01:0x21:Reading tar file core.xml 
13:37:17:WU00:FS01:0x21:Reading tar file integrator.xml 
13:37:17:WU00:FS01:0x21:Reading tar file state.xml 
13:37:17:WU00:FS01:0x21:Reading tar file system.xml 
13:37:17:WU00:FS01:0x21:Digital signatures verified 
13:37:17:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core 
13:37:17:WU00:FS01:0x21:Version 0.0.11 
13:37:38:WU00:FS01:0x21:Completed 0 out of 2000000 steps (0%) 
13:37:38:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900 
13:39:16:WU00:FS01:0x21:Completed 20000 out of 2000000 steps (1%) 
.
.
13:59:43:WU00:FS01:0x21:Completed 280000 out of 2000000 steps (14%) 
14:01:17:WU00:FS01:0x21:Completed 300000 out of 2000000 steps (15%) 
14:01:21:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint 
14:02:57:WU00:FS01:0x21:Completed 220000 out of 2000000 steps (11%) 
14:04:37:WU00:FS01:0x21:Completed 240000 out of 2000000 steps (12%) 
.
.
15:40:30:WU00:FS01:0x21:Completed 1380000 out of 2000000 steps (69%) 
15:42:10:WU00:FS01:0x21:Completed 1400000 out of 2000000 steps (70%) 
15:42:14:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint 
15:43:54:WU00:FS01:0x21:Completed 1320000 out of 2000000 steps (66%) 
15:45:33:WU00:FS01:0x21:Completed 1340000 out of 2000000 steps (67%) 
At the first bad state I immediately removed my overclock, ie returned it to factory base. As far as I'm concerned (supported, I think, by Grandpa_01's post) it was then no longer overclocked, but it still hit a bad state.

I'm still waiting for it to finish (or get dumped with another bad state). (Edit- it completed successfully.)

I'll concede that one swallow doth not a summer make, but I'll keep note of future behaviour.
Last edited by billford on Sat Oct 17, 2015 4:44 pm, edited 1 time in total.
Image
Post Reply