WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

foldy · Post by **foldy** » Sat Dec 10, 2016 9:38 am

snapshot wrote:If someone can point me to instructions on how to hack the relevant files then I might give it a try.

I did not try this myself but if you want to experiment:
http://null-bin.blogspot.de/2015/08/how ... river.html
But don't blame me if it does not work or Windows refuses to boot anymore.

Post by **bruce** » Sat Dec 10, 2016 5:55 pm

It's a shame that NV has not fixed their drivers yet. Apparently the 1050 and the 1050 Ti are the only two GPUs which cannot revert to older drivers without some kind of a hack. If you do try foldy's hack, let us know how it went.

red454 · Post by **red454** » Sat Dec 10, 2016 7:27 pm

I have to wonder how many times this has happened with earlier drivers and how long it took for Nvidia to fix it?

Kougar · Post by **Kougar** » Sat Dec 10, 2016 7:54 pm

So this is why the 750 Ti I have throws all sorts of crazy Event Log messages?

\Device\Video4
Graphics SM Global Exception on (GPC 0, TPC 4): Physical Multiple Warp Errors
\Device\Video4
Graphics SM Warp Exception on (GPC 0, TPC 4): Out Of Range Address
\Device\Video4
Graphics Exception: ESR 0x505e48=0x11000e 0x505e50=0x4 0x505e44=0xd3eff2 0x505e4c=0x7f

I recognize what warps are and how NVIDIA uses them to move data through the pipeline, pretty insane if NVIDIA still has not fixed and its been going on for over six weeks...

Post by **bruce** » Sat Dec 10, 2016 8:01 pm

Kougar: Nobody else has reported that yet -- and it might be useful.

I presume you're running 375.xx on your 75-Ti. Are there any messages in FAH's log or are they only in the Event log?

How about adding whatever details are missing and reporting it on https://forums.geforce.com/

(There'll be a long topic with the driver version in the title.)

Kougar · Post by **Kougar** » Sat Dec 10, 2016 9:41 pm

Huh odd. Yes this first started occurring after I installed the 376.19 drivers. My Titan Black blew itself up so I moved the 750 Ti into this system. In the original system the 750 Ti was using the 372 drivers.

Dunno if you want the reply in its own thread or not. Here's a partial of the log, I found the system had stopped GPU folding because of the number of failed units, so currently I just removed the GPU folding slot entirely. The event log has errors for every single failed WU I believe, always one of the listed three causes.

Code: Select all

03:35:47:WU00:FS01:0x21:*********************** Log Started 2016-12-09T03:35:47Z ***********************
03:35:47:WU00:FS01:0x21:Project: 11710 (Run 0, Clone 2, Gen 66)
03:35:47:WU00:FS01:0x21:Unit: 0x000000668ca304e75810bc06d111539f
03:35:47:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
03:35:47:WU00:FS01:0x21:Machine: 1
03:35:47:WU00:FS01:0x21:Reading tar file core.xml
03:35:47:WU00:FS01:0x21:Reading tar file integrator.xml
03:35:47:WU00:FS01:0x21:Reading tar file state.xml
03:35:47:WU00:FS01:0x21:Reading tar file system.xml
03:35:48:WU00:FS01:0x21:Digital signatures verified
03:35:48:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
03:35:48:WU00:FS01:0x21:Version 0.0.17
03:35:54:WU00:FS01:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)
03:35:54:WU00:FS01:0x21:Saving result file logfile_01.txt
03:35:54:WU00:FS01:0x21:Saving result file log.txt
03:35:54:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:35:57:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:35:57:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11710 run:0 clone:2 gen:66 core:0x21 unit:0x000000668ca304e75810bc06d111539f
03:35:57:WU00:FS01:Uploading 7.00KiB to 140.163.4.231
03:35:57:WU00:FS01:Upload complete
03:35:57:WU00:FS01:Server responded WORK_ACK (400)
03:35:57:WU00:FS01:Cleaning up
03:35:58:WU00:FS01:Assigned to work server 171.67.108.105
03:35:58:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 171.67.108.105
03:35:58:WU00:FS01:Downloading 19.29MiB
03:36:00:WU00:FS01:Download complete
03:36:00:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9175 run:9 clone:9 gen:134 core:0x21 unit:0x000000c0ab436c6957b24c286b3f1719
03:36:00:WU00:FS01:Starting
03:36:00:WU00:FS01:Core PID:6680
03:36:00:WU00:FS01:FahCore 0x21 started
03:36:01:WU00:FS01:0x21:*********************** Log Started 2016-12-09T03:36:01Z ***********************
03:36:01:WU00:FS01:0x21:Project: 9175 (Run 9, Clone 9, Gen 134)
03:36:01:WU00:FS01:0x21:Unit: 0x000000c0ab436c6957b24c286b3f1719
03:36:01:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
03:36:01:WU00:FS01:0x21:Machine: 1
03:36:01:WU00:FS01:0x21:Reading tar file core.xml
03:36:01:WU00:FS01:0x21:Reading tar file integrator.xml
03:36:01:WU00:FS01:0x21:Reading tar file state.xml
03:36:01:WU00:FS01:0x21:Reading tar file system.xml
03:36:02:WU00:FS01:0x21:Digital signatures verified
03:36:02:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
03:36:02:WU00:FS01:0x21:Version 0.0.17
03:36:08:WU00:FS01:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)
03:36:08:WU00:FS01:0x21:Saving result file logfile_01.txt
03:36:08:WU00:FS01:0x21:Saving result file log.txt
03:36:08:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:36:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:36:10:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9175 run:9 clone:9 gen:134 core:0x21 unit:0x000000c0ab436c6957b24c286b3f1719
03:36:10:WU00:FS01:Uploading 7.00KiB to 171.67.108.105
03:36:10:WU00:FS01:Upload complete
03:36:10:WU00:FS01:Server responded WORK_ACK (400)
03:36:10:WU00:FS01:Cleaning up
03:36:11:WU02:FS01:Assigned to work server 171.67.108.105
03:36:11:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 171.67.108.105
03:36:11:WU02:FS01:Downloading 19.47MiB
03:36:13:WU02:FS01:Download complete
03:36:13:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:9177 run:9 clone:6 gen:172 core:0x21 unit:0x00000108ab436c6957b24c295de5e08f
03:36:13:WU02:FS01:Starting
03:36:13:WU02:FS01:Core PID:14584
03:36:13:WU02:FS01:FahCore 0x21 started
03:36:14:WU02:FS01:0x21:*********************** Log Started 2016-12-09T03:36:14Z ***********************
03:36:14:WU02:FS01:0x21:Project: 9177 (Run 9, Clone 6, Gen 172)
03:36:14:WU02:FS01:0x21:Unit: 0x00000108ab436c6957b24c295de5e08f
03:36:14:WU02:FS01:0x21:CPU: 0x00000000000000000000000000000000
03:36:14:WU02:FS01:0x21:Machine: 1
03:36:14:WU02:FS01:0x21:Reading tar file core.xml
03:36:14:WU02:FS01:0x21:Reading tar file integrator.xml
03:36:14:WU02:FS01:0x21:Reading tar file state.xml
03:36:14:WU02:FS01:0x21:Reading tar file system.xml
03:36:14:WU02:FS01:0x21:Digital signatures verified
03:36:14:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
03:36:14:WU02:FS01:0x21:Version 0.0.17
03:36:21:WU02:FS01:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)
03:36:21:WU02:FS01:0x21:Saving result file logfile_01.txt
03:36:21:WU02:FS01:0x21:Saving result file log.txt
03:36:21:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:36:26:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:36:26:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9177 run:9 clone:6 gen:172 core:0x21 unit:0x00000108ab436c6957b24c295de5e08f
03:36:26:WU02:FS01:Uploading 7.00KiB to 171.67.108.105
03:36:27:WU02:FS01:Upload complete
03:36:27:WU02:FS01:Server responded WORK_ACK (400)
03:36:27:WU02:FS01:Cleaning up
03:36:27:WU00:FS01:Assigned to work server 171.67.108.104
03:36:27:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 171.67.108.104
03:36:27:WU00:FS01:Downloading 80.23MiB
03:36:33:WU00:FS01:Download 79.07%
03:36:34:WU00:FS01:Download complete
03:36:34:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9213 run:44 clone:42 gen:3 core:0x21 unit:0x0000000cab436c685796c11d4b8578a5
03:36:34:WU00:FS01:Starting
03:36:34:WU00:FS01:Core PID:4488
03:36:34:WU00:FS01:FahCore 0x21 started
03:36:35:WU00:FS01:0x21:*********************** Log Started 2016-12-09T03:36:35Z ***********************
03:36:35:WU00:FS01:0x21:Project: 9213 (Run 44, Clone 42, Gen 3)
03:36:35:WU00:FS01:0x21:Unit: 0x0000000cab436c685796c11d4b8578a5
03:36:35:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
03:36:35:WU00:FS01:0x21:Machine: 1
03:36:35:WU00:FS01:0x21:Reading tar file core.xml
03:36:35:WU00:FS01:0x21:Reading tar file integrator.xml
03:36:35:WU00:FS01:0x21:Reading tar file state.xml
03:36:35:WU00:FS01:0x21:Reading tar file system.xml
03:36:37:WU00:FS01:0x21:Digital signatures verified
03:36:37:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
03:36:37:WU00:FS01:0x21:Version 0.0.17
03:36:41:FS01:Paused
03:36:41:FS01:Shutting core down
03:36:41:WU00:FS01:0x21:WARNING:Console control signal 1 on PID 4488
03:36:41:WU00:FS01:0x21:Exiting, please wait. . .
03:37:04:WU00:FS01:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)
03:37:04:WU00:FS01:0x21:Saving result file logfile_01.txt
03:37:04:WU00:FS01:0x21:Saving result file log.txt
03:37:04:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:37:07:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)

Post by **bruce** » Sat Dec 10, 2016 10:04 pm

Yes, the message exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5) has only been found with the 375.xx/376.xx drivers. You can restart GPU folding if you reinstall an older driver version.

Kougar · Post by **Kougar** » Sat Dec 10, 2016 10:31 pm

Thanks for the help Bruce. I made a regression post in the 376.19 thread.

Quick aside, do any flags give preference for the AVX WUs or are those luck of the draw? Trying to keep my PPD above the EVGA monthly minimum with the Titan out of commission

drougnor · Post by **drougnor** » Sat Dec 10, 2016 10:41 pm

Personally, I'd love to see a rep of the OpenMM software programmers take on this: http://forums.evga.com/Got-word-from-NV ... 93454.aspx

The explanation makes perfect sense and if it IS a bug that's been surfaced by the driver code, it's in ALL of our best interest for OpenMM to dig into it to fix it, because that's a possible vulnerability that can be leveraged by not so nice coders to make our hardware do not so nice things . . .

Post by **bruce** » Sat Dec 10, 2016 11:35 pm

Kougar wrote:... do any flags give preference for the AVX WUs or are those luck of the draw? Trying to keep my PPD above the EVGA monthly minimum with the Titan out of commission

Off Topic, but see viewtopic.php?f=105&t=29273&p=291129#p291129

Post by **bruce** » Sat Dec 10, 2016 11:41 pm

drougnor wrote:Personally, I'd love to see a rep of the OpenMM software programmers take on this: http://forums.evga.com/Got-word-from-NV ... 93454.aspx

The explanation makes perfect sense and if it IS a bug that's been surfaced by the driver code, it's in ALL of our best interest for OpenMM to dig into it to fix it, because that's a possible vulnerability that can be leveraged by not so nice coders to make our hardware do not so nice things . . .

True, but it's strange that it appeared only in the 375-6 series of drivers.

Yes, a bug has been found in OpenMM, but fixing it doesn't resolve the issue.

In other words (contrary to external appearances) SOMETHING is happening.

At this point they're continuing to coordinate and when there is any concrete information about a fix, we'll hear about it whether it's in OpenMM or in driver 375-6 or both.

drougnor · Post by **drougnor** » Sun Dec 11, 2016 12:48 am

bruce wrote:
True, but it's strange that it appeared only in the 375-6 series of drivers.

Nah, not strange at all. The way I parse the message from NVidia is "We did some optimization of our code based on expected results and pushed it out in that version that started breaking things. Our 'optimization' brought to light unexpected behavior in the underlying support code of the OpenMM infrastructure."

Basically, NVidia is owning up to making a change in how the CUDA code interfaces with the OpenMM software, but it is causing entirely unexpected results to be returned and are under the belief that there's a synchronization error in the underlying code.

I'm glad to know that there is cooperation between the two teams, but now my curious nerd side has been poked with a sharp stick and wants to know the nitty gritty details, once it all gets hammered out.

SombraGuerrero · Post by **SombraGuerrero** » Sun Dec 11, 2016 5:27 am

I hope the OpenMM team will see it Nvidia's way. I would hate for Nvidia to have to branch or suppress some of their future Game Ready optimizations just to keep a bunch of non-gaming do-gooders like us happy! On the other hand, if they can control it at the app profile level at compile time, perhaps that isn't even an issue. Of course, this ultimately will mean that core_21 will need to be rebuilt if OpenMM implements a fix.

snapshot · Post by **snapshot** » Sun Dec 11, 2016 11:15 am

foldy wrote:
snapshot wrote:If someone can point me to instructions on how to hack the relevant files then I might give it a try.
I did not try this myself but if you want to experiment:
http://null-bin.blogspot.de/2015/08/how ... river.html
But don't blame me if it does not work or Windows refuses to boot anymore.

Thanks for that. It's a bit laptop-specific but I think there are enough clues in there to at least have a look. I'll be using a test PC and taking a fresh Acronis TIH image first....

2N5R · Post by **2N5R** » Wed Dec 14, 2016 3:32 pm

Anybody checked out the latest driver 376.33?

Folding Forum

WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.

Re: WARNING ! Do not upgrade to 375/376.xx drivers.