Anyone having trouble with the newest NVIDIA Driver 546.65?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

Today FaH started to cause my PC to crash. Either the PC blackscreens and Windows restarts or it just freezes completely so I have to do the old choke press on the power button. It happened with the DUD-E and Alzheimers tasks. Happens every time, reproducible. The CPU/GPU doesn't overheat or anything and other non-FaH computing jobs run perfectly fine.

Is anyone aware of any changes on the side of FaH that my cause this? Only thing I changed is that I updated the graphics driver but I'd hate to revert that one back...

Anyone else with the same issue?
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by HaloJones »

haven't updated my driver and I'm getting identical behaviour as you describe.
single 1070

Image
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

Ah, that is very interesting. Makes me think it might be unrelated to the driver and saves me from the hassle of DDUing the driver and all those shenanigans.... For now I think I will try again later today and see if it is fixed...
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by HaloJones »

getting a Windows DPC Watchdog Violation. If I have the logs open at the moment of failure it's showing as a CUDA error before it freezes. No changes on my end. Computer is 100% stable if folding is paused. no overclock. single watercooled 1070. no cpu folding.
single 1070

Image
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

having that open when one anticipates a crash is a very smart idea, i might try that later IF i can be bothered to crash my PC on purpose. well, i hope that it's just an issue of work units being broken that resolves itself soon.....
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by HaloJones »

if it's any use to you, testing my gpu has caused it to fail completely. I think it has been failing for a few days and is now gone entirely.
single 1070

Image
Joe_H
Site Admin
Posts: 7922
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Joe_H »

There is no general report of problems with the most recent driver from Nvidia. Please provide extracts from your logs showing the system and folding configuration as well as the WUs that are failing with the associated error messages.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

HaloJones wrote: Tue Jan 23, 2024 12:31 pm if it's any use to you, testing my gpu has caused it to fail completely. I think it has been failing for a few days and is now gone entirely.
well that is troubling news of course, that your GPU failed. doesn't bode well for mine of course. so far I only have issues with FaH. so maybe the CUDA platform in my GPU has failed...

which test did you run so I can try that myself?
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

I had some time and ran a few tests. It is not driver related, I reverted back to 546.33 which I know for a fact has worked flawlessly in the past. Same crashes.

I checked the Windows logs, nothing suspicious there either. Since the GPU runs fine for hours during heavy gaming where she also pulls upwards of 440 Watts, it can't reasonably be the PSU. Temps are not the issue either.

So yeah, it points to a faulty CUDA platform in my GPU. Sad times....

I can't attach text files here, so I will keep the extracts from the log files as brief as I can while retaining the relevant information, please let me know what additional data might be helpful.

Code: Select all

18:17:01:I1::WU88:Project: 18215 (Run 11155, Clone 1, Gen 11)
...
18:17:01:I1::WU88:       Core: Core23
18:17:01:I1::WU88:       Type: 0x23
18:17:01:I1::WU88:    Version: 8.0.3
...
18:17:01:I1::WU88:There are 4 platforms available.
18:17:01:I1::WU88:Platform 0: Reference
18:17:01:I1::WU88:Platform 1: CPU
18:17:01:I1::WU88:Platform 2: OpenCL
18:17:01:I1::WU88:  opencl-device 0 specified
18:17:01:I1::WU88:Platform 3: CUDA
18:17:01:I1::WU88:  cuda-device 0 specified
18:17:17:I1::WU88:Attempting to create CUDA context:
18:17:17:I1::WU88:  Configuring platform CUDA
18:17:22:I1::WU88:  Using CUDA on CUDA Platform and gpu 0
18:17:22:I1::WU88:  GPU info: Platform: CUDA
18:17:22:I1::WU88:  GPU info: PlatformIndex: 0
18:17:22:I1::WU88:  GPU info: Device: NVIDIA GeForce RTX 3090 Ti
18:17:22:I1::WU88:  GPU info: DeviceIndex: 0
18:17:22:I1::WU88:  GPU info: Vendor: 0x10de
18:17:22:I1::WU88:  GPU info: PCI: 45:00:00
18:17:22:I1::WU88:  GPU info: Compute: 8.6
18:17:22:I1::WU88:  GPU info: Driver: 12.3
18:17:22:I1::WU88:  GPU info: GPU: true
18:17:22:I1::WU88:Completed 0 out of 1250000 steps (0%)
18:17:23:I1::WU88:Checkpoint completed at step 0
18:18:10:I1::WU88:Completed 12500 out of 1250000 steps (1%)
18:18:57:I1::WU88:Completed 25000 out of 1250000 steps (2%)
18:18:58:I1::WU88:Checkpoint completed at step 25000
18:19:45:I1::WU88:Completed 37500 out of 1250000 steps (3%)
[log ends abruptly]

Code: Select all

17:56:11:I1::WU87:Project: 12245 (Run 0, Clone 337, Gen 12)
17:56:11:I1::WU87:Reading tar file core.xml
17:56:11:I1::WU87:Reading tar file integrator.xml
17:56:11:I1::WU87:Reading tar file state.xml.bz2
17:56:11:I1::WU87:Reading tar file system.xml.bz2
17:56:11:I1::WU87:Digital signatures verified
17:56:11:I1::WU87:Folding@home GPU Core23 Folding@home Core
17:56:11:I1::WU87:Version 8.0.3
17:56:11:I1::WU87:  Checkpoint write interval: 50000 steps (2%) [50 total]
17:56:11:I1::WU87:  JSON viewer frame write interval: 25000 steps (1%) [100 total]
17:56:11:I1::WU87:  XTC frame write interval: 25000 steps (1%) [100 total]
17:56:11:I1::WU87:  Global context and integrator variables write interval: disabled
17:56:11:I1::WU87:There are 4 platforms available.
17:56:11:I1::WU87:Platform 0: Reference
17:56:11:I1::WU87:Platform 1: CPU
17:56:11:I1::WU87:Platform 2: OpenCL
17:56:11:I1::WU87:  opencl-device 0 specified
17:56:11:I1::WU87:Platform 3: CUDA
17:56:11:I1::WU87:  cuda-device 0 specified
17:56:14:I1::WU87:Attempting to create CUDA context:
17:56:14:I1::WU87:  Configuring platform CUDA
17:56:17:I1::WU87:  Using CUDA on CUDA Platform and gpu 0
17:56:17:I1::WU87:  GPU info: Platform: CUDA
17:56:17:I1::WU87:  GPU info: PlatformIndex: 0
17:56:17:I1::WU87:  GPU info: Device: NVIDIA GeForce RTX 3090 Ti
17:56:17:I1::WU87:  GPU info: DeviceIndex: 0
17:56:17:I1::WU87:  GPU info: Vendor: 0x10de
17:56:17:I1::WU87:  GPU info: PCI: 45:00:00
17:56:17:I1::WU87:  GPU info: Compute: 8.6
17:56:17:I1::WU87:  GPU info: Driver: 12.3
17:56:17:I1::WU87:  GPU info: GPU: true
17:56:17:I1::WU87:Completed 0 out of 2500000 steps (0%)
17:56:17:I1::WU87:Checkpoint completed at step 0
.....
18:06:08:I1::WU87:Completed 875000 out of 2500000 steps (35%)
18:06:25:I1::WU87:Completed 900000 out of 2500000 steps (36%)
18:06:25:I1::WU87:Checkpoint completed at step 900000
18:06:42:I1::WU87:Completed 925000 out of 2500000 steps (37%)
[log ends]
same issue for "Project: 12280 (Run 0, Clone 336, Gen 46)", log looks basically the same so I didn't post it

I did not receive any error message and am unsure where I would find one.

Also if anyone knows a good way to test and diagnose the CUDA platform of ones GPU. Any info would be appreciated.
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by HaloJones »

I tried Heaven (ancient) which seemed to work. Tried to setup Timespy but I hate Steam with an overriding passion. I tried Furmark and the computer instantly crashed. On restart the card was no longer present in Device Manager. Thankfully I have a cpu with an IGP so am working off that until I get a replacement gpu.
single 1070

Image
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

HaloJones wrote: Wed Jan 24, 2024 2:31 pm I tried Heaven (ancient) which seemed to work. Tried to setup Timespy but I hate Steam with an overriding passion. I tried Furmark and the computer instantly crashed. On restart the card was no longer present in Device Manager. Thankfully I have a cpu with an IGP so am working off that until I get a replacement gpu.
ah, that sucks. sorry to hear that!

well, now I am extremely hesitant to stress my GPU further, because everything except CUDA works fine, SO FAR.... thanks for the info nonetheless, I might have to bite the bullet and try that at some point to know for sure....

my cpu doesnt have that so i'd have to raid my B-rig for a GPU.... :D sad times.
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by toTOW »

I was about to suggest to test GPU and/or PSU for stability after reading the first post ... HaloJones proved me right. :D

Levels of GPU hardware issues :
- NaNs detected on GPU
- GPU/driver resets
- system shutdowns/PSU triggering protections
- smoke
I had a 980 Ti that went through all steps ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Beercules48
Posts: 7
Joined: Sun Jan 21, 2024 10:16 pm
Hardware configuration: 5900X OC @ 4650 MHz | 3090 Ti OC @ 2175 MHz | 64 GB RAM | Win 11
Location: Germany

Re: Anyone having trouble with the newest NVIDIA Driver 546.65?

Post by Beercules48 »

well tried it again after the most recent nvidia driver update. to my surprise, it worked. I dunno what happened but I'm happy it works again and my GPU seems to be fine.

Code: Select all

22:01:31:I1::WU89:Completed 5000000 out of 5000000 steps (100%)
22:01:31:I1::WU89:Average performance: 263.415 ns/day
22:01:31:I1::WU89:Checkpoint completed at step 5000000
22:01:36:I1::WU89:Saving result file ..\logfile_01.txt
22:01:36:I1::WU89:Saving result file checkpointIntegrator.xml.bz2
22:01:36:I1::WU89:Saving result file checkpointState.xml.bz2
22:01:36:I1::WU89:Saving result file positions.xtc
22:01:36:I1::WU89:Saving result file science.log
22:01:36:I1::WU89:Folding@home Core Shutdown: FINISHED_UNIT
22:01:37:I1::WU89:Core returned FINISHED_UNIT (100)
22:01:37:I1::Added new work unit: cpus:0 gpus:gpu:45:00:00
22:01:37:I1::WU89:Uploading WU results 
....
22:02:18:I1::WU89:Credited 
so I'm back in the fold :lol:
Post Reply