I usually process some work units for you as a soak test of new gaming/server hardware. Recently I had a computer come in with a 3070; getting an upgrade to a 4090. Client stated that there were some stability concerns so I started with a soak test using F@H older version - with the advanced control. 1x GPU slot + 7x CPU slots (8 core cpu). Which was ran for some hours, then the 4090 was added and it was ran some more with no issues. Folding was then paused (I pressed finish and waited for it to end) and some real world 3d application testing was completed. All without issues. Usually I remove F@H before delivering the finished systems, or simply reinstall the entire O/S in the case of servers.
Fast forward 4 days for the 4090 PC and the computer is reliably BSOD as soon as it logs in. Safe mode not impacted, normal mode with no GPU driver not impacted. Running 566.36, clean installed. Got the system back in and put old card back in, no stability problems. Then I start an RMA for the GPU before noticing the computer was using a lot of power for an idle system. Upon further investigation I've forgotten to remove F@H from the system . So typically I would still say sorry I think the GPU is faulty but something doesnt feel 'right' here.
No other software is triggering instability in my testing. Including other GPU compute tasks that are causing higher (GPU) power usage then F@H. Furmark ran for several hours without crashing or overheating. Power target was then increased to ensure we have headroom and no issues seen (500w power draw). MSI Kombustor ran for several hours with no errors or issues detected. BSOD observed is always DPC_Watchdog_Violation; basically as soon as the computer logs in, even when stone cold. Looking around on the forum other people report that this can be hardware or software issue.
Sadly I needed to get the system operational and delivered back to the client so I have not managed to collect information about potential that a specific work unit seems to trigger this but I felt you may find my experience useful.
4090 dpc watchdog violation BSOD
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
-
- Site Admin
- Posts: 7986
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: 4090 dpc watchdog violation BSOD
Problem that has been reported in the past with newer generation cards has been how fast the power draw can rise as a WU starts processing on a GPU. Some power supplies can't keep up and temporarily drop the voltage on the 12 V rail. In this case the system has gone from a GPU with a typical TDP of 220 W to one that is 450 W. F@h GPU processing can often approach 90% of that and in a fraction of a second. Could also be a faulty VRM on the card, but that is less likely since it runs okay on other high power applications.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: 4090 dpc watchdog violation BSOD
Please try getting older drivers like 535.?? for linux or 546.65 for windows .... 566 are known to cause problems:
https://forums.developer.nvidia.com/t/d ... dll/316960
A search for "nvidia driver problems 566.36" will reveal a number of reports like the above.
https://forums.developer.nvidia.com/t/d ... dll/316960
A search for "nvidia driver problems 566.36" will reveal a number of reports like the above.
Re: 4090 dpc watchdog violation BSOD
Thanks for suggestions, sadly I didnt get time to look into this any further than above because the system needed to be back with the user. I simply wanted to point out my experience in case it was useful for you.
Power supply wise, went from a 650w PSU with the 3070 up to a 1000w PSU for the 4090. Maximum observed cpu power draw with O/C is 200w. Maximum observed from the GPU is 500w. So at least in theory power supply is sufficient even with ancillary loads. I am uncertain if furmark is better or worse in terms of causing a power surge, but it didnt crash the system.
I would lean towards driver issues personally on this occasion.
Power supply wise, went from a 650w PSU with the 3070 up to a 1000w PSU for the 4090. Maximum observed cpu power draw with O/C is 200w. Maximum observed from the GPU is 500w. So at least in theory power supply is sufficient even with ancillary loads. I am uncertain if furmark is better or worse in terms of causing a power surge, but it didnt crash the system.
I would lean towards driver issues personally on this occasion.
-
- Site Moderator
- Posts: 6394
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: 4090 dpc watchdog violation BSOD
It might just be a bad GPU ... was it factory overclocked ? Did you try to reduces settings to stock ones ?
Re: 4090 dpc watchdog violation BSOD
The card was BSODing in stock settings. I tried increasing power target with furmark running to ensure I had headroom then reset to default.
I guess the card could very well be bad, it was a factory refurbished Zotac one which was -£1000 less than retail price. To the eye looked like a new card. I stopped short of sending the card back because it completed several hours in furmark with no stability issues apparent. Some time was spent in 'artifact scanning' mode and none found. If I was going to send it back I must demonstrate the issue and doing so using F@H would be tricky because different work units have different demands.
To be clear; no instability was found in any other software or test.. Also no instability in 3+ hours playing video games and running other workloads; including some of your work units before I delivered the system to the end user.
I'm not complaining here, just merely sharing my experience in the hope you or others could find it useful. The suggestion that the driver I used could be buggy feels very possible to me.
I've taken the 3070 I removed; installed a water block on it and am currently stability testing my overclock with some folding.
I guess the card could very well be bad, it was a factory refurbished Zotac one which was -£1000 less than retail price. To the eye looked like a new card. I stopped short of sending the card back because it completed several hours in furmark with no stability issues apparent. Some time was spent in 'artifact scanning' mode and none found. If I was going to send it back I must demonstrate the issue and doing so using F@H would be tricky because different work units have different demands.
To be clear; no instability was found in any other software or test.. Also no instability in 3+ hours playing video games and running other workloads; including some of your work units before I delivered the system to the end user.
I'm not complaining here, just merely sharing my experience in the hope you or others could find it useful. The suggestion that the driver I used could be buggy feels very possible to me.
I've taken the 3070 I removed; installed a water block on it and am currently stability testing my overclock with some folding.