Page 1 of 3

random system hang

Posted: Sun Jan 03, 2021 7:55 pm
by djgibbons
I have started getting random system hangs. or freezes. It happens after the monitors turn off due to inactivity and the GPUs start folding (on LIGHT setting). Everything is fine for 5-8 hours, and then the system is unresponsive. I can get the same problem if I run folding on MEDIUM setting for 1-6 hours. At this time I have not allowed the monitors to turn off (just screen saver), and it is very stable. But I don't think the GPUs do much folding with the screen saver on.

My computer is about 2 years old and otherwise robust. I have the latest update for the WIN10 OS. I have an Nvidia Quadro RTX4000 driving 2 monitors, and a Quadro K4200 I keep just for folding. I have contacted Nvidia about this behaviour to make sure the cards are not too hot and are still running within their design parameters.

I have updated all the drivers and BIOS in the past month. I also use Restoro to do some clean-up, but this only delays the hang for a little while.

Are there any known issues with 2 cards? Should I have each card driving one monitor?

Re: random system hang

Posted: Sun Jan 03, 2021 8:28 pm
by JimboPalmer
If you Igo to control panels and search for power, go to the power and screen settings. Reduce the time before the screen powers down and make sure the PC never sleeps.

Re: random system hang

Posted: Sun Jan 03, 2021 8:46 pm
by djgibbons
Done. I will post the results when I have some.

Re: random system hang

Posted: Sun Jan 03, 2021 9:54 pm
by PantherX
Just wondering if you have installed Nvidia drivers from the official site? Reason is that if you have done the feature update to 20H2, then you might be using Microsoft Drivers which are known to be troublesome when it comes to folding. Using the drivers from Nvidia would resolve that issue.

Re: random system hang

Posted: Mon Jan 04, 2021 12:25 am
by djgibbons
I did get a hang about 3 hours after changing the screen settings with reduced power down time. I already had the PC at not sleeping. When I checked the driver version it matched Nvidia's latest, but I reinstalled it as a clean installation to be sure. What I did notice is that the file path given as the first step of the process was historically correct, but I can't find the actual directory based on the driver version name (460.89). So, something has changed with this RTX change-over.

Could there be a problem if the 2 graphics cards don't know that the other one exists?

Re: random system hang

Posted: Mon Jan 04, 2021 2:11 am
by JimboPalmer
https://www.techpowerup.com/gpu-specs/q ... 4000.c3336
is going to pull 165 watts and hopes for a 450 watt power supply.

https://www.techpowerup.com/gpu-specs/q ... 4200.c2602
pulls another 108 Watts, so a 600 Watt Power supply may be needed. (there is a real chance 550 watts is enough)

How many watts is your power supply rated for?

And in a similar vein, is your PC overheating?

Both Speccy and GPU-Z can measure Temps.

https://www.ccleaner.com/speccy/download/standard

https://www.techpowerup.com/gpuz/

Re: random system hang

Posted: Mon Jan 04, 2021 2:22 am
by djgibbons
I have a 1000W power supply, so should be plenty of juice. I have GPUZ installed per Nvidia's request and have sent them data. I also use CPUID HWMonitor for getting an idea of where things are at. My case has 3 input fans and 3 output fans, with another on the CPU cooler (heat pipe). I have no signs of overheating, unless I have exceeded the temperature limit on one or both of the cards. I have a JPG image of steady state conditions while running FAH at MEDIUM. How do I attach it here?

Re: random system hang

Posted: Mon Jan 04, 2021 2:24 am
by PantherX
djgibbons wrote:...When I checked the driver version it matched Nvidia's latest, but I reinstalled it as a clean installation to be sure. What I did notice is that the file path given as the first step of the process was historically correct, but I can't find the actual directory based on the driver version name (460.89). So, something has changed with this RTX change-over...
Apologies for not providing a bit more context... the driver that Microsoft would install would show the "right" value. The issue is that it would only do the Driver install and not the additional OpenCL stuff that is normally packaged in Nvidia's Driver. However, I don't know of a way to identify if the driver was installed by Microsoft or Nvidia... it's only when F@H encounters issues and a re-install magically fixes it.
djgibbons wrote:...Could there be a problem if the 2 graphics cards don't know that the other one exists?
For F@H, as long as it sees two supported GPUs, it will continue to work without issues.
For the drivers, as long as both GPUs are supported by the same driver package, it would work fine without issues... theoretically speaking.

Have you tried to fold only on 1 GPU at a time to test it out?
djgibbons wrote:...I have a JPG image of steady state conditions while running FAH at MEDIUM. How do I attach it here?
The forum doesn't provide image hosting feature. Instead, you can use third party sites to upload the image and then share it here. A commonly used one is: https://imgur.com/

Re: random system hang

Posted: Thu Jan 07, 2021 10:59 pm
by djgibbons
I had installed the latest driver straight from Nokia. I heard back from them, and all the data I sent them showed normal operation of both cards. My next step is to disable one of them to see if that helps.

Re: random system hang

Posted: Thu Jan 07, 2021 11:05 pm
by JimboPalmer
Is Nokia a typo? I would not use a driver from Nokia.

Re: random system hang

Posted: Fri Jan 08, 2021 10:15 pm
by djgibbons
Sorry, Nvidia. One of my company's customers is Nokia, and they are very demanding. So, I disabled the K4200 as it is the oldest card and has been repaired once before under warranty. I then set FAH to Medium and it has run without a hang now for almost 24 hours. If I can't use the K4200 for FAH, I might as well remove it from my computer.

Re: random system hang

Posted: Fri Jan 08, 2021 10:54 pm
by JimboPalmer
It may be possible the K4200 will work in another PC. (this assumes you have more than one PC)

Re: random system hang

Posted: Sat Jan 09, 2021 4:40 pm
by djgibbons
I do have another desktop, as well as 2 older laptops and an all-in-one. These four machines are running FAH on High setting without issues, so I might let sleeping dogs lie. I am also running FAH on my work computer at Medium, so this gives me 12 clients on average. It would be nice to have the K4200 in the mix, but I am still talking to Nvidia to see if there is any way to test it for internal defects that can cause a random hang. What would help me the most, I think, is a tool that can log all system component behaviours every few seconds to see if we can capture the moment a hang hits.

Re: random system hang

Posted: Sun Jan 10, 2021 1:27 am
by bruce
With the right drivers, the K4200 is supported. Professional graphics boards are especially designed to be powerful on FP64 computations. FAH uses mainly FP32 calculations plus a small percentage of FP64. It should work, though.

Re: random system hang

Posted: Sun Jan 10, 2021 2:12 am
by djgibbons
The drivers are the latest that Nvidia offers. And both cards use the same one, so there shouldn't be a conflict. I know that Nvidia offers an interface to connect 2 cards to make them work together, but have not tried it. Would FAH see them as 2 slots if tied together?

I am holding onto the idea that there is an interface problem where the traffic on the motherboard hits a jam because the 2 graphics cards are processing so much more data than the CPU, but the K4200 is lagging both the RTX4000 and the CPU.

I am also considering that having 2 monitors on the RTX4000 and none on the K4200 could present an imbalance of sorts. It might be worth a test to put my 1K monitor on the K4200 and keep the 4K monitor on the RTX4000, just for shiggles.