Page 1 of 3
Desktop freeze when GPU folding
Posted: Thu Apr 02, 2020 8:02 pm
by LancerDL
I suspect it is related to GPU folding, as when there's no GPU working-unit and only CPU folding, then there's no freeze. However the whole system seems to freeze, not just the screen. The freeze manifests as a screen freeze first, and an audio freeze shortly afterwards.
I have a GIGABYTE AORUS GTX 1080 Ti. I used a monitoring app and the fans are spinning and GPU temp goes to a high of about 79 Celsius. I've actually seen the system freeze when it drops a bit to 75 Celsius.
I've only started running Folding@Home recently, so I don't know if this is a hardware or software issue. Where should I start looking?
Re: Desktop freeze when GPU folding
Posted: Thu Apr 02, 2020 10:44 pm
by jrweiss
I'd start with looking at your GPU cooling.
First open the case and ensure it is free of dust. Clean as needed.
Then make sure you have adequate air flow through the case. You should have 2 or more 120-140mm intake fans in front, and a 120mm exhaust fan at the rear as a minimum. If you have solid backplane covers on either side of the GPU backplane, replace them with slotted covers (or temporarily remove them if you don't have slotted covers handy). That will help remove hot air from the GPU cooler. It may help to cover up any other slots in the back of the case (other than the exhaust fan) with tape to help direct the air around the GPU card.
Re: Desktop freeze when GPU folding
Posted: Thu Apr 02, 2020 11:32 pm
by ipkh
In general, freezes like that are a bad sign. Usually of a failing gpu. Desktop video stutters might be just excessive load on the GPU, but audio shouldn't freeze due to it. That would indicate a more serious instability in either the cpu or gpu.
Re: Desktop freeze when GPU folding
Posted: Sat Apr 04, 2020 1:54 am
by LancerDL
I have a front intake and back exhaust fans, each ~120 mm. The Power Supply is on the bottom with a fan directed upwards. There's also venting on the top. All the fans are operating and unobstructed. I don't think it's a cooling issue, as the temperature doesn't exceed 80 degrees Celsius on either CPU or GPU. I just can't conceive of it being anything other than a heating issue. How hot do other people's rigs get when folding at Medium level?
Regarding the audio, it freezes maybe 2 or more minutes after the video freezes. I assume the system has gone into full blown crash by that point. I don't know.
Re: Desktop freeze when GPU folding
Posted: Sat Apr 04, 2020 2:02 am
by PantherX
Welcome to the F@H Forum LancerDL,
Just wondering what brand PSU (Power Supply Unit) to you have and what wattage it is?
Re: Desktop freeze when GPU folding
Posted: Sat Apr 04, 2020 6:03 am
by Rel25917
Temporary or permanent freeze? The gpu core runs a sanity check using the cpu, if it's a temporary freeze at a regular interval maybe it's just the cpu getting swamped with the sanity check.
Re: Desktop freeze when GPU folding
Posted: Sat Apr 04, 2020 3:44 pm
by jrweiss
Many GPUs are VERY sensitive to voltage fluctuations. if your Power Supply is unstable at high loads, it might have voltage fluctuations that cause the freeze.
Re: Desktop freeze when GPU folding
Posted: Thu Apr 09, 2020 2:04 am
by LancerDL
Sorry for the delay everyone.
My power supply is an Antec EarthWatts Platinum 450 W 80+ ATX Power Supply. PC Part Picker says my profile should consume only up to 432 W. Do you think the margin may be too narrow?
The freezes are permanent (until I power off/on the system). I let it to run overnight once and it was frozen in the morning.
Re: Desktop freeze when GPU folding
Posted: Thu Apr 09, 2020 4:42 am
by PantherX
LancerDL wrote:...My power supply is an Antec EarthWatts Platinum 450 W 80+ ATX Power Supply. PC Part Picker says my profile should consume only up to 432 W. Do you think the margin may be too narrow?
The freezes are permanent (until I power off/on the system). I let it to run overnight once and it was frozen in the morning.
I would say that that is likely the issue. Whatever wattage is calculated, you double it and then buy the PSU. In your case, the calculated is 432 Watts so the PSU would be at least 850 Watts. The reason is that most PSUs would operate efficiently at ~50% load. While you have a high efficiency PSU, you have to consider "lost" power due to conversion.
Re: Desktop freeze when GPU folding
Posted: Thu Apr 09, 2020 4:45 am
by Darth_Peter_dualxeon
Yes that power supply is way too small. 432/450= you are using 96%, thats way small margin.
(and, pc part picker is only a guess, some fans, HDDs, and whatever may have different consumption)
And, power supplies usually aren't one unit, and there is a current limit (so, power limit) on all different voltages.
For example the 12V output of the PSU is overloaded as that is not enough for cpu and gpu together...
(look at the wattage of 12V output, should be printed on the side of the psu)
Usually people have a significantly bigger power supply than what the system needs.
The consumption of the system at full load should be less than circa 80%. (ideally around 50% so you have space to upgrade the system if you want)
power supplies at nearly full load lose efficiency and overheat quickly.
Re: Desktop freeze when GPU folding
Posted: Thu Apr 09, 2020 5:03 am
by iceman1992
Alternative guess, other than power supply issues, I would suggest you also check for DPC latency issues (if you're on Windows)
https://www.thesycon.de/eng/latency_check.shtml
Re: Desktop freeze when GPU folding
Posted: Thu Apr 09, 2020 5:12 am
by DCWerick
Yes, i can agree with the other folders.
my 1070ti temps is around 80-85c (full fan speed) set from msi afterburner.
my psu is EVGA 850watts.
if you got a spare PSU lying around (if still under 450 watts) you can 'frankenstein' it plug that psu to your 1080ti and just let it turn on prior booting up your system.
Then buy at least 850 watts
i'm going to get a new psu at least 1200w when i upgrade to 20xx series this year when this virus is squashed and we all go back to our work.
Re: Desktop freeze when GPU folding
Posted: Thu Apr 09, 2020 2:44 pm
by jrweiss
LancerDL wrote:My power supply is an Antec EarthWatts Platinum 450 W 80+ ATX Power Supply. PC Part Picker says my profile should consume only up to 432 W. Do you think the margin may be too narrow?
Yes. CPUs in "turbo" mode can draw WAY more power than their TDP suggests. Spikes in GPU load may also cause instantaneous drop in voltage. Gigabyte recommends minimum 600W PSU.
Even if the PSU could handle the total load, that PSU has a divided 12V rail, with 30A (360W) total on each rail (but not all of that 720W can be used!). One of the 12V rails may be overloaded, since the GPU can draw 250W and runs almost exclusively on 12V. You may have to check the spec sheet and see if you can re-allocate plugs so that the appropriate rails feed the appropriate accessories (e.g., one 8-pin plug from each rail to the GPU, or if the 2 plugs to the GPU are the ONLY load on a rail).
Re: Desktop freeze when GPU folding
Posted: Fri Apr 10, 2020 12:04 am
by LancerDL
This is sound analysis. Thanks folks. I'll look into upgrading the PSU, and hope to remember to update you once it's done.
Re: Desktop freeze when GPU folding
Posted: Fri Apr 17, 2020 5:50 pm
by LancerDL
I upgraded my Power Supply to 850W. It is quieter now and I suspect it's less distressed. Unfortunately, I'm still getting the issue.
I'm providing more data from a recent freeze. I began folding at about 16:38:46 UTC. The screens froze at 16:49:33 UTC just after reaching the 12% mark. From that point GPU folding appears to have stopped but CPU folding continued in the background. Eventually there is a failure reported. I had left it to give it time while frozen, and when I came back I saw the clock ticking again; perhaps in this period after the run was cancelled.
Code: Select all
16:38:46:WU01:FS00:0xa7:Completed 1 out of 2500000 steps (0%)
16:38:46:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
16:38:46:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
16:39:14:Saving configuration to config.xml
16:39:14:<config>
16:39:14: <!-- Network -->
16:39:14: <proxy v=':8080'/>
16:39:14:
16:39:14: <!-- User Information -->
16:39:14: <passkey v='*****'/>
16:39:14: <user v='LancerDL'/>
16:39:14:
16:39:14: <!-- Folding Slots -->
16:39:14: <slot id='0' type='CPU'/>
16:39:14: <slot id='1' type='GPU'/>
16:39:14:</config>
16:39:33:WU02:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
16:40:19:WU02:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
16:41:06:WU02:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
16:41:36:WU01:FS00:0xa7:Completed 25000 out of 2500000 steps (1%)
16:41:52:WU02:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
16:42:40:WU02:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
16:43:30:WU02:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
16:44:16:WU02:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
16:44:25:WU01:FS00:0xa7:Completed 50000 out of 2500000 steps (2%)
16:45:02:WU02:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
16:45:57:WU02:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
16:47:05:WU02:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
16:47:20:WU01:FS00:0xa7:Completed 75000 out of 2500000 steps (3%)
16:48:16:WU02:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
16:49:23:WU02:FS01:0x22:Completed 120000 out of 1000000 steps (12%) <-- Screen froze showing up to this log.
16:50:06:WU01:FS00:0xa7:Completed 100000 out of 2500000 steps (4%)
16:52:36:WU01:FS00:0xa7:Completed 125000 out of 2500000 steps (5%)
16:55:10:WU01:FS00:0xa7:Completed 150000 out of 2500000 steps (6%)
16:57:00:WU02:FS01:0x22:ERROR:exception: clWaitForEvents
16:57:00:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
16:57:00:WU02:FS01:0x22:Saving result file checkpointState.xml
16:57:00:WU02:FS01:0x22:Saving result file checkpt.crc
16:57:00:WU02:FS01:0x22:Saving result file positions.xtc
16:57:00:WU02:FS01:0x22:Saving result file science.log
16:57:00:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
16:57:01:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:57:01:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11762 run:0 clone:201 gen:19 core:0x22 unit:0x0000002e80fccb0a5e6d80c17ffb61f5
16:57:01:WU02:FS01:Uploading 26.09MiB to 128.252.203.10
16:57:01:WU02:FS01:Connecting to 128.252.203.10:8080
I also checked the Windows 10 EventViewer. The logs at the time had two warnings:
Code: Select all
Warning 2020-04-17 12:49:41 PM Display 4101 None
Display driver nvlddmkm stopped responding and has successfully recovered.
followed by:
Code: Select all
Warning 2020-04-17 12:49:41 PM Display 4109 None
Application FahCore_22.exe has been blocked from accessing Graphics hardware.
I'm not sure why FahCore_22.exe was particularly blocked, but presumably it was the only application related to the error. It getting blocked did not unfreeze the screen however.
Temperature of the CPU was around 87 Celsius, with the GPU at 73 Celsius, and stable. Resting GPU temperature is 50 Celsius. I don't think it's an over-heating issue. As the GPU temperature isn't very high, and yet the driver is reporting an issue that has to kick FahCore_22.exe, I doubt the CPU temps are relevant.
In my Registry, I have previously set TdrDelay and TdrDdiDelay to 20 seconds. So the timeout for these is well above standard, so for nvlddmkm to stop responding suggests a real issue to me. I am using the latest NVidia drivers withouth the "GeForce Experience" stuff. I also installed their Cuda kit, though not sure it was necessary. Any further ideas?
FWIW
Code: Select all
*********************** Log Started 2020-04-17T16:27:02Z ***********************
16:27:02:****************************** FAHClient ******************************
16:27:02: Version: 7.6.8
16:27:02: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:27:02: Copyright: 2020 foldingathome.org
16:27:02: Homepage: https://foldingathome.org/
16:27:02: Date: Apr 15 2020
16:27:02: Time: 14:55:26
16:27:02: Revision: 34a6e026f7032e19bfa748ef56985f8e9fb09c99
16:27:02: Branch: master
16:27:02: Compiler: Visual C++ 2008
16:27:02: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
16:27:02: Platform: win32 10
16:27:02: Bits: 32
16:27:02: Mode: Release
16:27:02: Config: C:\Users\#######\AppData\Roaming\FAHClient\config.xml
16:27:02:******************************** CBang ********************************
16:27:02: Date: Apr 15 2020
16:27:02: Time: 14:52:13
16:27:02: Revision: 640ed4bc1084ddf2aa9cfa32aa9a2470dc8b7857
16:27:02: Branch: master
16:27:02: Compiler: Visual C++ 2008
16:27:02: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
16:27:02: Platform: win32 10
16:27:02: Bits: 32
16:27:02: Mode: Release
16:27:02:******************************* System ********************************
16:27:02: CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
16:27:02: CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
16:27:02: CPUs: 4
16:27:02: Memory: 15.89GiB
16:27:02: Free Memory: 12.41GiB
16:27:02: Threads: WINDOWS_THREADS
16:27:02: OS Version: 6.2
16:27:02: Has Battery: false
16:27:02: On Battery: false
16:27:02: UTC Offset: -4
16:27:02: PID: 6008
16:27:02: CWD: C:\Users\#######\AppData\Roaming\FAHClient
16:27:02: OS: Windows 10 Enterprise
16:27:02: OS Arch: AMD64
16:27:02: GPUs: 1
16:27:02: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
16:27:02: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:11.0
16:27:02:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:445.87
16:27:02:OpenCL Device 1: Platform:1 Device:0 Bus:NA Slot:NA Compute:1.2 Driver:20.19
16:27:02: Win32 Service: false
16:27:02:******************************* libFAH ********************************
16:27:02: Date: Apr 15 2020
16:27:02: Time: 14:53:14
16:27:02: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
16:27:02: Branch: master
16:27:02: Compiler: Visual C++ 2008
16:27:02: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
16:27:02: Platform: win32 10
16:27:02: Bits: 32
16:27:02: Mode: Release
16:27:02:***********************************************************************