Page 1 of 1

GPU Folding PPD Crash

Posted: Fri Feb 06, 2015 12:57 am
by LonePalm
This afternoon the PPD on my GPU suddenly crashed after folding about 5%. I normally get about 120K PPD. The time per frame went from 8 minutes to 31 minutes to now it is 93 minutes.

I tried rolling back the driver just in case an update happened without my realizing it. No change.

Please help.

Code: Select all

*********************** Log Started 2015-02-05T23:47:15Z ***********************
23:47:15:************************* Folding@home Client *************************
23:47:15:      Website: http://folding.stanford.edu/
23:47:15:    Copyright: (c) 2009-2014 Stanford University
23:47:15:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:47:15:         Args: 
23:47:15:       Config: C:/Users/E****_R****/AppData/Roaming/FAHClient/config.xml
23:47:15:******************************** Build ********************************
23:47:15:      Version: 7.4.4
23:47:15:         Date: Mar 4 2014
23:47:15:         Time: 20:26:54
23:47:15:      SVN Rev: 4130
23:47:15:       Branch: fah/trunk/client
23:47:15:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
23:47:15:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
23:47:15:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
23:47:15:     Platform: win32 XP
23:47:15:         Bits: 32
23:47:15:         Mode: Release
23:47:15:******************************* System ********************************
23:47:15:          CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
23:47:15:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
23:47:15:         CPUs: 8
23:47:15:       Memory: 7.96GiB
23:47:15:  Free Memory: 5.63GiB
23:47:15:      Threads: WINDOWS_THREADS
23:47:15:   OS Version: 6.1
23:47:15:  Has Battery: false
23:47:15:   On Battery: false
23:47:15:   UTC Offset: -5
23:47:15:          PID: 6728
23:47:15:          CWD: C:/Users/Edward Rodman/AppData/Roaming/FAHClient
23:47:15:           OS: Windows 7 Professional
23:47:15:      OS Arch: AMD64
23:47:15:         GPUs: 1
23:47:15:        GPU 0: ATI:5 Tahiti XT [Radeon R9 200/HD 7900/8970]
23:47:15:         CUDA: Not detected
23:47:15:Win32 Service: false
23:47:15:***********************************************************************
23:47:15:<config>
23:47:15:  <!-- Folding Core -->
23:47:15:  <checkpoint v='9'/>
23:47:15:  <core-priority v='low'/>
23:47:15:
23:47:15:  <!-- Network -->
23:47:15:  <proxy v=':8080'/>
23:47:15:
23:47:15:  <!-- Slot Control -->
23:47:15:  <power v='full'/>
23:47:15:
23:47:15:  <!-- User Information -->
23:47:15:  <passkey v='********************************'/>
23:47:15:  <team v='36120'/>
23:47:15:  <user v='LonePalm'/>
23:47:15:
23:47:15:  <!-- Folding Slots -->
23:47:15:  <slot id='0' type='GPU'>
23:47:15:    <next-unit-percentage v='98'/>
23:47:15:  </slot>
23:47:15:  <slot id='1' type='CPU'>
23:47:15:    <cpus v='8'/>
23:47:15:    <next-unit-percentage v='98'/>
23:47:15:  </slot>
23:47:15:</config>
23:47:15:Trying to access database...
23:47:16:Successfully acquired database lock
23:47:16:Enabled folding slot 00: READY gpu:0:Tahiti XT [Radeon R9 200/HD 7900/8970]
23:47:16:Enabled folding slot 01: READY cpu:8
23:47:16:WU01:FS00:Starting
23:47:16:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Edward Rodman/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 704 -lifeline 6728 -checkpoint 9 -gpu 0 -gpu-vendor ati
23:47:21:WU01:FS00:Started FahCore on PID 6424
23:47:29:WU01:FS00:Core PID:7080
23:47:29:WU01:FS00:FahCore 0x17 started
23:47:29:WU00:FS01:Connecting to 171.67.108.200:8080
23:47:31:WU00:FS01:Assigned to work server 171.64.65.124
23:47:31:WU00:FS01:Requesting new work unit for slot 01: READY cpu:8 from 171.64.65.124
23:47:31:WU00:FS01:Connecting to 171.64.65.124:8080
23:47:33:WU00:FS01:Downloading 901.71KiB
23:47:33:WU00:FS01:Download complete
23:47:34:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9015 run:321 clone:1 gen:101 core:0xa4 unit:0x00000080664f2de453e55d29452a99ea
23:47:34:WU00:FS01:Starting
23:47:34:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Edward Rodman/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe" -dir 00 -suffix 01 -version 704 -lifeline 6728 -checkpoint 9 -np 8
23:47:35:WU00:FS01:Started FahCore on PID 6572
23:47:41:WU01:FS00:0x17:*********************** Log Started 2015-02-05T23:47:40Z ***********************
23:47:41:WU01:FS00:0x17:Project: 13001 (Run 206, Clone 2, Gen 74)
23:47:41:WU01:FS00:0x17:Unit: 0x00000089538b3db753288a1c15d48455
23:47:41:WU01:FS00:0x17:CPU: 0x00000000000000000000000000000000
23:47:41:WU01:FS00:0x17:Machine: 0
23:47:41:WU01:FS00:0x17:Digital signatures verified
23:47:41:WU01:FS00:0x17:Folding@home GPU core17
23:47:41:WU01:FS00:0x17:Version 0.0.52
23:47:53:WU00:FS01:Core PID:4572
23:47:53:WU00:FS01:FahCore 0xa4 started
23:47:54:WU00:FS01:0xa4:
23:47:54:WU00:FS01:0xa4:*------------------------------*
23:47:54:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
23:47:54:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
23:47:54:WU00:FS01:0xa4:
23:47:54:WU00:FS01:0xa4:Preparing to commence simulation
23:47:54:WU00:FS01:0xa4:- Looking at optimizations...
23:47:54:WU00:FS01:0xa4:- Created dyn
23:47:54:WU00:FS01:0xa4:- Files status OK
23:47:54:WU00:FS01:0xa4:- Expanded 922835 -> 1526808 (decompressed 165.4 percent)
23:47:54:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=922835 data_size=1526808, decompressed_data_size=1526808 diff=0
23:47:54:WU00:FS01:0xa4:- Digital signature verified
23:47:54:WU00:FS01:0xa4:
23:47:54:WU00:FS01:0xa4:Project: 9015 (Run 321, Clone 1, Gen 101)
23:47:54:WU00:FS01:0xa4:
23:47:54:WU00:FS01:0xa4:Assembly optimizations on if available.
23:47:54:WU00:FS01:0xa4:Entering M.D.
23:48:00:WU00:FS01:0xa4:Mapping NT from 8 to 8 
23:48:16:WU01:FS00:0x17:  Found a checkpoint file
23:48:26:WU00:FS01:0xa4:Completed 0 out of 250000 steps  (0%)
23:52:06:WU00:FS01:0xa4:Completed 2500 out of 250000 steps  (1%)
23:54:27:WU01:FS00:0x17:Completed 250000 out of 5000000 steps (5%)
23:55:01:WU01:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:55:04:WU00:FS01:0xa4:Completed 5000 out of 250000 steps  (2%)
23:56:30:WARNING:WU01:FS00:FahCore returned an unknown error code which probably indicates that it crashed
23:56:30:WARNING:WU01:FS00:FahCore returned: UNKNOWN_ENUM (-1073741819 = 0xc0000005)
23:56:30:WU01:FS00:Starting
23:56:30:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Edward Rodman/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 704 -lifeline 6728 -checkpoint 9 -gpu 0 -gpu-vendor ati
23:56:33:WU01:FS00:Started FahCore on PID 7652
23:56:39:WU01:FS00:Core PID:752
23:56:39:WU01:FS00:FahCore 0x17 started
23:56:40:WU01:FS00:0x17:*********************** Log Started 2015-02-05T23:56:39Z ***********************
23:56:40:WU01:FS00:0x17:Project: 13001 (Run 206, Clone 2, Gen 74)
23:56:40:WU01:FS00:0x17:Unit: 0x00000089538b3db753288a1c15d48455
23:56:40:WU01:FS00:0x17:CPU: 0x00000000000000000000000000000000
23:56:40:WU01:FS00:0x17:Machine: 0
23:56:40:WU01:FS00:0x17:Digital signatures verified
23:56:40:WU01:FS00:0x17:Folding@home GPU core17
23:56:40:WU01:FS00:0x17:Version 0.0.52
23:56:53:WU01:FS00:0x17:  Found a checkpoint file
00:00:15:WU01:FS00:0x17:Completed 250000 out of 5000000 steps (5%)
00:00:15:WU01:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:01:56:WU00:FS01:0xa4:Completed 7500 out of 250000 steps  (3%)
00:04:37:WU00:FS01:0xa4:Completed 10000 out of 250000 steps  (4%)
00:06:58:WU00:FS01:0xa4:Completed 12500 out of 250000 steps  (5%)
00:09:00:WU01:FS00:0x17:Completed 300000 out of 5000000 steps (6%)
00:09:14:WU00:FS01:0xa4:Completed 15000 out of 250000 steps  (6%)
00:11:30:WU00:FS01:0xa4:Completed 17500 out of 250000 steps  (7%)
00:13:46:WU00:FS01:0xa4:Completed 20000 out of 250000 steps  (8%)
00:16:13:WU00:FS01:0xa4:Completed 22500 out of 250000 steps  (9%)
00:18:08:WU01:FS00:0x17:Completed 350000 out of 5000000 steps (7%)
00:18:31:WU00:FS01:0xa4:Completed 25000 out of 250000 steps  (10%)
00:20:54:WU00:FS01:0xa4:Completed 27500 out of 250000 steps  (11%)
00:23:23:WU00:FS01:0xa4:Completed 30000 out of 250000 steps  (12%)
00:25:33:WU00:FS01:0xa4:Completed 32500 out of 250000 steps  (13%)
00:26:46:WU01:FS00:0x17:Completed 400000 out of 5000000 steps (8%)
00:27:42:WU00:FS01:0xa4:Completed 35000 out of 250000 steps  (14%)
00:29:49:WU00:FS01:0xa4:Completed 37500 out of 250000 steps  (15%)
00:32:14:WU00:FS01:0xa4:Completed 40000 out of 250000 steps  (16%)
00:34:29:WU00:FS01:0xa4:Completed 42500 out of 250000 steps  (17%)
00:35:08:WU01:FS00:0x17:Completed 450000 out of 5000000 steps (9%)
00:36:44:WU00:FS01:0xa4:Completed 45000 out of 250000 steps  (18%)
00:38:55:WU00:FS01:0xa4:Completed 47500 out of 250000 steps  (19%)
00:41:32:WU00:FS01:0xa4:Completed 50000 out of 250000 steps  (20%)
00:43:39:WU01:FS00:0x17:Completed 500000 out of 5000000 steps (10%)
00:45:00:WU00:FS01:0xa4:Completed 52500 out of 250000 steps  (21%)
00:47:39:WU00:FS01:0xa4:Completed 55000 out of 250000 steps  (22%)

Re: GPU Folding PPD Crash

Posted: Fri Feb 06, 2015 1:16 am
by debs3759
I had something similar on the HD 7770 I was folding on with the 13001 work units (the same that you are having trouble with). The eta was so long that they would have expired before I could complete them. Seems like there is a problem with projects 13000 and 13001.

Re: GPU Folding PPD Crash

Posted: Fri Feb 06, 2015 1:21 am
by LonePalm
OK, this is seriously strange. I just looked at my folding client and found it back down to 8.5 minutes per frame.

One other thing. Last September I was having a problem with the GPU not downloading unless I rebooted the computer. That problem fixed itself in late October.

Now I have the same problem with the CPU instead of the GPU. No clue why.

Re: GPU Folding PPD Crash

Posted: Fri Feb 06, 2015 7:06 am
by bruce
There's a known problem with error recovery during uploading/downloading. If communications is interrupted for ANY reason, you'll have to reboot or at least restart the client before any more communications can take place. It has nothing to do with CPU vs. GPU.

I'm not sure that's your problem, but it's worth investigating.