GPU WUs stuck at 99.99%?

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
SGM26
Posts: 10
Joined: Tue Nov 11, 2014 2:13 am

GPU WUs stuck at 99.99%?

Post by SGM26 »

I've had this issue with several different WUs for different projects lately. It will progress all the way to 99.99% completed but after that it just sits there for hours. At the time of posting with my current set of WUs (pictured below) they've been running at 99.99% for almost 3 hours.

Is there a way to fix this or will I just have to delete my GPU slots and get new WUs? If so that'll be two sets in a row I've had to get rid of.

http://i.imgur.com/Smoxs93.png

I don't have a CPU slot (I use that for BOINC) so I don't know if those are affected also.

Thanks.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU WUs stuck at 99.99%?

Post by Joe_H »

Please post your log, the image actually is not that useful. Especially though, look for the log not showing progress after a point, while the percentage still goes up on the FAHControl status display. That usually indicates a crash of the folding core caused by a GPU reset. You can check your system logs to see if there has been a reset at a time that coincides with the progress stopping in the log.

Directions for posting a log file can be found in this topic - viewtopic.php?f=61&t=26036.

If looking into your log does indicate a GPU reset occurred, it would take pausing and restarting folding to complete the WU's. But you should first correct whatever is causing the resets. Common causes are overheating, running with too high of an overclock, or accessing a PC remotely using Windows RDT.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
SGM26
Posts: 10
Joined: Tue Nov 11, 2014 2:13 am

Re: GPU WUs stuck at 99.99%?

Post by SGM26 »

Code: Select all

*********************** Log Started 2014-12-03T23:40:07Z ***********************
23:40:07:************************* Folding@home Client *************************
23:40:07:      Website: http://folding.stanford.edu/
23:40:07:    Copyright: (c) 2009-2014 Stanford University
23:40:07:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:40:07:         Args: 
23:40:07:       Config: D:/Users/Sean/AppData/Roaming/FAHClient/config.xml
23:40:07:******************************** Build ********************************
23:40:07:      Version: 7.4.4
23:40:07:         Date: Mar 4 2014
23:40:07:         Time: 20:26:54
23:40:07:      SVN Rev: 4130
23:40:07:       Branch: fah/trunk/client
23:40:07:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
23:40:07:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
23:40:07:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
23:40:07:     Platform: win32 XP
23:40:07:         Bits: 32
23:40:07:         Mode: Release
23:40:07:******************************* System ********************************
23:40:07:          CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
23:40:07:       CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
23:40:07:         CPUs: 8
23:40:07:       Memory: 15.95GiB
23:40:07:  Free Memory: 13.19GiB
23:40:07:      Threads: WINDOWS_THREADS
23:40:07:   OS Version: 6.2
23:40:07:  Has Battery: false
23:40:07:   On Battery: false
23:40:07:   UTC Offset: 0
23:40:07:          PID: 4792
23:40:07:          CWD: D:/Users/Sean/AppData/Roaming/FAHClient
23:40:07:           OS: Windows 8.1 Pro
23:40:07:      OS Arch: AMD64
23:40:07:         GPUs: 2
23:40:07:        GPU 0: ATI:5 Hawaii [Radeon R9 200X Series]
23:40:07:        GPU 1: ATI:5 Hawaii [Radeon R9 200X Series]
23:40:07:         CUDA: Not detected
23:40:07:Win32 Service: false
23:40:07:***********************************************************************
23:40:07:<config>
23:40:07:  <!-- Folding Core -->
23:40:07:  <core-priority v='low'/>
23:40:07:
23:40:07:  <!-- Network -->
23:40:07:  <proxy v=':8080'/>
23:40:07:
23:40:07:  <!-- Slot Control -->
23:40:07:  <power v='full'/>
23:40:07:
23:40:07:  <!-- User Information -->
23:40:07:  <passkey v='********************************'/>
23:40:07:  <team v='223518'/>
23:40:07:  <user v='SGM26'/>
23:40:07:
23:40:07:  <!-- Folding Slots -->
23:40:07:  <slot id='1' type='GPU'>
23:40:07:    <paused v='true'/>
23:40:07:  </slot>
23:40:07:  <slot id='2' type='GPU'>
23:40:07:    <paused v='true'/>
23:40:07:  </slot>
23:40:07:</config>
23:40:07:Trying to access database...
23:40:07:Successfully acquired database lock
23:40:07:Enabled folding slot 01: PAUSED gpu:0:Hawaii [Radeon R9 200X Series] (by user)
23:40:07:Enabled folding slot 02: PAUSED gpu:1:Hawaii [Radeon R9 200X Series] (by user)
23:40:50:FS01:Unpaused
23:40:50:FS02:Unpaused
23:40:50:WU01:FS02:Starting
23:40:50:WU01:FS02:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/Users/Sean/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 01 -suffix 01 -version 704 -lifeline 4792 -checkpoint 15 -gpu 1 -gpu-vendor ati
23:40:51:WU01:FS02:Started FahCore on PID 5876
23:40:51:WU01:FS02:Core PID:5888
23:40:51:WU01:FS02:FahCore 0x17 started
23:40:51:WU02:FS01:Starting
23:40:51:WU02:FS01:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/Users/Sean/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 02 -suffix 01 -version 704 -lifeline 4792 -checkpoint 15 -gpu 0 -gpu-vendor ati
23:40:51:WU02:FS01:Started FahCore on PID 5896
23:40:51:WU02:FS01:Core PID:5908
23:40:51:WU02:FS01:FahCore 0x17 started
23:40:51:WU01:FS02:0x17:*********************** Log Started 2014-12-03T23:40:51Z ***********************
23:40:51:WU01:FS02:0x17:Project: 10469 (Run 0, Clone 308, Gen 58)
23:40:51:WU01:FS02:0x17:Unit: 0x00000070538b3db9538f407fa9091dc2
23:40:51:WU01:FS02:0x17:CPU: 0x00000000000000000000000000000000
23:40:51:WU01:FS02:0x17:Machine: 2
23:40:51:WU01:FS02:0x17:Digital signatures verified
23:40:51:WU01:FS02:0x17:Folding@home GPU core17
23:40:51:WU01:FS02:0x17:Version 0.0.52
23:40:51:WU02:FS01:0x17:*********************** Log Started 2014-12-03T23:40:51Z ***********************
23:40:51:WU02:FS01:0x17:Project: 13000 (Run 2082, Clone 1, Gen 45)
23:40:51:WU02:FS01:0x17:Unit: 0x00000055538b3db75311e8951fd23780
23:40:51:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
23:40:51:WU02:FS01:0x17:Machine: 1
23:40:51:WU02:FS01:0x17:Digital signatures verified
23:40:51:WU02:FS01:0x17:Folding@home GPU core17
23:40:51:WU02:FS01:0x17:Version 0.0.52
23:40:52:WU01:FS02:0x17:  Found a checkpoint file
23:40:52:WU02:FS01:0x17:  Found a checkpoint file
23:41:08:Removing old file 'configs/config-20141128-021842.xml'
23:41:08:Saving configuration to config.xml
23:41:08:<config>
23:41:08:  <!-- Folding Core -->
23:41:08:  <core-priority v='low'/>
23:41:08:
23:41:08:  <!-- Network -->
23:41:08:  <proxy v=':8080'/>
23:41:08:
23:41:08:  <!-- Slot Control -->
23:41:08:  <power v='full'/>
23:41:08:
23:41:08:  <!-- User Information -->
23:41:08:  <passkey v='********************************'/>
23:41:08:  <team v='223518'/>
23:41:08:  <user v='SGM26'/>
23:41:08:
23:41:08:  <!-- Folding Slots -->
23:41:08:  <slot id='1' type='GPU'/>
23:41:08:  <slot id='2' type='GPU'/>
23:41:08:</config>
23:41:16:17:127.0.0.1:New Web connection
23:44:10:WU01:FS02:0x17:Completed 2750000 out of 5000000 steps (55%)
23:44:10:WU01:FS02:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:44:12:WU02:FS01:0x17:Completed 3500000 out of 5000000 steps (70%)
23:44:12:WU02:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:49:17:WU01:FS02:0x17:Completed 2800000 out of 5000000 steps (56%)
23:50:08:WU02:FS01:0x17:Completed 3550000 out of 5000000 steps (71%)
23:54:08:WU01:FS02:0x17:Completed 2850000 out of 5000000 steps (57%)
23:55:43:WU02:FS01:0x17:Completed 3600000 out of 5000000 steps (72%)
23:59:17:WU01:FS02:0x17:Completed 2900000 out of 5000000 steps (58%)
00:01:40:WU02:FS01:0x17:Completed 3650000 out of 5000000 steps (73%)
00:04:07:WU01:FS02:0x17:Completed 2950000 out of 5000000 steps (59%)
00:07:08:WU02:FS01:0x17:Completed 3700000 out of 5000000 steps (74%)
00:08:58:WU01:FS02:0x17:Completed 3000000 out of 5000000 steps (60%)
00:12:48:WU02:FS01:0x17:Completed 3750000 out of 5000000 steps (75%)
00:14:07:WU01:FS02:0x17:Completed 3050000 out of 5000000 steps (61%)
00:18:44:WU02:FS01:0x17:Completed 3800000 out of 5000000 steps (76%)
00:18:57:WU01:FS02:0x17:Completed 3100000 out of 5000000 steps (62%)
00:24:04:WU01:FS02:0x17:Completed 3150000 out of 5000000 steps (63%)
00:24:16:WU02:FS01:0x17:Completed 3850000 out of 5000000 steps (77%)
00:28:55:WU01:FS02:0x17:Completed 3200000 out of 5000000 steps (64%)
00:30:08:WU02:FS01:0x17:Completed 3900000 out of 5000000 steps (78%)
00:33:47:WU01:FS02:0x17:Completed 3250000 out of 5000000 steps (65%)
00:35:47:WU02:FS01:0x17:Completed 3950000 out of 5000000 steps (79%)
00:38:56:WU01:FS02:0x17:Completed 3300000 out of 5000000 steps (66%)
00:41:23:WU02:FS01:0x17:Completed 4000000 out of 5000000 steps (80%)
00:43:47:WU01:FS02:0x17:Completed 3350000 out of 5000000 steps (67%)
00:47:22:WU02:FS01:0x17:Completed 4050000 out of 5000000 steps (81%)
00:48:56:WU01:FS02:0x17:Completed 3400000 out of 5000000 steps (68%)
00:53:03:WU02:FS01:0x17:Completed 4100000 out of 5000000 steps (82%)
00:53:48:WU01:FS02:0x17:Completed 3450000 out of 5000000 steps (69%)
00:58:39:WU01:FS02:0x17:Completed 3500000 out of 5000000 steps (70%)
00:59:03:WU02:FS01:0x17:Completed 4150000 out of 5000000 steps (83%)
01:03:47:WU01:FS02:0x17:Completed 3550000 out of 5000000 steps (71%)
01:04:38:WU02:FS01:0x17:Completed 4200000 out of 5000000 steps (84%)
01:08:38:WU01:FS02:0x17:Completed 3600000 out of 5000000 steps (72%)
01:10:14:WU02:FS01:0x17:Completed 4250000 out of 5000000 steps (85%)
01:13:47:WU01:FS02:0x17:Completed 3650000 out of 5000000 steps (73%)
01:16:14:WU02:FS01:0x17:Completed 4300000 out of 5000000 steps (86%)
01:18:38:WU01:FS02:0x17:Completed 3700000 out of 5000000 steps (74%)
01:21:52:WU02:FS01:0x17:Completed 4350000 out of 5000000 steps (87%)
01:23:29:WU01:FS02:0x17:Completed 3750000 out of 5000000 steps (75%)
01:27:51:WU02:FS01:0x17:Completed 4400000 out of 5000000 steps (88%)
01:28:38:WU01:FS02:0x17:Completed 3800000 out of 5000000 steps (76%)
01:33:25:WU02:FS01:0x17:Completed 4450000 out of 5000000 steps (89%)
01:33:30:WU01:FS02:0x17:Completed 3850000 out of 5000000 steps (77%)
01:38:37:WU01:FS02:0x17:Completed 3900000 out of 5000000 steps (78%)
01:39:04:WU02:FS01:0x17:Completed 4500000 out of 5000000 steps (90%)
01:43:27:WU01:FS02:0x17:Completed 3950000 out of 5000000 steps (79%)
01:45:02:WU02:FS01:0x17:Completed 4550000 out of 5000000 steps (91%)
01:48:18:WU01:FS02:0x17:Completed 4000000 out of 5000000 steps (80%)
01:50:41:WU02:FS01:0x17:Completed 4600000 out of 5000000 steps (92%)
01:53:27:WU01:FS02:0x17:Completed 4050000 out of 5000000 steps (81%)
01:56:26:WU02:FS01:0x17:Completed 4650000 out of 5000000 steps (93%)
01:58:18:WU01:FS02:0x17:Completed 4100000 out of 5000000 steps (82%)
02:01:54:WU02:FS01:0x17:Completed 4700000 out of 5000000 steps (94%)
02:03:26:WU01:FS02:0x17:Completed 4150000 out of 5000000 steps (83%)
02:07:25:WU02:FS01:0x17:Completed 4750000 out of 5000000 steps (95%)
02:08:17:WU01:FS02:0x17:Completed 4200000 out of 5000000 steps (84%)
02:13:05:WU02:FS01:0x17:Completed 4800000 out of 5000000 steps (96%)
02:13:08:WU01:FS02:0x17:Completed 4250000 out of 5000000 steps (85%)
02:18:13:WU01:FS02:0x17:Completed 4300000 out of 5000000 steps (86%)
02:18:30:WU02:FS01:0x17:Completed 4850000 out of 5000000 steps (97%)
02:23:03:WU01:FS02:0x17:Completed 4350000 out of 5000000 steps (87%)
02:24:14:WU02:FS01:0x17:Completed 4900000 out of 5000000 steps (98%)
02:28:12:WU01:FS02:0x17:Completed 4400000 out of 5000000 steps (88%)
02:29:37:WU02:FS01:0x17:Completed 4950000 out of 5000000 steps (99%)
04:59:52:58:127.0.0.1:New Web connection
05:01:10:Removing old file 'configs/config-20141128-040830.xml'
05:01:10:Saving configuration to config.xml
05:01:10:<config>
05:01:10:  <!-- Folding Core -->
05:01:10:  <core-priority v='low'/>
05:01:10:
05:01:10:  <!-- Network -->
05:01:10:  <proxy v=':8080'/>
05:01:10:
05:01:10:  <!-- Slot Control -->
05:01:10:  <power v='full'/>
05:01:10:
05:01:10:  <!-- User Information -->
05:01:10:  <passkey v='********************************'/>
05:01:10:  <team v='223518'/>
05:01:10:  <user v='SGM26'/>
05:01:10:
05:01:10:  <!-- Folding Slots -->
05:01:10:</config>
05:01:10:FS01:Shutting core down
05:01:10:FS02:Shutting core down
05:01:10:WU02:FS01:0x17:WARNING:Console control signal 1 on PID 5908
05:01:10:WU01:FS02:0x17:WARNING:Console control signal 1 on PID 5888
05:01:10:WU02:FS01:0x17:Exiting, please wait. . .
05:01:10:WU01:FS02:0x17:Exiting, please wait. . .
05:01:23:Removing old file 'configs/config-20141128-060153.xml'
05:01:23:Saving configuration to config.xml
05:01:23:<config>
05:01:23:  <!-- Folding Core -->
05:01:23:  <core-priority v='low'/>
05:01:23:
05:01:23:  <!-- Network -->
05:01:23:  <proxy v=':8080'/>
05:01:23:
05:01:23:  <!-- Slot Control -->
05:01:23:  <power v='full'/>
05:01:23:
05:01:23:  <!-- User Information -->
05:01:23:  <passkey v='********************************'/>
05:01:23:  <team v='223518'/>
05:01:23:  <user v='SGM26'/>
05:01:23:
05:01:23:  <!-- Folding Slots -->
05:01:23:</config>
05:01:30:Removing old file 'configs/config-20141130-005745.xml'
05:01:30:Saving configuration to config.xml
05:01:30:<config>
05:01:30:  <!-- Folding Core -->
05:01:30:  <core-priority v='low'/>
05:01:30:
05:01:30:  <!-- Network -->
05:01:30:  <proxy v=':8080'/>
05:01:30:
05:01:30:  <!-- Slot Control -->
05:01:30:  <power v='full'/>
05:01:30:
05:01:30:  <!-- User Information -->
05:01:30:  <passkey v='********************************'/>
05:01:30:  <team v='223518'/>
05:01:30:  <user v='SGM26'/>
05:01:30:
05:01:30:  <!-- Folding Slots -->
05:01:30:</config>
05:02:00:86:127.0.0.1:New Web connection
05:02:11:WARNING:FS01:Killing WU02
05:02:11:WARNING:FS02:Killing WU01
05:02:11:WU01:FS02:FahCore returned: INTERRUPTED (102 = 0x66)
05:02:11:WU02:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
05:02:11:WARNING:WU01:FS02:Slot ID 2 no longer exists and there are no other matching slots, dumping
05:02:11:WU01:FS02:Sending unit results: id:01 state:SEND error:DUMPED project:10469 run:0 clone:308 gen:58 core:0x17 unit:0x00000070538b3db9538f407fa9091dc2
05:02:11:WU01:FS02:Connecting to 140.163.4.233:8080
05:02:11:WARNING:WU02:Slot ID 1 no longer exists and there are no other matching slots, dumping
05:02:11:WU02:Sending unit results: id:02 state:SEND error:DUMPED project:13000 run:2082 clone:1 gen:45 core:0x17 unit:0x00000055538b3db75311e8951fd23780
05:02:11:WU02:Connecting to 140.163.4.231:8080
05:02:12:WU02:Server responded WORK_ACK (400)
05:02:12:WU02:Cleaning up
05:02:12:WU01:FS02:Server responded WORK_ACK (400)
05:02:12:WU01:FS02:Cleaning up
05:02:25:Adding folding slot 00: READY gpu:0:Hawaii [Radeon R9 200X Series]
05:02:25:Adding folding slot 01: READY gpu:1:Hawaii [Radeon R9 200X Series]
05:02:25:Removing old file 'configs/config-20141130-015541.xml'
05:02:25:Saving configuration to config.xml
05:02:25:<config>
05:02:25:  <!-- Folding Core -->
05:02:25:  <core-priority v='low'/>
05:02:25:
05:02:25:  <!-- Network -->
05:02:25:  <proxy v=':8080'/>
05:02:25:
05:02:25:  <!-- Slot Control -->
05:02:25:  <power v='full'/>
05:02:25:
05:02:25:  <!-- User Information -->
05:02:25:  <passkey v='********************************'/>
05:02:25:  <team v='223518'/>
05:02:25:  <user v='SGM26'/>
05:02:25:
05:02:25:  <!-- Folding Slots -->
05:02:25:  <slot id='0' type='GPU'/>
05:02:25:  <slot id='1' type='GPU'/>
05:02:25:</config>
05:02:26:WU00:FS00:Connecting to 171.67.108.200:80
05:02:26:WU01:FS01:Connecting to 171.67.108.200:80
05:02:27:WU01:FS01:Assigned to work server 171.67.108.52
05:02:27:WU01:FS01:Requesting new work unit for slot 01: READY gpu:1:Hawaii [Radeon R9 200X Series] from 171.67.108.52
05:02:27:WU00:FS00:Assigned to work server 171.67.108.52
05:02:27:WU01:FS01:Connecting to 171.67.108.52:8080
05:02:27:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:Hawaii [Radeon R9 200X Series] from 171.67.108.52
05:02:27:WU00:FS00:Connecting to 171.67.108.52:8080
05:02:27:WU01:FS01:Downloading 1.52MiB
05:02:27:WU00:FS00:Downloading 1.53MiB
05:02:30:WU00:FS00:Download complete
05:02:30:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9201 run:477 clone:4 gen:86 core:0x17 unit:0x000000726652edc45399e8d00d4fd778
05:02:30:WU01:FS01:Download complete
05:02:30:WU00:FS00:Starting
05:02:30:WU00:FS00:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/Users/Sean/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 4792 -checkpoint 15 -gpu 0 -gpu-vendor ati
05:02:30:WU00:FS00:Started FahCore on PID 3748
05:02:30:WU00:FS00:Core PID:3900
05:02:30:WU00:FS00:FahCore 0x17 started
05:02:30:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:810 clone:1 gen:158 core:0x17 unit:0x000001066652edc45399f5e2e8c30932
05:02:30:WU01:FS01:Starting
05:02:30:WU01:FS01:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/Users/Sean/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 01 -suffix 01 -version 704 -lifeline 4792 -checkpoint 15 -gpu 1 -gpu-vendor ati
05:02:30:WU01:FS01:Started FahCore on PID 3716
05:02:30:WU01:FS01:Core PID:5752
05:02:30:WU01:FS01:FahCore 0x17 started
05:02:30:WU00:FS00:0x17:*********************** Log Started 2014-12-04T05:02:30Z ***********************
05:02:30:WU00:FS00:0x17:Project: 9201 (Run 477, Clone 4, Gen 86)
05:02:30:WU00:FS00:0x17:Unit: 0x000000726652edc45399e8d00d4fd778
05:02:30:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
05:02:30:WU00:FS00:0x17:Machine: 0
05:02:30:WU00:FS00:0x17:Reading tar file state.xml
05:02:30:WU00:FS00:0x17:Reading tar file system.xml
05:02:31:WU01:FS01:0x17:*********************** Log Started 2014-12-04T05:02:30Z ***********************
05:02:31:WU01:FS01:0x17:Project: 9201 (Run 810, Clone 1, Gen 158)
05:02:31:WU01:FS01:0x17:Unit: 0x000001066652edc45399f5e2e8c30932
05:02:31:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
05:02:31:WU01:FS01:0x17:Machine: 1
05:02:31:WU01:FS01:0x17:Reading tar file state.xml
05:02:31:WU01:FS01:0x17:Reading tar file system.xml
05:02:31:WU00:FS00:0x17:Reading tar file integrator.xml
05:02:31:WU00:FS00:0x17:Reading tar file core.xml
05:02:31:WU00:FS00:0x17:Digital signatures verified
05:02:31:WU00:FS00:0x17:Folding@home GPU core17
05:02:31:WU00:FS00:0x17:Version 0.0.52
05:02:31:WU01:FS01:0x17:Reading tar file integrator.xml
05:02:31:WU01:FS01:0x17:Reading tar file core.xml
05:02:31:WU01:FS01:0x17:Digital signatures verified
05:02:31:WU01:FS01:0x17:Folding@home GPU core17
05:02:31:WU01:FS01:0x17:Version 0.0.52
05:02:42:111:127.0.0.1:New Web connection
05:02:58:WU01:FS01:0x17:Completed 0 out of 5000000 steps (0%)
05:02:58:WU00:FS00:0x17:Completed 0 out of 5000000 steps (0%)
05:02:58:WU01:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
05:02:58:WU00:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
05:03:25:Removing old file 'configs/config-20141130-043052.xml'
05:03:25:Saving configuration to config.xml
05:03:25:<config>
05:03:25:  <!-- Folding Core -->
05:03:25:  <core-priority v='low'/>
05:03:25:
05:03:25:  <!-- Network -->
05:03:25:  <proxy v=':8080'/>
05:03:25:
05:03:25:  <!-- Slot Control -->
05:03:25:  <power v='full'/>
05:03:25:
05:03:25:  <!-- User Information -->
05:03:25:  <passkey v='********************************'/>
05:03:25:  <team v='223518'/>
05:03:25:  <user v='SGM26'/>
05:03:25:
05:03:25:  <!-- Folding Slots -->
05:03:25:  <slot id='0' type='GPU'/>
05:03:25:  <slot id='1' type='GPU'/>
05:03:25:</config>
05:05:07:WU01:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
05:05:21:WU00:FS00:0x17:Completed 50000 out of 5000000 steps (1%)
05:07:17:WU01:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
05:07:44:WU00:FS00:0x17:Completed 100000 out of 5000000 steps (2%)
05:09:27:WU01:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
05:10:05:WU00:FS00:0x17:Completed 150000 out of 5000000 steps (3%)
05:11:38:WU01:FS01:0x17:Completed 200000 out of 5000000 steps (4%)
05:12:33:WU00:FS00:0x17:Completed 200000 out of 5000000 steps (4%)
05:13:50:WU01:FS01:0x17:Completed 250000 out of 5000000 steps (5%)
05:15:00:WU00:FS00:0x17:Completed 250000 out of 5000000 steps (5%)
05:16:01:WU01:FS01:0x17:Completed 300000 out of 5000000 steps (6%)
05:17:26:WU00:FS00:0x17:Completed 300000 out of 5000000 steps (6%)
05:18:14:WU01:FS01:0x17:Completed 350000 out of 5000000 steps (7%)
05:19:54:WU00:FS00:0x17:Completed 350000 out of 5000000 steps (7%)
05:20:26:WU01:FS01:0x17:Completed 400000 out of 5000000 steps (8%)
05:22:18:WU00:FS00:0x17:Completed 400000 out of 5000000 steps (8%)
05:22:35:WU01:FS01:0x17:Completed 450000 out of 5000000 steps (9%)
******************************* Date: 2014-12-04 *******************************
21:22:53:WARNING:WU00:FS00:Detected clock skew (15 hours 47 mins), adjusting time estimates
21:22:53:WARNING:WU01:FS01:Detected clock skew (15 hours 47 mins), adjusting time estimates
21:25:20:146:127.0.0.1:New Web connection
00:47:51:FS00:Finishing
00:47:51:FS01:Finishing
01:54:37:196:127.0.0.1:New Web connection
02:00:10:222:127.0.0.1:New Web connection
02:00:57:WARNING:WU01:FS01:FahCore returned: FAILED_2 (1 = 0x1)
02:00:57:WU01:FS01:Starting
02:00:57:WU01:FS01:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/Users/Sean/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 01 -suffix 01 -version 704 -lifeline 4792 -checkpoint 15 -gpu 1 -gpu-vendor ati
02:00:57:WU01:FS01:Started FahCore on PID 6188
02:00:57:WU01:FS01:Core PID:5492
02:00:57:WU01:FS01:FahCore 0x17 started
02:00:58:WU01:FS01:0x17:*********************** Log Started 2014-12-05T02:00:57Z ***********************
02:00:58:WU01:FS01:0x17:Project: 9201 (Run 810, Clone 1, Gen 158)
02:00:58:WU01:FS01:0x17:Unit: 0x000001066652edc45399f5e2e8c30932
02:00:58:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
02:00:58:WU01:FS01:0x17:Machine: 1
02:00:58:WU01:FS01:0x17:Digital signatures verified
02:00:58:WU01:FS01:0x17:Folding@home GPU core17
02:00:58:WU01:FS01:0x17:Version 0.0.52
02:00:58:WU01:FS01:0x17:  Found a checkpoint file
02:00:59:WARNING:WU00:FS00:FahCore returned: FAILED_2 (1 = 0x1)
02:00:59:WU00:FS00:Starting
02:00:59:WU00:FS00:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/Users/Sean/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 4792 -checkpoint 15 -gpu 0 -gpu-vendor ati
02:00:59:WU00:FS00:Started FahCore on PID 7296
02:00:59:WU00:FS00:Core PID:2780
02:00:59:WU00:FS00:FahCore 0x17 started
02:00:59:WU00:FS00:0x17:*********************** Log Started 2014-12-05T02:00:59Z ***********************
02:00:59:WU00:FS00:0x17:Project: 9201 (Run 477, Clone 4, Gen 86)
02:00:59:WU00:FS00:0x17:Unit: 0x000000726652edc45399e8d00d4fd778
02:00:59:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
02:00:59:WU00:FS00:0x17:Machine: 0
02:00:59:WU00:FS00:0x17:Digital signatures verified
02:00:59:WU00:FS00:0x17:Folding@home GPU core17
02:00:59:WU00:FS00:0x17:Version 0.0.52
02:00:59:WU00:FS00:0x17:  Found a checkpoint file
SGM26
Posts: 10
Joined: Tue Nov 11, 2014 2:13 am

Re: GPU WUs stuck at 99.99%?

Post by SGM26 »

I've just restarted my computer and now both projects have regressed to just over 10% completed.

I don't have any overclocks on the GPUs and they're certainly not overheating (72C at the very most). I do however get a crash where the screen goes black and recovers a few seconds later with a balloon popup from windows.

The message reads
Display driver stopped responding and has recovered - Display driver AMD driver stopped responding and has recovered
Folding continues after this occurs, or at least the progress counter continues to go up.

Not sure if it's also worth mentioning but when I'm folding I'm also using the computer to watch videos on YouTube, Netflix, Twitch, etc as well as running BOINC on the CPU.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU WUs stuck at 99.99%?

Post by Joe_H »

You have to look at the log. the FAHControl display keeps ticking the percentage up because it has not received any indication the core has stopped. That message you have seen does indicate a GPU driver reset, and after that no further progress will be made by the folding core. There is some work being down to have the core recover from that kind of error, but it does not always work.

It is possible that one of the other processes using the GPU is causing the driver reset, less likely that the BOINC process running on the CPU is responsible.

Another thing I noticed from your log file at 21:22:53 are the clock skew messages. That sometimes is from having a system hibernate or sleep. The RAM on your GPU is not backed up as part of the sleeping process, so is not restored when coming out of sleep. That will also cause folding on a GPU to fail. If you are going to put a system to sleep, GPU folding should be paused first. This is not an issue with CPU folding as main memory is backed up for sleep/hibernate periods and restored on waking.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
SGM26
Posts: 10
Joined: Tue Nov 11, 2014 2:13 am

Re: GPU WUs stuck at 99.99%?

Post by SGM26 »

I see.

I do use sleep but I need to pause FAH in order to sleep otherwise the computer stays turned on (i.e. I click the sleep button, the screen turns off but the system continues to run.). I'll try a full shut down next time it is required.

I guess it's just one of those things that needs to be dealt with until there's a fix or some other work around. Next time it crashes I will pause and restart folding then check the log after a few minutes to see if it is working again.

Thanks for your help so far, it is appreciated.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU WUs stuck at 99.99%?

Post by Joe_H »

Well, in the case of the skew messages I pointed out, it was apparent that the GPU folding had not been paused. If it had been, there would have been messages about the cores stopping in the log.

In the meantime if you can figure out if there is some particular app or action that causes the driver reset message to show, perhaps someone here has come across a workaround to avoid the crash.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: GPU WUs stuck at 99.99%?

Post by bcavnaugh »

I had the same today as well on Core 17 P9201 I paused then restarted my computer. Sad that I dropped over 20K in Credits.
Funning though while the FAH Client showed 99.99% Complete HFM.NET http://weather.mfc-cs.com/haf/ showed 24%.
After the Reboot the Client showed 24%
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GPU WUs stuck at 99.99%?

Post by 7im »

Yes. Same issue as explained above.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU WUs stuck at 99.99%?

Post by bruce »

It is not necessary to suspend CPU WUs before sleeping, but it is essential for GPU slots. By default, FAH should have disabled sleeping.

At some point, I expect the FAH developers will figure out a way to recover after a sleep or other kinds of interruptions to GPU processing but that's not possible today.

If you're comfortable visiting fah's new subreddit, I think we should publish this problem there so that others with the same problem can upvote the topic.
Post Reply