Page 1 of 1

project:13001 run:155 clone:2 gen:9 BAD_WORK_UNIT

Posted: Tue Jun 03, 2014 8:41 pm
by GreyWhiskers
"Bad Work Unit" report on Core 17 Project 13001. Looking at the forum, it doesn't look like there have been any such BAD WORK UNIT failures reported to the board since about 3 May.

Running on ancient Hardware, with the 96 CUDA core GT430 Fermi GPU:
b) 2004 HP a475c desktop, 1 core Pent 4 HT@3.2 GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

This hardware, per HFM logs, completed one P13001 (520/5/1) on 31 March, then was powered off until early May while I was on a cross-country road trip. Since it was restarted a month ago, it has successfully completed a number of Core 15 work units from Projects 7626, 7622, 7621, 7624. Currently 39% into P7623 (R261, C0, G169), projecting 6798.6 ppd.

Code: Select all

*********************** Log Started 2014-05-30T17:43:26Z ***********************
17:43:26:************************* Folding@home Client *************************
17:43:26:      Website: http://folding.stanford.edu/
17:43:26:    Copyright: (c) 2009-2013 Stanford University
17:43:26:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:43:26:         Args: 
17:43:26:       Config: C:/Documents and Settings/Owner/Application
17:43:26:               Data/FAHClient/config.xml
17:43:26:******************************** Build ********************************
17:43:26:      Version: 7.3.6
17:43:26:         Date: Feb 18 2013
17:43:26:         Time: 15:25:17
17:43:26:      SVN Rev: 3923
17:43:26:       Branch: fah/trunk/client
17:43:26:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
17:43:26:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
17:43:26:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
17:43:26:     Platform: win32 XP
17:43:26:         Bits: 32
17:43:26:         Mode: Release
17:43:26:******************************* System ********************************
17:43:26:          CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz
17:43:26:       CPU ID: GenuineIntel Family 15 Model 2 Stepping 9
17:43:26:         CPUs: 2
17:43:26:       Memory: 2.00GiB
17:43:26:  Free Memory: 413.76MiB
17:43:26:      Threads: WINDOWS_THREADS
17:43:26:  Has Battery: false
17:43:26:   On Battery: false
17:43:26:   UTC offset: -7
17:43:26:          PID: 5328
17:43:26:          CWD: C:/Documents and Settings/Owner/Application Data/FAHClient
17:43:26:           OS: Microsoft Windows XP Service Pack 3
17:43:26:      OS Arch: X86
17:43:26:         GPUs: 1
17:43:26:        GPU 0: NVIDIA:2 GF108 [GeForce GT 430]
17:43:26:         CUDA: 2.1
17:43:26:  CUDA Driver: 4020
17:43:26:Win32 Service: false
17:43:26:***********************************************************************
17:43:26:<config>
17:43:26:  <!-- Folding Slot Configuration -->
17:43:26:  <extra-core-args v='-forceasm'/>
17:43:26:  <power v='full'/>
17:43:26:
17:43:26:  <!-- Logging -->
17:43:26:  <log-rotate-max v='60'/>
17:43:26:
17:43:26:  <!-- Network -->
17:43:26:  <proxy v=':8080'/>
17:43:26:
17:43:26:  <!-- Remote Command Server -->
17:43:26:  <password v='********************************'/>
17:43:26:
17:43:26:  <!-- User Information -->
17:43:26:  <passkey v='********************************'/>
17:43:26:  <user v='GreyWhiskers'/>
17:43:26:
17:43:26:  <!-- Work Unit Control -->
17:43:26:  <next-unit-percentage v='100'/>
17:43:26:
17:43:26:  <!-- Folding Slots -->
17:43:26:  <slot id='1' type='GPU'>
17:43:26:    <core-priority v='low'/>
17:43:26:  </slot>
17:43:26:</config>

Code: Select all

******************************* Date: 2014-05-31 *******************************
...

09:18:06:WU01:FS01:Connecting to assign-GPU.stanford.edu:80
09:18:06:WU01:FS01:News: Welcome to Folding@Home
09:18:06:WU01:FS01:Assigned to work server 140.163.4.231
09:18:06:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GF108 [GeForce GT 430] from 140.163.4.231
09:18:06:WU01:FS01:Connecting to 140.163.4.231:8080
09:18:07:WU01:FS01:Downloading 4.84MiB
09:18:09:WU01:FS01:Download complete
09:18:09:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13001 run:155 clone:2 gen:9 core:0x17 unit:0x00000018538b3db753287b9d0d440452

... [Successful completion of prior Work Unit]

09:49:40:WU01:FS01:Starting
09:49:40:WU01:FS01:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" "C:/Documents and Settings/Owner/Application Data/FAHClient/cores/web.stanford.edu/~pande/Win32/x86/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 703 -lifeline 5328 -checkpoint 15 -gpu 0 -gpu-vendor nvidia -forceasm
09:49:41:WU01:FS01:Started FahCore on PID 216
09:49:42:WU01:FS01:Core PID:2252
09:49:42:WU01:FS01:FahCore 0x17 started

... [Successful upload of prior work unit]

09:49:43:WU01:FS01:0x17:*********************** Log Started 2014-05-31T09:49:43Z ***********************
09:49:43:WU01:FS01:0x17:Project: 13001 (Run 155, Clone 2, Gen 9)
09:49:43:WU01:FS01:0x17:Unit: 0x00000018538b3db753287b9d0d440452
09:49:43:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
09:49:43:WU01:FS01:0x17:Machine: 1
09:49:43:WU01:FS01:0x17:Reading tar file state.xml
09:49:48:WU01:FS01:0x17:Reading tar file system.xml
09:49:51:WU01:FS01:0x17:Reading tar file integrator.xml
09:49:51:WU01:FS01:0x17:Reading tar file core.xml
09:49:51:WU01:FS01:0x17:Digital signatures verified
09:49:51:WU01:FS01:0x17:Folding@home GPU core17
09:49:51:WU01:FS01:0x17:Version 0.0.52
10:00:59:WU01:FS01:0x17:Completed 0 out of 5000000 steps (0%)
10:01:00:WU01:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
11:47:16:WU01:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
******************************* Date: 2014-05-31 *******************************
13:31:32:WU01:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
15:17:02:WU01:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
17:02:33:WU01:FS01:0x17:Completed 200000 out of 5000000 steps (4%)
18:46:52:WU01:FS01:0x17:Completed 250000 out of 5000000 steps (5%)
******************************* Date: 2014-05-31 *******************************
20:32:38:WU01:FS01:0x17:Completed 300000 out of 5000000 steps (6%)
22:18:35:WU01:FS01:0x17:Completed 350000 out of 5000000 steps (7%)
23:18:36:WU01:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
01:03:06:WU01:FS01:0x17:Completed 300000 out of 5000000 steps (6%)
******************************* Date: 2014-06-01 *******************************
02:48:59:WU01:FS01:0x17:Completed 350000 out of 5000000 steps (7%)
04:34:27:WU01:FS01:0x17:Completed 400000 out of 5000000 steps (8%)
06:20:39:WU01:FS01:0x17:Completed 450000 out of 5000000 steps (9%)
08:05:46:WU01:FS01:0x17:Completed 500000 out of 5000000 steps (10%)
******************************* Date: 2014-06-01 *******************************
09:51:25:WU01:FS01:0x17:Completed 550000 out of 5000000 steps (11%)
11:37:17:WU01:FS01:0x17:Completed 600000 out of 5000000 steps (12%)
13:22:42:WU01:FS01:0x17:Completed 650000 out of 5000000 steps (13%)
15:08:13:WU01:FS01:0x17:Completed 700000 out of 5000000 steps (14%)
******************************* Date: 2014-06-01 *******************************
16:52:35:WU01:FS01:0x17:Completed 750000 out of 5000000 steps (15%)
18:37:58:WU01:FS01:0x17:Completed 800000 out of 5000000 steps (16%)
20:23:48:WU01:FS01:0x17:Completed 850000 out of 5000000 steps (17%)
22:09:18:WU01:FS01:0x17:Completed 900000 out of 5000000 steps (18%)
******************************* Date: 2014-06-02 *******************************
00:14:53:WU01:FS01:0x17:Completed 950000 out of 5000000 steps (19%)
02:22:47:WU01:FS01:0x17:Completed 1000000 out of 5000000 steps (20%)
02:22:48:WU01:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
03:14:52:WU01:FS01:0x17:Completed 900000 out of 5000000 steps (18%)
04:59:31:WU01:FS01:0x17:Completed 950000 out of 5000000 steps (19%)
******************************* Date: 2014-06-02 *******************************
06:44:50:WU01:FS01:0x17:Completed 1000000 out of 5000000 steps (20%)
08:31:51:WU01:FS01:0x17:Completed 1050000 out of 5000000 steps (21%)
10:16:20:WU01:FS01:0x17:Completed 1100000 out of 5000000 steps (22%)
12:01:51:WU01:FS01:0x17:Completed 1150000 out of 5000000 steps (23%)
******************************* Date: 2014-06-02 *******************************
13:45:38:WU01:FS01:0x17:Completed 1200000 out of 5000000 steps (24%)
15:30:42:WU01:FS01:0x17:Completed 1250000 out of 5000000 steps (25%)
17:15:35:WU01:FS01:0x17:Completed 1300000 out of 5000000 steps (26%)
18:59:27:WU01:FS01:0x17:Completed 1350000 out of 5000000 steps (27%)
******************************* Date: 2014-06-02 *******************************
20:45:21:WU01:FS01:0x17:Completed 1400000 out of 5000000 steps (28%)
22:34:11:WU01:FS01:0x17:Completed 1450000 out of 5000000 steps (29%)
00:44:05:WU01:FS01:0x17:Completed 1500000 out of 5000000 steps (30%)
00:44:06:WU01:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
00:44:06:WU01:FS01:0x17:Max number of retries reached. Aborting.
00:44:06:WU01:FS01:0x17:ERROR:exception: Max Retries Reached
00:44:06:WU01:FS01:0x17:Saving result file logfile_01.txt
00:44:06:WU01:FS01:0x17:Saving result file log.txt
00:44:07:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
00:44:11:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:44:11:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13001 run:155 clone:2 gen:9 core:0x17 unit:0x00000018538b3db753287b9d0d440452
00:44:11:WU01:FS01:Uploading 2.81KiB to 140.163.4.231
00:44:11:WU01:FS01:Connecting to 140.163.4.231:8080
00:44:12:WU00:FS01:Connecting to assign-GPU.stanford.edu:80
00:44:12:WU01:FS01:Upload complete
00:44:12:WU01:FS01:Server responded WORK_ACK (400)
00:44:12:WU01:FS01:Cleaning up

Re: project:13001 run:155 clone:2 gen:9 BAD_WORK_UNIT

Posted: Tue Jun 03, 2014 8:53 pm
by Joe_H
There are a couple complete failures reported in the database for this WU in addition to you getting partial credit for the incomplete WU. Another folder has successfully processed this WU.