Page 1 of 1

9401 - Bad state detected

Posted: Wed Feb 12, 2014 4:31 am
by Jim Saunders
Hi, I didn't see anything else in the thread on it while the project was in beta, but I got this:

Code: Select all

02:01:41:WU00:FS01:Connecting to assign-GPU.stanford.edu:80
02:01:53:WU00:FS01:News: Welcome to Folding@Home
02:01:53:WU00:FS01:Assigned to work server 171.67.108.31
02:01:53:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:"GF116 [GeForce GT 610]" from 171.67.108.31
02:01:53:WU00:FS01:Connecting to 171.67.108.31:8080
02:01:57:WU00:FS01:Downloading 4.32MiB
02:03:07:WU00:FS01:Download complete
02:03:07:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9401 run:87 clone:0 gen:9 core:0x17 unit:0x0000000e6652edaf52eae313afb39c24
02:03:07:WU00:FS01:Downloading project 9401 description
02:03:07:WU00:FS01:Connecting to fah-web.stanford.edu:80
02:03:10:WU00:FS01:Project 9401 description downloaded successfully
02:06:37:WU00:FS01:Starting
02:06:37:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Jim/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 702 -lifeline 5804 -checkpoint 30 -gpu 0
02:06:37:WU00:FS01:Started FahCore on PID 6072
02:06:37:WU00:FS01:Core PID:6484
02:06:37:WU00:FS01:FahCore 0x17 started
02:06:37:WU00:FS01:0x17:*********************** Log Started 2014-02-12T02:06:37Z ***********************
02:06:37:WU00:FS01:0x17:Project: 9401 (Run 87, Clone 0, Gen 9)
02:06:37:WU00:FS01:0x17:Unit: 0x0000000e6652edaf52eae313afb39c24
02:06:37:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
02:06:37:WU00:FS01:0x17:Machine: 1
02:06:37:WU00:FS01:0x17:Reading tar file state.xml
02:06:38:WU00:FS01:0x17:Reading tar file system.xml
02:06:39:WU00:FS01:0x17:Reading tar file integrator.xml
02:06:39:WU00:FS01:0x17:Reading tar file core.xml
02:06:39:WU00:FS01:0x17:Digital signatures verified
02:06:39:WU00:FS01:0x17:Folding@home GPU core17
02:06:39:WU00:FS01:0x17:Version 0.0.52
02:10:49:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
02:10:49:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:27:37:WU00:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
03:02:09:WU00:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
03:02:09:WU00:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
03:35:19:WU00:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
******************************** Date: 12/02/14 ********************************
03:52:06:WU00:FS01:0x17:Completed 200000 out of 5000000 steps (4%)
04:09:00:WU00:FS01:0x17:Completed 250000 out of 5000000 steps (5%)
GTX 580 on Win7, stock clocks, SMP running on 7 cores on an i7-950. V7 indicates 33K PPD, HFM 22K; I'm not concerned about the score, but I wanted to pass this up on the chance it indicates something. Near as I can tell the slot carries on as per normal though. I've never seen a log entry like it for any of the other projects (8018 and 8900 in recent memory), and another unit on a different GPU has run to 17% without a similar report.

Jim

Re: 9401 - Bad state detected

Posted: Wed Feb 12, 2014 5:09 am
by PantherX
IIRC, this is a feature that is build-in FahCore_17 which attempts to resolves some NANs(?) before giving up on it. I encountered this issue once on my GPUs and the WU successfully competed the WU and was credited. If you can successfully fold the WU and upload it, you should be credited for it.

Re: 9401 - Bad state detected

Posted: Wed Feb 12, 2014 5:31 am
by Jim Saunders
Thanks, I figured it was something like this. If it goes sideways I'll report, but otherwise I see no reason to worry about it.

Jim

Re: 9401 - Bad state detected

Posted: Wed Feb 12, 2014 5:41 am
by Jim Saunders
But then this happened:

Code: Select all

02:10:49:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
02:10:49:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:27:37:WU00:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
03:02:09:WU00:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
03:02:09:WU00:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
03:35:19:WU00:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
******************************** Date: 12/02/14 ********************************
03:52:06:WU00:FS01:0x17:Completed 200000 out of 5000000 steps (4%)
04:09:00:WU00:FS01:0x17:Completed 250000 out of 5000000 steps (5%)
04:38:36:WU00:FS01:0x17:Completed 300000 out of 5000000 steps (6%)
04:38:36:WU00:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
05:12:40:WU00:FS01:0x17:Completed 350000 out of 5000000 steps (7%)
05:12:40:WU00:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
05:12:41:WU00:FS01:0x17:Max number of retries reached. Aborting.
05:12:41:WU00:FS01:0x17:ERROR:exception: Max Retries Reached
05:12:41:WU00:FS01:0x17:Saving result file logfile_01.txt
05:12:41:WU00:FS01:0x17:Saving result file log.txt
05:12:41:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
05:12:41:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:12:41:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9401 run:87 clone:0 gen:9 core:0x17 unit:0x0000000e6652edaf52eae313afb39c24
05:12:41:WU00:FS01:Uploading 2.64KiB to 171.67.108.31
05:12:41:WU00:FS01:Connecting to 171.67.108.31:8080
05:12:41:WU01:FS01:Connecting to assign-GPU.stanford.edu:80
05:12:42:WU00:FS01:Upload complete
05:12:42:WU00:FS01:Server responded WORK_ACK (400)
05:12:42:WU00:FS01:Cleaning up
I'm not going to worry about it until it becomes a pattern; the other slot from above it up to 20% with no indications of the same problem.

Jim

Re: 9401 - Bad state detected

Posted: Wed Feb 12, 2014 6:40 am
by bruce
Molecular simulation of folding involves a degree of randomness depending on the temperature of the sample being simulated. Unfortunately, that means that once in a while what we call a "bad WU" is issued, though nobody knows it's bad until somebody runs it. We hope that the "bad WUs" are weeded out during beta testing but ithat is never certain since it's a probabilistic process.

The same WU was reassigned and somebody else encountered an error, too, so don't worry about trying to fix your system.

Re: 9401 - Bad state detected

Posted: Wed Feb 12, 2014 3:15 pm
by Rel25917
Is it stock stock clocks or factory overclocked stock clocks? This can be a sign of a bit to much overclock. Core 17 is sensitive to memory oc. I had to reduce the memory speed on my evga superclocked titan to get rid of that error(but I'm running +15mhz on core over the superclock speed). If you keep seeing it every now and then you may need to try tweaking the memory speed.

Re: 9401 - Bad state detected

Posted: Wed Feb 12, 2014 11:31 pm
by Jim Saunders
So far it's been a one-of incident, on a unit I haven't seen much before; HFM didn't keep it in the log and I don't have any more running. If it happens again I'll pass it up, but I have no reason to think anything is wrong on my end. The card off the top of my head isn't one of the factory overclocked ones.

Jim

Re: 9401 - Bad state detected

Posted: Thu Feb 13, 2014 1:29 pm
by Ripshod
I'm struggling to get a stable machine with these 9401s. 8900s were absolutely fine with everything including my overclocks. 9401 just constant crashes. Fresh install with zero modification (crikey it's slooooow) now and the 13.12 whql drivers. Will report back here if I still have problems.

Nothing in the logs nor in the event viewer.

Re: 9401 - Bad state detected

Posted: Thu Feb 13, 2014 3:14 pm
by 7im
Stability issues always mean too much overclock. Do you have this problem at oem stock speeds?

Re: 9401 - Bad state detected

Posted: Thu Feb 13, 2014 3:29 pm
by Ripshod
Funnily enough yh, at stock everything and a fresh install.

Got it sorted. For some reason uninstalling 'CCC' and all the other stuff works, Just the basic drivers installed and everything is good again. Overclocks are fine now too.

Gotta say I didn't see that one coming!!

Re: 9401 - Bad state detected

Posted: Sat Feb 15, 2014 7:40 pm
by Jim Saunders
As a postscript this GPU has demonstrated instability on P8900 WUs also; any criticism of P9401 should be considered in that context.

Jim