Suffering from constant crashes on core 21 WUs!

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: Suffering from constant crashes on core 21 WUs!

Post by billford »

7im wrote:Just like one expects a few extra bugs in the first model year of a newly redesigned car or truck.
Yes, and if the bugs are severe then the manufacture will perform a recall.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Suffering from constant crashes on core 21 WUs!

Post by bruce »

Do you consider a "bad-state" severe if it's recoverable from the last checkpoint?
Ar`Kritz
Posts: 9
Joined: Wed Oct 08, 2014 1:58 pm

Re: Suffering from constant crashes on core 21 WUs!

Post by Ar`Kritz »

bruce wrote:Do you consider a "bad-state" severe if it's recoverable from the last checkpoint?
Now that's a big if:

04:16:52:WU01:FS00:0x21:Completed 900000 out of 2000000 steps (45%)
04:17:00:WU01:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
04:17:00:WU01:FS00:0x21:Max number of retries reached. Aborting.
04:17:00:WU01:FS00:0x21:ERROR:Max Retries Reached
04:17:00:WU01:FS00:0x21:Saving result file logfile_01.txt
04:17:00:WU01:FS00:0x21:Saving result file log.txt
04:17:00:WU01:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
04:17:01:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:17:01:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:9626 run:1 clone:12 gen:35 core:0x21 unit:0x0000003dab436c9b5609bee1fab4a345
04:17:01:WU01:FS00:Uploading 12.50KiB to 171.67.108.155
04:17:01:WU01:FS00:Connecting to 171.67.108.155:8080
04:17:02:WU00:FS00:Connecting to 171.67.108.45:80
04:17:02:WU01:FS00:Upload complete
04:17:02:WU01:FS00:Server responded WORK_ACK (400)
04:17:02:WU01:FS00:Cleaning up

C21 P9627 on 980 Ti, usually happends around 45%. Not on every WU from this WS, of course, but on a LOT of them...
Image
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: Suffering from constant crashes on core 21 WUs!

Post by billford »

bruce wrote:Do you consider a "bad-state" severe if it's recoverable from the last checkpoint?
If it then runs to the end, no.

If there are routinely so many bad states (possibly only on certain cards) that the WUs abort after an arbitrary three then yes.

If the parenthesised comment is true then those WUs should not be issued to those cards unless the client's beta flag is set.
Image
Post Reply