Page 1 of 1

Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Posted: Mon Dec 12, 2016 7:10 pm
by Nicolas_orleans
Hello
Strange error on this WU, I got a UNKNOWN_ENUM (127 = 0x7f), then FAHClient restarted with a TPF of 15 minutes on a 980 Ti.
After computer rebooted, FAHClient fails to restart even from command line. In process to find a way to reinstall (GDEBI uninstall / reinstall does not work)

Code: Select all

13:20:35:WU03:FS00:0x21:Project: 10493 (Run 5, Clone 38, Gen 285)
13:20:35:WU03:FS00:0x21:Unit: 0x0000018f8ca304f555d616a56df00c60
13:20:35:WU03:FS00:0x21:CPU: 0x00000000000000000000000000000000
13:20:35:WU03:FS00:0x21:Machine: 0
13:20:35:WU03:FS00:0x21:Reading tar file core.xml
13:20:35:WU03:FS00:0x21:Reading tar file system.xml
13:20:35:WU03:FS00:0x21:Reading tar file integrator.xml
13:20:35:WU03:FS00:0x21:Reading tar file state.xml
13:20:36:WU03:FS00:0x21:Digital signatures verified
13:20:36:WU03:FS00:0x21:Folding@home GPU Core21 Folding@home Core
13:20:36:WU03:FS00:0x21:Version 0.0.17
13:20:36:WU02:FS00:Upload 24.54%
13:20:42:WU03:FS00:0x21:Completed 0 out of 5000000 steps (0%)
13:20:42:WU03:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
13:23:44:WU03:FS00:0x21:Completed 50000 out of 5000000 steps (1%)
13:26:45:WU03:FS00:0x21:Completed 100000 out of 5000000 steps (2%)
13:29:48:WU03:FS00:0x21:Completed 150000 out of 5000000 steps (3%)
13:32:49:WU03:FS00:0x21:Completed 200000 out of 5000000 steps (4%)
13:35:50:WU03:FS00:0x21:Completed 250000 out of 5000000 steps (5%)
13:38:53:WU03:FS00:0x21:Completed 300000 out of 5000000 steps (6%)
13:41:55:WU03:FS00:0x21:Completed 350000 out of 5000000 steps (7%)
13:52:40:WARNING:WU03:FS00:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
13:52:40:WU03:FS00:Starting
13:52:40:WU03:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/beta/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 704 -lifeline 3499 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
13:52:40:WU03:FS00:Started FahCore on PID 25376
13:52:40:WU03:FS00:Core PID:25380
13:52:40:WU03:FS00:FahCore 0x21 started
13:52:41:WU03:FS00:0x21:*********************** Log Started 2016-12-12T13:52:41Z ***********************
13:52:41:WU03:FS00:0x21:Project: 10493 (Run 5, Clone 38, Gen 285)
13:52:41:WU03:FS00:0x21:Unit: 0x0000018f8ca304f555d616a56df00c60
13:52:41:WU03:FS00:0x21:CPU: 0x00000000000000000000000000000000
13:52:41:WU03:FS00:0x21:Machine: 0
13:52:41:WU03:FS00:0x21:Digital signatures verified
13:52:41:WU03:FS00:0x21:Folding@home GPU Core21 Folding@home Core
13:52:41:WU03:FS00:0x21:Version 0.0.17
13:52:41:WU03:FS00:0x21:  Found a checkpoint file
13:52:47:WU03:FS00:0x21:Completed 250000 out of 5000000 steps (5%)
13:52:47:WU03:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
14:17:06:WU03:FS00:0x21:Completed 300000 out of 5000000 steps (6%)
14:41:24:WU03:FS00:0x21:Completed 350000 out of 5000000 steps (7%)
15:05:43:WU03:FS00:0x21:Completed 400000 out of 5000000 steps (8%)
15:29:59:WU03:FS00:0x21:Completed 450000 out of 5000000 steps (9%)
15:54:18:WU03:FS00:0x21:Completed 500000 out of 5000000 steps (10%)
16:18:36:WU03:FS00:0x21:Completed 550000 out of 5000000 steps (11%)
16:42:55:WU03:FS00:0x21:Completed 600000 out of 5000000 steps (12%)
17:07:14:WU03:FS00:0x21:Completed 650000 out of 5000000 steps (13%)
17:31:30:WU03:FS00:0x21:Completed 700000 out of 5000000 steps (14%)
17:55:49:WU03:FS00:0x21:Completed 750000 out of 5000000 steps (15%)
******************************* Date: 2016-12-12 *******************************
18:20:08:WU03:FS00:0x21:Completed 800000 out of 5000000 steps (16%)
18:44:24:WU03:FS00:0x21:Completed 850000 out of 5000000 steps (17%)

Re: Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Posted: Sat Dec 17, 2016 7:19 am
by Nicolas_orleans
I think the cause is an hardware failure. I reinstalled the system, and drivers appear to reset randomly, though I folded for months 24/7 with these. My best guess is one of the cards is failing and resets the driver for all cards. Will need to run each card separately to icheck this assumption.

Re: Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Posted: Sat Dec 17, 2016 8:07 pm
by bruce
That's a reasonable assumption and you have a good plan to isolate it. Messages saying UNKNOWN* means that something happened which is probably "impossible" and hardware failures are known to be a chief cause of such messags.