Re: Core 17 has suddenly started crashing
Posted: Mon Jun 02, 2014 5:36 pm
Any recent Windows updates? Try a clean install of the appropriate drivers from nvidia.
Community driven support forum for Folding@home
https://foldingforum.org/
From the log, the WU never reaches 1%. In fact it never started processing beyond the setup phase. Do you have examples of WU's that fail at 1%? If you are going by the percentage displayed in the FAHControl or Web Control display, that can be off when a WU is just started or restarted and especially where errors are being processed.Eagle wrote:Regarding the error, it doesn't happen at exactly 0%. It's mostly around 1% when the error happens - if that makes a difference.
Windows updates, if left to automatic updates, will sometimes update the NV driver with the MS version instead of the NV version. NV version is needed. For your card, the latest version is fine.Eagle wrote:Yes, but mostly Windows Defender definition updates, a couple of Office updates and a handful with regards to security.
Re-installing didn't help. At first, I believed the newer nVidia drivers where the cause, but even going back to 335.23 didn't resolve the issue (although it worked back in March/April). When did FAH update from 0.0.49 to 0.0.52?
Thanks for the clarification - indeed, I judged from the values displayed within FAHControl. I just watched it reaching right above 2% before failing and no percentage within the log:Joe_H wrote:From the log, the WU never reaches 1%. In fact it never started processing beyond the setup phase. Do you have examples of WU's that fail at 1%? If you are going by the percentage displayed in the FAHControl or Web Control display, that can be off when a WU is just started or restarted and especially where errors are being processed.
As for the WU, it has no entries in the database yet. So it is too early to tell if it is a bad WU.
Code: Select all
17:57:45:WU00:FS01:Starting
17:57:45:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
17:57:45:WU00:FS01:Started FahCore on PID 7480
17:57:45:WU00:FS01:Core PID:2444
17:57:45:WU00:FS01:FahCore 0x17 started
17:57:46:WU00:FS01:0x17:*********************** Log Started 2014-06-02T17:57:45Z ***********************
17:57:46:WU00:FS01:0x17:Project: 13000 (Run 1869, Clone 0, Gen 16)
17:57:46:WU00:FS01:0x17:Unit: 0x00000028538b3db75311ac46997470b8
17:57:46:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
17:57:46:WU00:FS01:0x17:Machine: 1
17:57:46:WU00:FS01:0x17:Reading tar file state.xml
17:57:46:WU00:FS01:0x17:Reading tar file system.xml
17:57:47:WU00:FS01:0x17:Reading tar file integrator.xml
17:57:47:WU00:FS01:0x17:Reading tar file core.xml
17:57:47:WU00:FS01:0x17:Digital signatures verified
17:57:47:WU00:FS01:0x17:Folding@home GPU core17
17:57:47:WU00:FS01:0x17:Version 0.0.52
17:58:29:WU02:FS00:0xa3:Completed 355000 out of 500000 steps (71%)
18:01:37:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:ERROR:103: Lost client lifeline
18:01:37:WU00:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
18:01:38:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
18:01:38:WARNING:WU00:FS01:FahCore returned: CLIENT_DIED (103 = 0x67)
18:01:38:WARNING:WU00:FS01:Too many errors, failing
18:01:38:WU00:FS01:Sending unit results: id:00 state:SEND error:FAILED project:13000 run:1869 clone:0 gen:16 core:0x17 unit:0x00000028538b3db75311ac46997470b8
18:01:38:WU00:FS01:Connecting to 140.163.4.231:8080
18:01:38:WU01:FS01:Connecting to 171.67.108.201:80
18:01:38:WU00:FS01:Server responded WORK_ACK (400)
18:01:38:WU00:FS01:Cleaning up
"bigadv" was once required to get the newer FAH core. I assume it can be safely removed then?P5-133XL wrote:There is no point in bigadv with a CPU limit of 7 when the minimum CPU cores are 24.
I do not know why you want to limit the number of logs to 5 but is irrelevant to the issue.
OC'ing would be the obvious likeihood.
It was a while back, 7 November 2013 (https://folding.stanford.edu/home/chang ... -full-fah/). However, the minimum FahCore_17 version requirement is determined by the WU so if you get a WU which required version 0.0.52, it will automatically update FahCore_17.Eagle wrote:...When did FAH update from 0.0.49 to 0.0.52?
Pretty sure that it was advanced when FahCore_17 was released for the first time (https://folding.stanford.edu/home/welco ... core-17-2/). You can now remove it. Moreover, the setting was in the CPU Slot so wouldn't have effected the GPU Slot.Eagle wrote:..."bigadv" was once required to get the newer FAH core. I assume it can be safely removed then?...
Humm, not sure why that is since by default, there can be a maximum of 17 log files, the log file in the FAHClient folder which is the current one and the previous 16 log files in the logs folder. The oldest one is automatically deleted and a new log file is generated every time FAHClient is restarted.Eagle wrote:...When I set up FAH 7, this was suggested to prevent the log directory from getting filled with dozens of log files...
Apart from Windows Update, did anything else on the system change?Eagle wrote:..."As I never OC'ed the card and it worked just fine until a couple of days/weeks, this seems a bit strange to me.
However, a quick search later, and it seems that something is killing the client (viewtopic.php?p=164030#p164030). Maybe the updated security application is interfering with F@H files? Have you created an exception for F@H in your security application?Eagle wrote:...
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:ERROR:103: Lost client lifeline
....
Thanks for the information.PantherX wrote:It was a while back, 7 November 2013 (https://folding.stanford.edu/home/chang ... -full-fah/). However, the minimum FahCore_17 version requirement is determined by the WU so if you get a WU which required version 0.0.52, it will automatically update FahCore_17.
Regarding the GPU slot, you are right. But when it comes to the CPU one, I was once told (and it was here at Folding Forum) to add bigadv to the CPU slot. But that might be out of date as well.PantherX wrote:Pretty sure that it was advanced when FahCore_17 was released for the first time (https://folding.stanford.edu/home/welco ... core-17-2/). You can now remove it. Moreover, the setting was in the CPU Slot so wouldn't have effected the GPU Slot.
Thanks for the clarification. I removed that setting as well.PantherX wrote:Humm, not sure why that is since by default, there can be a maximum of 17 log files, the log file in the FAHClient folder which is the current one and the previous 16 log files in the logs folder. The oldest one is automatically deleted and a new log file is generated every time FAHClient is restarted.
Code: Select all
<config>
<!-- Network -->
<proxy v=':8080'/>
<!-- Slot Control -->
<power v='full'/>
<!-- User Information -->
<passkey v='********************************'/>
<team v='34361'/>
<user v='Eagle3386'/>
<!-- Folding Slots -->
<slot id='1' type='GPU'/>
<slot id='0' type='CPU'>
<cpus v='7'/>
</slot>
</config>
I updated Logitech G-Software, Miranda NG, Mozilla Firefox (I'm on the Nightly channel, so it updates almost every day..), Paint.NET (4.0 Beta), Skype and Steam. Apart from that, both, hard- and software, run on the same revision as months ago and 24/7.PantherX wrote:Apart from Windows Update, did anything else on the system change?
That might have happened due to the frequent slot deaths with that client eventually dying. In order to test again, but without waiting a couple of hours, I normally pause and un-pause the slot. That way, it retries to work on WUs.PantherX wrote:These error messages are uncommon for FahCore_17 and I don't remember seeing them recently:However, a quick search later, and it seems that something is killing the client (viewtopic.php?p=164030#p164030). Maybe the updated security application is interfering with F@H files? Have you created an exception for F@H in your security application?Eagle wrote:...
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:ERROR:103: Lost client lifeline
....
Code: Select all
12:36:25:FS01:Paused
12:36:26:FS01:Unpaused
12:36:26:WU00:FS01:Connecting to 171.67.108.201:80
12:36:27:WU00:FS01:Assigned to work server 171.64.65.105
12:36:27:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 171.64.65.105
12:36:27:WU00:FS01:Connecting to 171.64.65.105:8080
12:36:28:WU00:FS01:Downloading 122.32KiB
12:36:29:WU00:FS01:Download complete
12:36:29:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:7623 run:349 clone:0 gen:168 core:0x15 unit:0x000000de664f2dd14fe4fa6aca9dcf16
12:36:29:WU00:FS01:Starting
12:36:29:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:36:29:WU00:FS01:Started FahCore on PID 4432
12:36:29:WU00:FS01:Core PID:1820
12:36:29:WU00:FS01:FahCore 0x15 started
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:*------------------------------*
12:36:30:WU00:FS01:0x15:Folding@Home GPU Core
12:36:30:WU00:FS01:0x15:Version 2.25 (Wed May 9 17:03:01 EDT 2012)
12:36:30:WU00:FS01:0x15:Build host AmoebaRemote
12:36:30:WU00:FS01:0x15:Board Type NVIDIA/CUDA
12:36:30:WU00:FS01:0x15:Core 15
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:Window's signal control handler registered.
12:36:30:WU00:FS01:0x15:Preparing to commence simulation
12:36:30:WU00:FS01:0x15:- Looking at optimizations...
12:36:30:WU00:FS01:0x15:DeleteFrameFiles: successfully deleted file=00/wudata_01.ckp
12:36:30:WU00:FS01:0x15:- Created dyn
12:36:30:WU00:FS01:0x15:- Files status OK
12:36:30:WU00:FS01:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
12:36:30:WU00:FS01:0x15:- Expanded 124739 -> 501826 (decompressed 402.3 percent)
12:36:30:WU00:FS01:0x15:Called DecompressByteArray: compressed_data_size=124739 data_size=501826, decompressed_data_size=501826 diff=0
12:36:30:WU00:FS01:0x15:- Digital signature verified
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:Project: 7623 (Run 349, Clone 0, Gen 168)
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:Assembly optimizations on if available.
12:36:30:WU00:FS01:0x15:Entering M.D.
12:36:32:WU00:FS01:0x15:Tpr hash 00/wudata_01.tpr: 3019852558 3487233801 375813301 3414467450 1681703861
12:36:32:WU00:FS01:0x15:GPU device id=0
12:36:32:WU00:FS01:0x15:Working on Protein
12:36:32:WU00:FS01:0x15:Client config unavailable.
12:36:32:WU00:FS01:0x15:Starting GUI Server
12:37:45:WU00:FS01:0x15:Setting checkpoint frequency: 400000
12:37:45:WU00:FS01:0x15:Completed 3 out of 40000000 steps (0%).
12:41:20:WU00:FS01:0x15:Completed 400000 out of 40000000 steps (1%).
12:45:00:WU00:FS01:0x15:Completed 800000 out of 40000000 steps (2%).
12:48:42:WU00:FS01:0x15:Completed 1200000 out of 40000000 steps (3%).
12:52:22:WU00:FS01:0x15:Completed 1600000 out of 40000000 steps (4%).
12:56:04:WU00:FS01:0x15:Completed 2000000 out of 40000000 steps (5%).
Code: Select all
16:23:32:WU00:FS01:Connecting to 171.67.108.201:80
16:23:34:WU00:FS01:Assigned to work server 140.163.4.231
16:23:34:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 140.163.4.231
16:23:34:WU00:FS01:Connecting to 140.163.4.231:8080
16:23:35:WU00:FS01:Downloading 4.83MiB
16:23:40:WU00:FS01:Download complete
16:23:40:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:434 clone:1 gen:23 core:0x17 unit:0x00000028538b3db75328cabc362813c8
16:23:40:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
16:23:40:WU00:FS01:Connecting to web.stanford.edu:80
16:23:41:WU00:FS01:FahCore 17: Downloading 2.55MiB
16:23:47:WU00:FS01:FahCore 17: 36.76%
16:23:53:WU00:FS01:FahCore 17: 75.97%
16:23:56:WU00:FS01:FahCore 17: Download complete
16:23:56:WU00:FS01:Valid core signature
16:23:56:WARNING:WU00:FS01:FahCore has not changed since last download, aborting core update
16:23:56:WU00:FS01:Starting
16:23:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
16:23:56:WU00:FS01:Started FahCore on PID 6964
16:23:57:WU00:FS01:Core PID:4132
16:23:57:WU00:FS01:FahCore 0x17 started
I always only install the display, audio and PhysX driver, no 3D-stuff and applications aren't offered (at least not via GeForce Experience).PantherX wrote:Humm, none of the additional updates would normally interfere with F@H. FahCore_15 is working which means that at least the CUDA code is running properly. FahCore_17 uses OpenCL so maybe a reinstallation of Nvidia Driver with the clean option and install only the drivers and not the additional features if you aren't using it might be next.
I've paused, deleted and un-paused the GPU slot. The result is this:PantherX wrote:Also, when reviewing your log file (viewtopic.php?p=265425#p265425) I noticed an anomaly in the message sequence:The message about aborting FahCore update shouldn't occur after the download is over and FahCore is verified. It should happen during the download. Maybe, you can delete the FahCore_17 in the folder which will automatically force a fresh download of FahCore_17.Code: Select all
16:23:32:WU00:FS01:Connecting to 171.67.108.201:80 16:23:34:WU00:FS01:Assigned to work server 140.163.4.231 16:23:34:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 140.163.4.231 16:23:34:WU00:FS01:Connecting to 140.163.4.231:8080 16:23:35:WU00:FS01:Downloading 4.83MiB 16:23:40:WU00:FS01:Download complete 16:23:40:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:434 clone:1 gen:23 core:0x17 unit:0x00000028538b3db75328cabc362813c8 16:23:40:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah 16:23:40:WU00:FS01:Connecting to web.stanford.edu:80 16:23:41:WU00:FS01:FahCore 17: Downloading 2.55MiB 16:23:47:WU00:FS01:FahCore 17: 36.76% 16:23:53:WU00:FS01:FahCore 17: 75.97% 16:23:56:WU00:FS01:FahCore 17: Download complete 16:23:56:WU00:FS01:Valid core signature 16:23:56:WARNING:WU00:FS01:FahCore has not changed since last download, aborting core update 16:23:56:WU00:FS01:Starting 16:23:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia 16:23:56:WU00:FS01:Started FahCore on PID 6964 16:23:57:WU00:FS01:Core PID:4132 16:23:57:WU00:FS01:FahCore 0x17 started
Code: Select all
19:13:40:FS01:Unpaused
19:13:40:WU02:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
19:13:40:WU02:FS01:Connecting to web.stanford.edu:80
19:13:40:WU02:FS01:FahCore 17: Downloading 2.55MiB
19:13:46:WU02:FS01:FahCore 17: 26.96%
19:13:52:WU02:FS01:FahCore 17: 58.81%
19:13:58:WU02:FS01:FahCore 17: 90.67%
19:13:59:WU02:FS01:FahCore 17: Download complete
19:13:59:WU02:FS01:Valid core signature
19:13:59:WU02:FS01:Unpacked 8.60MiB to cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe
19:14:00:WU02:FS01:Starting
19:14:00:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
19:14:00:WU02:FS01:Started FahCore on PID 5784
19:14:00:WU02:FS01:Core PID:4248
19:14:00:WU02:FS01:FahCore 0x17 started
19:14:01:WU02:FS01:0x17:*********************** Log Started 2014-06-03T19:14:00Z ***********************
19:14:01:WU02:FS01:0x17:Project: 9406 (Run 34, Clone 0, Gen 61)
19:14:01:WU02:FS01:0x17:Unit: 0x000000520a3b1e5c533dd2516a29146a
19:14:01:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:14:01:WU02:FS01:0x17:Machine: 1
19:14:01:WU02:FS01:0x17:Reading tar file state.xml
19:14:02:WU02:FS01:0x17:Reading tar file system.xml
19:14:03:WU02:FS01:0x17:Reading tar file integrator.xml
19:14:03:WU02:FS01:0x17:Reading tar file core.xml
19:14:03:WU02:FS01:0x17:Digital signatures verified
19:14:03:WU02:FS01:0x17:Folding@home GPU core17
19:14:03:WU02:FS01:0x17:Version 0.0.52
19:19:59:WU02:FS01:0x17:Completed 0 out of 2000000 steps (0%)
19:19:59:WU02:FS01:0x17:Lost lifeline PID 5784, exiting
19:19:59:WU02:FS01:0x17:Lost lifeline PID 5784, exiting
19:19:59:WU02:FS01:0x17:ERROR:103: Lost client lifeline
19:19:59:WU02:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
19:19:59:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
19:19:59:WARNING:WU02:FS01:FahCore returned: CLIENT_DIED (103 = 0x67)
19:19:59:WU02:FS01:Starting
19:19:59:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
19:19:59:WU02:FS01:Started FahCore on PID 4580
19:19:59:WU02:FS01:Core PID:6528
19:19:59:WU02:FS01:FahCore 0x17 started
19:20:00:WU02:FS01:0x17:*********************** Log Started 2014-06-03T19:20:00Z ***********************
19:20:00:WU02:FS01:0x17:Project: 9406 (Run 34, Clone 0, Gen 61)
19:20:00:WU02:FS01:0x17:Unit: 0x000000520a3b1e5c533dd2516a29146a
19:20:00:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:20:00:WU02:FS01:0x17:Machine: 1
19:20:00:WU02:FS01:0x17:Reading tar file state.xml
19:20:01:WU02:FS01:0x17:Reading tar file system.xml
19:20:02:WU02:FS01:0x17:Reading tar file integrator.xml
19:20:02:WU02:FS01:0x17:Reading tar file core.xml
19:20:02:WU02:FS01:0x17:Digital signatures verified
19:20:02:WU02:FS01:0x17:Folding@home GPU core17
19:20:02:WU02:FS01:0x17:Version 0.0.52