Page 6 of 7

Re: Core 17 has suddenly started crashing

Posted: Fri Jun 06, 2014 2:03 am
by PantherX
Eagle wrote:...I don't know why that "Date: 2014-06-05"-line is written into the log, although the day (and hence the date) is still the same...
This information is always printed in the log file every 6 hours by default.
Eagle wrote:...The only "new thing" I found is this line:

Code: Select all

11:24:25:WU01:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
However, 7im told me that it's unused and hence can be ignored...
To elaborate a little, this feature was introduced in FahCore_17 version 0.0.52 (IIRC) so that users with only a single Nvidia GPU can stop folding for X time when Y temperature was reached. This feature doesn't work on AMD GPUs and on multiple Nvidia GPUs. Since you haven't seen this message before, it suggested that something fishy occurred with the FahCore_17 update.
Eagle wrote:...So, the only real change is the change of the work-directory from my hard disk to my solid-state drive. Can this really be causing "lost lifeline" and things like that? I can hardly imagine that, but then again, I'm just a passionate FAH user, no insider...
That shouldn't cause an issue at all. I have kept the data directory on HDD Drive with the program folder on the SSD Drive and can fold without any issues. Here's my configuration:

Code: Select all

*********************** Log Started 2014-05-28T20:00:37Z ***********************
20:00:37:************************* Folding@home Client *************************
20:00:37:      Website: http://folding.stanford.edu/
20:00:37:    Copyright: (c) 2009-2014 Stanford University
20:00:37:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
20:00:37:         Args: 
20:00:37:       Config: D:/FAH/V7/config.xml
20:00:37:******************************** Build ********************************
20:00:37:      Version: 7.4.4
20:00:37:         Date: Mar 4 2014
20:00:37:         Time: 20:26:54
20:00:37:      SVN Rev: 4130
20:00:37:       Branch: fah/trunk/client
20:00:37:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
20:00:37:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
20:00:37:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
20:00:37:     Platform: win32 XP
20:00:37:         Bits: 32
20:00:37:         Mode: Release
20:00:37:******************************* System ********************************
20:00:37:          CPU: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
20:00:37:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
20:00:37:         CPUs: 8
20:00:37:       Memory: 15.89GiB
20:00:37:  Free Memory: 12.08GiB
20:00:37:      Threads: WINDOWS_THREADS
20:00:37:   OS Version: 6.2
20:00:37:  Has Battery: true
20:00:37:   On Battery: false
20:00:37:   UTC Offset: 3
20:00:37:          PID: 2096
20:00:37:          CWD: D:/FAH/V7
20:00:37:           OS: Windows 8 Pro
20:00:37:      OS Arch: AMD64
20:00:37:         GPUs: 1
20:00:37:        GPU 0: NVIDIA:2 GF114 [GeForce GTX 675M]
20:00:37:         CUDA: 2.1
20:00:37:  CUDA Driver: 6000
20:00:37:Win32 Service: false
20:00:37:***********************************************************************
20:00:38:<config>
20:00:38:  <!-- Network -->
20:00:38:  <proxy v=':8080'/>
20:00:38:
20:00:38:  <!-- Remote Command Server -->
20:00:38:  <password v='*********'/>
20:00:38:
20:00:38:  <!-- Slot Control -->
20:00:38:  <power v='full'/>
20:00:38:
20:00:38:  <!-- User Information -->
20:00:38:  <passkey v='********************************'/>
20:00:38:  <team v='69411'/>
20:00:38:  <user v='PantherX'/>
20:00:38:
20:00:38:  <!-- Folding Slots -->
20:00:38:  <slot id='0' type='CPU'>
20:00:38:    <cpus v='7'/>
20:00:38:    <max-packet-size v='small'/>
20:00:38:    <max-slot-errors v='1'/>
20:00:38:    <max-unit-errors v='1'/>
20:00:38:    <next-unit-percentage v='100'/>
20:00:38:    <pause-on-start v='true'/>
20:00:38:  </slot>
20:00:38:  <slot id='1' type='GPU'>
20:00:38:    <max-slot-errors v='1'/>
20:00:38:    <max-unit-errors v='1'/>
20:00:38:    <next-unit-percentage v='100'/>
20:00:38:    <pause-on-start v='true'/>
20:00:38:  </slot>
20:00:38:</config>
20:00:38:Connecting to assign-GPU.stanford.edu:80
20:00:39:Updated GPUs.txt
20:00:39:Read GPUs.txt
20:00:39:Trying to access database...
20:00:40:Successfully acquired database lock
20:00:40:Enabled folding slot 00: PAUSED cpu:7 (by user)
20:00:40:Enabled folding slot 01: PAUSED gpu:0:GF114 [GeForce GTX 675M] (by user)
20:04:22:Clean exit
Eagle wrote:...Any further information would be greatly appreciated!
Since this issue started few weeks ago, my best guess would be a mangled FahCore update. As a test, can you please replace the beta value with advanced and see if it continues to fold FahCore_17 WUs properly? Do note that with the advanced flag, you might download FahCore_17 version 0.0.52 in a different directory and continue to fold FahCore_17 WU without issues. If this works fine, then you can revert back to your original setup by doing a fresh installation.

Re: Core 17 has suddenly started crashing

Posted: Fri Jun 06, 2014 9:14 am
by Eagle
PantherX wrote:This information is always printed in the log file every 6 hours by default.
Thanks for the information!
PantherX wrote:To elaborate a little, this feature was introduced in FahCore_17 version 0.0.52 (IIRC) so that users with only a single Nvidia GPU can stop folding for X time when Y temperature was reached. This feature doesn't work on AMD GPUs and on multiple Nvidia GPUs. Since you haven't seen this message before, it suggested that something fishy occurred with the FahCore_17 update.
That makes me wonder..
PantherX wrote:That shouldn't cause an issue at all. I have kept the data directory on HDD Drive with the program folder on the SSD Drive and can fold without any issues. Here's my configuration:

Code: Select all

*********************** Log Started 2014-05-28T20:00:37Z ***********************
20:00:37:************************* Folding@home Client *************************
20:00:37:      Website: http://folding.stanford.edu/
20:00:37:    Copyright: (c) 2009-2014 Stanford University
20:00:37:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
20:00:37:         Args: 
20:00:37:       Config: D:/FAH/V7/config.xml
20:00:37:******************************** Build ********************************
20:00:37:      Version: 7.4.4
20:00:37:         Date: Mar 4 2014
20:00:37:         Time: 20:26:54
20:00:37:      SVN Rev: 4130
20:00:37:       Branch: fah/trunk/client
20:00:37:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
20:00:37:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
20:00:37:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
20:00:37:     Platform: win32 XP
20:00:37:         Bits: 32
20:00:37:         Mode: Release
20:00:37:******************************* System ********************************
20:00:37:          CPU: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
20:00:37:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
20:00:37:         CPUs: 8
20:00:37:       Memory: 15.89GiB
20:00:37:  Free Memory: 12.08GiB
20:00:37:      Threads: WINDOWS_THREADS
20:00:37:   OS Version: 6.2
20:00:37:  Has Battery: true
20:00:37:   On Battery: false
20:00:37:   UTC Offset: 3
20:00:37:          PID: 2096
20:00:37:          CWD: D:/FAH/V7
20:00:37:           OS: Windows 8 Pro
20:00:37:      OS Arch: AMD64
20:00:37:         GPUs: 1
20:00:37:        GPU 0: NVIDIA:2 GF114 [GeForce GTX 675M]
20:00:37:         CUDA: 2.1
20:00:37:  CUDA Driver: 6000
20:00:37:Win32 Service: false
20:00:37:***********************************************************************
20:00:38:<config>
20:00:38:  <!-- Network -->
20:00:38:  <proxy v=':8080'/>
20:00:38:
20:00:38:  <!-- Remote Command Server -->
20:00:38:  <password v='*********'/>
20:00:38:
20:00:38:  <!-- Slot Control -->
20:00:38:  <power v='full'/>
20:00:38:
20:00:38:  <!-- User Information -->
20:00:38:  <passkey v='********************************'/>
20:00:38:  <team v='69411'/>
20:00:38:  <user v='PantherX'/>
20:00:38:
20:00:38:  <!-- Folding Slots -->
20:00:38:  <slot id='0' type='CPU'>
20:00:38:    <cpus v='7'/>
20:00:38:    <max-packet-size v='small'/>
20:00:38:    <max-slot-errors v='1'/>
20:00:38:    <max-unit-errors v='1'/>
20:00:38:    <next-unit-percentage v='100'/>
20:00:38:    <pause-on-start v='true'/>
20:00:38:  </slot>
20:00:38:  <slot id='1' type='GPU'>
20:00:38:    <max-slot-errors v='1'/>
20:00:38:    <max-unit-errors v='1'/>
20:00:38:    <next-unit-percentage v='100'/>
20:00:38:    <pause-on-start v='true'/>
20:00:38:  </slot>
20:00:38:</config>
20:00:38:Connecting to assign-GPU.stanford.edu:80
20:00:39:Updated GPUs.txt
20:00:39:Read GPUs.txt
20:00:39:Trying to access database...
20:00:40:Successfully acquired database lock
20:00:40:Enabled folding slot 00: PAUSED cpu:7 (by user)
20:00:40:Enabled folding slot 01: PAUSED gpu:0:GF114 [GeForce GTX 675M] (by user)
20:04:22:Clean exit
Since this issue started few weeks ago, my best guess would be a mangled FahCore update.
So, although your basic setup equaled mine, you didn't experience the error, but I did. Very strange..
PantherX wrote:As a test, can you please replace the beta value with advanced and see if it continues to fold FahCore_17 WUs properly? Do note that with the advanced flag, you might download FahCore_17 version 0.0.52 in a different directory and continue to fold FahCore_17 WU without issues. If this works fine, then you can revert back to your original setup by doing a fresh installation.
Core 17 Beta will take about an hour and a half, before it's finish. But right afterwards, I'm going to try your suggestion and report back afterwards.

Re: Core 17 has suddenly started crashing

Posted: Fri Jun 06, 2014 12:52 pm
by Eagle
Alright, this is getting strange now! Everything runs just fine:

Code: Select all

12:00:02:WU00:FS01:Connecting to 171.67.108.201:80
12:00:02:WU00:FS01:Assigned to work server 171.64.65.93
12:00:02:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 171.64.65.93
12:00:02:WU00:FS01:Connecting to 171.64.65.93:8080
12:00:03:WU00:FS01:Downloading 2.92MiB
12:00:09:WU00:FS01:Download 23.51%
12:00:15:WU00:FS01:Download 57.70%
12:00:21:WU00:FS01:Download 89.76%
12:00:22:WU00:FS01:Download complete
12:00:22:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9102 run:2 clone:19 gen:25 core:0x17 unit:0x0000001e0a3b1e81537c069339e5f10d
12:00:22:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
12:00:22:WU00:FS01:Connecting to web.stanford.edu:80
12:00:22:WU00:FS01:FahCore 17: Downloading 2.55MiB
12:00:28:WU00:FS01:FahCore 17: 34.31%
12:00:34:WU00:FS01:FahCore 17: 73.52%
12:00:38:WU00:FS01:FahCore 17: Download complete
12:00:38:WU00:FS01:Valid core signature
12:00:38:WU00:FS01:Unpacked 8.60MiB to cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe
12:00:38:WU00:FS01:Starting
12:00:38:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/USER/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 6128 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:00:38:WU00:FS01:Started FahCore on PID 1620
12:00:38:WU00:FS01:Core PID:1664
12:00:38:WU00:FS01:FahCore 0x17 started
12:00:39:WU00:FS01:0x17:*********************** Log Started 2014-06-06T12:00:38Z ***********************
12:00:39:WU00:FS01:0x17:Project: 9102 (Run 2, Clone 19, Gen 25)
12:00:39:WU00:FS01:0x17:Unit: 0x0000001e0a3b1e81537c069339e5f10d
12:00:39:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:00:39:WU00:FS01:0x17:Machine: 1
12:00:39:WU00:FS01:0x17:Reading tar file state.xml
12:00:39:WU00:FS01:0x17:Reading tar file system.xml
12:00:40:WU00:FS01:0x17:Reading tar file integrator.xml
12:00:40:WU00:FS01:0x17:Reading tar file core.xml
12:00:40:WU00:FS01:0x17:Digital signatures verified
12:00:40:WU00:FS01:0x17:Folding@home GPU core17
12:00:40:WU00:FS01:0x17:Version 0.0.52
12:02:08:WU00:FS01:0x17:Completed 0 out of 2500000 steps (0%)
12:02:08:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
12:05:00:WU00:FS01:0x17:Completed 25000 out of 2500000 steps (1%)
12:07:38:WU00:FS01:0x17:Completed 50000 out of 2500000 steps (2%)
12:10:22:WU00:FS01:0x17:Completed 75000 out of 2500000 steps (3%)
12:12:54:WU00:FS01:0x17:Completed 100000 out of 2500000 steps (4%)
12:15:42:WU00:FS01:0x17:Completed 125000 out of 2500000 steps (5%)
12:18:16:WU00:FS01:0x17:Completed 150000 out of 2500000 steps (6%)
12:22:01:WU00:FS01:0x17:Completed 175000 out of 2500000 steps (7%)
12:25:14:WU00:FS01:0x17:Completed 200000 out of 2500000 steps (8%)
12:28:58:WU00:FS01:0x17:Completed 225000 out of 2500000 steps (9%)
12:32:11:WU00:FS01:0x17:Completed 250000 out of 2500000 steps (10%)
(...)
No lost lifeline, not a single error up until 10% - and since the initial error(s) happened way before reaching 5%, I assume the current "incarnation" of Core 17, version 0.0.52, is correct. Should I try and return to my previous setup? I don't want a WU mess being sent back to Stanford because of all these trial-and-error actions..

Re: Core 17 has suddenly started crashing

Posted: Fri Jun 06, 2014 1:32 pm
by 7im
This is at least the second time I have seen moving the data location from a secondary drive back to the primary drive fix a weird problem. But I have also seen dual locations work as PX has shown again.

I also don't know why Eagle's X: drive location worked for so long and then stopped working. Maybe drives A through F work better, and G - Z have an issue?

Re: Core 17 has suddenly started crashing

Posted: Fri Jun 06, 2014 5:10 pm
by davidcoton
I suggest that it was the core update that didn't like the data directory location. That could explain the timing in this case. But I don't know why, and I can't explain why it seems to work in other cases -- presumably there would be more reports if core updates always failed with a non-standard data directory.

David

Re: Core 17 has suddenly started crashing

Posted: Fri Jun 06, 2014 9:10 pm
by PantherX
Eagle wrote:...So, although your basic setup equaled mine, you didn't experience the error, but I did. Very strange...
You might be in luck since I haven't folded on my laptop with V7, only the Folding App. Thus, the cores haven't updated. I can start-up V7 and monitor it to see if the issue is replicated or not.
Eagle wrote:Alright, this is getting strange now! Everything runs just fine:...No lost lifeline, not a single error up until 10% - and since the initial error(s) happened way before reaching 5%, I assume the current "incarnation" of Core 17, version 0.0.52, is correct. Should I try and return to my previous setup? I don't want a WU mess being sent back to Stanford because of all these trial-and-error actions..
That's good news. If you do want to return to your previous set-up, set the slot to finish and once all WUs are completed, you can perform a fresh installation (remember that during uninstallation, select the option to delete the data) and hope for the best. To avoid burning through WUs, what you can do is once you start-up folding on the new set-up, simply select the Slot, right-click it and select Finish. Thus, if it errors out, no new WU will be downloaded. If it finishes successfully, it means that you can keep the set-up and fold happily.

Re: Core 17 has suddenly started crashing

Posted: Tue Jun 17, 2014 8:58 pm
by Eagle
Sorry for the late reply, but I had personal issues to deal with and I also wanted to get reliable results.
Now, let's get right to it:
7im wrote:This is at least the second time I have seen moving the data location from a secondary drive back to the primary drive fix a weird problem. But I have also seen dual locations work as PX has shown again.

I also don't know why Eagle's X: drive location worked for so long and then stopped working. Maybe drives A through F work better, and G - Z have an issue?
FAH finished and was removed completely (including the folders on the X: drive) afterwards. I then rebooted my machine, installed FAH via the freshly downloaded installer (with the data directory being on the X: drive again) and configured it like this:

Code: Select all

20:30:23:<config>
20:30:23:  <!-- Network -->
20:30:23:  <proxy v=':8080'/>
20:30:23:
20:30:23:  <!-- Slot Control -->
20:30:23:  <power v='full'/>
20:30:23:
20:30:23:  <!-- User Information -->
20:30:23:  <passkey v='********************************'/>
20:30:23:  <team v='34361'/>
20:30:23:  <user v='Eagle'/>
20:30:23:
20:30:23:  <!-- Folding Slots -->
20:30:23:  <slot id='0' type='CPU'/>
20:30:23:  <slot id='1' type='GPU'/>
20:30:23:</config>
As you can see, I didn't enable advanced and/or beta folding. However, right after installing I did notice these two changes:
1. Right-clicking FAH's Systray-icon opened the context menu right away and at the icon's position - previously, it took about 1-3 seconds and it opened at the cursor's position (wherever I moved it to in the meantime).
2. Although no beta (i.e. FAH 0.0.55 right now, IIRC) was requested and it did download 0.0.52, I received (and still receive) the temperature line:

Code: Select all

WU02:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
davidcoton wrote:I suggest that it was the core update that didn't like the data directory location. That could explain the timing in this case. But I don't know why, and I can't explain why it seems to work in other cases -- presumably there would be more reports if core updates always failed with a non-standard data directory.

David
Since FAH folded for a while now, I don't understand why it folds with 0.0.52 again (using the same program and data location prior to the "single drive test") without the previous error?!
PantherX wrote:You might be in luck since I haven't folded on my laptop with V7, only the Folding App. Thus, the cores haven't updated. I can start-up V7 and monitor it to see if the issue is replicated or not.
It would be great if you can do so. :)
PantherX wrote:That's good news. If you do want to return to your previous set-up, set the slot to finish and once all WUs are completed, you can perform a fresh installation (remember that during uninstallation, select the option to delete the data) and hope for the best. To avoid burning through WUs, what you can do is once you start-up folding on the new set-up, simply select the Slot, right-click it and select Finish. Thus, if it errors out, no new WU will be downloaded. If it finishes successfully, it means that you can keep the set-up and fold happily.
Thanks for all those handy tips. They're all appreciated and being followed. :) Results were/are as described above.

Although I'd like to investigate this to the point of "total clarification", I do want to take a break here and thank you guys for helping me out and guiding me to a point where my FAH setup returned to its desired state. Thanks a lot! :)
If I can be of any further help, please let me know!

Re: Core 17 has suddenly started crashing

Posted: Wed Jun 18, 2014 12:38 pm
by PantherX
Eagle wrote:...1. Right-clicking FAH's Systray-icon opened the context menu right away and at the icon's position - previously, it took about 1-3 seconds and it opened at the cursor's position (wherever I moved it to in the meantime)...
Humm, that was fixed in V7.3.7 (https://fah.stanford.edu/projects/FAHClient/ticket/994) which meant that once you installed V7.4.4, you should have gotten the proper version.
Eagle wrote:...2. Although no beta (i.e. FAH 0.0.55 right now, IIRC) was requested and it did download 0.0.52, I received (and still receive) the temperature line:

Code: Select all

WU02:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
...
That is expected since it was a new feature introduced to Windows only version 0.0.52.
Eagle wrote:...It would be great if you can do so. :)...
Sure, might have to schedule it for the weekend (at the earliest) so I can monitor it. Will post the results here once I have completed the test.

Re: Core 17 has suddenly started crashing

Posted: Wed Jun 18, 2014 1:22 pm
by Eagle
That's strange, because I do swear that I've experienced it with 7.4.4.

Regarding the message: but I've never seen it before the advanced/beta try, although FAH was already at version 0.0.52 back then?!

Alright, I'll wait then. But please note that just after posting yesterday, the error returned just today:

Code: Select all

02:59:26:WU01:FS01:0x15:Folding@home Core Shutdown: FINISHED_UNIT
02:59:26:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:59:26:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:7624 run:423 clone:0 gen:299 core:0x15 unit:0x00000199664f2dd14fe612f7ab56012e
02:59:26:WU01:FS01:Uploading 807.35KiB to 171.64.65.105
02:59:26:WU02:FS01:Starting
02:59:26:WU01:FS01:Connecting to 171.64.65.105:8080
02:59:26:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 8264 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
02:59:26:WU02:FS01:Started FahCore on PID 12180
02:59:27:WU02:FS01:Core PID:10900
02:59:27:WU02:FS01:FahCore 0x17 started
02:59:27:WU02:FS01:0x17:*********************** Log Started 2014-06-18T02:59:27Z ***********************
02:59:27:WU02:FS01:0x17:Project: 9406 (Run 29, Clone 0, Gen 61)
02:59:27:WU02:FS01:0x17:Unit: 0x000000600a3b1e5c533dd15c8f1365b9
02:59:27:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
02:59:27:WU02:FS01:0x17:Machine: 1
02:59:27:WU02:FS01:0x17:Reading tar file state.xml
02:59:28:WU02:FS01:0x17:Reading tar file system.xml
02:59:29:WU02:FS01:0x17:Reading tar file integrator.xml
02:59:29:WU02:FS01:0x17:Reading tar file core.xml
02:59:29:WU02:FS01:0x17:Digital signatures verified
02:59:29:WU02:FS01:0x17:Folding@home GPU core17
02:59:29:WU02:FS01:0x17:Version 0.0.52
02:59:32:WU01:FS01:Upload 47.56%
02:59:37:WU01:FS01:Upload complete
02:59:38:WU01:FS01:Server responded WORK_ACK (400)
02:59:38:WU01:FS01:Final credit estimate, 14093.00 points
02:59:38:WU01:FS01:Cleaning up
03:04:10:WU02:FS01:0x17:Completed 0 out of 2000000 steps (0%)
03:04:10:WU02:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:07:56:WU02:FS01:0x17:Completed 20000 out of 2000000 steps (1%)
(...)
All Folding went just fine with only percentages being logged, hence cut.
(...)
08:11:48:WU02:FS01:0x17:Completed 1740000 out of 2000000 steps (87%)

--- REBOOT (caused by Windows updates) ---

*********************** Log Started 2014-06-18T08:16:34Z ***********************
08:16:34:************************* Folding@home Client *************************
08:16:34:      Website: http://folding.stanford.edu/
08:16:34:    Copyright: (c) 2009-2014 Stanford University
08:16:34:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:16:34:         Args: 
08:16:34:       Config: X:/Folding At Home/config.xml
08:16:34:******************************** Build ********************************
08:16:34:      Version: 7.4.4
08:16:34:         Date: Mar 4 2014
08:16:34:         Time: 20:26:54
08:16:34:      SVN Rev: 4130
08:16:34:       Branch: fah/trunk/client
08:16:34:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
08:16:34:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
08:16:34:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
08:16:34:     Platform: win32 XP
08:16:34:         Bits: 32
08:16:34:         Mode: Release
08:16:34:******************************* System ********************************
08:16:34:          CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
08:16:34:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
08:16:34:         CPUs: 8
08:16:34:       Memory: 31.97GiB
08:16:34:  Free Memory: 20.66GiB
08:16:34:      Threads: WINDOWS_THREADS
08:16:34:   OS Version: 6.1
08:16:34:  Has Battery: false
08:16:34:   On Battery: false
08:16:34:   UTC Offset: 2
08:16:34:          PID: 5196
08:16:34:          CWD: X:/Folding At Home
08:16:34:           OS: Windows 7 Ultimate
08:16:34:      OS Arch: AMD64
08:16:34:         GPUs: 1
08:16:34:        GPU 0: NVIDIA:3 GK110 [GeForce GTX 780]
08:16:34:         CUDA: 3.5
08:16:34:  CUDA Driver: 6000
08:16:34:Win32 Service: false
08:16:34:***********************************************************************
08:16:34:<config>
08:16:34:  <!-- Network -->
08:16:34:  <proxy v=':8080'/>
08:16:34:
08:16:34:  <!-- Slot Control -->
08:16:34:  <power v='full'/>
08:16:34:
08:16:34:  <!-- User Information -->
08:16:34:  <passkey v='********************************'/>
08:16:34:  <team v='34361'/>
08:16:34:  <user v='Eagle'/>
08:16:34:
08:16:34:  <!-- Folding Slots -->
08:16:34:  <slot id='0' type='CPU'/>
08:16:34:  <slot id='1' type='GPU'/>
08:16:34:</config>
08:16:34:Trying to access database...
08:16:35:Successfully acquired database lock
08:16:35:Enabled folding slot 00: READY cpu:7
08:16:35:Enabled folding slot 01: READY gpu:0:GK110 [GeForce GTX 780]
08:16:35:WU00:FS00:Starting
08:16:35:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe" -dir 00 -suffix 01 -version 704 -lifeline 5196 -checkpoint 15 -np 7
08:16:35:WU00:FS00:Started FahCore on PID 6840
08:16:35:WU00:FS00:Core PID:6852
08:16:35:WU00:FS00:FahCore 0xa3 started

--- NOTE: I've cut the CPU slot log below this line ---

08:16:35:WU02:FS01:Starting
08:16:35:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5196 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
08:16:35:WU02:FS01:Started FahCore on PID 6952
08:16:36:WU02:FS01:Core PID:7012
08:16:36:WU02:FS01:FahCore 0x17 started
08:16:37:WU02:FS01:0x17:*********************** Log Started 2014-06-18T08:16:37Z ***********************
08:16:37:WU02:FS01:0x17:Project: 9406 (Run 29, Clone 0, Gen 61)
08:16:37:WU02:FS01:0x17:Unit: 0x000000600a3b1e5c533dd15c8f1365b9
08:16:37:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
08:16:37:WU02:FS01:0x17:Machine: 1
08:16:37:WU02:FS01:0x17:Digital signatures verified
08:16:37:WU02:FS01:0x17:Folding@home GPU core17
08:16:37:WU02:FS01:0x17:Version 0.0.52
08:16:37:WU02:FS01:0x17:  Found a checkpoint file
08:21:29:WU02:FS01:0x17:Completed 1700000 out of 2000000 steps (85%)
08:21:29:WU02:FS01:0x17:Lost lifeline PID 6952, exiting
08:21:29:WU02:FS01:0x17:Lost lifeline PID 6952, exiting
08:21:29:WU02:FS01:0x17:ERROR:103: Lost client lifeline
08:21:29:WU02:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
08:21:30:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
08:21:30:WARNING:WU02:FS01:FahCore returned: CLIENT_DIED (103 = 0x67)
08:21:30:WU02:FS01:Starting
08:21:30:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5196 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
08:21:30:WU02:FS01:Started FahCore on PID 8120
08:21:30:WU02:FS01:Core PID:4220
08:21:30:WU02:FS01:FahCore 0x17 started
08:21:30:WU02:FS01:0x17:*********************** Log Started 2014-06-18T08:21:30Z ***********************
08:21:30:WU02:FS01:0x17:Project: 9406 (Run 29, Clone 0, Gen 61)
08:21:30:WU02:FS01:0x17:Unit: 0x000000600a3b1e5c533dd15c8f1365b9
08:21:30:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
08:21:30:WU02:FS01:0x17:Machine: 1
08:21:30:WU02:FS01:0x17:Digital signatures verified
08:21:30:WU02:FS01:0x17:Folding@home GPU core17
08:21:30:WU02:FS01:0x17:Version 0.0.52
08:21:30:WU02:FS01:0x17:  Found a checkpoint file
08:26:50:WU02:FS01:0x17:Completed 1700000 out of 2000000 steps (85%)
08:26:50:WU02:FS01:0x17:Lost lifeline PID 8120, exiting
08:26:50:WU02:FS01:0x17:Lost lifeline PID 8120, exiting
08:26:50:WU02:FS01:0x17:ERROR:103: Lost client lifeline
08:26:50:WU02:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
08:26:50:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
08:26:50:WARNING:WU02:FS01:FahCore returned: CLIENT_DIED (103 = 0x67)
08:26:50:WU02:FS01:Starting
08:26:50:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5196 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
08:26:50:WU02:FS01:Started FahCore on PID 7488
08:26:50:WU02:FS01:Core PID:5940
08:26:50:WU02:FS01:FahCore 0x17 started
08:26:51:WU02:FS01:0x17:*********************** Log Started 2014-06-18T08:26:50Z ***********************
08:26:51:WU02:FS01:0x17:Project: 9406 (Run 29, Clone 0, Gen 61)
08:26:51:WU02:FS01:0x17:Unit: 0x000000600a3b1e5c533dd15c8f1365b9
08:26:51:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
08:26:51:WU02:FS01:0x17:Machine: 1
08:26:51:WU02:FS01:0x17:Digital signatures verified
08:26:51:WU02:FS01:0x17:Folding@home GPU core17
08:26:51:WU02:FS01:0x17:Version 0.0.52
08:26:51:WU02:FS01:0x17:  Found a checkpoint file
08:31:38:WU02:FS01:0x17:Completed 1700000 out of 2000000 steps (85%)
08:31:38:WU02:FS01:0x17:Lost lifeline PID 7488, exiting
08:31:38:WU02:FS01:0x17:Lost lifeline PID 7488, exiting
08:31:38:WU02:FS01:0x17:ERROR:103: Lost client lifeline
08:31:38:WU02:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
08:31:38:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
08:31:38:WARNING:WU02:FS01:FahCore returned: CLIENT_DIED (103 = 0x67
In between, the following updates were installed (in chronological order from old to new) and I finally rebooted about 5 hours ago:

Code: Select all

Definition Update for Windows Defender - KB915597 (Definition 1.175.1478.0)
Definition Update for Windows Defender - KB915597 (Definition 1.175.1813.0)
Security update for Microsoft Office 2010 (KB2767915) 32-Bit-Edition
Update for Windows 7 for x64 based Systems (KB2952664)
Security update for Windows 7 for x64 based Systems (KB2957503)
Definition Update for Microsoft Office 2010 (KB982726) 32-Bit-Edition
Cumulative Security Update for Internet Explorer 11 for Windows 7 for x64 Systems (KB2957689)
Update for Microsoft Word 2010 (KB2880529) 32-Bit-Edition
Security Update for Windows 7 for x64 based Systems (KB2965788)
Update for Windows 7 for x64 based Systems (KB2800095)
Security Update for Windows 7 for x64 based Systems (KB2939576)
Security Update for Windows 7 for x64 based Systems (KB2957189)
Security Update for Windows 7 for x64 based Systems (KB2957509)
Windows Malicious Software Removal Tool x64 - June 2014 (KB890830)
Definition Update for Windows Defender - KB915597 (Definition 1.175.2521.0)
I'm starting to believe FAH wants my SSD to be heavily used (which I try to avoid for obvious reasons).. :?

Re: Core 17 has suddenly started crashing

Posted: Wed Jun 18, 2014 7:38 pm
by bruce
Eagle wrote:That's strange, because I do swear that I've experienced it with 7.4.4.

Regarding the message: but I've never seen it before the advanced/beta try, although FAH was already at version 0.0.52 back then?!
Version numbers of FAHClients, of a Linux FahCore_*, and a Windows FahCore_* are independent of each other. Last I checked, the latest Linux FahCore_17 was 0.0.46, the latest WIndows FahCore_17 was 0.0.52 (0.0.55 is being tested) and the FAHClient was 7.4.4. Each one can be updated independently and presumably work with unmodified versions of something else. You're responsible for updating FAHClient. Periodically the FahCores update themselves automatically, depending on the requirements of the WU you're assigned.

Re: Core 17 has suddenly started crashing

Posted: Wed Jun 18, 2014 11:10 pm
by PantherX
Eagle wrote:...I'm starting to believe FAH wants my SSD to be heavily used (which I try to avoid for obvious reasons).. :?
It is understandable if you want to minimize data written on an SSD. However, it seems that unless you write several GBs of data everyday, a consumer SSD would last quite a while with normal usage (http://techreport.com/review/26523/the- ... a-petabyte). Pretty sure that F@H alone is no way close to writing that much data so it should be fairly safe.

I started up my GTX 675M and so far it is working fine:

Code: Select all

*********************** Log Started 2014-06-18T21:06:56Z ***********************
21:06:56:************************* Folding@home Client *************************
21:06:56:      Website: http://folding.stanford.edu/
21:06:56:    Copyright: (c) 2009-2014 Stanford University
21:06:56:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:06:56:         Args: 
21:06:56:       Config: D:/FAH/V7/config.xml
21:06:56:******************************** Build ********************************
21:06:56:      Version: 7.4.4
21:06:56:         Date: Mar 4 2014
21:06:56:         Time: 20:26:54
21:06:56:      SVN Rev: 4130
21:06:56:       Branch: fah/trunk/client
21:06:56:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
21:06:56:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
21:06:56:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
21:06:56:     Platform: win32 XP
21:06:56:         Bits: 32
21:06:56:         Mode: Release
21:06:56:******************************* System ********************************
21:06:56:          CPU: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
21:06:56:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
21:06:56:         CPUs: 8
21:06:56:       Memory: 15.89GiB
21:06:56:  Free Memory: 11.29GiB
21:06:56:      Threads: WINDOWS_THREADS
21:06:56:   OS Version: 6.2
21:06:56:  Has Battery: true
21:06:56:   On Battery: false
21:06:56:   UTC Offset: 3
21:06:56:          PID: 10568
21:06:56:          CWD: D:/FAH/V7
21:06:56:           OS: Windows 8 Pro
21:06:56:      OS Arch: AMD64
21:06:56:         GPUs: 1
21:06:56:        GPU 0: NVIDIA:2 GF114 [GeForce GTX 675M]
21:06:56:         CUDA: 2.1
21:06:56:  CUDA Driver: 6000
21:06:56:Win32 Service: false
21:06:56:***********************************************************************
21:06:56:<config>
21:06:56:  <!-- Network -->
21:06:56:  <proxy v=':8080'/>
21:06:56:
21:06:56:  <!-- Remote Command Server -->
21:06:56:  <password v='*********'/>
21:06:56:
21:06:56:  <!-- Slot Control -->
21:06:56:  <power v='full'/>
21:06:56:
21:06:56:  <!-- User Information -->
21:06:56:  <passkey v='********************************'/>
21:06:56:  <team v='69411'/>
21:06:56:  <user v='PantherX'/>
21:06:56:
21:06:56:  <!-- Folding Slots -->
21:06:56:  <slot id='0' type='CPU'>
21:06:56:    <cpus v='7'/>
21:06:56:    <max-packet-size v='small'/>
21:06:56:    <max-slot-errors v='1'/>
21:06:56:    <max-unit-errors v='1'/>
21:06:56:    <next-unit-percentage v='100'/>
21:06:56:    <pause-on-start v='true'/>
21:06:56:  </slot>
21:06:56:  <slot id='1' type='GPU'>
21:06:56:    <max-slot-errors v='1'/>
21:06:56:    <max-unit-errors v='1'/>
21:06:56:    <next-unit-percentage v='100'/>
21:06:56:    <pause-on-start v='true'/>
21:06:56:  </slot>
21:06:56:</config>
21:06:56:Trying to access database...
21:06:56:Successfully acquired database lock
21:06:56:Enabled folding slot 00: PAUSED cpu:7 (by user)
21:06:56:Enabled folding slot 01: PAUSED gpu:0:GF114 [GeForce GTX 675M] (by user)
21:07:06:FS01:Unpaused
21:07:07:WU00:FS01:Connecting to 171.67.108.201:80
21:07:11:WU00:FS01:Assigned to work server 140.163.4.231
21:07:11:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF114 [GeForce GTX 675M] from 140.163.4.231
21:07:11:WU00:FS01:Connecting to 140.163.4.231:8080
21:07:12:WU00:FS01:Downloading 4.84MiB
21:07:17:WU00:FS01:Download complete
21:07:17:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:316 clone:0 gen:16 core:0x17 unit:0x00000028538b3db75328a93e6b24aff3
21:07:18:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
21:07:18:WU00:FS01:Connecting to web.stanford.edu:80
21:07:32:WU00:FS01:FahCore 17: Downloading 2.55MiB
21:07:38:WU00:FS01:FahCore 17: 19.60%
21:07:44:WU00:FS01:FahCore 17: 46.56%
21:07:50:WU00:FS01:FahCore 17: 71.07%
21:07:56:WU00:FS01:FahCore 17: 98.02%
21:07:56:WU00:FS01:FahCore 17: Download complete
21:07:57:WU00:FS01:Valid core signature
21:07:57:WU00:FS01:Unpacked 8.60MiB to cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe
21:07:57:WU00:FS01:Starting
21:07:57:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/FAH/V7/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 10568 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:07:57:WU00:FS01:Started FahCore on PID 4744
21:07:57:WU00:FS01:Core PID:10104
21:07:57:WU00:FS01:FahCore 0x17 started
21:07:58:WU00:FS01:0x17:*********************** Log Started 2014-06-18T21:07:58Z ***********************
21:07:58:WU00:FS01:0x17:Project: 13001 (Run 316, Clone 0, Gen 16)
21:07:58:WU00:FS01:0x17:Unit: 0x00000028538b3db75328a93e6b24aff3
21:07:58:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
21:07:58:WU00:FS01:0x17:Machine: 1
21:07:58:WU00:FS01:0x17:Reading tar file state.xml
21:08:00:WU00:FS01:0x17:Reading tar file system.xml
21:08:01:WU00:FS01:0x17:Reading tar file integrator.xml
21:08:01:WU00:FS01:0x17:Reading tar file core.xml
21:08:01:WU00:FS01:0x17:Digital signatures verified
21:08:01:WU00:FS01:0x17:Folding@home GPU core17
21:08:01:WU00:FS01:0x17:Version 0.0.52
21:11:13:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
21:11:13:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:13:41:FS01:Finishing
21:45:34:WU00:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
22:20:10:WU00:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
22:55:32:WU00:FS01:0x17:Completed 150000 out of 5000000 steps (3%)

Re: Core 17 has suddenly started crashing

Posted: Thu Jun 19, 2014 10:29 pm
by Eagle
bruce wrote:Version numbers of FAHClients, of a Linux FahCore_*, and a Windows FahCore_* are independent of each other. Last I checked, the latest Linux FahCore_17 was 0.0.46, the latest WIndows FahCore_17 was 0.0.52 (0.0.55 is being tested) and the FAHClient was 7.4.4. Each one can be updated independently and presumably work with unmodified versions of something else. You're responsible for updating FAHClient. Periodically the FahCores update themselves automatically, depending on the requirements of the WU you're assigned.
You got me wrong on this. I had 7.4.4 installed right when it came out. About every 2-4 weeks, I check for a new FAH client package, because the client still lacks an auto-updater (no offense intended here, I'm patiently waiting for it).
However, AFAIR, the message of FAH 0.0.52 in question was logged _after_ I switched program- and data-directory onto C:, but _never_ before the switch, i.e. on the X: drive.
PantherX wrote:It is understandable if you want to minimize data written on an SSD. However, it seems that unless you write several GBs of data everyday, a consumer SSD would last quite a while with normal usage (http://techreport.com/review/26523/the- ... a-petabyte). Pretty sure that F@H alone is no way close to writing that much data so it should be fairly safe.
I know about reviews regarding the wear-leveling, even the facts that NAND flash from the Intel-Micron joint venture (I own a Crucial C300 with 256 GB) is considered to be (one of the) best in class with about 72 TB of data that can be written per cell before EOL and using a RAM disk for all temporary data, complemented by software-adjustments like no hard drive cache within Firefox, will greatly extend expectable lifetime of my SSD. But I'm a software architect/engineer, so I'm aiming for improvement - the CPU slot has no such issues, only the GPU one. That's why I do believe it's a bug and bugs can be fixed. ;)

Regarding my FAH setup, I'm not happy about the SSD usage. Weeks have passed since the error's origin was found, but no fix was provided until now. However, I do respect all the people working on FAH and the fact their time is limited - which might be why they work on other issues right now. So, I'm only waiting for GROMACS to finish and then switch to the one-drive workaround until a fix is available.
PantherX wrote:I started up my GTX 675M and so far it is working fine:

Code: Select all

*********************** Log Started 2014-06-18T21:06:56Z ***********************
21:06:56:************************* Folding@home Client *************************
21:06:56:      Website: http://folding.stanford.edu/
21:06:56:    Copyright: (c) 2009-2014 Stanford University
21:06:56:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:06:56:         Args: 
21:06:56:       Config: D:/FAH/V7/config.xml
21:06:56:******************************** Build ********************************
21:06:56:      Version: 7.4.4
21:06:56:         Date: Mar 4 2014
21:06:56:         Time: 20:26:54
21:06:56:      SVN Rev: 4130
21:06:56:       Branch: fah/trunk/client
21:06:56:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
21:06:56:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
21:06:56:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
21:06:56:     Platform: win32 XP
21:06:56:         Bits: 32
21:06:56:         Mode: Release
21:06:56:******************************* System ********************************
21:06:56:          CPU: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
21:06:56:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
21:06:56:         CPUs: 8
21:06:56:       Memory: 15.89GiB
21:06:56:  Free Memory: 11.29GiB
21:06:56:      Threads: WINDOWS_THREADS
21:06:56:   OS Version: 6.2
21:06:56:  Has Battery: true
21:06:56:   On Battery: false
21:06:56:   UTC Offset: 3
21:06:56:          PID: 10568
21:06:56:          CWD: D:/FAH/V7
21:06:56:           OS: Windows 8 Pro
21:06:56:      OS Arch: AMD64
21:06:56:         GPUs: 1
21:06:56:        GPU 0: NVIDIA:2 GF114 [GeForce GTX 675M]
21:06:56:         CUDA: 2.1
21:06:56:  CUDA Driver: 6000
21:06:56:Win32 Service: false
21:06:56:***********************************************************************
21:06:56:<config>
21:06:56:  <!-- Network -->
21:06:56:  <proxy v=':8080'/>
21:06:56:
21:06:56:  <!-- Remote Command Server -->
21:06:56:  <password v='*********'/>
21:06:56:
21:06:56:  <!-- Slot Control -->
21:06:56:  <power v='full'/>
21:06:56:
21:06:56:  <!-- User Information -->
21:06:56:  <passkey v='********************************'/>
21:06:56:  <team v='69411'/>
21:06:56:  <user v='PantherX'/>
21:06:56:
21:06:56:  <!-- Folding Slots -->
21:06:56:  <slot id='0' type='CPU'>
21:06:56:    <cpus v='7'/>
21:06:56:    <max-packet-size v='small'/>
21:06:56:    <max-slot-errors v='1'/>
21:06:56:    <max-unit-errors v='1'/>
21:06:56:    <next-unit-percentage v='100'/>
21:06:56:    <pause-on-start v='true'/>
21:06:56:  </slot>
21:06:56:  <slot id='1' type='GPU'>
21:06:56:    <max-slot-errors v='1'/>
21:06:56:    <max-unit-errors v='1'/>
21:06:56:    <next-unit-percentage v='100'/>
21:06:56:    <pause-on-start v='true'/>
21:06:56:  </slot>
21:06:56:</config>
21:06:56:Trying to access database...
21:06:56:Successfully acquired database lock
21:06:56:Enabled folding slot 00: PAUSED cpu:7 (by user)
21:06:56:Enabled folding slot 01: PAUSED gpu:0:GF114 [GeForce GTX 675M] (by user)
21:07:06:FS01:Unpaused
21:07:07:WU00:FS01:Connecting to 171.67.108.201:80
21:07:11:WU00:FS01:Assigned to work server 140.163.4.231
21:07:11:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF114 [GeForce GTX 675M] from 140.163.4.231
21:07:11:WU00:FS01:Connecting to 140.163.4.231:8080
21:07:12:WU00:FS01:Downloading 4.84MiB
21:07:17:WU00:FS01:Download complete
21:07:17:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:316 clone:0 gen:16 core:0x17 unit:0x00000028538b3db75328a93e6b24aff3
21:07:18:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
21:07:18:WU00:FS01:Connecting to web.stanford.edu:80
21:07:32:WU00:FS01:FahCore 17: Downloading 2.55MiB
21:07:38:WU00:FS01:FahCore 17: 19.60%
21:07:44:WU00:FS01:FahCore 17: 46.56%
21:07:50:WU00:FS01:FahCore 17: 71.07%
21:07:56:WU00:FS01:FahCore 17: 98.02%
21:07:56:WU00:FS01:FahCore 17: Download complete
21:07:57:WU00:FS01:Valid core signature
21:07:57:WU00:FS01:Unpacked 8.60MiB to cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe
21:07:57:WU00:FS01:Starting
21:07:57:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" D:/FAH/V7/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 10568 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:07:57:WU00:FS01:Started FahCore on PID 4744
21:07:57:WU00:FS01:Core PID:10104
21:07:57:WU00:FS01:FahCore 0x17 started
21:07:58:WU00:FS01:0x17:*********************** Log Started 2014-06-18T21:07:58Z ***********************
21:07:58:WU00:FS01:0x17:Project: 13001 (Run 316, Clone 0, Gen 16)
21:07:58:WU00:FS01:0x17:Unit: 0x00000028538b3db75328a93e6b24aff3
21:07:58:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
21:07:58:WU00:FS01:0x17:Machine: 1
21:07:58:WU00:FS01:0x17:Reading tar file state.xml
21:08:00:WU00:FS01:0x17:Reading tar file system.xml
21:08:01:WU00:FS01:0x17:Reading tar file integrator.xml
21:08:01:WU00:FS01:0x17:Reading tar file core.xml
21:08:01:WU00:FS01:0x17:Digital signatures verified
21:08:01:WU00:FS01:0x17:Folding@home GPU core17
21:08:01:WU00:FS01:0x17:Version 0.0.52
21:11:13:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
21:11:13:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:13:41:FS01:Finishing
21:45:34:WU00:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
22:20:10:WU00:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
22:55:32:WU00:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
Excuse my nit-picking question, but did it fold until 100%? I'm just asking, because mine went fine for about week and even on the first failing WU, it worked until 87% which, after a reboot, was "reset" to 85% (latest checkpoint) and then the client immediately died. :(

Re: Core 17 has suddenly started crashing

Posted: Thu Jun 19, 2014 11:02 pm
by 7im
The FAHClient and FAHCores would seem to be in a feature freeze at the moment. This bug also has a low occurrence rate (a few) and has a valid workaround. It will get very low priority, unfortunately.

Over time, FAH may be switching to a new type of FAHCore, which may render this (and many other pending issues) a non-issue. Realistically, I would not expect this issue to be addressed in the current FAHClient, unless the bug was embarrassingly obvious and easy to fix. Entschuldigung.

Re: Core 17 has suddenly started crashing

Posted: Thu Jun 19, 2014 11:24 pm
by Eagle
I'm an optimist - I'm waiting for 7.4.5 / 0.0.56 then and if that doesn't work, well, Maxwell is coming..
Trotzdem danke, 7im! ;)

Re: Core 17 has suddenly started crashing

Posted: Thu Jun 19, 2014 11:28 pm
by PantherX
Eagle wrote:...Excuse my nit-picking question, but did it fold until 100%? I'm just asking, because mine went fine for about week and even on the first failing WU, it worked until 87% which, after a reboot, was "reset" to 85% (latest checkpoint) and then the client immediately died. :(
It is still folding non-stop on the same WU of Project 13001 with the TPF of ~35 minutes. It is currently at 45% and folding (needs ~1.33 days to finish). Do you want me to reboot the system? If so, do you remember how you rebooted the system:
1) Exit FAHClient waited for sometime then rebooted the system
2) Simply rebooted the system without exiting the FAHClient

Also, how do you start-up FAHClient:
1) Automatically when Windows log-in
2) Manually via a schortcut

With the above information, I can have a better chance at reproducing your error and hopefully, find further similarities if not the root cause.