Page 1 of 1

Issue with Project: 8049 (R 777, C 4, G 107) and FAH Core A4

Posted: Fri Nov 09, 2012 2:08 pm
by Fahrenheit451
Since yesterday FAH Core A4 crashes (Windows APPCRASH) while processing Project: 8049 (Run 777, Clone 4, Gen 107). After confirming the Windows message the core starts new and continues folding.
Today, after the first crash, I stopped folding using FAHControl and deleted the folder "C:\Users\myUsername\AppData\Roaming\FAHClient\cores\www.stanford.edu\~pande\Win32\x86\Core_a4.fah". After a restart FAH has downloaded FAH Core A4 again and continued folding, but now Core A4 crashed again. Is it the core itself or does the WU cause the crash? Should I finish the WU or dump it?

Here is the latest logfile:

Code: Select all

*********************** Log Started 2012-11-09T12:26:49Z ***********************
12:26:49:WU00:FS01:Downloading core from http://www.stanford.edu/~pande/Win32/x86/Core_a4.fah
12:26:49:WU00:FS01:Connecting to www.stanford.edu:80
12:26:51:WU00:FS01:FahCore a4: Downloading 2.89MiB
12:26:57:WU00:FS01:FahCore a4: 34.63%
12:27:03:WU00:FS01:FahCore a4: 69.25%
12:27:08:WU00:FS01:FahCore a4: Download complete
12:27:08:WU00:FS01:Valid core signature
12:27:08:WU00:FS01:Unpacked 9.59MiB to cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe
12:27:08:WU00:FS01:Starting
12:27:08:WU00:FS01:Running FahCore: D:\Programme\FAHClient/FAHCoreWrapper.exe C:/Users/myUsername/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 701 -lifeline 5976 -checkpoint 15 -np 2
12:27:08:WU00:FS01:Started FahCore on PID 3828
12:27:09:WU00:FS01:Core PID:3712
12:27:09:WU00:FS01:FahCore 0xa4 started
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:*------------------------------*
12:27:11:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
12:27:11:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:Preparing to commence simulation
12:27:11:WU00:FS01:0xa4:- Looking at optimizations...
12:27:11:WU00:FS01:0xa4:- Files status OK
12:27:11:WU00:FS01:0xa4:- Expanded 967903 -> 2212988 (decompressed 228.6 percent)
12:27:11:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=967903 data_size=2212988, decompressed_data_size=2212988 diff=0
12:27:11:WU00:FS01:0xa4:- Digital signature verified
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:Project: 8049 (Run 777, Clone 4, Gen 107)
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:Assembly optimizations on if available.
12:27:11:WU00:FS01:0xa4:Entering M.D.
12:27:17:WU00:FS01:0xa4:Using Gromacs checkpoints
12:27:17:WU00:FS01:0xa4:Mapping NT from 2 to 2 
12:27:17:WU00:FS01:0xa4:Resuming from checkpoint
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.log
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.trr
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.edr
12:27:17:WU00:FS01:0xa4:Completed 115310 out of 250000 steps  (46%)
12:32:06:WU00:FS01:0xa4:Completed 117500 out of 250000 steps  (47%)
12:38:09:WU00:FS01:0xa4:Completed 120000 out of 250000 steps  (48%)
12:43:42:WU00:FS01:0xa4:Completed 122500 out of 250000 steps  (49%)
12:49:23:WU00:FS01:0xa4:Completed 125000 out of 250000 steps  (50%)
12:55:32:WU00:FS01:0xa4:Completed 127500 out of 250000 steps  (51%)
13:01:34:WU00:FS01:0xa4:Completed 130000 out of 250000 steps  (52%)
13:04:12:WU00:FS01:0xa4:Gromacs cannot continue further.
13:04:12:WU00:FS01:0xa4:Going to send back what have done -- stepsTotalG=250000
13:04:12:WU00:FS01:0xa4:Work fraction=0.5249 steps=250000.
13:36:52:WU00:FS01:FahCore returned: FAILED_3 (255 = 0xff)
13:36:52:WU00:FS01:Starting
13:36:52:WU00:FS01:Running FahCore: D:\Programme\FAHClient/FAHCoreWrapper.exe C:/Users/myUsername/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 701 -lifeline 5976 -checkpoint 15 -np 2
13:36:52:WU00:FS01:Started FahCore on PID 4600
13:36:52:WU00:FS01:Core PID:4728
13:36:52:WU00:FS01:FahCore 0xa4 started
13:36:52:WU00:FS01:0xa4:
13:36:52:WU00:FS01:0xa4:*------------------------------*
13:36:52:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
13:36:52:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
13:36:52:WU00:FS01:0xa4:
13:36:52:WU00:FS01:0xa4:Preparing to commence simulation
13:36:52:WU00:FS01:0xa4:- Ensuring status. Please wait.
13:37:01:WU00:FS01:0xa4:- Looking at optimizations...
13:37:01:WU00:FS01:0xa4:- Working with standard loops on this execution.
13:37:01:WU00:FS01:0xa4:- Previous termination of core was improper.
13:37:01:WU00:FS01:0xa4:- Files status OK
13:37:01:WU00:FS01:0xa4:- Expanded 967903 -> 2212988 (decompressed 228.6 percent)
13:37:01:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=967903 data_size=2212988, decompressed_data_size=2212988 diff=0
13:37:01:WU00:FS01:0xa4:- Digital signature verified
13:37:01:WU00:FS01:0xa4:
13:37:01:WU00:FS01:0xa4:Project: 8049 (Run 777, Clone 4, Gen 107)
13:37:01:WU00:FS01:0xa4:
13:37:01:WU00:FS01:0xa4:Entering M.D.
13:37:07:WU00:FS01:0xa4:Using Gromacs checkpoints
13:37:07:WU00:FS01:0xa4:Mapping NT from 2 to 2 
13:37:08:WU00:FS01:0xa4:Resuming from checkpoint
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.log
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.trr
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.edr
13:37:08:WU00:FS01:0xa4:Completed 128240 out of 250000 steps  (51%)

and here the System info part:

Code: Select all

*********************** Log Started 2012-11-09T12:26:49Z ***********************
12:26:49:************************* Folding@home Client *************************
12:26:49:      Website: http://folding.stanford.edu/
12:26:49:    Copyright: (c) 2009-2012 Stanford University
12:26:49:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:26:49:         Args: --lifeline 3852 --command-port=36330
12:26:49:       Config: C:/Users/myUsername/AppData/Roaming/FAHClient/config.xml
12:26:49:******************************** Build ********************************
12:26:49:      Version: 7.1.52
12:26:49:         Date: Mar 20 2012
12:26:49:         Time: 19:37:42
12:26:49:      SVN Rev: 3515
12:26:49:       Branch: fah/trunk/client
12:26:49:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
12:26:49:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
12:26:49:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
12:26:49:     Platform: win32 XP
12:26:49:         Bits: 32
12:26:49:         Mode: Release
12:26:49:******************************* System ********************************
12:26:49:          CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
12:26:49:       CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
12:26:49:         CPUs: 2
12:26:49:       Memory: 2.00GiB
12:26:49:  Free Memory: 1.20GiB
12:26:49:      Threads: WINDOWS_THREADS
12:26:49:   On Battery: false
12:26:49:   UTC offset: 1
12:26:49:          PID: 5976
12:26:49:          CWD: C:/Users/myUsername/AppData/Roaming/FAHClient
12:26:49:           OS: Windows Vista (TM) Ultimate Service Pack 2
12:26:49:      OS Arch: X86
12:26:49:         GPUs: 1
12:26:49:        GPU 0: NVIDIA:1 G92 [GeForce 8800 GTS 512]
12:26:49:         CUDA: 1.1
12:26:49:  CUDA Driver: 5000
12:26:49:Win32 Service: false
12:26:49:***********************************************************************
12:26:49:<config>
12:26:49:  <!-- Folding Slot Configuration -->
12:26:49:  <gpu v='true'/>
12:26:49:
12:26:49:  <!-- User Information -->
12:26:49:  <passkey v='********************************'/>
12:26:49:  <user v='superduper4711'/>
12:26:49:
12:26:49:  <!-- Folding Slots -->
12:26:49:</config>
12:26:49:Trying to access database...
12:26:49:Successfully acquired database lock
12:26:49:Enabled folding slot 00: READY gpu:0:"G92 [GeForce 8800 GTS 512]"
12:26:49:Enabled folding slot 01: READY smp:2

Re: Issue with Project: 8049 (R 777, C 4, G 107) and FAH Cor

Posted: Fri Nov 09, 2012 2:30 pm
by bollix47
The WU has been completed by another folder:

Hi xxxxxxx (team xxxxx),
Your WU (P8049 R777 C4 G107) was added to the stats database on 2012-11-08 17:08:08 for 1338.9 points of credit.

In this case it should be okay to dump work/00.

You might want to reboot your computer as well before restarting the client.

Re: Issue with Project: 8049 (R 777, C 4, G 107) and FAH Cor

Posted: Fri Nov 09, 2012 8:06 pm
by bruce
Fahrenheit451 wrote:Is it the core itself or does the WU cause the crash?
Apparently neither.

Part of the original design of older clients included a feature that attempted to answer your question. After certain failures like that, the client assumed the WU was corrupted during download so it re-downloaded the WU. After several more failures, it assumed the FahCore had been corrupted during download so it redownloaded the FahCore and restarted the WU. Occasionally, one or the other worked and the WU was completed. More commonly, the fault was neither so the WU was then assigned to someone else.

Error 255 = 0xff is a generic error code that probably points to a hardware fault (memory??) which is pretty much confirmed when it was reissued and completed somebody else.

Re: Issue with Project: 8049 (R 777, C 4, G 107) and FAH Cor

Posted: Sat Nov 10, 2012 12:34 am
by Fahrenheit451
Ok. FAH dumped the WU and downloaded another one (Project: 8055 (Run 1326, Clone 3, Gen 30)). Let's see if the core will continue to crash.