Issue with Project: 8049 (R 777, C 4, G 107) and FAH Core A4

Moderators: Site Moderators, FAHC Science Team

Post Reply
Fahrenheit451
Posts: 90
Joined: Sun Sep 19, 2010 9:25 am
Hardware configuration: System 1:
CPU: Intel Core 2 Duo E6850, 3000 MHz (9 x 333)
Mainboard: Gigabyte P35-DS3P v1.1
RAM: 2 GB (2x1024 MB DDR2 Dual-Channel, DRAM frequency 400 MHz)
OS: Windows Vista Ultimate, 32 bit, Service Pack 2
HD: Samsung HD501LJ (500 GB)
PSU: BeQuiet Straight Power E8 CM 580W
Case: CoolerMaster Cosmos 1000
GPU: NVIDIA GeForce 8800 GTS 512, XFX Pine Group
GFX driver version: 285.62
The card runs at stock speed: GPU 648 MHz, Memory 972 Mhz, Shader 1620 MHz

System 2:
CPU: Intel i7 2600 (Sandy Bridge) @ 3400 MHz (4C/8T)
Mainboard: Asus P8H67-M-Pro
RAM: 8GB (2x4 GB DDR3, 1333 MHz, Kingston)
OS: Windows 7 Ultimate, 64bit
HD: Samsung HD103SJ (1TB)
PSU: BeQuiet Straight Power E8 CM 580 W
Case: Corsair Obsidian 800D with H100 water cooling
GPU: i7 2600 IGP (not for folding) / Gigabyte GTX 560 Ti OC 1GB
Location: Bonn, Germany

Issue with Project: 8049 (R 777, C 4, G 107) and FAH Core A4

Post by Fahrenheit451 »

Since yesterday FAH Core A4 crashes (Windows APPCRASH) while processing Project: 8049 (Run 777, Clone 4, Gen 107). After confirming the Windows message the core starts new and continues folding.
Today, after the first crash, I stopped folding using FAHControl and deleted the folder "C:\Users\myUsername\AppData\Roaming\FAHClient\cores\www.stanford.edu\~pande\Win32\x86\Core_a4.fah". After a restart FAH has downloaded FAH Core A4 again and continued folding, but now Core A4 crashed again. Is it the core itself or does the WU cause the crash? Should I finish the WU or dump it?

Here is the latest logfile:

Code: Select all

*********************** Log Started 2012-11-09T12:26:49Z ***********************
12:26:49:WU00:FS01:Downloading core from http://www.stanford.edu/~pande/Win32/x86/Core_a4.fah
12:26:49:WU00:FS01:Connecting to www.stanford.edu:80
12:26:51:WU00:FS01:FahCore a4: Downloading 2.89MiB
12:26:57:WU00:FS01:FahCore a4: 34.63%
12:27:03:WU00:FS01:FahCore a4: 69.25%
12:27:08:WU00:FS01:FahCore a4: Download complete
12:27:08:WU00:FS01:Valid core signature
12:27:08:WU00:FS01:Unpacked 9.59MiB to cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe
12:27:08:WU00:FS01:Starting
12:27:08:WU00:FS01:Running FahCore: D:\Programme\FAHClient/FAHCoreWrapper.exe C:/Users/myUsername/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 701 -lifeline 5976 -checkpoint 15 -np 2
12:27:08:WU00:FS01:Started FahCore on PID 3828
12:27:09:WU00:FS01:Core PID:3712
12:27:09:WU00:FS01:FahCore 0xa4 started
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:*------------------------------*
12:27:11:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
12:27:11:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:Preparing to commence simulation
12:27:11:WU00:FS01:0xa4:- Looking at optimizations...
12:27:11:WU00:FS01:0xa4:- Files status OK
12:27:11:WU00:FS01:0xa4:- Expanded 967903 -> 2212988 (decompressed 228.6 percent)
12:27:11:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=967903 data_size=2212988, decompressed_data_size=2212988 diff=0
12:27:11:WU00:FS01:0xa4:- Digital signature verified
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:Project: 8049 (Run 777, Clone 4, Gen 107)
12:27:11:WU00:FS01:0xa4:
12:27:11:WU00:FS01:0xa4:Assembly optimizations on if available.
12:27:11:WU00:FS01:0xa4:Entering M.D.
12:27:17:WU00:FS01:0xa4:Using Gromacs checkpoints
12:27:17:WU00:FS01:0xa4:Mapping NT from 2 to 2 
12:27:17:WU00:FS01:0xa4:Resuming from checkpoint
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.log
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.trr
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
12:27:17:WU00:FS01:0xa4:Verified 00/wudata_01.edr
12:27:17:WU00:FS01:0xa4:Completed 115310 out of 250000 steps  (46%)
12:32:06:WU00:FS01:0xa4:Completed 117500 out of 250000 steps  (47%)
12:38:09:WU00:FS01:0xa4:Completed 120000 out of 250000 steps  (48%)
12:43:42:WU00:FS01:0xa4:Completed 122500 out of 250000 steps  (49%)
12:49:23:WU00:FS01:0xa4:Completed 125000 out of 250000 steps  (50%)
12:55:32:WU00:FS01:0xa4:Completed 127500 out of 250000 steps  (51%)
13:01:34:WU00:FS01:0xa4:Completed 130000 out of 250000 steps  (52%)
13:04:12:WU00:FS01:0xa4:Gromacs cannot continue further.
13:04:12:WU00:FS01:0xa4:Going to send back what have done -- stepsTotalG=250000
13:04:12:WU00:FS01:0xa4:Work fraction=0.5249 steps=250000.
13:36:52:WU00:FS01:FahCore returned: FAILED_3 (255 = 0xff)
13:36:52:WU00:FS01:Starting
13:36:52:WU00:FS01:Running FahCore: D:\Programme\FAHClient/FAHCoreWrapper.exe C:/Users/myUsername/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 701 -lifeline 5976 -checkpoint 15 -np 2
13:36:52:WU00:FS01:Started FahCore on PID 4600
13:36:52:WU00:FS01:Core PID:4728
13:36:52:WU00:FS01:FahCore 0xa4 started
13:36:52:WU00:FS01:0xa4:
13:36:52:WU00:FS01:0xa4:*------------------------------*
13:36:52:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
13:36:52:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
13:36:52:WU00:FS01:0xa4:
13:36:52:WU00:FS01:0xa4:Preparing to commence simulation
13:36:52:WU00:FS01:0xa4:- Ensuring status. Please wait.
13:37:01:WU00:FS01:0xa4:- Looking at optimizations...
13:37:01:WU00:FS01:0xa4:- Working with standard loops on this execution.
13:37:01:WU00:FS01:0xa4:- Previous termination of core was improper.
13:37:01:WU00:FS01:0xa4:- Files status OK
13:37:01:WU00:FS01:0xa4:- Expanded 967903 -> 2212988 (decompressed 228.6 percent)
13:37:01:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=967903 data_size=2212988, decompressed_data_size=2212988 diff=0
13:37:01:WU00:FS01:0xa4:- Digital signature verified
13:37:01:WU00:FS01:0xa4:
13:37:01:WU00:FS01:0xa4:Project: 8049 (Run 777, Clone 4, Gen 107)
13:37:01:WU00:FS01:0xa4:
13:37:01:WU00:FS01:0xa4:Entering M.D.
13:37:07:WU00:FS01:0xa4:Using Gromacs checkpoints
13:37:07:WU00:FS01:0xa4:Mapping NT from 2 to 2 
13:37:08:WU00:FS01:0xa4:Resuming from checkpoint
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.log
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.trr
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
13:37:08:WU00:FS01:0xa4:Verified 00/wudata_01.edr
13:37:08:WU00:FS01:0xa4:Completed 128240 out of 250000 steps  (51%)

and here the System info part:

Code: Select all

*********************** Log Started 2012-11-09T12:26:49Z ***********************
12:26:49:************************* Folding@home Client *************************
12:26:49:      Website: http://folding.stanford.edu/
12:26:49:    Copyright: (c) 2009-2012 Stanford University
12:26:49:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:26:49:         Args: --lifeline 3852 --command-port=36330
12:26:49:       Config: C:/Users/myUsername/AppData/Roaming/FAHClient/config.xml
12:26:49:******************************** Build ********************************
12:26:49:      Version: 7.1.52
12:26:49:         Date: Mar 20 2012
12:26:49:         Time: 19:37:42
12:26:49:      SVN Rev: 3515
12:26:49:       Branch: fah/trunk/client
12:26:49:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
12:26:49:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
12:26:49:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
12:26:49:     Platform: win32 XP
12:26:49:         Bits: 32
12:26:49:         Mode: Release
12:26:49:******************************* System ********************************
12:26:49:          CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
12:26:49:       CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
12:26:49:         CPUs: 2
12:26:49:       Memory: 2.00GiB
12:26:49:  Free Memory: 1.20GiB
12:26:49:      Threads: WINDOWS_THREADS
12:26:49:   On Battery: false
12:26:49:   UTC offset: 1
12:26:49:          PID: 5976
12:26:49:          CWD: C:/Users/myUsername/AppData/Roaming/FAHClient
12:26:49:           OS: Windows Vista (TM) Ultimate Service Pack 2
12:26:49:      OS Arch: X86
12:26:49:         GPUs: 1
12:26:49:        GPU 0: NVIDIA:1 G92 [GeForce 8800 GTS 512]
12:26:49:         CUDA: 1.1
12:26:49:  CUDA Driver: 5000
12:26:49:Win32 Service: false
12:26:49:***********************************************************************
12:26:49:<config>
12:26:49:  <!-- Folding Slot Configuration -->
12:26:49:  <gpu v='true'/>
12:26:49:
12:26:49:  <!-- User Information -->
12:26:49:  <passkey v='********************************'/>
12:26:49:  <user v='superduper4711'/>
12:26:49:
12:26:49:  <!-- Folding Slots -->
12:26:49:</config>
12:26:49:Trying to access database...
12:26:49:Successfully acquired database lock
12:26:49:Enabled folding slot 00: READY gpu:0:"G92 [GeForce 8800 GTS 512]"
12:26:49:Enabled folding slot 01: READY smp:2
Last edited by Fahrenheit451 on Sat Nov 10, 2012 12:35 am, edited 1 time in total.
bollix47
Posts: 2957
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Issue with Project: 8049 (R 777, C 4, G 107) and FAH Cor

Post by bollix47 »

The WU has been completed by another folder:

Hi xxxxxxx (team xxxxx),
Your WU (P8049 R777 C4 G107) was added to the stats database on 2012-11-08 17:08:08 for 1338.9 points of credit.

In this case it should be okay to dump work/00.

You might want to reboot your computer as well before restarting the client.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Issue with Project: 8049 (R 777, C 4, G 107) and FAH Cor

Post by bruce »

Fahrenheit451 wrote:Is it the core itself or does the WU cause the crash?
Apparently neither.

Part of the original design of older clients included a feature that attempted to answer your question. After certain failures like that, the client assumed the WU was corrupted during download so it re-downloaded the WU. After several more failures, it assumed the FahCore had been corrupted during download so it redownloaded the FahCore and restarted the WU. Occasionally, one or the other worked and the WU was completed. More commonly, the fault was neither so the WU was then assigned to someone else.

Error 255 = 0xff is a generic error code that probably points to a hardware fault (memory??) which is pretty much confirmed when it was reissued and completed somebody else.
Fahrenheit451
Posts: 90
Joined: Sun Sep 19, 2010 9:25 am
Hardware configuration: System 1:
CPU: Intel Core 2 Duo E6850, 3000 MHz (9 x 333)
Mainboard: Gigabyte P35-DS3P v1.1
RAM: 2 GB (2x1024 MB DDR2 Dual-Channel, DRAM frequency 400 MHz)
OS: Windows Vista Ultimate, 32 bit, Service Pack 2
HD: Samsung HD501LJ (500 GB)
PSU: BeQuiet Straight Power E8 CM 580W
Case: CoolerMaster Cosmos 1000
GPU: NVIDIA GeForce 8800 GTS 512, XFX Pine Group
GFX driver version: 285.62
The card runs at stock speed: GPU 648 MHz, Memory 972 Mhz, Shader 1620 MHz

System 2:
CPU: Intel i7 2600 (Sandy Bridge) @ 3400 MHz (4C/8T)
Mainboard: Asus P8H67-M-Pro
RAM: 8GB (2x4 GB DDR3, 1333 MHz, Kingston)
OS: Windows 7 Ultimate, 64bit
HD: Samsung HD103SJ (1TB)
PSU: BeQuiet Straight Power E8 CM 580 W
Case: Corsair Obsidian 800D with H100 water cooling
GPU: i7 2600 IGP (not for folding) / Gigabyte GTX 560 Ti OC 1GB
Location: Bonn, Germany

Re: Issue with Project: 8049 (R 777, C 4, G 107) and FAH Cor

Post by Fahrenheit451 »

Ok. FAH dumped the WU and downloaded another one (Project: 8055 (Run 1326, Clone 3, Gen 30)). Let's see if the core will continue to crash.
Post Reply