P6041 (R0 C247 G155) Unstable WU

Moderators: Site Moderators, FAHC Science Team

Post Reply
Blazin420
Posts: 6
Joined: Fri Apr 29, 2011 3:00 pm

P6041 (R0 C247 G155) Unstable WU

Post by Blazin420 »

I've tried this one 4 times now, with a looping failure in the end. :e( The first pass went to 65%, and each
pass after that only got to around 14%. I reduced my overclock in an attempt to run this one. Unlike some
of the A4's that are a little overclock sensitive this one continues to fail even after reducing the overclock.

I reduced my overclock from 4.0ghz to 3.8ghz (which runs even the sensitive A4's). This machine has passed
12 hours of MemTest86+, 12 hours of Prime95, and 24 hours of StressCPU @ 4.0ghz, so I'm going to go out
on a limb here and say the WU's at fault.

I know I stopped the process twice the first time (it may have helped the WU to get to 65% on the first pass)
to play video games, but it started back up and ran for almost 2 hours after the final restart before failing.

Here's the log:

Code: Select all


[15:29:52] + Attempting to send results [July 16 15:29:52 UTC]
[15:29:52] - Reading file work/wuresults_04.dat from core
[15:29:52]   (Read 3794987 bytes from disk)
[15:29:52] Connecting to http://171.64.65.54:8080/
[15:31:01] Posted data.
[15:31:02] Initial: 0000; - Uploaded at ~52 kB/s
[15:31:02] - Averaged speed for that direction ~52 kB/s
[15:31:02] + Results successfully sent
[15:31:02] Thank you for your contribution to Folding@Home.
[15:31:02] + Number of Units Completed: 772

[15:31:06] Trying to send all finished work units
[15:31:06] + No unsent completed units remaining.
[15:31:06] - Preparing to get new work unit...
[15:31:06] Cleaning up work directory
[15:31:06] + Attempting to get work packet
[15:31:06] Passkey found
[15:31:06] - Will indicate memory of 4098 MB
[15:31:06] - Connecting to assignment server
[15:31:06] Connecting to http://assign.stanford.edu:8080/
[15:31:07] Posted data.
[15:31:07] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[15:31:07] + News From Folding@Home: Welcome to Folding@Home
[15:31:07] Loaded queue successfully.
[15:31:07] Sent data
[15:31:07] Connecting to http://171.64.65.54:8080/
[15:31:09] Posted data.
[15:31:10] Initial: 0000; - Receiving payload (expected size: 7885241)
[15:31:34] - Downloaded at ~320 kB/s
[15:31:34] - Averaged speed for that direction ~296 kB/s
[15:31:34] + Received work.
[15:31:34] Trying to send all finished work units
[15:31:34] + No unsent completed units remaining.
[15:31:34] + Closed connections
[15:31:34] 
[15:31:34] + Processing work unit
[15:31:34] Core required: FahCore_a3.exe
[15:31:34] Core found.
[15:31:34] Working on queue slot 05 [July 16 15:31:34 UTC]
[15:31:34] + Working ...
[15:31:34] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 6 -checkpoint 5 -verbose -lifeline 1196 -version 634'

[15:31:34] 
[15:31:34] *------------------------------*
[15:31:34] Folding@Home Gromacs SMP Core
[15:31:34] Version 2.27 (Dec. 15, 2010)
[15:31:34] 
[15:31:34] Preparing to commence simulation
[15:31:34] - Looking at optimizations...
[15:31:34] - Created dyn
[15:31:34] - Files status OK
[15:31:36] - Expanded 7884729 -> 10126021 (decompressed 128.4 percent)
[15:31:36] Called DecompressByteArray: compressed_data_size=7884729 data_size=10126021, decompressed_data_size=10126021 diff=0
[15:31:36] - Digital signature verified
[15:31:36] 
[15:31:36] Project: 6041 (Run 0, Clone 247, Gen 155)
[15:31:36] 
[15:31:36] Assembly optimizations on if available.
[15:31:36] Entering M.D.
[15:31:42] Mapping NT from 6 to 6 
[15:31:43] Completed 0 out of 250000 steps  (0%)
[15:44:44] Completed 2500 out of 250000 steps  (1%)
[15:57:43] Completed 5000 out of 250000 steps  (2%)
[16:10:31] Completed 7500 out of 250000 steps  (3%)
[16:23:18] Completed 10000 out of 250000 steps  (4%)
[16:36:06] Completed 12500 out of 250000 steps  (5%)
[16:49:02] Completed 15000 out of 250000 steps  (6%)
[17:01:55] Completed 17500 out of 250000 steps  (7%)
[17:14:40] Completed 20000 out of 250000 steps  (8%)
[17:27:21] Completed 22500 out of 250000 steps  (9%)
[17:40:01] Completed 25000 out of 250000 steps  (10%)
[17:52:57] Completed 27500 out of 250000 steps  (11%)
[18:05:54] Completed 30000 out of 250000 steps  (12%)
[18:18:49] Completed 32500 out of 250000 steps  (13%)
[18:31:37] Completed 35000 out of 250000 steps  (14%)
[18:44:40] Completed 37500 out of 250000 steps  (15%)
[18:57:36] Completed 40000 out of 250000 steps  (16%)
[19:10:23] Completed 42500 out of 250000 steps  (17%)
[19:15:08] - Autosending finished units... [July 16 19:15:08 UTC]
[19:15:08] Trying to send all finished work units
[19:15:08] + No unsent completed units remaining.
[19:15:08] - Autosend completed
[19:23:04] Completed 45000 out of 250000 steps  (18%)
[19:35:44] Completed 47500 out of 250000 steps  (19%)
[19:48:36] Completed 50000 out of 250000 steps  (20%)
[20:01:26] Completed 52500 out of 250000 steps  (21%)
[20:14:33] Completed 55000 out of 250000 steps  (22%)
[20:27:48] Completed 57500 out of 250000 steps  (23%)
[20:41:11] Completed 60000 out of 250000 steps  (24%)
[20:55:13] Completed 62500 out of 250000 steps  (25%)
[21:09:13] Completed 65000 out of 250000 steps  (26%)
[21:22:48] Completed 67500 out of 250000 steps  (27%)
[21:35:57] Completed 70000 out of 250000 steps  (28%)
[21:49:37] Completed 72500 out of 250000 steps  (29%)
[22:03:23] Completed 75000 out of 250000 steps  (30%)
[22:16:40] Completed 77500 out of 250000 steps  (31%)
[22:29:42] Completed 80000 out of 250000 steps  (32%)
[22:42:26] Completed 82500 out of 250000 steps  (33%)
[22:55:17] Completed 85000 out of 250000 steps  (34%)
[23:08:14] Completed 87500 out of 250000 steps  (35%)
[23:21:42] Completed 90000 out of 250000 steps  (36%)
[23:34:20] Completed 92500 out of 250000 steps  (37%)
[23:46:58] Completed 95000 out of 250000 steps  (38%)
[23:59:44] Completed 97500 out of 250000 steps  (39%)
[00:12:35] Completed 100000 out of 250000 steps  (40%)
[00:25:25] Completed 102500 out of 250000 steps  (41%)
[00:38:11] Completed 105000 out of 250000 steps  (42%)
[00:46:52] Killing all core threads
[00:46:52] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown at user request.
[00:46:52] ***** Got a SIGTERM signal (2)
[00:46:52] Killing all core threads
[00:46:52] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown.


--- Opening Log file [July 17 02:14:40 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\Blaine\F@H_SMP\Folding@Home Windows SMP Client V1.01
Executable: C:\Users\Blaine\F@H_SMP\Folding@Home Windows SMP Client V1.01\F@H_SMP-6.34.exe
Arguments: -smp -verbosity 9 

[02:14:40] - Ask before connecting: No
[02:14:40] - User name: Blazin420 (Team 420)
[02:14:40] - User ID: 7FEB5F862FE09BC8
[02:14:40] - Machine ID: 1
[02:14:40] 
[02:14:40] Loaded queue successfully.
[02:14:40] 
[02:14:40] - Autosending finished units... [July 17 02:14:40 UTC]
[02:14:40] + Processing work unit
[02:14:40] Trying to send all finished work units
[02:14:40] Core required: FahCore_a3.exe
[02:14:40] + No unsent completed units remaining.
[02:14:40] Core found.
[02:14:40] - Autosend completed
[02:14:40] Working on queue slot 05 [July 17 02:14:40 UTC]
[02:14:40] + Working ...
[02:14:40] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 6 -checkpoint 5 -verbose -lifeline 4708 -version 634'

[02:14:40] 
[02:14:40] *------------------------------*
[02:14:40] Folding@Home Gromacs SMP Core
[02:14:40] Version 2.27 (Dec. 15, 2010)
[02:14:40] 
[02:14:40] Preparing to commence simulation
[02:14:40] - Ensuring status. Please wait.
[02:14:50] - Looking at optimizations...
[02:14:50] - Working with standard loops on this execution.
[02:14:50] - Previous termination of core was improper.
[02:14:50] - Files status OK
[02:14:51] - Expanded 7884729 -> 10126021 (decompressed 128.4 percent)
[02:14:51] Called DecompressByteArray: compressed_data_size=7884729 data_size=10126021, decompressed_data_size=10126021 diff=0
[02:14:51] - Digital signature verified
[02:14:51] 
[02:14:51] Project: 6041 (Run 0, Clone 247, Gen 155)
[02:14:51] 
[02:14:51] Entering M.D.
[02:14:57] Using Gromacs checkpoints
[02:14:57] Mapping NT from 6 to 6 
[02:14:59] Resuming from checkpoint
[02:14:59] Verified work/wudata_05.log
[02:14:59] Verified work/wudata_05.trr
[02:14:59] Verified work/wudata_05.xtc
[02:14:59] Verified work/wudata_05.edr
[02:15:00] Completed 106686 out of 250000 steps  (42%)
[02:19:14] Completed 107500 out of 250000 steps  (43%)
[02:32:14] Completed 110000 out of 250000 steps  (44%)
[02:45:16] Completed 112500 out of 250000 steps  (45%)
[02:58:02] Completed 115000 out of 250000 steps  (46%)
[03:10:53] Completed 117500 out of 250000 steps  (47%)
[03:23:39] Completed 120000 out of 250000 steps  (48%)
[03:36:34] Completed 122500 out of 250000 steps  (49%)
[03:49:29] Completed 125000 out of 250000 steps  (50%)
[04:02:26] Completed 127500 out of 250000 steps  (51%)
[04:15:11] Completed 130000 out of 250000 steps  (52%)
[04:27:54] Completed 132500 out of 250000 steps  (53%)
[04:40:55] Completed 135000 out of 250000 steps  (54%)
[04:54:06] Completed 137500 out of 250000 steps  (55%)
[05:07:15] Completed 140000 out of 250000 steps  (56%)
[05:20:21] Completed 142500 out of 250000 steps  (57%)
[05:33:16] Completed 145000 out of 250000 steps  (58%)
[05:46:13] Completed 147500 out of 250000 steps  (59%)
[05:49:16] Killing all core threads
[05:49:16] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown at user request.
[05:49:16] ***** Got a SIGTERM signal (2)
[05:49:16] Killing all core threads
[05:49:16] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown.


--- Opening Log file [July 17 07:00:56 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\Blaine\F@H_SMP\Folding@Home Windows SMP Client V1.01
Executable: C:\Users\Blaine\F@H_SMP\Folding@Home Windows SMP Client V1.01\F@H_SMP-6.34.exe
Arguments: -smp -verbosity 9 

[07:00:56] - Ask before connecting: No
[07:00:56] - User name: Blazin420 (Team 420)
[07:00:56] - User ID: 7FEB5F862FE09BC8
[07:00:56] - Machine ID: 1
[07:00:56] 
[07:00:56] Loaded queue successfully.
[07:00:56] 
[07:00:56] - Autosending finished units... [July 17 07:00:56 UTC]
[07:00:56] + Processing work unit
[07:00:56] Trying to send all finished work units
[07:00:56] Core required: FahCore_a3.exe
[07:00:56] + No unsent completed units remaining.
[07:00:56] Core found.
[07:00:56] - Autosend completed
[07:00:56] Working on queue slot 05 [July 17 07:00:56 UTC]
[07:00:56] + Working ...
[07:00:56] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 6 -checkpoint 5 -verbose -lifeline 2888 -version 634'

[07:00:56] 
[07:00:56] *------------------------------*
[07:00:56] Folding@Home Gromacs SMP Core
[07:00:56] Version 2.27 (Dec. 15, 2010)
[07:00:56] 
[07:00:56] Preparing to commence simulation
[07:00:56] - Ensuring status. Please wait.
[07:01:19] - Looking at optimizations...
[07:01:19] - Working with standard loops on this execution.
[07:01:20] - Previous termination of core was improper.
[07:01:20] - Going to use standard loops.
[07:01:20] - Files status OK
[07:01:21] - Expanded 7884729 -> 10126021 (decompressed 128.4 percent)
[07:01:21] Called DecompressByteArray: compressed_data_size=7884729 data_size=10126021, decompressed_data_size=10126021 diff=0
[07:01:21] - Digital signature verified
[07:01:21] 
[07:01:21] Project: 6041 (Run 0, Clone 247, Gen 155)
[07:01:21] 
[07:01:21] Entering M.D.
[07:01:27] Using Gromacs checkpoints
[07:01:28] Mapping NT from 6 to 6 
[07:01:29] Resuming from checkpoint
[07:01:29] Verified work/wudata_05.log
[07:01:30] Verified work/wudata_05.trr
[07:01:30] Verified work/wudata_05.xtc
[07:01:30] Verified work/wudata_05.edr
[07:01:31] Completed 147264 out of 250000 steps  (58%)
[07:02:46] Completed 147500 out of 250000 steps  (59%)
[07:15:32] Completed 150000 out of 250000 steps  (60%)
[07:28:16] Completed 152500 out of 250000 steps  (61%)
[07:42:47] Completed 155000 out of 250000 steps  (62%)
[07:55:42] Completed 157500 out of 250000 steps  (63%)
[08:08:29] Completed 160000 out of 250000 steps  (64%)
[08:21:14] Completed 162500 out of 250000 steps  (65%)
[08:26:04] Gromacs cannot continue further.
[08:26:04] Going to send back what have done -- stepsTotalG=250000
[08:26:04] Work fraction=0.6537 steps=250000.
[08:26:11] logfile size=132714 infoLength=132714 edr=0 trr=23
[08:26:11] logfile size: 132714 info=132714 bed=0 hdr=23
[08:26:11] - Writing 133250 bytes of core data to disk...
[08:26:14] CoreStatus = C0000005 (-1073741819)
[08:26:14] Client-core communications error: ERROR 0xc0000005
[08:26:14] Deleting current work unit & continuing...
[08:26:28] Trying to send all finished work units
[08:26:28] + No unsent completed units remaining.
[08:26:28] - Preparing to get new work unit...
[08:26:28] Cleaning up work directory
[08:26:28] + Attempting to get work packet
[08:26:28] Passkey found
[08:26:28] - Will indicate memory of 4098 MB
[08:26:28] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 10, Stepping: 0
[08:26:28] - Connecting to assignment server
[08:26:28] Connecting to http://assign.stanford.edu:8080/
[08:26:29] Posted data.
[08:26:29] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[08:26:29] + News From Folding@Home: Welcome to Folding@Home
[08:26:29] Loaded queue successfully.
[08:26:29] Sent data
[08:26:29] Connecting to http://171.64.65.54:8080/
[08:26:32] Posted data.
[08:26:32] Initial: 0000; - Receiving payload (expected size: 7885241)
[08:26:57] - Downloaded at ~308 kB/s
[08:26:57] - Averaged speed for that direction ~298 kB/s
[08:26:57] + Received work.
[08:26:57] + Closed connections
[08:27:02] 
[08:27:02] + Processing work unit
[08:27:02] Core required: FahCore_a3.exe
[08:27:02] Core found.
[08:27:02] Working on queue slot 06 [July 17 08:27:02 UTC]
[08:27:02] + Working ...
[08:27:02] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 6 -checkpoint 5 -verbose -lifeline 2888 -version 634'

[08:27:02] 
[08:27:02] *------------------------------*
[08:27:02] Folding@Home Gromacs SMP Core
[08:27:02] Version 2.27 (Dec. 15, 2010)
[08:27:02] 
[08:27:02] Preparing to commence simulation
[08:27:02] - Looking at optimizations...
[08:27:02] - Created dyn
[08:27:02] - Files status OK
[08:27:03] - Expanded 7884729 -> 10126021 (decompressed 128.4 percent)
[08:27:03] Called DecompressByteArray: compressed_data_size=7884729 data_size=10126021, decompressed_data_size=10126021 diff=0
[08:27:03] - Digital signature verified
[08:27:03] 
[08:27:03] Project: 6041 (Run 0, Clone 247, Gen 155)
[08:27:03] 
[08:27:03] Assembly optimizations on if available.
[08:27:03] Entering M.D.
[08:27:09] Mapping NT from 6 to 6 
[08:27:10] Completed 0 out of 250000 steps  (0%)
[08:40:12] Completed 2500 out of 250000 steps  (1%)
[08:53:17] Completed 5000 out of 250000 steps  (2%)
[09:06:15] Completed 7500 out of 250000 steps  (3%)
[09:19:05] Completed 10000 out of 250000 steps  (4%)
[09:32:04] Completed 12500 out of 250000 steps  (5%)
[09:47:34] Completed 15000 out of 250000 steps  (6%)
[10:00:42] Completed 17500 out of 250000 steps  (7%)
[10:13:33] Completed 20000 out of 250000 steps  (8%)
[10:26:19] Completed 22500 out of 250000 steps  (9%)
[10:39:29] Completed 25000 out of 250000 steps  (10%)
[10:52:45] Completed 27500 out of 250000 steps  (11%)
[11:06:05] Completed 30000 out of 250000 steps  (12%)
[11:19:06] Completed 32500 out of 250000 steps  (13%)
[11:32:17] Completed 35000 out of 250000 steps  (14%)
[11:45:16] CoreStatus = C0000029 (-1073741783)
[11:45:16] Client-core communications error: ERROR 0xc0000029
[11:45:16] Deleting current work unit & continuing...
Had to change my machine ID to pick up a new WU (currently running P6053 (R1 C84 G309).
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: P6041 (R0 C247 G155) Unstable WU

Post by bruce »

One person had trouble with this WU. It was reassigned to them and was successfully completed.

Have you tested your overclock with stresscpu2 and thoroughly tested your memory? The A4's are not "more sensitive" to overclocking ... they simply use more of the resources of your hardware. There's no such thing as a "mostly stable" system, just stable systems and unstable systems which may have not been tested at full utilization that may run successfully with lower a utilization.
Blazin420
Posts: 6
Joined: Fri Apr 29, 2011 3:00 pm

Re: P6041 (R0 C247 G155) Unstable WU

Post by Blazin420 »

I'm glad someone got it to complete on their SECOND run! It just wasn't one of my 4 attempts at it..

Anyone that's been doing this for a while, will (usually) stress their overclock before deploying it..
As I stated above it's passed (although around 3 months ago) the following @ 4.0ghz:
A) 12 hours of MemTest86+, 4x2gb Corsair XMS3 DDR3 1600 @ 1.651v 9-9-9-24 2t (rated spec)
B) 12 hours of Prime95 AMD 1090t @4.0ghz 1.453v, & 4 hours @4.1ghz 1.458v (bios voltage)
C) 24 hours of Stresscpu2 x32

After reducing it to 3.8 and going to 1.455v it failed the 4th time with a clean download of both the WU and core.
Without a reboot.. I played 2 hours of Dirt3, cleared out the work folder & que, changed my machine Id & it's been
back up and folding for more than 24 hours now.. Usually if it's a bad OC you'll get a BSOD with both CPU & GPU clients
running without core locking. If a core starts blowing it, eventually the Nvidia clients will hit that core, and poof.
This time both Nvidia clients continued to run. and the WU kept restarting at 14% after the first failure at 65%..

After finding a bunch of posts about 6041 EUIE's (funny how certain projects have WAY more complaints than others) I just
thought I had just gotten a bad WU..

It's not exactly considered a 100% stable WU if takes that many runs to get successful results!! It's considered a "sensitive"
project series, when a bunch of people start complaining about EUIE's. Certain projects do have higher EUIE counts than others,
that's why some projects don't have a single post. While others have tons..
Image
Post Reply