Page 1 of 1

Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Posted: Wed May 21, 2008 9:50 pm
by DocJonz
Had an EUE, 0x7b Error with this WU on a long running C2Q machine (no OC). :cry:

Code: Select all

[04:56:57] - Starting from initial work packet
[04:56:57] 
[04:56:57] Project: 3062 (Run 2, Clone 105, Gen 4)
[04:56:57] 
[04:56:57] Assembly optimizations on if available.
[04:56:57] Entering M.D.
[04:56:57] itial work packet
[04:56:57] 
[04:56:57] Project: 3062 (Run 2, Clone 105, Gen 4)
[04:56:57] 
[04:56:57] Assembly optimizations on if available.
[04:56:57] Entering M.D.
[04:57:04] Protein: p3062_lambda5_99sb
[04:57:04] Writing local files
[04:57:10] Extra SSE boost OK.
[05:15:38] ed 50000 out of 5000000 steps  (1 percent)
[05:34:34] Writing local files
[05:34:34] Completed 100000 out of 5000000 steps  (2 percent)
[05:53:48] Writing local files
[05:53:48] Completed 150000 out of 5000000 steps  (3 percent)
[06:12:53] Writing local files
[06:12:53] Completed 200000 out of 5000000 steps  (4 percent)
[06:31:49] Writing local files
[06:31:49] Completed 250000 out of 5000000 steps  (5 percent)
[06:50:46] Writing local files
[06:50:46] Completed 300000 out of 5000000 steps  (6 percent)
[07:10:07] Writing local files
[07:10:08] Completed 350000 out of 5000000 steps  (7 percent)
[07:29:04] Writing local files
[07:29:04] Completed 400000 out of 5000000 steps  (8 percent)
[07:47:58] Writing local files
[07:47:58] Completed 450000 out of 5000000 steps  (9 percent)
[08:06:51] Writing local files
[08:06:51] Completed 500000 out of 5000000 steps  (10 percent)
[08:25:46] Writing local files
[08:25:46] Completed 550000 out of 5000000 steps  (11 percent)
[08:44:39] Writing local files
[08:44:39] Completed 600000 out of 5000000 steps  (12 percent)
[09:03:35] Writing local files
[09:03:35] Completed 650000 out of 5000000 steps  (13 percent)
[09:22:30] Writing local files
[09:22:30] Completed 700000 out of 5000000 steps  (14 percent)
[09:41:27] Writing local files
[09:41:27] Completed 750000 out of 5000000 steps  (15 percent)
[10:00:16] Writing local files
[10:00:16] Completed 800000 out of 5000000 steps  (16 percent)
[10:21:02] Writing local files
[10:21:03] Completed 850000 out of 5000000 steps  (17 percent)
[10:40:02] Writing local files
[10:40:02] Completed 900000 out of 5000000 steps  (18 percent)
[10:59:03] Writing local files
[10:59:03] Completed 950000 out of 5000000 steps  (19 percent)
[11:18:03] Writing local files
[11:18:03] Completed 1000000 out of 5000000 steps  (20 percent)
[11:37:02] Writing local files
[11:37:02] Completed 1050000 out of 5000000 steps  (21 percent)
[11:55:59] Writing local files
[11:55:59] Completed 1100000 out of 5000000 steps  (22 percent)
[12:14:57] Writing local files
[12:14:57] Completed 1150000 out of 5000000 steps  (23 percent)
[12:33:56] Writing local files
[12:33:56] Completed 1200000 out of 5000000 steps  (24 percent)
[12:47:05] Gromacs cannot continue further.
[12:47:05] Going to send back what have done.
[12:47:05] logfile size: 39226
[12:47:05] - Writing 39762 bytes of core data to disk...
[12:47:05]   ... Done.
[12:47:05] - Failed to delete work/wudata_08.arc
[12:47:05] No C.P. to delete.
[12:47:05] - Failed to delete work/wudata_08.sas
[12:47:05] - Failed to delete work/wudata_08.goe
[12:47:05] - Failed to delete work/wudata_08.pdo
[12:47:05] Warning:  check for stray files
[12:49:05] 
[12:49:05] Folding@home Core Shutdown: EARLY_UNIT_END
[12:49:05] 
[12:49:05] Folding@home Core Shutdown: EARLY_UNIT_END
[12:49:08] CoreStatus = 7B (123)
[12:49:08] Client-core communications error: ERROR 0x7b
[12:49:08] Deleting current work unit & continuing...
[12:51:49] - Preparing to get new work unit...
[12:51:49] + Attempting to get work packet
[12:51:49] - Connecting to assignment server
[12:51:51] - Successful: assigned to (171.64.65.63).
[12:51:51] + News From Folding@Home: Welcome to Folding@Home
[12:51:51] Loaded queue successfully.
[12:52:01] + Closed connections
[12:52:06] 
[12:52:06] + Processing work unit
[12:52:06] Core required: FahCore_a1.exe
[12:52:06] Core found.
[12:52:06] Working on Unit 09 [May 21 12:52:06]
[12:52:06] + Working ...
[12:52:07] 
[12:52:07] *------------------------------*
[12:52:07] Folding@Home Gromacs SMP Core
[12:52:07] Version 1.74 (March 10, 2007)
[12:52:07] 
[12:52:07] Preparing to commence simulation
[12:52:07] - Ensuring status. Please wait.
[12:52:24] - Assembly optimizations manually forced on.
[12:52:24] - Not checking prior termination.
[12:52:24] - Expanded 609531 -> 3263133 (decompressed 535.3 percent)
[12:52:24] No C.P. to delete.
[12:52:24] - Failed to delete - Failed to delete - Failed to delete - Assembly optimizations on if available.
[12:52:24] Entering M.D.
[12:52:24] rWarning:  check for stray files
[12:52:24] - Starting from i
[12:52:24] Project: 3062 (Run
[12:52:24] Project: 3062 (Run 2, Assembly optimizatiAssembly optimizations on if available.
[12:52:24] Entering M.D.
[12:52:24] ations on if available.
[12:52:24] Entering M.D.
[12:52:31]  boost OK.
[12:52:31] Writing local files
[12:52:31] Extra SSE boost OK.
[12:52:31] Writing local files
[12:52:31] Completed 0 out of 5000000 steps  (0 percent)
[13:11:26] Writing local files
[13:11:26] Completed 50000 out of 5000000 steps  (1 percent)
[13:30:27] Writing local files
[13:30:27] Completed 100000 out of 5000000 steps  (2 percent)
[13:49:28] Writing local files
[13:49:29] Completed 150000 out of 5000000 steps  (3 percent)
[14:08:30] Writing local files
[14:08:30] Completed 200000 out of 5000000 steps  (4 percent)
[14:27:32] Writing local files
[14:27:32] Completed 250000 out of 5000000 steps  (5 percent)
[14:46:33] Writing local files
[14:46:33] Completed 300000 out of 5000000 steps  (6 percent)
[15:05:34] Writing local files
[15:05:34] Completed 350000 out of 5000000 steps  (7 percent)
[15:24:35] Writing local files
[15:24:35] Completed 400000 out of 5000000 steps  (8 percent)
[15:43:36] Writing local files
[15:43:37] Completed 450000 out of 5000000 steps  (9 percent)
[16:02:35] Writing local files
[16:02:35] Completed 500000 out of 5000000 steps  (10 percent)
[16:22:08] Writing local files
[16:22:08] Completed 550000 out of 5000000 steps  (11 percent)
[16:41:09] Writing local files
[16:41:09] Completed 600000 out of 5000000 steps  (12 percent)
[17:00:10] Writing local files
[17:00:10] Completed 650000 out of 5000000 steps  (13 percent)
[17:19:11] Writing local files
[17:19:11] Completed 700000 out of 5000000 steps  (14 percent)
[17:38:30] Writing local files
[17:38:30] Completed 750000 out of 5000000 steps  (15 percent)
[17:57:39] Writing local files
[17:57:39] Completed 800000 out of 5000000 steps  (16 percent)
[18:17:00] Writing local files
[18:17:00] Completed 850000 out of 5000000 steps  (17 percent)
[18:36:02] Writing local files
[18:36:02] Completed 900000 out of 5000000 steps  (18 percent)
[18:55:02] Writing local files
[18:55:02] Completed 950000 out of 5000000 steps  (19 percent)
[19:14:18] Writing local files
[19:14:18] Completed 1000000 out of 5000000 steps  (20 percent)
[19:33:30] Writing local files
[19:33:30] Completed 1050000 out of 5000000 steps  (21 percent)
[19:53:31] Writing local files
[19:53:32] Completed 1100000 out of 5000000 steps  (22 percent)
[20:12:52] Writing local files
[20:12:52] Completed 1150000 out of 5000000 steps  (23 percent)
[20:31:52] Writing local files
[20:31:52] Completed 1200000 out of 5000000 steps  (24 percent)
[20:45:01] Gromacs cannot continue further.
[20:45:01] Going to send back what have done.
[20:45:01] logfile size: 39226
[20:45:01] - Writing 39762 bytes of core data to disk...
[20:45:01]   ... Done.
[20:45:01] - Failed to delete work/wudata_09.xtc
[20:45:01] No C.P. to delete.
[20:45:01] - Failed to delete work/wudata_09.goe
[20:45:01] - Failed to delete work/wudata_09.pdo
[20:45:01] - Failed to delete work/wudata_09.xvg
[20:45:01] Warning:  check for stray files
[20:47:01] 
[20:47:01] Folding@home Core Shutdown: EARLY_UNIT_END
[20:47:01] 
[20:47:01] Folding@home Core Shutdown: EARLY_UNIT_END
[20:47:05] CoreStatus = 7B (123)
[20:47:05] Client-core communications error: ERROR 0x7b
[20:47:05] Deleting current work unit & continuing...
[20:49:25] - Preparing to get new work unit...
[20:49:25] + Attempting to get work packet
[20:49:25] - Connecting to assignment server
[20:49:26] - Successful: assigned to (171.64.65.64).
[20:49:26] + News From Folding@Home: Welcome to Folding@Home
[20:49:26] Loaded queue successfully.
[20:51:16] + Closed connections
[20:51:21] 
[20:51:21] + Processing work unit
[20:51:21] Core required: FahCore_a1.exe
[20:51:21] Core found.
[20:51:21] Working on Unit 00 [May 21 20:51:21]
[20:51:21] + Working ...
[20:51:21] 
[20:51:21] *------------------------------*
[20:51:21] Folding@Home Gromacs SMP Core
[20:51:21] Version 1.74 (March 10, 2007)
[20:51:21] 
[20:51:21] Preparing to commence simulation
[20:51:21] - Ensuring status. Please wait.
[20:51:38] - Assembly optimizations manually forced on.
[20:51:38] - Not checking prior termination.
[20:51:43] - Expanded 2449201 -> 12890333 (decompressed 526.3 percent)
[20:51:43] - Starting from initial work packet
[20:51:43] 
[20:51:43] Project: 2653 (Run 13, Clone 28, Gen 66)
[20:51:43] 
[20:51:43] Assembly optimizations on if available.
[20:51:43] Entering M.D.
[20:51:50] Rejecting checkpoint
[20:51:51] Protein: Protein in POPCExtra SSE boost OK.
[20:51:51] 
[20:51:52] Extra SSE boost OK.
[20:51:53] Writing local files
[20:51:53] Completed 0 out of 500000 steps  (0 percent)
[21:07:46] Writing local files
[21:07:46] Completed 5000 out of 500000 steps  (1 percent)

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Posted: Wed May 21, 2008 11:24 pm
by toTOW
You should try qfix and see if it helps ...

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Posted: Thu May 22, 2008 1:03 am
by anandhanju
If you get it again, try stopping your client at around 20% and then restart it. Hopefully, it should go beyond the failure point safely.

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Posted: Thu May 22, 2008 3:35 am
by nwkelley
client core communication errors are "business as usual" on SMP projects, hopefully one of these options will work (ie receiving partial credit from qfix) thanks for your post, good luck and let us know if you have further troubles with it.
nick

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Posted: Thu May 22, 2008 6:01 am
by DocJonz
toTOW wrote:You should try qfix and see if it helps ...
At the end of the log I posted, it said that it deleted the WU and started a new one - I'm asuming, therefore, that qfix isn't going to help in this case :(
anandhanju wrote:If you get it again, try stopping your client at around 20% and then restart it. Hopefully, it should go beyond the failure point safely.
This would be a good thing to try - unfortunately I won't necessarily be around at the right time to give it a go ... maybe one day I'll catch one :wink:
nwkelley wrote:client core communication errors are "business as usual" on SMP projects, hopefully one of these options will work (ie receiving partial credit from qfix) thanks for your post, good luck and let us know if you have further troubles with it.
nick
Are these 'client core comms' errors always going to be there on the SMP? Or can we expect the client-writing whizzo's at Stanford to make this a thing of the past - I'm hoping the latter :D

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Posted: Thu May 22, 2008 9:38 am
by toTOW
DocJonz wrote:
toTOW wrote:You should try qfix and see if it helps ...
At the end of the log I posted, it said that it deleted the WU and started a new one - I'm asuming, therefore, that qfix isn't going to help in this case :(
If you see this message, something has been written ... look in your /work folder, and if you find a wuresult_xx.dat, qfix has something to fix ;) :

Code: Select all

[20:45:01] Going to send back what have done.
[20:45:01] logfile size: 39226
[20:45:01] - Writing 39762 bytes of core data to disk...
[20:45:01]   ... Done.