Project: 2665 (Run 2, Clone 650, Gen 55) 'EUE' ?
Posted: Sun Oct 19, 2008 8:42 pm
Failed at the same point on both my sister machines (well same % progress anyway the slightly higher spec one (in my signature) got through a few kbs more of the WU looking at the logs. Have just Qfixed the result;
LOWER spec Machine;
HIGHER spec Machine;
Cheers.
LOWER spec Machine;
Code: Select all
[03:42:30] + Processing work unit
[03:42:30] Work type a1 not eligible for variable processors
[03:42:30] Core required: FahCore_a1.exe
[03:42:30] Core found.
[03:42:30] Using generic mpiexec calls
[03:42:30] Working on queue slot 01 [October 19 03:42:30 UTC]
[03:42:30] + Working ...
[03:42:30] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 996 -version 622'
[03:42:30]
[03:42:30] *------------------------------*
[03:42:30] Folding@Home Gromacs SMP Core
[03:42:30] Version 1.74 (March 10, 2007)
[03:42:30]
[03:42:30] Preparing to commence simulation
[03:42:30] - Ensuring status. Please wait.
[03:42:39] - Starting from initial work packet
[03:42:39]
[03:42:39] Project: 2665 (Run 2, Clone 650, Gen 55)
[03:42:39]
[03:42:39] Assembly optimizations on if available.
[03:42:39] Entering M.D.
[03:43:00] al work packet
[03:43:00]
[03:43:00] Project: 2665 (Run 2, Clone 650, Gen 55)
[03:43:00]
[03:43:03] Entering M.D.
[03:43:06] ne 650, Gen 55)
[03:43:06]
[03:43:06] Entering M.D.
[03:43:15] GG with glycosylations
[03:43:15] Writing local files
[03:43:15] cal files
[03:43:17] Extra SSE boost OK.
[03:43:29] cal files
[03:43:29] Completed 0 out of 250000 steps (0 percent)
[03:58:29] Timered checkpoint triggered.
[04:13:22] Writing local files
[04:13:23] Completed 2500 out of 250000 steps (1 percent)
[04:28:24] Timered checkpoint triggered.
[04:42:26] Writing local files
[04:42:27] Completed 5000 out of 250000 steps (2 percent)
[04:57:28] Timered checkpoint triggered.
[05:11:33] Writing local files
[05:11:33] Completed 7500 out of 250000 steps (3 percent)
[05:26:34] Timered checkpoint triggered.
[05:40:40] Writing local files
[05:40:41] Completed 10000 out of 250000 steps (4 percent)
[05:55:41] Timered checkpoint triggered.
[06:09:47] Writing local files
[06:09:48] Completed 12500 out of 250000 steps (5 percent)
[06:24:48] Timered checkpoint triggered.
[06:31:41] - Autosending finished units... [October 19 06:31:41 UTC]
[06:31:41] Trying to send all finished work units
[06:31:41] + No unsent completed units remaining.
[06:31:41] - Autosend completed
[06:38:56] Writing local files
[06:38:56] Completed 15000 out of 250000 steps (6 percent)
[06:53:57] Timered checkpoint triggered.
[07:08:02] Writing local files
[07:08:02] Completed 17500 out of 250000 steps (7 percent)
[07:23:03] Timered checkpoint triggered.
[07:37:08] Writing local files
[07:37:08] Completed 20000 out of 250000 steps (8 percent)
[07:52:09] Timered checkpoint triggered.
[08:06:13] Writing local files
[08:06:14] Completed 22500 out of 250000 steps (9 percent)
[08:21:15] Timered checkpoint triggered.
[08:35:21] Writing local files
[08:35:22] Completed 25000 out of 250000 steps (10 percent)
[08:50:23] Timered checkpoint triggered.
[09:04:27] Writing local files
[09:04:27] Completed 27500 out of 250000 steps (11 percent)
[09:15:42] Warning: long 1-4 interactions
[09:15:42] Gromacs cannot continue further.
[09:15:42] Going to send back what have done.
[09:15:42] logfile size: 30037
[09:15:42] - Writing 30573 bytes of core data to disk...
[09:15:42] ... Done.
[09:17:42]
[09:17:42] Folding@home Core Shutdown: EARLY_UNIT_END
[09:17:42]
[09:17:42] Folding@home Core Shutdown: EARLY_UNIT_END
[09:17:47] CoreStatus = 7B (123)
[09:17:47] Client-core communications error: ERROR 0x7b
[09:17:47] This is a sign of more serious problems, shutting down.
Code: Select all
--- Opening Log file [October 19 11:28:38 UTC]
[11:28:38] - Ask before connecting: No
[11:28:38] - User name: al (Team 13505)
[11:28:38] - User ID: 29363BE33B18C144
[11:28:38] - Machine ID: 2
[11:28:38]
[11:28:38] Loaded queue successfully.
[11:28:38] Deleting work unit #1 from work queue...
[11:28:38] Using generic mpiexec calls
[11:30:41] - Warning: Could not delete all work unit files (1): Core returned invalid code
[11:30:41] - Failed to delete the requested work unit
[11:30:41] ***** Got a SIGTERM signal (2)
[11:30:41] Killing all core threads
[11:30:41] Killing 2 cores
[11:30:41] Killing core 0
[11:30:41] Killing core 1
Folding@Home Client Shutdown.
HIGHER spec Machine;
Code: Select all
[11:47:16] - Ask before connecting: No
[11:47:16] - User name: al (Team 13505)
[11:47:16] - User ID: 1D00E77F41982A15
[11:47:16] - Machine ID: 2
[11:47:16]
[11:47:16] Loaded queue successfully.
[11:47:16]
[11:47:16] - Autosending finished units... [October 19 11:47:16 UTC]
[11:47:16] + Processing work unit
[11:47:16] Trying to send all finished work units[11:47:16] Work type a1 not eligible for variable processors
[11:47:16] Core required: FahCore_a1.exe
[11:47:16] + No unsent completed units remaining.
[11:47:16] - Autosend completed
[11:47:16] Core found.
[11:47:16] Using generic mpiexec calls
[11:47:16] Working on queue slot 02 [October 19 11:47:16 UTC]
[11:47:16] + Working ...
[11:47:16] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 3604 -version 622'
[11:47:17]
[11:47:17] *------------------------------*
[11:47:17] Folding@Home Gromacs SMP Core
[11:47:17] Version 1.74 (March 10, 2007)
[11:47:17]
[11:47:17] Preparing to commence simulation
[11:47:17] - Ensuring status. Please wait.
[11:47:34] - Looking at optimizations...
[11:47:34] - Working with standard loops on this execution.
[11:47:34] - Previous termination of core was improper.
[11:47:34] - Going to use standard loops.
[11:47:34] - Files status OK
[11:47:54] - Expanded 4795617 -> 24810145 (decompressed 517.3 percent)
[11:47:54]
[11:47:54] Project: 2665 (Run 2, Clone 650, Gen 55)
[11:47:54]
[11:47:56] Entering M.D.
[11:48:02] Calling FAH init
[11:48:04] Read topology
[11:48:04] s
[11:48:04] Writing local files
[11:48:04] int)
[11:48:04] Read checkpoint
[11:48:04] Protein: HGG with glycosylations
[11:48:04] Writing local files
[11:48:14] Extra SSE boost OK.
[11:48:14] Writing local files
[11:48:15] Completed 0 out of 250000 steps (0 percent)
[12:03:16] Timered checkpoint triggered.
[12:14:20] Writing local files
[12:14:21] Completed 2500 out of 250000 steps (1 percent)
[12:29:21] Timered checkpoint triggered.
[12:39:24] Writing local files
[12:39:24] Completed 5000 out of 250000 steps (2 percent)
[12:54:25] Timered checkpoint triggered.
[13:03:55] Writing local files
[13:03:55] Completed 7500 out of 250000 steps (3 percent)
[13:18:56] Timered checkpoint triggered.
[13:27:37] Writing local files
[13:27:38] Completed 10000 out of 250000 steps (4 percent)
[13:42:38] Timered checkpoint triggered.
[13:51:01] Writing local files
[13:51:02] Completed 12500 out of 250000 steps (5 percent)
[14:06:04] Timered checkpoint triggered.
[14:14:05] Writing local files
[14:14:05] Completed 15000 out of 250000 steps (6 percent)
[14:29:06] Timered checkpoint triggered.
[14:37:34] Writing local files
[14:37:34] Completed 17500 out of 250000 steps (7 percent)
[14:52:35] Timered checkpoint triggered.
[15:00:34] Writing local files
[15:00:34] Completed 20000 out of 250000 steps (8 percent)
[15:15:35] Timered checkpoint triggered.
[15:23:35] Writing local files
[15:23:35] Completed 22500 out of 250000 steps (9 percent)
[15:38:36] Timered checkpoint triggered.
[15:46:37] Writing local files
[15:46:37] Completed 25000 out of 250000 steps (10 percent)
[16:01:39] Timered checkpoint triggered.
[16:10:03] Writing local files
[16:10:03] Completed 27500 out of 250000 steps (11 percent)
[16:18:59] Warning: long 1-4 interactions
[16:18:59] Gromacs cannot continue further.
[16:18:59] Going to send back what have done.
[16:18:59] logfile size: 39459
[16:18:59] - Writing 39995 bytes of core data to disk...
[16:18:59] ... Done.
[16:18:59] - Failed to delete work/wudata_02.sas
[16:18:59] - Failed to delete work/wudata_02.goe
[16:18:59] - Failed to delete work/wudata_02.pdo
[16:18:59] Warning: check for stray files
[16:20:59]
[16:20:59] Folding@home Core Shutdown: EARLY_UNIT_END
[16:20:59]
[16:20:59] Folding@home Core Shutdown: EARLY_UNIT_END
[16:21:02] CoreStatus = 7B (123)
[16:21:02] Client-core communications error: ERROR 0x7b
[16:21:02] This is a sign of more serious problems, shutting down.