Page 1 of 1

Project: 3065 (Run 0, Clone 156, Gen 20)

Posted: Sat Feb 21, 2009 10:23 pm
by Flathead74
Project: 3065 (Run 0, Clone 156, Gen 20)

Warning: long 1-4 interactions

This WU seems to have been rattling about for quite a while, seeing that it is Run 0.

...another case of the old "Error: Could not read unit 05 file. Removing from queue.", striking again. :lol:

Code: Select all

[07:27:40] Project: 3065 (Run 0, Clone 156, Gen 20)
[07:27:40] 
[07:27:40] Assembly optimizations on if available.
[07:27:40] Entering M.D.
[07:27:46] Rejecting checkpoint
[07:27:47] Protein: 66728 p3065_lambda5_99sb_bigExtra SSE boost OK.
[07:27:47] 
[07:27:47] Extra SSE boost OK.
[07:27:47] Writing local files
[07:27:47] Completed 0 out of 2500000 steps  (0 percent)
[07:36:36] Writing local files
[07:36:36] Completed 25000 out of 2500000 steps  (1 percent)
<snip>
[10:44:34] Completed 550000 out of 2500000 steps  (22 percent)
[10:53:22] Warning:  long 1-4 interactions
[10:53:26] CoreStatus = 1 (1)
[10:53:26] Sending work to server
[10:53:26] Project: 3065 (Run 0, Clone 156, Gen 20)
[10:53:26] - Error: Could not get length of results file work/wuresults_05.dat
[10:53:26] - Error: Could not read unit 05 file. Removing from queue.
[10:53:26] Trying to send all finished work units
[10:53:26] + No unsent completed units remaining.

Re: Project: 3065 (Run 0, Clone 156, Gen 20)

Posted: Sat Feb 21, 2009 10:42 pm
by toTOW
There is only one report in the DB for this WU : it's for partial credit.

Re: Project: 3065 (Run 0, Clone 156, Gen 20)

Posted: Sun Feb 22, 2009 12:16 am
by Flathead74
Thanks for checking, toTOW.

Re: Project: 3065 (Run 0, Clone 156, Gen 20)

Posted: Wed Feb 25, 2009 2:49 pm
by Outback_Jon
I had a p3065 r0 c214 g10 get an EUE on me overnight.

Code: Select all

[23:10:56] *-----------------Folding@Home Gromacs SMP Core
[23:10:56] Version 1.7- Assembly optimizations manually forced on.
[23:11:13] - Not checking prior termination.
[23:11:17] - Expanded 1651703 -> 9524377 (decompressed 576.6 percent)
[23:11:18] 
[23:11:18] Project: 3065 (Run 0, Clone 214, Gen 10)
[23:11:18] 
[23:11:21] Assembly optimizations on if available.
[23:11:21] Entering M.D.
[23:11:31] Calling FAH init
[23:11:32] Read topology
[23:11:32] sb_big
[23:11:32] ng from checkpoint)
[23:11:32] Read checkpoint
[23:11:32] Protein: 66728 p3065_lambda5_99sb_big
[23:11:34] Writing local files
[23:11:34] Completed 17740 out of 2500000 steps  (0 percent)
[23:11:35] Extra SSE boost OK.
[23:28:14] Writing local files
[23:28:42] Completed 25000 out of 2500000 steps  (1 percent)
[23:59:08] Timered checkpoint triggered.
[00:12:44] Writing local files
[00:13:10] Completed 50000 out of 2500000 steps  (2 percent)
[00:43:36] Timered checkpoint triggered.
[00:51:59] Writing local files
[00:52:25] Completed 75000 out of 2500000 steps  (3 percent)
[01:22:51] Timered checkpoint triggered.
[01:30:56] Writing local files
[01:31:21] Completed 100000 out of 2500000 steps  (4 percent)
[02:01:47] Timered checkpoint triggered.
[02:09:52] Writing local files
[02:10:18] Completed 125000 out of 2500000 steps  (5 percent)
[02:40:45] Timered checkpoint triggered.
[02:48:55] Writing local files
[02:49:21] Completed 150000 out of 2500000 steps  (6 percent)
[03:19:47] Timered checkpoint triggered.
[03:27:51] Writing local files
[03:28:16] Completed 175000 out of 2500000 steps  (7 percent)
[03:58:43] Timered checkpoint triggered.
[04:08:42] Writing local files
[04:09:07] Completed 200000 out of 2500000 steps  (8 percent)
[04:39:34] Timered checkpoint triggered.
[04:47:43] Writing local files
[04:48:09] Completed 225000 out of 2500000 steps  (9 percent)
[05:10:53] - Autosending finished units... [February 25 05:10:53 UTC]
[05:10:53] Trying to send all finished work units
[05:10:53] + No unsent completed units remaining.
[05:10:53] - Autosend completed
[05:18:38] Timered checkpoint triggered.
[05:27:43] Writing local files
[05:28:09] Completed 250000 out of 2500000 steps  (10 percent)
[05:58:35] Timered checkpoint triggered.
[06:12:04] Writing local files
[06:12:30] Completed 275000 out of 2500000 steps  (11 percent)
[06:42:56] Timered checkpoint triggered.
[06:52:07] Writing local files
[06:52:33] Completed 300000 out of 2500000 steps  (12 percent)
[07:22:58] Timered checkpoint triggered.
[07:32:06] Writing local files
[07:32:31] Completed 325000 out of 2500000 steps  (13 percent)
[08:02:56] Timered checkpoint triggered.
[08:12:00] Writing local files
[08:12:26] Completed 350000 out of 2500000 steps  (14 percent)
[08:42:50] Timered checkpoint triggered.
[08:51:56] Writing local files
[08:52:21] Completed 375000 out of 2500000 steps  (15 percent)
[09:22:46] Timered checkpoint triggered.
[09:32:03] Writing local files
[09:32:28] Completed 400000 out of 2500000 steps  (16 percent)
[10:02:54] Timered checkpoint triggered.
[10:12:32] Writing local files
[10:12:58] Completed 425000 out of 2500000 steps  (17 percent)
[10:43:22] Timered checkpoint triggered.
[10:53:15] Writing local files
[10:53:41] Completed 450000 out of 2500000 steps  (18 percent)
[10:59:14] Quit 101 - NaN detected: (ener[20])
[10:59:14] 
[10:59:14] Simulation instability has been encountered. The run has entered a
[10:59:14]   state from which no further progress can be made.
[10:59:14] This may be the correct result of the simulation, however if you
[10:59:14]   often see other project units terminating early like this
[10:59:14]   too, you may wish to check the stability of your computer (issues
[10:59:14]   such as high temperature, overclocking, etc.).
[10:59:14] Going to send back what have done.
[10:59:14] logfile size: 28600
[10:59:14] - Writing 29150 bytes of core data to disk...
[10:59:15]   ... Done.
[10:59:17] - Failed to delete work/wudata_00.xtc
[10:59:53] No C.P. to delete.
[11:00:40] Warning:  check for stray files
[11:02:53] 
[11:02:53] Folding@home Core Shutdown: EARLY_UNIT_END
[11:02:53] 
[11:02:53] Folding@home Core Shutdown: EARLY_UNIT_END
[11:03:04] CoreStatus = 63 (99)

Re: Project: 3065 (Run 0, Clone 156, Gen 20)

Posted: Wed Feb 25, 2009 6:40 pm
by bruce
Flathead74 wrote:...another case of the old "Error: Could not read unit 05 file. Removing from queue.", striking again. :lol:
Presumably you don't have an overenthusiastic anti-virus program that has permission to prune the FAH work directory.

Re: Project: 3065 (Run 0, Clone 156, Gen 20)

Posted: Wed Feb 25, 2009 7:40 pm
by Flathead74
bruce wrote:
Flathead74 wrote:...another case of the old "Error: Could not read unit 05 file. Removing from queue.", striking again. :lol:
Presumably you don't have an overenthusiastic anti-virus program that has permission to prune the FAH work directory.
You would be presuming correctly... it is a dedicated system with no a/v program.

I really suspect that this WU has been assigned many times, considering the Run number.

For example, our team has records of one of our mates processing Project: 3065 (Run 0, Clone 15, Gen 0), on 1/24/2008.
As you can see, that's over a year ago.
It seems to me that a Run 0 WU would more likely have already been completed by now, if it were good.

Client is 6.24beta_alt