Page 1 of 1

Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Tue Sep 28, 2010 1:46 pm
by todh
I've got a machine that's been stuck on Project: 6509 (Run 15, Clone 232, Gen 14) for awhile now:

Code: Select all

[02:42:51] Writing local files
[02:42:51] Completed 15000 out of 250000 steps  (6%)
[02:57:52] Timered checkpoint triggered.
[03:09:04] CoreStatus = 0 (0)
[03:09:04] Sending work to server
[03:09:04] Project: 6509 (Run 15, Clone 232, Gen 14)
[03:09:04] - Error: Could not get length of results file work/wuresults_09.dat
[03:09:04] - Error: Could not read unit 09 file. Removing from queue.
[03:09:04] Trying to send all finished work units
[03:09:04] + No unsent completed units remaining.
[03:09:04] - Preparing to get new work unit...
[03:09:04] Cleaning up work directory
--
[05:54:47] Writing local files
[05:54:47] Completed 15000 out of 250000 steps  (6%)
[06:09:47] Timered checkpoint triggered.
[06:21:01] CoreStatus = 0 (0)
[06:21:01] Sending work to server
[06:21:01] Project: 6509 (Run 15, Clone 232, Gen 14)
[06:21:01] - Error: Could not get length of results file work/wuresults_00.dat
[06:21:01] - Error: Could not read unit 00 file. Removing from queue.
[06:21:01] Trying to send all finished work units
[06:21:01] + No unsent completed units remaining.
[06:21:01] - Preparing to get new work unit...
[06:21:01] Cleaning up work directory
--
[09:06:59] Writing local files
[09:06:59] Completed 15000 out of 250000 steps  (6%)
[09:21:59] Timered checkpoint triggered.
[09:33:14] CoreStatus = 0 (0)
[09:33:14] Sending work to server
[09:33:14] Project: 6509 (Run 15, Clone 232, Gen 14)
[09:33:14] - Error: Could not get length of results file work/wuresults_01.dat
[09:33:14] - Error: Could not read unit 01 file. Removing from queue.
[09:33:14] Trying to send all finished work units
[09:33:14] + No unsent completed units remaining.
[09:33:14] - Preparing to get new work unit...
[09:33:14] Cleaning up work directory
--
[12:18:42] Writing local files
[12:18:42] Completed 15000 out of 250000 steps  (6%)
[12:33:42] Timered checkpoint triggered.
[12:44:53] CoreStatus = 0 (0)
[12:44:53] Sending work to server
[12:44:53] Project: 6509 (Run 15, Clone 232, Gen 14)
[12:44:53] - Error: Could not get length of results file work/wuresults_02.dat
[12:44:53] - Error: Could not read unit 02 file. Removing from queue.
[12:44:53] Trying to send all finished work units
[12:44:53] + No unsent completed units remaining.
[12:44:53] - Preparing to get new work unit...
[12:44:53] Cleaning up work directory

Re: Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Wed Sep 29, 2010 4:22 am
by todh
Any word on this one? It's still failing at the same place, should I just delete it?

Code: Select all

[01:05:43] Completed 15000 out of 250000 steps  (6%)
[01:20:43] Timered checkpoint triggered.
[01:31:54] CoreStatus = 0 (0)
[01:31:54] Sending work to server
[01:31:54] Project: 6509 (Run 15, Clone 232, Gen 14)
[01:31:54] - Error: Could not get length of results file work/wuresults_06.dat
[01:31:54] - Error: Could not read unit 06 file. Removing from queue.
[01:31:54] Trying to send all finished work units
[01:31:54] + No unsent completed units remaining.
[01:31:54] - Preparing to get new work unit...
[01:31:54] Cleaning up work directory

Re: Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Wed Sep 29, 2010 5:09 am
by John_Weatherman
todh wrote:Any word on this one? It's still failing at the same place, should I just delete it?
Yep. You might have to delete the queue.dat too to get a new WU.

Re: Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Wed Sep 29, 2010 12:54 pm
by todh
John_Weatherman wrote: Yep. You might have to delete the queue.dat too to get a new WU.
Mmm, apparently that's not enough, it came back:

Code: Select all

[12:26:56] Loaded queue successfully.
[12:26:56] Printing Queue Information
Current Queue: 
Slot 01  Empty/Deleted

Slot 02  Empty/Deleted

Slot 03  Empty/Deleted

Slot 04  Empty/Deleted

Slot 05  Empty/Deleted

Slot 06  Empty/Deleted

Slot 07  Empty/Deleted

Slot 08  Empty/Deleted

Slot 09  Empty/Deleted

Slot 00 *Empty/Deleted

PF: 0.000000 based on last 0 slot(s)

Folding@Home Client Shutdown.


--- Opening Log file [September 29 12:29:01 UTC] 


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/fah/hitchcock
Executable: /home/fah/bin/fah6
Arguments: -verbosity 9 

[12:29:01] - Ask before connecting: No
[12:29:01] - User name: bondcliff (Team 163)
[12:29:01] - User ID: 5B39A2904F168873
[12:29:01] - Machine ID: 4
[12:29:01] 
[12:29:01] Loaded queue successfully.
[12:29:01] - Preparing to get new work unit...
[12:29:01] Cleaning up work directory
[12:29:01] - Autosending finished units... [September 29 12:29:01 UTC]
[12:29:01] Trying to send all finished work units
[12:29:01] + No unsent completed units remaining.
[12:29:01] - Autosend completed
[12:29:01] + Attempting to get work packet
[12:29:01] - Will indicate memory of 983 MB
[12:29:01] - Connecting to assignment server
[12:29:01] Connecting to http://assign.stanford.edu:8080/
[12:29:01] Posted data.
[12:29:01] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[12:29:01] + News From Folding@Home: Welcome to Folding@Home
[12:29:02] Loaded queue successfully.
[12:29:02] Connecting to http://171.64.65.62:8080/
[12:29:03] Posted data.
[12:29:03] Initial: 0000; - Receiving payload (expected size: 997268)
[12:29:04] - Downloaded at ~973 kB/s
[12:29:04] - Averaged speed for that direction ~973 kB/s
[12:29:04] + Received work.
[12:29:04] + Closed connections
[12:29:04] 
[12:29:04] + Processing work unit
[12:29:04] Core required: FahCore_78.exe
[12:29:04] Core found.
[12:29:04] Working on queue slot 01 [September 29 12:29:04 UTC]
[12:29:04] + Working ...
[12:29:04] - Calling './FahCore_78.exe -dir work/ -nice 19 -suffix 01 -np 0 -checkpoint 15 -verbose -lifeline 3805 -version 629'

[12:29:04] 
[12:29:04] *------------------------------*
[12:29:04] Folding@Home Gromacs Core
[12:29:04] Version 1.90 (March 8, 2006)
[12:29:04] 
[12:29:04] Preparing to commence simulation
[12:29:04] - Looking at optimizations...
[12:29:04] - Created dyn
[12:29:04] - Files status OK
[12:29:05] - Expanded 996756 -> 5048061 (decompressed 506.4 percent)
[12:29:05] - Starting from initial work packet
[12:29:05] 
[12:29:05] Project: 6509 (Run 15, Clone 232, Gen 14)
[12:29:05] 
[12:29:05] Assembly optimizations on if available.
[12:29:05] Entering M.D.
[12:29:11] Protein: TR574_16 in water
[12:29:11] 
[12:29:11] Writing local files
[12:29:11] Extra SSE boost OK.
[12:29:12] Writing local files
[12:29:12] Completed 0 out of 250000 steps  (0%)
[12:44:13] Timered checkpoint triggered.
I've shut down the client for the time being.

Re: Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Wed Sep 29, 2010 2:04 pm
by John_Weatherman
OK, try deleting the work folder. If that does n't work try changing the machine id. Hopefully one of those will do the trick.
In the meanwhile the Mods could hopefully pass on to Stanford that this is a duff WU.

Re: Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Wed Sep 29, 2010 6:47 pm
by todh
John_Weatherman wrote:OK, try deleting the work folder. If that does n't work try changing the machine id. Hopefully one of those will do the trick.
In the meanwhile the Mods could hopefully pass on to Stanford that this is a duff WU.
Wow, it took changing the machineid to get a new WU, deleting the work folder and queue.dat wasn't enough. Thanks.

Re: Project: 6509 (Run 15, Clone 232, Gen 14)

Posted: Fri Oct 15, 2010 11:12 pm
by sortofageek
For the record, Project: 6509 (Run 15, Clone 232, Gen 14) was added to the stats database on 2010-10-09 07:08:18 for 192 points of credit.