Page 1 of 1

Picking up WUs already started? *SOLVED*

Posted: Wed Mar 02, 2011 4:05 am
by Full_Taoer
Of the last 4 WUs my laptop has picked up, 3 of them started at a step other than 0. One started at 89%, another at 13%, and the latest one at 51%.

Are these units that EUE'd for someone else and I'm picking up where they left off, or is this an error in it's own right? They are getting credited as if I completed the whole WU myself when submitted.

Re: Picking up WUs already started?

Posted: Wed Mar 02, 2011 5:02 am
by PantherX
Whenever a WU is assigned, it always start from 0% so I believe that you restarted the Client and it resumed from the checkpoint. However, if in doubt, please post the FAHlog for further analysis.

Re: Picking up WUs already started?

Posted: Fri Mar 04, 2011 5:28 am
by Full_Taoer
There are two examples in the log below, one with the old client and one with the new. The first unit finishes and the client shuts down due to -oneunit flag. My intention was to upgrade to v6.34 client. I came home to find the laptop screen frozen. On restart, the client is set as a service and it picks up a new WU, which starts at 51%.

I stop it, give the -oneunit flag again and send it on its way. The WU finishes and the client shuts down. I upgrade the client from v6.30 to v6.34. Run -configonly and then it shuts down. On restarting the new client, it picks up a WU and starts at 13%.

Code: Select all

[08:18:14] Completed 485000 out of 500000 steps  (97%)
[08:34:50] Completed 490000 out of 500000 steps  (98%)
[08:51:30] Completed 495000 out of 500000 steps  (99%)
[09:08:09] Completed 500000 out of 500000 steps  (100%)
[09:08:11] DynamicWrapper: Finished Work Unit: sleep=10000
[09:08:21] 
[09:08:21] Finished Work Unit:
[09:08:21] - Reading up to 3702144 from "work/wudata_00.trr": Read 3702144
[09:08:21] trr file hash check passed.
[09:08:21] edr file hash check passed.
[09:08:21] logfile size: 70836
[09:08:21] Leaving Run
[09:08:24] - Writing 3808532 bytes of core data to disk...
[09:08:24]   ... Done.
[09:08:28] - Shutting down core
[09:08:28] 
[09:08:28] Folding@home Core Shutdown: FINISHED_UNIT
[09:08:32] CoreStatus = 64 (100)
[09:08:32] Sending work to server
[09:08:32] Project: 6053 (Run 1, Clone 109, Gen 161)
[09:08:32] + Attempting to send results [February 28 09:08:32 UTC]


[09:09:41] + Results successfully sent
[09:09:41] Thank you for your contribution to Folding@Home.
[09:09:41] + Number of Units Completed: 107

[09:09:46] + -oneunit flag given and have now finished a unit. Exiting.
Folding@Home Client Shutdown.


--- Opening Log file [February 28 23:34:21 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH
Service: C:\FAH\fah6
Arguments: -svcstart -d C:\FAH -smp 2 

Launched as a service.
Entered C:\FAH to do work.

[23:34:21] - Ask before connecting: No
[23:34:21] - User name: Full_Taoer (Team 11108)
[23:34:21] - User ID: 4E0AB6AD04C4F0F2
[23:34:21] - Machine ID: 1
[23:34:21] 
[23:34:21] Loaded queue successfully.
[23:34:21] 
[23:34:21] + Processing work unit
[23:34:21] Core required: FahCore_a3.exe
[23:34:21] Core found.
[23:34:21] Working on queue slot 01 [February 28 23:34:21 UTC]
[23:34:21] + Working ...
[23:34:22] 
[23:34:22] *------------------------------*
[23:34:22] Folding@Home Gromacs SMP Core
[23:34:22] Version 2.22 (Mar 12, 2010)
[23:34:22] 
[23:34:22] Preparing to commence simulation
[23:34:22] - Ensuring status. Please wait.
[23:34:32] - Looking at optimizations...
[23:34:32] - Working with standard loops on this execution.
[23:34:32] - Previous termination of core was improper.
[23:34:32] - Files status OK
[23:34:32] - Expanded 1766781 -> 2253305 (decompressed 127.5 percent)
[23:34:32] Called DecompressByteArray: compressed_data_size=1766781 data_size=2253305, decompressed_data_size=2253305 diff=0
[23:34:32] - Digital signature verified
[23:34:32] 
[23:34:32] Project: 6056 (Run 1, Clone 189, Gen 140)
[23:34:32] 
[23:34:32] Entering M.D.
[23:34:38] Using Gromacs checkpoints
[23:34:39] Resuming from checkpoint
[23:34:40] Verified work/wudata_01.log
[23:34:40] Verified work/wudata_01.trr
[23:34:40] Verified work/wudata_01.edr
[23:34:40] Completed 256662 out of 500000 steps  (51%)
[23:38:58] Service stop request received.

Folding@Home Client Shutdown.


--- Opening Log file [March 1 00:09:08 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH
Service: C:\FAH\fah6
Arguments: -svcstart -d C:\FAH -oneunit -smp 2 

Launched as a service.
Entered C:\FAH to do work.

[00:09:08] - Ask before connecting: No
[00:09:08] - User name: Full_Taoer (Team 11108) 
[00:09:08] - User ID: 4E0AB6AD04C4F0F2
[00:09:08] - Machine ID: 1
[00:09:08] 
[00:09:09] Loaded queue successfully.
[00:09:09] 
[00:09:09] + Processing work unit
[00:09:09] Core required: FahCore_a3.exe
[00:09:09] Core found.
[00:09:09] Working on queue slot 01 [March 1 00:09:09 UTC]
[00:09:09] + Working ...
[00:09:09] 
[00:09:09] *------------------------------*
[00:09:09] Folding@Home Gromacs SMP Core
[00:09:09] Version 2.22 (Mar 12, 2010)
[00:09:09] 
[00:09:09] Preparing to commence simulation
[00:09:09] - Looking at optimizations...
[00:09:09] - Files status OK
[00:09:09] - Expanded 1766781 -> 2253305 (decompressed 127.5 percent)
[00:09:09] Called DecompressByteArray: compressed_data_size=1766781 data_size=2253305, decompressed_data_size=2253305 diff=0
[00:09:09] - Digital signature verified
[00:09:09] 
[00:09:09] Project: 6056 (Run 1, Clone 189, Gen 140)
[00:09:09] 
[00:09:09] Assembly optimizations on if available.
[00:09:09] Entering M.D.
[00:09:15] Using Gromacs checkpoints
[00:09:16] Resuming from checkpoint
[00:09:16] Verified work/wudata_01.log
[00:09:16] Verified work/wudata_01.trr
[00:09:16] Verified work/wudata_01.edr
[00:09:17] Completed 257652 out of 500000 steps  (51%)
[00:17:14] Completed 260000 out of 500000 steps  (52%)
[00:35:54] Completed 265000 out of 500000 steps  (53%)
[00:54:41] Completed 270000 out of 500000 steps  (54%)
[01:11:26] Completed 275000 out of 500000 steps  (55%)
[01:28:04] Completed 280000 out of 500000 steps  (56%)
[01:44:40] Completed 285000 out of 500000 steps  (57%)
[02:01:19] Completed 290000 out of 500000 steps  (58%)
[02:17:55] Completed 295000 out of 500000 steps  (59%)
[02:34:31] Completed 300000 out of 500000 steps  (60%)
[02:51:24] Completed 305000 out of 500000 steps  (61%)
[03:08:07] Completed 310000 out of 500000 steps  (62%)
[03:24:44] Completed 315000 out of 500000 steps  (63%)
[03:41:24] Completed 320000 out of 500000 steps  (64%)
[03:58:10] Completed 325000 out of 500000 steps  (65%)
[04:14:47] Completed 330000 out of 500000 steps  (66%)
[04:31:26] Completed 335000 out of 500000 steps  (67%)
[04:48:06] Completed 340000 out of 500000 steps  (68%)
[05:04:50] Completed 345000 out of 500000 steps  (69%)
[05:21:30] Completed 350000 out of 500000 steps  (70%)
[05:39:24] Completed 355000 out of 500000 steps  (71%)
[05:56:13] Completed 360000 out of 500000 steps  (72%)
[06:13:06] Completed 365000 out of 500000 steps  (73%)
[06:29:54] Completed 370000 out of 500000 steps  (74%)
[06:48:26] Completed 375000 out of 500000 steps  (75%)
[07:05:09] Completed 380000 out of 500000 steps  (76%)
[07:21:51] Completed 385000 out of 500000 steps  (77%)
[07:38:34] Completed 390000 out of 500000 steps  (78%)
[07:55:15] Completed 395000 out of 500000 steps  (79%)
[08:15:38] Completed 400000 out of 500000 steps  (80%)
[08:32:22] Completed 405000 out of 500000 steps  (81%)
[08:49:11] Completed 410000 out of 500000 steps  (82%)
[09:06:56] Completed 415000 out of 500000 steps  (83%)
[09:23:37] Completed 420000 out of 500000 steps  (84%)
[09:40:16] Completed 425000 out of 500000 steps  (85%)
[09:56:54] Completed 430000 out of 500000 steps  (86%)
[10:13:32] Completed 435000 out of 500000 steps  (87%)
[10:30:08] Completed 440000 out of 500000 steps  (88%)
[10:46:52] Completed 445000 out of 500000 steps  (89%)
[11:03:33] Completed 450000 out of 500000 steps  (90%)
[11:20:10] Completed 455000 out of 500000 steps  (91%)
[11:36:47] Completed 460000 out of 500000 steps  (92%)
[11:53:23] Completed 465000 out of 500000 steps  (93%)
[12:10:03] Completed 470000 out of 500000 steps  (94%)
[12:26:41] Completed 475000 out of 500000 steps  (95%)
[12:43:21] Completed 480000 out of 500000 steps  (96%)
[13:00:07] Completed 485000 out of 500000 steps  (97%)
[13:16:45] Completed 490000 out of 500000 steps  (98%)
[13:33:24] Completed 495000 out of 500000 steps  (99%)
[13:50:04] Completed 500000 out of 500000 steps  (100%)
[13:50:05] DynamicWrapper: Finished Work Unit: sleep=10000
[13:50:15] 
[13:50:15] Finished Work Unit:
[13:50:15] - Reading up to 3704976 from "work/wudata_01.trr": Read 3704976
[13:50:15] trr file hash check passed.
[13:50:15] edr file hash check passed.
[13:50:15] logfile size: 71408
[13:50:15] Leaving Run
[13:50:17] - Writing 3811936 bytes of core data to disk...
[13:50:17]   ... Done.
[13:50:20] - Shutting down core
[13:50:20] 
[13:50:20] Folding@home Core Shutdown: FINISHED_UNIT
[13:50:24] CoreStatus = 64 (100)
[13:50:24] Sending work to server
[13:50:24] Project: 6056 (Run 1, Clone 189, Gen 140)


[13:50:24] + Attempting to send results [March 1 13:50:24 UTC]
[13:51:33] + Results successfully sent
[13:51:33] Thank you for your contribution to Folding@Home.
[13:51:33] + Number of Units Completed: 108

[13:51:39] + -oneunit flag given and have now finished a unit. Exiting.
Folding@Home Client Shutdown.


--- Opening Log file [March 1 17:41:01 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH
Executable: fah6
Arguments: -configonly -smp 2 

[17:41:01] - Ask before connecting: No
[17:41:01] - User name: Full_Taoer (Team 11108)
[17:41:01] - User ID: 4E0AB6AD04C4F0F2
[17:41:01] - Machine ID: 1
[17:41:01] 
[17:41:01] Configuring Folding@Home...


[17:42:55] - Ask before connecting: No
[17:42:55] - User name: Full_Taoer (Team 11108)
[17:42:55] - User ID: 4E0AB6AD04C4F0F2
[17:42:55] - Machine ID: 1
[17:42:55] 
[17:42:55] -configonly flag given, so exiting.


--- Opening Log file [March 1 17:43:38 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH
Service: C:\FAH\fah6
Arguments: -svcstart -d C:\FAH -smp 

Launched as a service.
Entered C:\FAH to do work.

[17:43:38] - Ask before connecting: No
[17:43:38] - User name: Full_Taoer (Team 11108)
[17:43:38] - User ID: 4E0AB6AD04C4F0F2
[17:43:38] - Machine ID: 1
[17:43:38] 
[17:43:39] Loaded queue successfully.
[17:43:39] 
[17:43:39] + Processing work unit
[17:43:39] Core required: FahCore_a3.exe
[17:43:39] Core found.
[17:43:39] Working on queue slot 02 [March 1 17:43:39 UTC]
[17:43:39] + Working ...
[17:43:39] 
[17:43:39] *------------------------------*
[17:43:39] Folding@Home Gromacs SMP Core
[17:43:39] Version 2.22 (Mar 12, 2010)
[17:43:39] 
[17:43:39] Preparing to commence simulation
[17:43:39] - Looking at optimizations...
[17:43:39] - Files status OK
[17:43:39] - Expanded 1765647 -> 2253729 (decompressed 127.6 percent)
[17:43:39] Called DecompressByteArray: compressed_data_size=1765647 data_size=2253729, decompressed_data_size=2253729 diff=0
[17:43:39] - Digital signature verified
[17:43:39] 
[17:43:39] Project: 6053 (Run 1, Clone 60, Gen 174)
[17:43:39] 
[17:43:39] Assembly optimizations on if available.
[17:43:39] Entering M.D.
[17:43:45] Using Gromacs checkpoints
[17:43:46] Resuming from checkpoint
[17:43:46] Verified work/wudata_02.log
[17:43:46] Verified work/wudata_02.trr
[17:43:46] Verified work/wudata_02.edr
[17:43:46] Completed 66662 out of 500000 steps  (13%)
[17:55:15] Completed 70000 out of 500000 steps  (14%)
[18:12:35] Completed 75000 out of 500000 steps  (15%)
[18:29:27] Completed 80000 out of 500000 steps  (16%)
It doesn't seem to cause any problems for me, but it seemed strange enough to report.

The latest WU started from 0%.

Re: Picking up WUs already started?

Posted: Fri Mar 04, 2011 5:42 am
by bruce
The -oneunit and installing as a service are incompatible. It goes something like this:

FAH and Windows configure a service with the default option to always restart the service if the service is terminated for any reason. You add the -oneunit flag which causes the serivce to shut down when it finishes the current WU but windows promptly restarts it. Later you come back and discover that the WU has already made some significant progress.

To use the -onunit flag, you have to go into the Windows Service manager and stop the serivce, (perhaps setting it to Manual, too). Then you restart the client interactively with the -oneunit flag. The interactive client finishes the current WU and shuts down properly. Later, whenever you're ready, you restart the service, either manually (if you set it to Manual) or perhaps by rebooting (if it's still set to Automatic).

Re: Picking up WUs already started? *SOLVED*

Posted: Sat Mar 05, 2011 12:16 am
by Full_Taoer
That is the method I used. From the Windows Services manager, I stopped the service, then added the -oneunit flag as a start parameter. When the WU finished, the client stopped as it was supposed to, according to the log. I dropped in the new binary and ran it with -configonly from the Command Line. I then restarted the service. It started that WU at 13%.

Perhaps after shutting down for the -oneunit command, it did as you suggest and restarted after a period of time. Maybe it was folding in some state that didn't log the activity. There was about a 3 hour span from the -oneunit shutdown to the time I restarted with the new client. The progress for that WU on my laptop in three hours would be close to 13%

Thanks for the responses, Panther and Bruce. I'm going to call this solved.

Re: Picking up WUs already started?

Posted: Sat Mar 05, 2011 12:28 am
by bruce
If you rebooted, that's probably what started the service. In my experience (I have not tested it carefully) if there are no changes to the default service configuration settings, it can be stopped manually and will certainly restart after a reboot. I don't know if some other event triggers a restart. I do know that if you put -oneunit in the configuration and you start the service, the oneunit shutdown will trigger an automatic restart.

Note my instructions:
hen you restart the client interactively with the -oneunit flag
Now look at the log you posted: Arguments: -svcstart -d C:\FAH -oneunit -smp 2 Your log shows that you restarted the client as a service so you got an automatic restart.