Page 1 of 1

70XX project didn't support checkpoint parameter

Posted: Fri Aug 17, 2012 1:52 am
by vmzy
I have set checkpoint interval to 3 mins.

Code: Select all

<!-- FahCore Control -->
  <checkpoint v='3'/>

  <!-- Folding Slot Configuration -->
  <extra-core-args v='-forceasm'/>
But 70xx project seems just have percent checkpoint.Didn't use checkpoint interval setting.

Code: Select all

15:43:51:WU01:FS00:0xa4:Completed 2200000 out of 10000000 steps  (22%)
15:53:23:Server connection id=1 ended
15:53:25:Lost lifeline PID 2148, exiting
15:53:26:FS00:Shutting core down
15:53:36:WU01:FS00:0xa4:Client no longer detected. Shutting down core 
15:53:36:WU01:FS00:0xa4:
15:53:36:WU01:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
15:53:36:Clean exit
you can see from shutdown log,that I quit v7 10 mins after percent reached.It should checkpointed 3 times.

Code: Select all

01:08:43:WU01:FS00:Starting
01:08:43:WU01:FS00:Running FahCore: "D:\Program Files\FAHClient/FAHCoreWrapper.exe" D:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 701 -lifeline 2520 -checkpoint 3 -np 4 -forceasm
01:08:43:WU01:FS00:Started FahCore on PID 2756
01:08:43:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
01:08:43:WU01:FS00:Core PID:2772
01:08:43:WU01:FS00:FahCore 0xa4 started
01:08:43:WU01:FS00:0xa4:
01:08:43:WU01:FS00:0xa4:*------------------------------*
01:08:43:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
01:08:44:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
01:08:44:WU01:FS00:0xa4:
01:08:44:WU01:FS00:0xa4:Preparing to commence simulation
01:08:44:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
01:08:44:WU01:FS00:0xa4:- Not checking prior termination.
01:08:44:WU01:FS00:0xa4:- Expanded 49744 -> 192928 (decompressed 387.8 percent)
01:08:44:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=49744 data_size=192928, decompressed_data_size=192928 diff=0
01:08:44:WU01:FS00:0xa4:- Digital signature verified
01:08:44:WU01:FS00:0xa4:
01:08:44:WU01:FS00:0xa4:Project: 7027 (Run 1, Clone 448, Gen 14)
01:08:44:WU01:FS00:0xa4:
01:08:44:WU01:FS00:0xa4:Assembly optimizations on if available.
01:08:44:WU01:FS00:0xa4:Entering M.D.
01:08:49:WU01:FS00:0xa4:Using Gromacs checkpoints
01:08:50:WU01:FS00:0xa4:Mapping NT from 4 to 4 
01:08:50:WU01:FS00:0xa4:Resuming from checkpoint
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.log
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.trr
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.xtc
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.edr
01:08:50:WU01:FS00:0xa4:Completed 2200001 out of 10000000 steps  (22%)
But when I restart v7 it just continued from percent checkpoint,waste 10 mins calculation.And the checkpoint interval setting hadn`t take effect.Please check it.


Furthermore ,80xx project seems don't support next-unit-percentage parameter.It will starting download new WU when 100% finished rather than 99%.

Re: 70XX project didn't support checkpoint parameter

Posted: Fri Aug 17, 2012 7:32 pm
by compdewd
vmzy wrote:Furthermore ,80xx project seems don't support next-unit-percentage parameter.It will starting download new WU when 100% finished rather than 99%.
I do not think this is project specific. I have also had this happen. You can think of it as if next-unit-percentage is set to 99, once 99% is complete, the next work unit download will start. If you want the next work unit to be downloaded once the unit reaches 99%, you must set next-unit-percentage to 98. I also thought that next-unit-percentage was not working when I saw this, but after changing the parameter to something else, I saw how it worked.

Re: 70XX project didn't support checkpoint parameter

Posted: Fri Aug 17, 2012 7:55 pm
by 7im
Various fahcores follow the checkpoint request differently. Some fahcores save a check point at every %, so they don't save any extra checkpoints on a timed basis. Still others work very much inline with timed checkpoints. YMMV.

As to the 99% vs. 100%, that's a similar fahcore related issue with V7, and has been documented.

Re: 70XX project didn't support checkpoint parameter

Posted: Fri Aug 17, 2012 8:53 pm
by Nathan_P
There's no need to have your checkpoints set so low, if your machine crashes or loses power whilst writing the checkpoint the WU will be corrupted and lost. 15 minutes is the default and I run mine at 30.

Regarding next unit %age. I have noticed on 80xx projects that they are that quick that if you set your %age to 99 the current WU will be finished before the download is finished - nothing is wrong its just normal. On a 764x project you will have a good 10 minutes to finish the download before you hit 100%