Feature Request: Pause at next checkpoint

Moderators: Site Moderators, FAHC Science Team

Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Feature Request: Pause at next checkpoint

Post by Crawdaddy79 »

Hello Folding Team,

I use the pause feature a lot. Sometimes because I want to do something else on my PC that requires cycles, sometimes because I just want this corner in my basement to cool down. Either way, I would like a better way to pause progress so that X amount of work isn't lost and has to be redone. Currently I have the CPs set to five minutes, but even 4:59 worth of waste seems unnecessary to me.

I would like for another button to be added, "Pause at next checkpoint". This would encompass the checkpoints of every folding slot.

This way I could click the button and futz around until everything is paused, and then do my thing.

Also there's a whole "my PC crashes when it runs at 100% for too long and it comes back reporting BAD WORK_UNIT" story that I'm not going to go into too much detail about.
Image
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Feature Request: Pause at next checkpoint

Post by iceman1992 »

Agreed, I would like this too.
Frogging101
Posts: 78
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: Feature Request: Pause at next checkpoint

Post by Frogging101 »

I think it checkpoints when you pause it. The UI will go back to the last whole percentage point, but when you unpause it the "Completed steps" will show that you've kept your progress since then.
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Feature Request: Pause at next checkpoint

Post by Crawdaddy79 »

I wish that were the case, Frogging101, but check out meh logfile of when I arbitrarily paused, waited a minute, then unpaused.

0x22 went from 68% to 65% after unpausing.

Code: Select all

03:15:19:WU00:FS00:0xa7:Completed 15000 out of 125000 steps (12%)
03:15:43:WU01:FS01:0x22:Completed 670000 out of 1000000 steps (67%)
03:16:22:WU00:FS00:0xa7:Completed 16250 out of 125000 steps (13%)
03:16:39:WU01:FS01:0x22:Completed 680000 out of 1000000 steps (68%)
03:17:03:FS00:Paused
03:17:03:FS01:Paused
03:17:03:FS00:Shutting core down
03:17:03:FS01:Shutting core down
03:17:03:WU01:FS01:0x22:WARNING:Console control signal 1 on PID 3584
03:17:03:WU00:FS00:0xa7:WARNING:Console control signal 1 on PID 16184
03:17:03:WU01:FS01:0x22:Exiting, please wait. . .
03:17:03:WU00:FS00:0xa7:Exiting, please wait. . .
03:17:04:WU01:FS01:0x22:Folding@home Core Shutdown: INTERRUPTED
03:17:04:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
03:17:05:Removing old file 'configs/config-20200410-193411.xml'
03:17:05:Saving configuration to config.xml
03:17:05:<config>
03:17:05:  <!-- Folding Core -->
03:17:05:  <checkpoint v='5'/>
03:17:05:
03:17:05:  <!-- Network -->
03:17:05:  <proxy v=':8080'/>
03:17:05:
03:17:05:  <!-- Slot Control -->
03:17:05:  <power v='MEDIUM'/>
03:17:05:
03:17:05:  <!-- User Information -->
03:17:05:  <passkey v='********************************'/>
03:17:05:  <team v='64'/>
03:17:05:  <user v='Crawdaddy79'/>
03:17:05:
03:17:05:  <!-- Folding Slots -->
03:17:05:  <slot id='0' type='CPU'>
03:17:05:    <paused v='true'/>
03:17:05:  </slot>
03:17:05:  <slot id='1' type='GPU'>
03:17:05:    <paused v='true'/>
03:17:05:  </slot>
03:17:05:</config>
03:17:05:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
03:17:05:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
03:17:21:FS00:Unpaused
03:17:21:FS01:Unpaused
03:17:21:WU01:FS01:Starting
03:17:21:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 10740 -checkpoint 5 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
03:17:21:WU01:FS01:Started FahCore on PID 1776
03:17:21:WU01:FS01:Core PID:14812
03:17:21:WU01:FS01:FahCore 0x22 started
03:17:21:WU01:FS01:0x22:*********************** Log Started 2020-04-11T03:17:21Z ***********************
03:17:21:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
03:17:21:WU01:FS01:0x22:       Type: 0x22
03:17:21:WU01:FS01:0x22:       Core: Core22
03:17:21:WU01:FS01:0x22:    Website: https://foldingathome.org/
03:17:21:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
03:17:21:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
03:17:21:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
03:17:21:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 1776 -checkpoint 5
03:17:21:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
03:17:21:WU01:FS01:0x22:     Config: <none>
03:17:21:WU01:FS01:0x22:************************************ Build *************************************
03:17:21:WU01:FS01:0x22:    Version: 0.0.2
03:17:21:WU01:FS01:0x22:       Date: Dec 6 2019
03:17:21:WU01:FS01:0x22:       Time: 21:30:31
03:17:21:WU01:FS01:0x22: Repository: Git
03:17:21:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
03:17:21:WU01:FS01:0x22:     Branch: HEAD
03:17:21:WU01:FS01:0x22:   Compiler: Visual C++ 2008
03:17:21:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:17:21:WU01:FS01:0x22:   Platform: win32 10
03:17:21:WU01:FS01:0x22:       Bits: 64
03:17:21:WU01:FS01:0x22:       Mode: Release
03:17:21:WU01:FS01:0x22:************************************ System ************************************
03:17:21:WU01:FS01:0x22:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
03:17:21:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
03:17:21:WU01:FS01:0x22:       CPUs: 16
03:17:21:WU01:FS01:0x22:     Memory: 31.95GiB
03:17:21:WU01:FS01:0x22:Free Memory: 23.59GiB
03:17:21:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
03:17:21:WU01:FS01:0x22: OS Version: 6.2
03:17:21:WU01:FS01:0x22:Has Battery: false
03:17:21:WU01:FS01:0x22: On Battery: false
03:17:21:WU01:FS01:0x22: UTC Offset: -4
03:17:21:WU01:FS01:0x22:        PID: 14812
03:17:21:WU01:FS01:0x22:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
03:17:21:WU01:FS01:0x22:         OS: Windows 10 Home
03:17:21:WU01:FS01:0x22:    OS Arch: AMD64
03:17:21:WU01:FS01:0x22:********************************************************************************
03:17:21:WU01:FS01:0x22:Project: 11745 (Run 0, Clone 2225, Gen 26)
03:17:21:WU01:FS01:0x22:Unit: 0x000000388ca304f15e67f104dec31f90
03:17:21:WU01:FS01:0x22:Digital signatures verified
03:17:21:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
03:17:21:WU01:FS01:0x22:Version 0.0.2
03:17:22:WU01:FS01:0x22:  Found a checkpoint file
03:17:22:WU00:FS00:Starting
03:17:22:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 00 -suffix 01 -version 705 -lifeline 10740 -checkpoint 5 -np 14
03:17:22:WU00:FS00:Started FahCore on PID 7020
03:17:22:WU00:FS00:Core PID:15444
03:17:22:WU00:FS00:FahCore 0xa7 started
03:17:22:WU00:FS00:0xa7:*********************** Log Started 2020-04-11T03:17:22Z ***********************
03:17:22:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
03:17:22:WU00:FS00:0xa7:       Type: 0xa7
03:17:22:WU00:FS00:0xa7:       Core: Gromacs
03:17:22:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 7020 -checkpoint 5 -np 14
03:17:22:WU00:FS00:0xa7:************************************ CBang *************************************
03:17:22:WU00:FS00:0xa7:       Date: Oct 26 2019
03:17:22:WU00:FS00:0xa7:       Time: 01:38:25
03:17:22:WU00:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
03:17:22:WU00:FS00:0xa7:     Branch: master
03:17:22:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
03:17:22:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:17:22:WU00:FS00:0xa7:   Platform: win32 10
03:17:22:WU00:FS00:0xa7:       Bits: 64
03:17:22:WU00:FS00:0xa7:       Mode: Release
03:17:22:WU00:FS00:0xa7:************************************ System ************************************
03:17:22:WU00:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
03:17:22:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
03:17:22:WU00:FS00:0xa7:       CPUs: 16
03:17:22:WU00:FS00:0xa7:     Memory: 31.95GiB
03:17:22:WU00:FS00:0xa7:Free Memory: 23.53GiB
03:17:22:WU00:FS00:0xa7:    Threads: WINDOWS_THREADS
03:17:22:WU00:FS00:0xa7: OS Version: 6.2
03:17:22:WU00:FS00:0xa7:Has Battery: false
03:17:22:WU00:FS00:0xa7: On Battery: false
03:17:22:WU00:FS00:0xa7: UTC Offset: -4
03:17:22:WU00:FS00:0xa7:        PID: 15444
03:17:22:WU00:FS00:0xa7:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
03:17:22:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
03:17:22:WU00:FS00:0xa7:    Version: 0.0.18
03:17:22:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
03:17:22:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
03:17:22:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
03:17:22:WU00:FS00:0xa7:       Date: Oct 26 2019
03:17:22:WU00:FS00:0xa7:       Time: 01:52:30
03:17:22:WU00:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
03:17:22:WU00:FS00:0xa7:     Branch: master
03:17:22:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
03:17:22:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:17:22:WU00:FS00:0xa7:   Platform: win32 10
03:17:22:WU00:FS00:0xa7:       Bits: 64
03:17:22:WU00:FS00:0xa7:       Mode: Release
03:17:22:WU00:FS00:0xa7:************************************ Build *************************************
03:17:22:WU00:FS00:0xa7:       SIMD: avx_256
03:17:22:WU00:FS00:0xa7:********************************************************************************
03:17:22:WU00:FS00:0xa7:Project: 13870 (Run 0, Clone 529, Gen 59)
03:17:22:WU00:FS00:0xa7:Unit: 0x000000440d5262775e764918e9059201
03:17:22:WU00:FS00:0xa7:Digital signatures verified
03:17:22:WU00:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
03:17:22:WU00:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
03:17:22:WU00:FS00:0xa7:Calling: mdrun -s frame59.tpr -o frame59.trr -x frame59.xtc -e frame59.edr -cpi state.cpt -cpt 5 -nt 12
03:17:22:WU00:FS00:0xa7:Steps: first=7375000 total=125000
03:17:24:WU00:FS00:0xa7:Completed 17072 out of 125000 steps (13%)
03:17:36:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
03:17:41:WU01:FS01:0x22:Completed 650000 out of 1000000 steps (65%)
03:17:41:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:17:46:WU00:FS00:0xa7:Completed 17500 out of 125000 steps (14%)
03:18:06:Removing old file 'configs/config-20200410-204117.xml'
03:18:06:Saving configuration to config.xml
03:18:06:<config>
03:18:06:  <!-- Folding Core -->
03:18:06:  <checkpoint v='5'/>
03:18:06:
03:18:06:  <!-- Network -->
03:18:06:  <proxy v=':8080'/>
03:18:06:
03:18:06:  <!-- Slot Control -->
03:18:06:  <power v='MEDIUM'/>
03:18:06:
03:18:06:  <!-- User Information -->
03:18:06:  <passkey v='********************************'/>
03:18:06:  <team v='64'/>
03:18:06:  <user v='Crawdaddy79'/>
03:18:06:
03:18:06:  <!-- Folding Slots -->
03:18:06:  <slot id='0' type='CPU'/>
03:18:06:  <slot id='1' type='GPU'/>
03:18:06:</config>
03:18:38:WU01:FS01:0x22:Completed 660000 out of 1000000 steps (66%)
03:18:49:WU00:FS00:0xa7:Completed 18750 out of 125000 steps (15%)
Image
Frogging101
Posts: 78
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: Feature Request: Pause at next checkpoint

Post by Frogging101 »

Interesting. Your CPU slot did save, though:

Code: Select all

03:16:22:WU00:FS00:0xa7:Completed 16250 out of 125000 steps (13%)
03:17:03:FS00:Paused
[...]
03:17:21:FS00:Unpaused
03:17:22:WU00:FS00:Starting
03:17:22:WU00:FS00:0xa7:Steps: first=7375000 total=125000
03:17:24:WU00:FS00:0xa7:Completed 17072 out of 125000 steps (13%)
Also, I've not seen a WU lose more than 1% before. I thought it at least checkpointed at every %. How odd.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Feature Request: Pause at next checkpoint

Post by Joe_H »

GPU WUs pause at set percentages that are set by the researcher when the project is set up. Depending on the project, typical values are every 2-5%.

CPU WUs running on the A7 core write out a checkpoint at the time interval set in the client through FAHControl. Depending on how it is paused, the A7 core will attempt to write a checkpoint then as well.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Frogging101
Posts: 78
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: Feature Request: Pause at next checkpoint

Post by Frogging101 »

Joe_H wrote: GPU WUs pause at set percentages that are set by the researcher when the project is set up. Depending on the project, typical values are every 2-5%.
That figures, actually, since GPU computing generally processes data in large chunks for efficiency.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Feature Request: Pause at next checkpoint

Post by PantherX »

Crawdaddy79 wrote:...Sometimes because I want to do something else on my PC that requires cycles...
Generally speaking, if you're using the CPU, folding would hardly impact whatever tasks you're running on your CPU as folding priority is very low while most other tasks are set at a higher priority. I understand if wanted to pause the GPU slot since you may encounter screen lag. You can reduce/eliminate the screen lag by disabling the hardware acceleration on the application (if it's supported) and/or disable Windows animation.
Crawdaddy79 wrote:..."my PC crashes when it runs at 100% for too long and it comes back reporting BAD WORK_UNIT" story that I'm not going to go into too much detail about.
That's an indication of something else. I have folded on my CPU at 100% for months without issues. The only time I would restart was due to the monthly Windows updates and apart from that, it would fold day and night without crashing. If you would like us to investigate, please do share your log and as much details as possible about your system setup and usage.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Feature Request: Pause at next checkpoint

Post by iceman1992 »

The other day I needed to pause the GPU slot for a minute because I needed to move the electrical plug for the laptop (old battery not strong enough to sustain a full load), and I lost around 45 minutes of work
foldinghomealone2
Posts: 146
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

Hence my! general recommendation: never ever pause a GPU slot.
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Feature Request: Pause at next checkpoint

Post by iceman1992 »

foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Feature Request: Pause at next checkpoint

Post by Crawdaddy79 »

PantherX wrote:
Crawdaddy79 wrote:..."my PC crashes when it runs at 100% for too long and it comes back reporting BAD WORK_UNIT" story that I'm not going to go into too much detail about.
That's an indication of something else. I have folded on my CPU at 100% for months without issues. The only time I would restart was due to the monthly Windows updates and apart from that, it would fold day and night without crashing. If you would like us to investigate, please do share your log and as much details as possible about your system setup and usage.
Agree about the CPU impact being minimal and 100% reliable when pegged at 100%. It's the GPU that I have issues with. It seems to become unstable if it runs at 100% for too long (60 - 90 minutes OK, 120+ minutes, not OK) - but that's digressing.
Joe_H wrote:GPU WUs pause at set percentages that are set by the researcher when the project is set up. Depending on the project, typical values are every 2-5%.

CPU WUs running on the A7 core write out a checkpoint at the time interval set in the client through FAHControl. Depending on how it is paused, the A7 core will attempt to write a checkpoint then as well.
This is good info. Thank you.
Image
uyaem
Posts: 219
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: Feature Request: Pause at next checkpoint

Post by uyaem »

iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
A log notification about hitting a savepoint would be cool. :)
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Feature Request: Pause at next checkpoint

Post by iceman1992 »

uyaem wrote:
iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
A log notification about hitting a savepoint would be cool. :)
That would be (I would guess) the easiest update that can solve this problem
Rel25917
Posts: 303
Joined: Wed Aug 15, 2012 2:31 am

Re: Feature Request: Pause at next checkpoint

Post by Rel25917 »

I haven't looked at any of the new covid projects but before that when I checked 2.5% was pretty much normal, a safe bet would be to stop after multiples of 5%.
Post Reply