Page 1 of 2

Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 12:29 pm
by Peter_Hucker
Task jumps to 70% immediately - is this normal?

Code: Select all

12:18:52:FS04:Unpaused
12:18:52:WU00:FS04:Starting
12:18:52:WU00:FS04:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.20/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 12472 -checkpoint 15 -opencl-platform 0 -opencl-device 1 -gpu-vendor amd -gpu 1 -gpu-usage 100
12:18:52:WU00:FS04:Started FahCore on PID 18764
12:18:52:WU00:FS04:Core PID:15964
12:18:52:WU00:FS04:FahCore 0x22 started
12:18:52:WU00:FS04:0x22:*********************** Log Started 2022-02-22T12:18:52Z ***********************
12:18:52:WU00:FS04:0x22:*************************** Core22 Folding@home Core ***************************
12:18:52:WU00:FS04:0x22:       Core: Core22
12:18:52:WU00:FS04:0x22:       Type: 0x22
12:18:52:WU00:FS04:0x22:    Version: 0.0.20
12:18:52:WU00:FS04:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:18:52:WU00:FS04:0x22:  Copyright: 2020 foldingathome.org
12:18:52:WU00:FS04:0x22:   Homepage: https://foldingathome.org/
12:18:52:WU00:FS04:0x22:       Date: Jan 20 2022
12:18:52:WU00:FS04:0x22:       Time: 01:15:36
12:18:52:WU00:FS04:0x22:   Revision: 3f211b8a4346514edbff34e3cb1c0e0ec951373c
12:18:52:WU00:FS04:0x22:     Branch: HEAD
12:18:52:WU00:FS04:0x22:   Compiler: Visual C++
12:18:52:WU00:FS04:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
12:18:52:WU00:FS04:0x22:             -DOPENMM_VERSION="\"7.7.0\""
12:18:52:WU00:FS04:0x22:   Platform: win32 10
12:18:52:WU00:FS04:0x22:       Bits: 64
12:18:52:WU00:FS04:0x22:       Mode: Release
12:18:52:WU00:FS04:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
12:18:52:WU00:FS04:0x22:             <peastman@stanford.edu>
12:18:52:WU00:FS04:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 18764 -checkpoint 15
12:18:52:WU00:FS04:0x22:             -opencl-platform 0 -opencl-device 1 -gpu-vendor amd -gpu 1
12:18:52:WU00:FS04:0x22:             -gpu-usage 100
12:18:52:WU00:FS04:0x22:************************************ libFAH ************************************
12:18:52:WU00:FS04:0x22:       Date: Jan 20 2022
12:18:52:WU00:FS04:0x22:       Time: 01:14:17
12:18:52:WU00:FS04:0x22:   Revision: 9f4ad694e75c2350d4bb6b8b5b769ba27e483a2f
12:18:52:WU00:FS04:0x22:     Branch: HEAD
12:18:52:WU00:FS04:0x22:   Compiler: Visual C++
12:18:52:WU00:FS04:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
12:18:52:WU00:FS04:0x22:   Platform: win32 10
12:18:52:WU00:FS04:0x22:       Bits: 64
12:18:52:WU00:FS04:0x22:       Mode: Release
12:18:52:WU00:FS04:0x22:************************************ CBang *************************************
12:18:52:WU00:FS04:0x22:       Date: Jan 20 2022
12:18:52:WU00:FS04:0x22:       Time: 01:13:20
12:18:52:WU00:FS04:0x22:   Revision: ab023d155b446906d55b0f6c9a1eedeea04f7a1a
12:18:52:WU00:FS04:0x22:     Branch: HEAD
12:18:52:WU00:FS04:0x22:   Compiler: Visual C++
12:18:52:WU00:FS04:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
12:18:52:WU00:FS04:0x22:   Platform: win32 10
12:18:52:WU00:FS04:0x22:       Bits: 64
12:18:52:WU00:FS04:0x22:       Mode: Release
12:18:52:WU00:FS04:0x22:************************************ System ************************************
12:18:52:WU00:FS04:0x22:        CPU: AMD Ryzen 9 3900XT 12-Core Processor
12:18:52:WU00:FS04:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
12:18:52:WU00:FS04:0x22:       CPUs: 24
12:18:52:WU00:FS04:0x22:     Memory: 63.93GiB
12:18:52:WU00:FS04:0x22:Free Memory: 48.96GiB
12:18:52:WU00:FS04:0x22:    Threads: WINDOWS_THREADS
12:18:52:WU00:FS04:0x22: OS Version: 6.2
12:18:52:WU00:FS04:0x22:Has Battery: true
12:18:52:WU00:FS04:0x22: On Battery: false
12:18:52:WU00:FS04:0x22: UTC Offset: 0
12:18:52:WU00:FS04:0x22:        PID: 15964
12:18:52:WU00:FS04:0x22:        CWD: C:\ProgramData\FAHClient\work
12:18:52:WU00:FS04:0x22:************************************ OpenMM ************************************
12:18:52:WU00:FS04:0x22:    Version: 7.7.0
12:18:52:WU00:FS04:0x22:********************************************************************************
12:18:52:WU00:FS04:0x22:Project: 17258 (Run 1596, Clone 0, Gen 23)
12:18:52:WU00:FS04:0x22:Digital signatures verified
12:18:52:WU00:FS04:0x22:Folding@home GPU Core22 Folding@home Core
12:18:52:WU00:FS04:0x22:Version 0.0.20
12:18:53:WU00:FS04:0x22:  Checkpoint write interval: 10000 steps (2%) [50 total]
12:18:53:WU00:FS04:0x22:  JSON viewer frame write interval: 5000 steps (1%) [100 total]
12:18:53:WU00:FS04:0x22:  XTC frame write interval: 50000 steps (10%) [10 total]
12:18:53:WU00:FS04:0x22:  Global context and integrator variables write interval: disabled
12:18:53:WU00:FS04:0x22:There are 3 platforms available.
12:18:53:WU00:FS04:0x22:Platform 0: Reference
12:18:53:WU00:FS04:0x22:Platform 1: CPU
12:18:53:WU00:FS04:0x22:Platform 2: OpenCL
12:18:53:WU00:FS04:0x22:  opencl-device 1 specified
12:20:17:WU00:FS04:0x22:Attempting to create OpenCL context:
12:20:17:WU00:FS04:0x22:  Configuring platform OpenCL
12:20:36:WU00:FS04:0x22:  Using OpenCL on platformId 0 and gpu 1
12:20:37:WU00:FS04:0x22:Completed 350000 out of 500000 steps (70%)
12:28:02:WU00:FS04:0x22:Completed 355000 out of 500000 steps (71%)

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 12:49 pm
by gunnarre
Yes, it is resuming from a checkpoint. Depending on the project, GPU projects may checkpoint every 2% or 5% of work. This is set by the researcher.
CPU projects checkpoint based on time - it is possible to adjust this in the advanced settings, but I'm not sure if the new CPU cores respect this setting.

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 2:36 pm
by Peter_Hucker
But this was a freshly downloaded task. Can someone 70% complete it and upload their checkpoint, so I get given a half done task?

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 2:56 pm
by gunnarre
I have never heard of that happening. What can happen, though, is that if your client already has downloaded a new WU - for a second GPU or because you're almost finished with a WU, and you pause the client, it will start folding the newest WU instead of completing the one you've already started. If that happened with the last WU, you will continue with the old WU once the new WU has been done. It is rare, but I've seen it happen after a pausing or failed WU.

Your log shows that the client was unpaused though.

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 3:06 pm
by Peter_Hucker
I'd set it to "finish" yesterday, intending to go do Milkyway@Home on that GPU today. But Milkyway@Home is down at the moment, so I set it to "fold". Therefore it had no tasks, then downloaded one. It shot to 70% immediately, and has been progressing normally since then.

The log doesn't show anything being downloaded, which is weird, I watched it do it!

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 3:25 pm
by gunnarre
Do you have two clients working in the same folder, perhaps?

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 3:30 pm
by Peter_Hucker
No, one client on each of 7 PCs. They're set up in the normal way, just linked to one to observe using HFM.

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 5:26 pm
by Joe_H
The necessary information to determine what happened would either be in the log section prior to the "Unpaused" message or in a previous log file. Somehow you had an unfinished WU with a valid checkpoint for the client to start processing when you started the client up.

If you are looking at the log file through FAHControl's log window, you can use Refresh to reload the log from the beginning. Removing the check on "Follow" will keep it from scrolling as new log entries are added. You can also open the log files in just about any text editor, the current log should be read only and locked from updating by an editor.

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 5:41 pm
by Peter_Hucker
I'm really confused now. I've looked back at the logs and it looks like I paused it from just before 9am to midday, but don't remember doing so. The log I posted was the wrong 12 o'clock. My brain is currently operating on a 32 hour day, so I don't know what bloody day it is sometimes. I'll put it down to user error.

EDIT: Found what happened! I looked through the Boinc logs and found that just before 9am, I restarted the computer. I think due to testing USB sticks was making it sticky. Because I've set FAH to start paused (so it doesn't get new tasks for a slot I want semi-permanently off), it did so. At midday the confusion arised because I thought it had finished and I'd forgotten to give it another task, so I pressed fold and it continued whereas I thought it had got a new task.

Could FAH please continue in the same state after a restart? There seems to be no way to tell it to do this.

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 7:25 pm
by aetch
Thought - rather than have your machines switching between BOINC and FAH, why not dedicate some machines to BOINC and other machines to FAH.
Keep it simple.

Re: Task jumps to 70% immediately

Posted: Tue Feb 22, 2022 7:43 pm
by Joe_H
Peter_Hucker wrote:Could FAH please continue in the same state after a restart? There seems to be no way to tell it to do this.
This is one of those cases where they can't please everyone. The client used to start up after a reboot except if you used the 'pause-on-start' option set to false even after pausing before the reboot. Many people complained, and that was changed to make the 'pause' sticky in a later version. There is also a corner case that resulted in the 'finish' setting working the way it does.

The state logic may be getting reworked for the next version. A beta of version 8 which is mostly being written from the ground up from what I understand should be out later this year. Exactly when will depend on getting all of the pieces coded and tested working together.

Re: Task jumps to 70% immediately

Posted: Wed Feb 23, 2022 6:25 am
by Whompithian
Could FAH please continue in the same state after a restart?
It does, if "pause-on-start" is disabled. Normally, when you pause a slot before shutting down, that slot will remain paused the next time you start the computer. If a slot is running when you shut down, then it will immediately continue running when you turn on the computer. The only state that doesn't carry over on restart is "finish." In that case, it will continue folding after a restart, whether it actually finished or not, and it will continue to fold after the work unit has finished, as though the "finish" request was never made.

"pause-on-start" does exactly what it says - it forces a slot to start in a paused state, whether or not that is how it was left. There is no equivalent to force a slot to start in an active state, but there is a command-line flag to start in a "finish" state.

I find that the most reliable way to shut down F@H is to instruct all slots to finish, wait for them to finish, manually pause them, then stop folding. Fortunately, my folding system runs ~24x6.3, so I don't have to go through all that too often.

Re: Task jumps to 70% immediately

Posted: Wed Feb 23, 2022 7:52 am
by gunnarre
If I'm restarting the machine, I don't finish the WU but I tend to wait until a checkpoint occurs, so every 5% or 2% of a GPU WU.

Re: Task jumps to 70% immediately

Posted: Wed Feb 23, 2022 3:23 pm
by Peter_Hucker
It's very simple, the state should always carry over. Restarting the machine should change nothing.

The reason I switch about is I might want to do a load of work on a particular project, eg. Rosetta which has sporadic work suddenly has some.

Re: Task jumps to 70% immediately

Posted: Wed Feb 23, 2022 3:37 pm
by Neil-B
... and for me it is very simple ... I always want FaH on my kit to default to paused if for some reason there is an unintended interruption (be it the client or wider os/hardware/power related) for a whole bag load of system management reasons.

The hard bit is that there are very many "simples" depending on the folder/scenario ... The current system is what we have ... What will be in place in v8 is at this time unknown/uncertain - given it is a total rewrite with the aim of being fully open source it wouldn't surprise me if pretty much everything changes including the way folders interact/manage work so at this time the only options are to be patient and use the mechanisms currently in place or tbh choose not to use FaH :(