Page 1 of 4

Ability to abort/cancel a workunit?

Posted: Thu Feb 17, 2022 10:05 pm
by Peter_Hucker
All I can find on this is someone asking on github a year ago, and a reason given about cherry picking for points. I don't understand why you give out tasks with different amounts of points. I often find a workunit isn't running fast enough or isn't running at all and there's no easy way to cancel the workunit. I can of course always delete the slot and re-add it, which I'm sure cherry pickers will already be doing. Anyway, what it ends up with me doing if I don't bother reassigning the slot, is giving back a late workunit and have to find another project to run that chip on until it expires.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 7:52 am
by PaulTV
It's possible to choose a cause preference, and thus choose a cause/project with a relatively high number of points. And that's as far as you should go.

Work units are based on the results returned by a previous work unit. So dumping a WU, and having reach its expiration, slows down the research, and we're all into folding to help science (I hope). So that's why you shouldn't dump WUs, from a science perspective.

When you return WUs quickly enough, you get quick return bonus. But you'll have to return at least 80% of the WUs succesfully, otherwise you won't be eligible for the QRB. So from a rewards perspective, you shouldn't drop WUs either.

The teams are trying to have the rewards balanced, including running jobs on reference machines, as far as I understand, but with the fast pace hardware is changing, with GPUs getting more and more cores, with varying sizes of researched molecules, it's very hard to do so.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 9:12 am
by Peter_Hucker
I'm not trying to get more points, I'm trying to do useful work for science, but I'm being hampered by this interface. When I receive a work unit and I can see there's no way I can do it on time, or that work unit is stuck, I should be able to abort it, so someone else can have a go immediately. What on earth is the point in me having to let it expire? Either my computer is doing nothing, or it's going to return it late, either way someone else could have received that unit sooner. Boinc has this working just fine, we're free to stop and start work units as we see fit.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 9:53 am
by aetch
Can you post your log.

The GPUs have been categorised so they should be receiving work units they can complete in time.
CPUs are a bit more problematic, the only way FAH have to allocate appropriate work units is by core/thread count. They don't have a method to differentiate the capabilities of different CPUs, yet.

Also bear in mind that until the first few folds have been processed (2-3%) the ETA should not be trusted.
When the client receives a work unit it bases the initial ETA on one of two things:-
1). previous work units your client has ran for the same project
2). the timeout, if this is the first work unit that your client has received for that project.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 10:07 am
by Peter_Hucker
I've only just started folding so it's quite possible it doesn't have an accurate picture of how fast everything is. Especially as I had it set to get work from any project so it was maybe a different one each time.

So do I have to let it fail to finish one on time so it can work out it's too slow at that one? If I pause it until it's expired, it won't gather that info.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 10:15 am
by aetch
I'm not asking for you to wait to completed/fail a work unit, just the log so far.

I assume you're using the advanced control panel.

Follow these instructions on how to post it -> viewtopic.php?f=24&t=26036#p327412

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 10:19 am
by Peter_Hucker
I meant completing the failed one could teach it not to get ones that are too big for that chip. I assume it will give up when it reaches the deadline? Unlike Boinc which will continue anything it's started.

Not sure which bit of the log you want. I've got 7 computers doing various workunits on CPUs and GPUs, which I've paused and restarted and gone off to do Boinc etc. I've lost track of which ones were annoying me.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 10:20 am
by PaulTV
Peter_Hucker wrote:I'm not trying to get more points, I'm trying to do useful work for science, but I'm being hampered by this interface.
My apologies, I misunderstood your initial question.

I don't know if there'll be an option to ditch a WU in the interface in the next release of the client (it won't be in the current version), but I kinda doubt it, to be honest, for several reasons. It'd require not only client changes, but also server changes (to tell the server that a job won't be finished by that client), and that would open a use can of worms with cherry picking for the wrong reasons. I do hope though that the client will be more intelligent at some point in choosing jobs that can be finished in time.

If WUs are expiring even if you're folding 24/7, it may indicate that the computer/component is too slow. It may help to tune specific settings, to get smaller jobs. When a job is requested, it takes the number of assigned cores/threads into account - large jobs will require a minimum of available threads.

When you tune the number of threads in FAHControl (being called CPUs, not threads), be aware that lowering that number is applied immediately on the running job. Increasing that number is applied only on the next job.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 10:23 am
by Peter_Hucker
I've taken my two slowest computers off FAH, they can run simple Boinc projects.

I've lowered the threads on some, both to allow other things to run ok (like security cameras), and because FAH wasn't using all the cores with one slot, so I split the CPU into 2. That would have caused things to take too long aswell. I'm going to assume eventually it'll get the hang of what size tasks to give me for each chip. The only problem I can see continuing is GPU tasks that get stuck.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 10:34 am
by aetch
The client does not learn from failed work units.

Please be aware that when you change CPU core/thread count for a slot the following happens
down - immediate effect, the work unit will take longer to process
up - wait for the next work unit before taking effect, the current work unit won't process any quicker

If you want help with your folding computer it got to be the logs from those machines.
We can deal with one at a time if necessary.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 11:45 am
by gunnarre
I am particularly concerned about GPU WUs "get stuck". This sounds like a problem with the machines that is happening on, and what I'd most like to see logs from.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 2:55 pm
by Peter_Hucker
aetch wrote:The client does not learn from failed work units.
What do you mean by failed? If a work unit takes too long and is returned late, or cancelled because it's very late, does the client not work out that it shouldn't do jobs that long? How else will it learn?

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 3:15 pm
by Peter_Hucker
gunnarre wrote:I am particularly concerned about GPU WUs "get stuck". This sounds like a problem with the machines that is happening on, and what I'd most like to see logs from.
I think there's something up with the driver. MSI Afterburner and GPU-Z show the GPU at precisely (which is odd) 64% load whether FAH is running or not. It isn't under load as it's generating no heat. But.... if I run Milkyway at Home through Boinc, the load goes up to 99% and the temperature rises. It calculates this project at the expected speed. So two oddities - showing 64% load instead of 0%, and FAH won't calculate while Milkyway at Home will.

I just started the GPU on that machine, and got the following log. It shows two entries because I started it (which downloaded a task), then paused it to see if Boinc made the GPU work, then restarted FAH again. I can't see anything of interest in the log.

******************************* Date: 2022-02-18 *******************************
14:58:23:WU00:FS02:Connecting to assign1.foldingathome.org:80
14:58:24:WU00:FS02:Assigned to work server 128.252.203.11
14:58:24:WU00:FS02:Requesting new work unit for slot 02: gpu:1:0 Tahiti XT [Radeon R9 200/HD 7900/8970] from 128.252.203.11
14:58:24:WU00:FS02:Connecting to 128.252.203.11:8080
14:58:34:WU00:FS02:Downloading 26.57MiB
14:58:40:WU00:FS02:Download 35.99%
14:58:46:WU00:FS02:Download 81.15%
14:58:49:WU00:FS02:Download complete
14:58:49:WU00:FS02:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:18201 run:35807 clone:1 gen:11 core:0x22 unit:0x000000010000000b0000471900008bdf
14:58:49:WU00:FS02:Starting
14:58:49:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.20/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 9384 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
14:58:49:WU00:FS02:Started FahCore on PID 12976
14:58:49:WU00:FS02:Core PID:9704
14:58:49:WU00:FS02:FahCore 0x22 started
14:58:50:WU00:FS02:0x22:*********************** Log Started 2022-02-18T14:58:49Z ***********************
14:58:50:WU00:FS02:0x22:*************************** Core22 Folding@home Core ***************************
14:58:50:WU00:FS02:0x22: Core: Core22
14:58:50:WU00:FS02:0x22: Type: 0x22
14:58:50:WU00:FS02:0x22: Version: 0.0.20
14:58:50:WU00:FS02:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:58:50:WU00:FS02:0x22: Copyright: 2020 foldingathome.org
14:58:50:WU00:FS02:0x22: Homepage: https://foldingathome.org/
14:58:50:WU00:FS02:0x22: Date: Jan 20 2022
14:58:50:WU00:FS02:0x22: Time: 01:15:36
14:58:50:WU00:FS02:0x22: Revision: 3f211b8a4346514edbff34e3cb1c0e0ec951373c
14:58:50:WU00:FS02:0x22: Branch: HEAD
14:58:50:WU00:FS02:0x22: Compiler: Visual C++
14:58:50:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
14:58:50:WU00:FS02:0x22: -DOPENMM_VERSION="\"7.7.0\""
14:58:50:WU00:FS02:0x22: Platform: win32 10
14:58:50:WU00:FS02:0x22: Bits: 64
14:58:50:WU00:FS02:0x22: Mode: Release
14:58:50:WU00:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
14:58:50:WU00:FS02:0x22: <peastman@stanford.edu>
14:58:50:WU00:FS02:0x22: Args: -dir 00 -suffix 01 -version 706 -lifeline 12976 -checkpoint 15
14:58:50:WU00:FS02:0x22: -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0
14:58:50:WU00:FS02:0x22: -gpu-usage 100
14:58:50:WU00:FS02:0x22:************************************ libFAH ************************************
14:58:50:WU00:FS02:0x22: Date: Jan 20 2022
14:58:50:WU00:FS02:0x22: Time: 01:14:17
14:58:50:WU00:FS02:0x22: Revision: 9f4ad694e75c2350d4bb6b8b5b769ba27e483a2f
14:58:50:WU00:FS02:0x22: Branch: HEAD
14:58:50:WU00:FS02:0x22: Compiler: Visual C++
14:58:50:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
14:58:50:WU00:FS02:0x22: Platform: win32 10
14:58:50:WU00:FS02:0x22: Bits: 64
14:58:50:WU00:FS02:0x22: Mode: Release
14:58:50:WU00:FS02:0x22:************************************ CBang *************************************
14:58:50:WU00:FS02:0x22: Date: Jan 20 2022
14:58:50:WU00:FS02:0x22: Time: 01:13:20
14:58:50:WU00:FS02:0x22: Revision: ab023d155b446906d55b0f6c9a1eedeea04f7a1a
14:58:50:WU00:FS02:0x22: Branch: HEAD
14:58:50:WU00:FS02:0x22: Compiler: Visual C++
14:58:50:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
14:58:50:WU00:FS02:0x22: Platform: win32 10
14:58:50:WU00:FS02:0x22: Bits: 64
14:58:50:WU00:FS02:0x22: Mode: Release
14:58:50:WU00:FS02:0x22:************************************ System ************************************
14:58:50:WU00:FS02:0x22: CPU: Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
14:58:50:WU00:FS02:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
14:58:50:WU00:FS02:0x22: CPUs: 6
14:58:50:WU00:FS02:0x22: Memory: 15.84GiB
14:58:50:WU00:FS02:0x22:Free Memory: 10.04GiB
14:58:50:WU00:FS02:0x22: Threads: WINDOWS_THREADS
14:58:50:WU00:FS02:0x22: OS Version: 6.2
14:58:50:WU00:FS02:0x22:Has Battery: false
14:58:50:WU00:FS02:0x22: On Battery: false
14:58:50:WU00:FS02:0x22: UTC Offset: 0
14:58:50:WU00:FS02:0x22: PID: 9704
14:58:50:WU00:FS02:0x22: CWD: C:\ProgramData\FAHClient\work
14:58:50:WU00:FS02:0x22:************************************ OpenMM ************************************
14:58:50:WU00:FS02:0x22: Version: 7.7.0
14:58:50:WU00:FS02:0x22:********************************************************************************
14:58:50:WU00:FS02:0x22:Project: 18201 (Run 35807, Clone 1, Gen 11)
14:58:50:WU00:FS02:0x22:Reading tar file core.xml
14:58:50:WU00:FS02:0x22:Reading tar file integrator.xml
14:58:50:WU00:FS02:0x22:Reading tar file state.xml
14:58:51:WU00:FS02:0x22:Reading tar file system.xml
14:58:52:WU00:FS02:0x22:Digital signatures verified
14:58:52:WU00:FS02:0x22:Folding@home GPU Core22 Folding@home Core
14:58:52:WU00:FS02:0x22:Version 0.0.20
14:58:53:WU00:FS02:0x22: Checkpoint write interval: 25000 steps (2%) [50 total]
14:58:53:WU00:FS02:0x22: JSON viewer frame write interval: 12500 steps (1%) [100 total]
14:58:53:WU00:FS02:0x22: XTC frame write interval: 20000 steps (1.6%) [62 total]
14:58:53:WU00:FS02:0x22: Global context and integrator variables write interval: disabled
14:58:53:WU00:FS02:0x22:There are 3 platforms available.
14:58:53:WU00:FS02:0x22:Platform 0: Reference
14:58:53:WU00:FS02:0x22:Platform 1: CPU
14:58:53:WU00:FS02:0x22:Platform 2: OpenCL
14:58:53:WU00:FS02:0x22: opencl-device 0 specified
14:59:17:WU00:FS02:0x22:Attempting to create OpenCL context:
14:59:17:WU00:FS02:0x22: Configuring platform OpenCL
14:59:40:WU00:FS02:0x22: Using OpenCL on platformId 1 and gpu 0
14:59:40:WU00:FS02:0x22:Completed 0 out of 1250000 steps (0%)
14:59:42:WU00:FS02:0x22:Checkpoint completed at step 0
15:01:07:FS02:Paused
15:01:07:FS02:Shutting core down
15:01:07:WU00:FS02:0x22:WARNING:Console control signal 1 on PID 9704
15:01:07:WU00:FS02:0x22:Exiting, please wait. . .
15:01:07:WU00:FS02:0x22:Folding@home Core Shutdown: INTERRUPTED
15:01:10:WU00:FS02:FahCore returned: INTERRUPTED (102 = 0x66)
15:01:52:Removing old file 'configs/config-20220216-005625.xml'
15:01:52:Saving configuration to config.xml
15:01:52:<config>
15:01:52: <!-- Folding Slot Configuration -->
15:01:52: <cause v='COVID_19'/>
15:01:52:
15:01:52: <!-- Network -->
15:01:52: <proxy v=':8080'/>
15:01:52:
15:01:52: <!-- Remote Command Server -->
15:01:52: <password v='*****'/>
15:01:52:
15:01:52: <!-- Slot Control -->
15:01:52: <power v='FULL'/>
15:01:52:
15:01:52: <!-- User Information -->
15:01:52: <passkey v='*****'/>
15:01:52: <team v='224497'/>
15:01:52: <user v='PeterHucker_1HK9mWMp2xTK3f7fjowi1mCCbczu2EgFyR'/>
15:01:52:
15:01:52: <!-- Folding Slots -->
15:01:52: <slot id='0' type='CPU'>
15:01:52: <cpus v='5'/>
15:01:52: </slot>
15:01:52: <slot id='2' type='GPU'>
15:01:52: <paused v='true'/>
15:01:52: <pci-bus v='1'/>
15:01:52: <pci-slot v='0'/>
15:01:52: </slot>
15:01:52:</config>
15:05:10:FS02:Unpaused
15:05:10:WU00:FS02:Starting
15:05:10:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.20/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 9384 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
15:05:10:WU00:FS02:Started FahCore on PID 12676
15:05:10:WU00:FS02:Core PID:12792
15:05:10:WU00:FS02:FahCore 0x22 started
15:05:10:WU00:FS02:0x22:*********************** Log Started 2022-02-18T15:05:10Z ***********************
15:05:10:WU00:FS02:0x22:*************************** Core22 Folding@home Core ***************************
15:05:10:WU00:FS02:0x22: Core: Core22
15:05:10:WU00:FS02:0x22: Type: 0x22
15:05:10:WU00:FS02:0x22: Version: 0.0.20
15:05:10:WU00:FS02:0x22: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:05:10:WU00:FS02:0x22: Copyright: 2020 foldingathome.org
15:05:10:WU00:FS02:0x22: Homepage: https://foldingathome.org/
15:05:10:WU00:FS02:0x22: Date: Jan 20 2022
15:05:10:WU00:FS02:0x22: Time: 01:15:36
15:05:10:WU00:FS02:0x22: Revision: 3f211b8a4346514edbff34e3cb1c0e0ec951373c
15:05:10:WU00:FS02:0x22: Branch: HEAD
15:05:10:WU00:FS02:0x22: Compiler: Visual C++
15:05:10:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
15:05:10:WU00:FS02:0x22: -DOPENMM_VERSION="\"7.7.0\""
15:05:10:WU00:FS02:0x22: Platform: win32 10
15:05:10:WU00:FS02:0x22: Bits: 64
15:05:10:WU00:FS02:0x22: Mode: Release
15:05:10:WU00:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
15:05:10:WU00:FS02:0x22: <peastman@stanford.edu>
15:05:10:WU00:FS02:0x22: Args: -dir 00 -suffix 01 -version 706 -lifeline 12676 -checkpoint 15
15:05:10:WU00:FS02:0x22: -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0
15:05:10:WU00:FS02:0x22: -gpu-usage 100
15:05:10:WU00:FS02:0x22:************************************ libFAH ************************************
15:05:10:WU00:FS02:0x22: Date: Jan 20 2022
15:05:10:WU00:FS02:0x22: Time: 01:14:17
15:05:10:WU00:FS02:0x22: Revision: 9f4ad694e75c2350d4bb6b8b5b769ba27e483a2f
15:05:10:WU00:FS02:0x22: Branch: HEAD
15:05:10:WU00:FS02:0x22: Compiler: Visual C++
15:05:10:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
15:05:10:WU00:FS02:0x22: Platform: win32 10
15:05:10:WU00:FS02:0x22: Bits: 64
15:05:10:WU00:FS02:0x22: Mode: Release
15:05:10:WU00:FS02:0x22:************************************ CBang *************************************
15:05:10:WU00:FS02:0x22: Date: Jan 20 2022
15:05:10:WU00:FS02:0x22: Time: 01:13:20
15:05:10:WU00:FS02:0x22: Revision: ab023d155b446906d55b0f6c9a1eedeea04f7a1a
15:05:10:WU00:FS02:0x22: Branch: HEAD
15:05:10:WU00:FS02:0x22: Compiler: Visual C++
15:05:10:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
15:05:10:WU00:FS02:0x22: Platform: win32 10
15:05:10:WU00:FS02:0x22: Bits: 64
15:05:10:WU00:FS02:0x22: Mode: Release
15:05:10:WU00:FS02:0x22:************************************ System ************************************
15:05:10:WU00:FS02:0x22: CPU: Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
15:05:10:WU00:FS02:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
15:05:10:WU00:FS02:0x22: CPUs: 6
15:05:10:WU00:FS02:0x22: Memory: 15.84GiB
15:05:10:WU00:FS02:0x22:Free Memory: 9.97GiB
15:05:10:WU00:FS02:0x22: Threads: WINDOWS_THREADS
15:05:10:WU00:FS02:0x22: OS Version: 6.2
15:05:10:WU00:FS02:0x22:Has Battery: false
15:05:10:WU00:FS02:0x22: On Battery: false
15:05:10:WU00:FS02:0x22: UTC Offset: 0
15:05:10:WU00:FS02:0x22: PID: 12792
15:05:10:WU00:FS02:0x22: CWD: C:\ProgramData\FAHClient\work
15:05:10:WU00:FS02:0x22:************************************ OpenMM ************************************
15:05:10:WU00:FS02:0x22: Version: 7.7.0
15:05:10:WU00:FS02:0x22:********************************************************************************
15:05:10:WU00:FS02:0x22:Project: 18201 (Run 35807, Clone 1, Gen 11)
15:05:10:WU00:FS02:0x22:Digital signatures verified
15:05:10:WU00:FS02:0x22:Folding@home GPU Core22 Folding@home Core
15:05:10:WU00:FS02:0x22:Version 0.0.20
15:05:11:WU00:FS02:0x22: Checkpoint write interval: 25000 steps (2%) [50 total]
15:05:11:WU00:FS02:0x22: JSON viewer frame write interval: 12500 steps (1%) [100 total]
15:05:11:WU00:FS02:0x22: XTC frame write interval: 20000 steps (1.6%) [62 total]
15:05:11:WU00:FS02:0x22: Global context and integrator variables write interval: disabled
15:05:11:WU00:FS02:0x22:There are 3 platforms available.
15:05:11:WU00:FS02:0x22:Platform 0: Reference
15:05:11:WU00:FS02:0x22:Platform 1: CPU
15:05:11:WU00:FS02:0x22:Platform 2: OpenCL
15:05:11:WU00:FS02:0x22: opencl-device 0 specified
15:05:36:WU00:FS02:0x22:Attempting to create OpenCL context:
15:05:36:WU00:FS02:0x22: Configuring platform OpenCL
15:05:56:Removing old file 'configs/config-20220216-013518.xml'
15:05:56:Saving configuration to config.xml
15:05:56:<config>
15:05:56: <!-- Folding Slot Configuration -->
15:05:56: <cause v='COVID_19'/>
15:05:56:
15:05:56: <!-- Network -->
15:05:56: <proxy v=':8080'/>
15:05:56:
15:05:56: <!-- Remote Command Server -->
15:05:56: <password v='*****'/>
15:05:56:
15:05:56: <!-- Slot Control -->
15:05:56: <power v='FULL'/>
15:05:56:
15:05:56: <!-- User Information -->
15:05:56: <passkey v='*****'/>
15:05:56: <team v='224497'/>
15:05:56: <user v='PeterHucker_1HK9mWMp2xTK3f7fjowi1mCCbczu2EgFyR'/>
15:05:56:
15:05:56: <!-- Folding Slots -->
15:05:56: <slot id='0' type='CPU'>
15:05:56: <cpus v='5'/>
15:05:56: </slot>
15:05:56: <slot id='2' type='GPU'>
15:05:56: <pci-bus v='1'/>
15:05:56: <pci-slot v='0'/>
15:05:56: </slot>
15:05:56:</config>
15:05:58:WU00:FS02:0x22: Using OpenCL on platformId 1 and gpu 0
15:05:58:WU00:FS02:0x22:Completed 0 out of 1250000 steps (0%)
15:06:00:WU00:FS02:0x22:Checkpoint completed at step 0

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 4:36 pm
by aetch
Peter_Hucker wrote:What do you mean by failed? If a work unit takes too long and is returned late, or cancelled because it's very late, does the client not work out that it shouldn't do jobs that long? How else will it learn?
The client does not learn, it is not designed to.
It's designed to tell the server what resources are available and it's the server that sends work units that has been programmed appropriate for the offered hardware.
It can and will dump a work unit if it's causing too many errors/crashes, if it expires or some other reason that I don't know.
None of that is taken into account when requesting the next work unit.

If your GPU is too slow then your folding rig has an issue
FAH maintain a list of GPUs and roughly breakdown their capabilities.
Prior to release projects are tested on a range of hardware so targeting can be refined when released to full FAH.
Your GPU should be getting assigned work units it is capable of completing within the timeout.
Typically FAH will blacklist GPUs which are too old or deemed not capable so they don't receive any work units.

If your CPU is too slow then it's up to you (the donor/user) to diagnose it, with help if needed, modify your config to get lesser work units or possibly withdraw it from service.
FAH do not maintain a CPU list. They assume all CPU slots of the same size have the same capability and performance, within reason.

That said, there are edge cases which slip through the gaps and the forums right here are the right place to air those issues.

Re: Ability to abort/cancel a workunit?

Posted: Fri Feb 18, 2022 4:53 pm
by Peter_Hucker
My computers work perfectly fine. They run full speed, no overheating, no overclocking, nothing messed around with. They function perfectly in Boinc. The problem is with the dodgy FAH software.