Page 1 of 1

Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 5:16 am
by sejtam
I just had a WU killed by FAHcontrol because it exceeded the final deadline. That deadline
apparently was very short.


******************************* Date: 2020-10-22 *******************************
23:22:10:WU02:FS01:0xa7:Completed 177500 out of 250000 steps (71%)
23:32:59:WU02:FS01:0xa7:Completed 180000 out of 250000 steps (72%)
03:56:34:WU01:FS01:Connecting to assign2.foldingathome.org:80
03:56:35:WU01:FS01:Assigned to work server 66.170.111.50
03:56:35:WU01:FS01:Requesting new work unit for slot 01: RUNNING cpu:1 from 66.170.111.50
03:56:35:WU01:FS01:Connecting to 66.170.111.50:8080
03:56:35:WU01:FS01:Downloading 7.36MiB
03:56:41:WU01:FS01:Download 55.17%
03:56:44:WU01:FS01:Download complete
03:56:44:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17408 run:0 clone:668 gen:85 core:0xa7 unit:0x0000006742aa6f325f61846850e1c007
04:02:07:WU02:FS01:0xa7:Completed 250000 out of 250000 steps (100%)
04:02:12:WU02:FS01:0xa7:Saving result file ../logfile_01.txt
04:02:12:WU01:FS01:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-sse2/a7-0.0.19/Core_a7.fah/FahCore_a7" -dir 01 -suffix 01 -version 706 -lifeline 75 -checkpoint 30 -np 1
04:02:12:WU01:FS01:Started FahCore on PID 8409
04:02:13:WU01:FS01:Core PID:8410
04:02:13:WU01:FS01:FahCore 0xa7 started
04:02:13:WU01:FS01:0xa7:*********************** Log Started 2020-10-23T04:02:13Z ***********************
04:02:13:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
until
23:47:32:WU01:FS01:0xa7:Completed 103750 out of 125000 steps (83%)
******************************* Date: 2020-10-25 *******************************
00:24:16:WU01:FS01:0xa7:Completed 105000 out of 125000 steps (84%)
01:00:10:WU01:FS01:0xa7:Completed 106250 out of 125000 steps (85%)
01:38:11:WU01:FS01:0xa7:Completed 107500 out of 125000 steps (86%)
02:14:46:WU01:FS01:0xa7:Completed 108750 out of 125000 steps (87%)
02:53:04:WU01:FS01:0xa7:Completed 110000 out of 125000 steps (88%)
03:27:19:WU01:FS01:0xa7:Completed 111250 out of 125000 steps (89%)
03:52:53:WARNING:WU01:FS01:Past final deadline 2020-10-25T03:52:52Z, dumping
03:52:53:WU01:FS01:Shutting core down
03:52:53:WU01:FS01:0xa7:Caught signal SIGINT(2) on PID 8410
03:52:53:WU01:FS01:0xa7:Exiting, please wait. . .
03:52:53:WU00:FS01:Connecting to assign1.foldingathome.org:80
So that workload had a final deadline of just 48 hours after assignment?

I don't find any mention of the timeouts or deadlines in the logs'

Re: Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 7:27 am
by Neil-B
Timeouts and Expiration Deadlines are indicated it both the web and advanced controls .. the project summary webpage linked from top of forum also indicates these .. 1 day Timeouts and 2 day Expirations have been common for covid projects

Re: Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 8:27 am
by sejtam
But they are not logged, so after the fact it is impossible to see what was actually indicated as timeout and deadline... unless you run into the deadline . *then* it logs them at the point it stops processing. At which point I have wasted almost 2 days of processing. for nothing. Why is my client (macos OSX, no GPU) even getting such WUs that have deadlines that are almost impossible to achieve ???

Re: Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 8:45 am
by ajm
The Timeout and Deadline values are constant for a given project. You can see them here: https://apps.foldingathome.org/psummary

Re: Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 9:16 am
by Neil-B
I am guessing that you may not be running 24/7 ... or be running fairly old kit ... or possibly be running a limited core cpu slot ... the WUs might struggle to complete under those circumstances ... the web and advanced controls are the tools intended for monitoring progress of WUs ... the log is simply a record of what has happened ... FaH WUs will normally complete within Timeout on relatively recent kit folding 24/7.

Re: Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 9:40 am
by sejtam
I am running 24/7 . yes it is limited ( 201- mac mini). I am running 2 slots with one core each (as running one slot with 2 cores only uses about half of each CPU which seems like a waste also). It would be nice if client could request only WUs that fit its limited performance

And I understand that he web controls are there for monitoring, but once the WU has been cancelled/deleted you cannot find. the timeout and expiry anymore. and I surely don't watch this all the time. Just logging these values on assignment would help later figure out what happened.

Re: Very short final deadline? 48 h?

Posted: Sun Oct 25, 2020 11:23 am
by Neil-B
I slot with 2 cores is the way to go and should use both cores 100% .. a single core on a mac mini sill struggle to meet expirations tbh ... You can always check the timeout expiration past WUs as the top level project details are posted in the links previously provided and the deadlines are the same for all WUs in each project.

it is possible (due to an old bug) for the first WU when a slot is setup to only use a single core (which might be why you have seen 50%) but after that it should run fine on 2 cores (and pretty much twice as quickly) - you may find the WUs get in close to Timeout even ... If you finish one of the slots then once completed delete it then increase the cores on the remaining slot to 2 you should get what you want ... if you increase the slot to 2 mid WU it will only increase at the next WU not mid WU - but it won't harm the running WU to do so.

Re: Very short final deadline? 48 h?

Posted: Wed Oct 28, 2020 3:22 pm
by bruce
If, in response to the old bug mentioned above, you decided to reconfigure the Mini to run two independent slots, that was a bad idea. Before a new wU is downloaded, reconfigure your sysem for one slot using two threads (CPUs). If the slot is set for 1 CPU, when a WU is downloaded, you cannot reconfigure that WU to use both CPUs.