Page 1 of 2

Frequently downloading work units with expirations too short

Posted: Thu Sep 03, 2020 1:34 am
by MajorCaliber
In the last few weeks I have a recurring problem of client 7.6.13 downloading work units with expirations so short they do not have time to finish. It happens more frequently on my GPU slot, but also happens with the CPU slot. I have a not insubstantial laptop with a Processor. Core i7-8550U CPU @ 1.80GHz, 4 cores 8 threads Radeon M530 The CPU is good for 8-20K PPD depending on temperature. The GPU is good for 15-30K PPD also depending on temperature. I am running FAH 24/7 at maximum priority. As an example, I was watching when my current GPU unit was assigned, at that moment it was 45,700 points and expected to complete in 2.14 days but only precisely 2 days before expiration. I used to get expiration dates of 4 or 5 days. This will eventually expire and get dumped. I can’t even find a way to dump it now and save the 2 days of wasted effort.

How can I fix this and is it only happening to me?

Re: Frequently downloading work units with expirations too s

Posted: Thu Sep 03, 2020 8:01 am
by JimboPalmer
Welcome to Folding@Home!

https://www.techpowerup.com/gpu-specs/r ... bile.c3002

Your GPU is fully supported: it has OpenCL support of 2.0 (1.2 is the minimum) and supports Double Precision floating point math (FP64)

However, it is a laptop GPU with pretty mild specs. 384 cores and only 750 Mhz speed are both low.

https://ark.intel.com/content/www/us/en ... 0-ghz.html

The CPU is fully modern and should be having no difficulty if it is not overheating.

The Windows utility I use to monitor temps is speccy.

https://www.ccleaner.com/speccy/download/standard

others exist, but I like speccy.

If you are thermally throttled, you could try only using one or the other of the CPU and GPU, or buying a cooling pad. Reducing the heat should make both faster.

Re: Frequently downloading work units with expirations too s

Posted: Thu Sep 03, 2020 9:07 am
by gunnarre
I assume you let the work unit fold at least 5% before you look at the PPD estimation. The estimation in Advanced Control is often completely off in the beginning of a WU.

The FAH project is working to better segment the work units so they get assigned to a GPU that can complete them on time, and utilize it fully - but this work is not yet complete, so you might get sub-optimal work units for your GPU still.

Most laptops with discrete graphics are made for workloads which either load the GPU or use the CPU in short bursts - not both at the same time. As Jimbo says, a cooling pad and folding on only the GPU or only the CPU (set it manually to 8 threads) seems to be the way to go - the CPU has AVX support, so it should be no slouch on CPU folding.

Re: Frequently downloading work units with expirations too s

Posted: Sun Sep 06, 2020 1:39 am
by PantherX
Welcome to the F@H Forum MajorCaliber,

Please note that there are plans on optimizing the allocation of WU to GPUs based on the hardware specifications to ensure that the most optimum WU is allocated to it. While this would solve your issue to a certain degree, there's no ETA on it and would take time to implement.

Re: Frequently downloading work units with expirations too s

Posted: Sun Sep 06, 2020 3:29 am
by MajorCaliber
PantherX wrote:Welcome to the F@H Forum MajorCaliber,

Please note that there are plans on optimizing the allocation of WU to GPUs based on the hardware specifications to ensure that the most optimum WU is allocated to it. While this would solve your issue to a certain degree, there's no ETA on it and would take time to implement.
Thanks for the response, Panther.

While that kind of sophisticated logic is certainly the "classier" solution, a far simpler one would just to be give us longer expiration times. I used to get typically 4 or 5 day periods to complete a chunk of work either a CPU chunk that takes 12 hours or a 45K point GPU chunk that would take 2.5 days. Everything was fine then. I still get 5 days for CPU chunks, but my last few GPU chunks have only been allocated exactly 48 hours before expiration and as a result the project has not received GPU work for me in weeks.

I understand there must be SOME expiration to account for work units you will never get back, but I have to believe that you are losing more than you are gaining with such short expirations. The situation is even worse because when the GPU chunk finally does time out, the slot does not automatically download a new chunk. The slot stays idle until I restart my computer. That is the only way I have to download a new chunk.

As long as I have the attention of an "insider", is there any way I can dump a work unit in process when it is apparent it will not finish before it expires? At least I can roll the dice sooner to see if I can get a packet I can complete?

Re: Frequently downloading work units with expirations too s

Posted: Sun Sep 06, 2020 4:14 am
by PantherX
MajorCaliber wrote:...my last few GPU chunks have only been allocated exactly 48 hours before expiration and as a result the project has not received GPU work for me in weeks...
Please note that only Moonshot WUs have the shortened timeline due to the requirements from the Chemists that F@H is collaborating with. Given that it's a multi-interdisciplinary collaboration across various companies/universities and countries, there's virtually no wiggle room for pushing out the deadlines due to domino effect. Thus, the Moonshot WUs (Project 134XX series) have the reduced timeline and will not be changed for the foreseeable future. However, other Projects do have normal time frames.
MajorCaliber wrote:...but I have to believe that you are losing more than you are gaining with such short expirations...
Can you please post the log file? Ensure you include the first 100 lines which will inform us of what the system configuration is and what the client settings are. If you require guidance, please view this topic: viewtopic.php?f=24&t=26036
Reason is that it seems you have a mobile AMD GPU folding 24/7 but can't meet the Moonshot deadline so if we have those details, we might be able to exclude it from those Projects. However, there's no guarantee of that.
Moreover we do have weekly sprints which provides an indication of how much WUs have been folded for Moonshot Project: https://twitter.com/jchodera/status/1302297941466427393
MajorCaliber wrote:...The situation is even worse because when the GPU chunk finally does time out, the slot does not automatically download a new chunk. The slot stays idle until I restart my computer. That is the only way I have to download a new chunk...
Without the log file to confirm, it seems that you are encountering a known network issue (https://github.com/FoldingAtHome/fah-issues/issues/983) on a regular basis which seems a bit unusual.
MajorCaliber wrote:...is there any way I can dump a work unit in process when it is apparent it will not finish before it expires? At least I can roll the dice sooner to see if I can get a packet I can complete?
Unfortunately, there's no official way and dumping WUs to cherry-pick is actually not allowed and hurts the progress of science. Instead, an alternative is to post the Project numbers of the long WU which can allow us to reach out to the researchers directly and see if we can do something about it since there might be other donors in your situation too.

Re: Frequently downloading work units with expirations too s

Posted: Mon Sep 14, 2020 7:52 pm
by MajorCaliber
Panther X,

Thanks for the help You have been truly informative and addressed my concerns.

Just a bit of an update. The situation seems to have resolved itself, at least for me. I now get GPU work blocks with more reasonable expirations of 5 days or so although I usually complete them in a bit more than 2 days, as opposed to the 48 hour expirations I was getting. Also I have not had the known network issue you referred to. The link you provided seemed to describe my problem.

I did, however figure out how to dump a work block that I know will not complete and get a new one. I'm not sure what you meant by cherry picking, but as soon as I get a work block with an expiration so short I know it will not finish, it is a foregone conclusion that the server will not get it back from me and it will need to be assigned to somebody else after expiration. There is no course of action that will get that completed work back to the server any sooner. The only options are that I dump it and get to work on a new block ASAP or I continue to waste time on it and not start on a potentially viable block for 48 hours. It seems to me the science progresses faster with the second option.

I no longer have any log entries from expired units since I have restarted for travel and updates since then. If it happens again. I will post the log segment. I check my stats once or twice a day and it is always gratifying to see me climbing my team's (Anandtech) rankings.

Re: Frequently downloading work units with expirations too s

Posted: Mon Sep 14, 2020 8:53 pm
by ThWuensche
That unfortunately might be an observation changing soon :( . I guess your problem is with moonshot WUs and the server for these has been down for a few hours. So it's likely that you will get one of these WUs again after finishing the current one.

When I had problems with moonshot WUs I had set the disease selection to cancer. After that I still did get COVID WUs, but no moonshot WUs any more. The other WUs have longer timeouts, you might try that.

Re: Frequently downloading work units with expirations too s

Posted: Tue Sep 15, 2020 3:54 am
by PantherX
MajorCaliber wrote:...I'm not sure what you meant by cherry picking, but as soon as I get a work block with an expiration so short I know it will not finish, it is a foregone conclusion that the server will not get it back from me and it will need to be assigned to somebody else after expiration. There is no course of action that will get that completed work back to the server any sooner. The only options are that I dump it and get to work on a new block ASAP or I continue to waste time on it and not start on a potentially viable block for 48 hours. It seems to me the science progresses faster with the second option...
Generally speaking, cherry picking is when donors dump WUs that don't provide them a high enough PPD even though they can fold it before the expiration deadline. In your case, since the assigned WU can't be completed before the expiration deadline, you're dumping it which is a workaround. I would personally report it here in the Forum which allows us to inform the Project Owners and fix it to prevent donors from having this issue.

However, there are plans to automate this process where the classification of GPUs is done on a data-driven approach instead of using GPU architecture. Work is underway but there's no ETA.

Re: Frequently downloading work units with expirations too s

Posted: Tue Sep 15, 2020 4:31 am
by JohnChodera
Just wanted to apologize for the Moonshot server downtime. We had two different failures occur at once here: The first was the work server not coming back after configuration updates; the second was that some WUs were sent out without a collection server attached. We're working to make sure we don't have an issue collision like this again!

We're also extending the deadlines with the Moonshot projects, and working to accelerate the calculations. Exciting updates are happening soon.

Thanks for sticking with us!

~ John Chodera // MSKCC

Re: Frequently downloading work units with expirations too s

Posted: Tue Sep 15, 2020 7:06 am
by MajorCaliber
PantherX wrote: Generally speaking, cherry picking is when donors dump WUs that don't provide them a high enough PPD even though they can fold it before the expiration deadline. In your case, since the assigned WU can't be completed before the expiration deadline, you're dumping it which is a workaround. I would personally report it here in the Forum which allows us to inform the Project Owners and fix it to prevent donors from having this issue.
I see what you mean. I would never dump a work unit I could finish just to get a "better one". Somebody has to do that work, it might as well be me. I will report it back here if it happens again. So far so good for the last few GPU units. The CPU units have never had that problem for me.

Re: Frequently downloading work units with expirations too s

Posted: Tue Sep 15, 2020 7:14 am
by MajorCaliber
JohnChodera wrote:Just wanted to apologize for the Moonshot server downtime. We had two different failures occur at once here: The first was the work server not coming back after configuration updates; the second was that some WUs were sent out without a collection server attached...

We're also extending the deadlines with the Moonshot projects, and working to accelerate the calculations.
I had a few instances of that second problem of no collection server assigned about a month ago but not since. I don't know about others, but I have benefited tremendously from the extended deadlines already. Thanks for helping us help you help us, LOL.

limiting WU selection to short ones?

Posted: Mon Sep 21, 2020 2:13 am
by sejtam
I am folding on an old 2-CPU mac-mini (no GPU) and found that a few times now I got WUS that have an ETA of 5 days,
which ultimately take quite a bit longer than 5 days (after 2.5 days it was at 44%, showed still 4.99 days remaining and
got very close/over the expiry time, in which case the effort would be wasted)

Is there a way to tell the client to collect only smaller work units, so that they can be sure to finish before the expiry?

Re: Frequently downloading work units with expirations too s

Posted: Mon Sep 21, 2020 2:26 am
by sejtam
MajorCaliber,
MajorCaliber wrote: I did, however figure out how to dump a work block that I know will not complete and get a new one. I
Could you share how you do that? The only way I found was to delete all my slots and then re-create them, which means both slots will have the WUs dumped).

I have an issue that my old Macbook 2 CPUS (no GPUs here) sometimes get workload that say 5 days, and after 3 days with 44% still estimate 5 more days, which is then past the
expiration. I'd rather not finish those workloads since they won't be affected.

And yes. I'd need a way to select shorter workloads also...

Re: limiting WU selection to short ones?

Posted: Mon Sep 21, 2020 7:27 am
by Hopfgeist
I don't think there is a way of doing this.

I stopped folding on my old Athlon II Neo N36L for the same reason (it is an otherwise very nice and compact hp Microserver which acts as a remote backup server, so its CPU is mostly idle). It was doing far less than 1000 PPD (whereas my main server would do closer to 100,000), and often would not finish the WU before the timeout. It would not normally run into the final deadline, so I would still get some points, but since the WU had almost always been reassigned and finished by someone with a much faster machine before I could deliver a result, it felt a bit like cheating. I wasn't contributing anything.

Btw, all Intel Mac Minis have some sort of GPU, but GPU-folding is not supported on macOS.

Cheers,
HG