Page 1 of 1

128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 6:38 pm
by rusty
Hello,

I observed the following a few hours ago and thought I would report it.

Log for WU [Project:11759 (Run 0, Clone 5086, Gen 42)]
(Upload Successful. Final credit estimate, 132696.00 points)

Code: Select all

14:36:48:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
14:36:50:WU00:FS01:0x22:Saving result file ../logfile_01.txt
14:36:50:WU00:FS01:0x22:Saving result file checkpointState.xml
14:36:50:WU00:FS01:0x22:Saving result file checkpt.crc
14:36:50:WU00:FS01:0x22:Saving result file positions.xtc
14:36:50:WU00:FS01:0x22:Saving result file science.log
14:36:50:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
14:36:51:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:36:51:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11759 run:0 clone:5086 gen:42 core:0x22 unit:0x0000004680fccb0a5e6e859a0f61666c
14:36:51:WU00:FS01:Uploading 50.00MiB to 128.252.203.10
14:36:51:WU00:FS01:Connecting to 128.252.203.10:8080
14:37:17:WU00:FS01:Upload 0.75%
14:37:23:WU00:FS01:Upload 17.75%
14:37:29:WU00:FS01:Upload 46.75%
14:37:35:WU00:FS01:Upload 90.00%
14:37:36:WU00:FS01:Upload complete
14:37:36:WU00:FS01:Server responded WORK_ACK (400)
14:37:36:WU00:FS01:Final credit estimate, 132696.00 points
14:37:36:WU00:FS01:Cleaning up
Checking the WU Status Page (link) :
"Not found."

I waited about 3 hours before posting this report to make sure that whatever cron job (or whatever) had time to update the WU Status, but alas...

This particular machine has returned other WUs since 11759 (0, 5086, 42), and those show up as having been returned (for example: 11764 (0, 3122, 33)

Hopefully returned results are not getting "lost"

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 7:57 pm
by davidcoton
Experience suggests credit is almost never lost, but frequently delayed. The stats server is under pressure (like the whole F@H system), but is not a high priority compared to the servers that affect the science.

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 8:00 pm
by rusty
Thanks. I'm not really concerned about the stats/credit -- just the loss of the work.

Even if the "points" aren't updated, I would think that the WU status tool would still be correct @ https://apps.foldingathome.org/wu (unless there is a "real" problem, of course). Otherwise, the tool is a useless novelty, which I would assume is not the case.

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 8:13 pm
by davidcoton
AFAICT the status tool uses the stats database to get its information

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 8:53 pm
by rusty
Interesting, I would have expected WU status queries to go one level deeper than the statistics database; otherwise, I wouldn't have bothered posting.

In any case, I'll keep an eye on this and bump the thread if this WU still doesn't register in the next week or so.

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 9:02 pm
by PantherX
This is a simplified version of how you get points:

Finished WU uploaded to WS/CS -> WS/CS verifies WU and calculates points -> Stats Server gets all the data from WS/CS and processes it -> Stats are updated

Thus, if you have successfully upload the WU and the server acknowledges it, I am pretty certain that you will eventually get the points. Do note that the collection of additional points is generally manually done and is rather intensive so I would expect that it will be addressed eventually once the supply of WUs meets the demands in a reliable manner. AFAIK, I can't remember an instance where any points were lost.

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 10:07 pm
by rusty
Okay, I have to admit. I am getting pretty frustrated here.

I am not posting to inquire about my point total or the credit for completing this WU.

Like I mentioned before, I do not care about the points.

I was posting because it appeared that the collection server did not properly record the receipt of the WU. I simply wanted it to be on the admin staff's radar if this was the case.

The confusion here stems from the fact that I believed I could use the WU Status tool to check the true status of a WU. Now that I understand that that is, in fact, not the case, I believe we are square here. Thanks for your help.

Again, if the WU doesn't show as being received by the WU Status tool in a week or so, I will bump the thread.

Re: 128.252.203.10 Losing Work Units?

Posted: Sat Apr 04, 2020 10:44 pm
by uyaem
Fair enough, just wanna say I'm in the same boat at the moment.
Completed the following today, none of which are available on the ../wu app.
It has been the case before, just requires some patience, the delay varies a LOT. :)

Code: Select all

FS00:0xa7:Project: 16411 (Run 705, Clone 0, Gen 4)
FS01:0x22:Project: 13879 (Run 0, Clone 68, Gen 20)
FS00:0xa7:Project: 14592 (Run 705, Clone 2, Gen 15)
FS00:0xa7:Project: 16405 (Run 0, Clone 700, Gen 16)
FS00:0xa7:Project: 16418 (Run 0, Clone 951, Gen 39)
FS00:0xa7:Project: 14614 (Run 239, Clone 0, Gen 2)
FS01:0x22:Project: 14541 (Run 0, Clone 1724, Gen 10)

Re: 128.252.203.10 Losing Work Units?

Posted: Sun Apr 05, 2020 9:44 am
by uyaem
Please check again with the WU status tool, all mine have been processed successfully now.

Re: 128.252.203.10 Losing Work Units?

Posted: Sun Apr 05, 2020 2:02 pm
by Neil-B
They have caught up a bit (circa 1.1m WUs) https://apps.foldingathome.org/credit-log

Re: 128.252.203.10 Losing Work Units?

Posted: Sun Apr 05, 2020 2:23 pm
by rusty
Thanks everyone. Fortunately, it looks like the situation was the same for me. The original WU in question is now showing as received, so no worries.

I learned an important lesson here about the WU Status tool: it's not necessarily up-to-date and shouldn't be used to try and identify potential problems.

Neil-B:
The link you posted to the Credit Log app is a nice resource to have (oddly, not listed at https://apps.foldingathome.org). Thanks!

Re: 128.252.203.10 Losing Work Units?

Posted: Mon Apr 06, 2020 4:20 pm
by davidcoton
The summary is this: if you get credit or see the WU in the WU status tool, then it has been received.
Otherwise, if the upload in your log looks correct, the science has almost certainly been received but there is a delay in the link to stats. The stats server will usually be a lower priority than anything on the science path.
Otherwise, the client will retry the upload until it succeeds or the WU expires.

The system is designed to be robust -- loss of science is in no-one's interest. That implies that the vast majority of WUs will be uploaded before they expire. Even with the recent step-change in the work throughput, very little work has been lost (sorry I don't have figures, but certainly nothing that worries the F@H team). There were delays in work allocation and return while servers were overloaded, that should mainly be solved now (several new powerful servers are online).

Once a WU is uploaded, loss of points credit is also extremely rare, but may be delayed -- work on the stats system has been carried out, if more is necessary it will be done but at lower priority than the science systems.