Page 1 of 1

Odd upload failure

Posted: Mon Nov 24, 2014 10:51 am
by billford
Log extract:

Code: Select all

******************************* Date: 2014-11-22 *******************************
.
.
09:28:50:WU02:FS00:0xa4:- Shutting down core
09:28:50:WU02:FS00:0xa4:
09:28:50:WU02:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
09:29:01:WU02:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:29:02:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:9015 run:467 clone:2 gen:51 core:0xa4 unit:0x00000042664f2de453e55e1e332f655a
09:29:02:WU02:FS00:Uploading 1.64MiB to 171.64.65.124
09:29:02:WU02:FS00:Connecting to 171.64.65.124:8080
And there it stayed… if the server responded (and I suspect it did) the the client didn't see it, but there was no timeout and the client didn't go into the usual retry-at-increasing-intervals sequence. (It carried on as usual with a newly downloaded WU of course)

I only noticed it on this morning's check around the clients- after a reboot it connected to 171.64.65.124 and uploaded the WU with no bother.

I'm fairly sure that it was most probably caused by some sort of OS/client/network problem at my end, but I'd be interested in comments from others.

In particular, is this what I would expect from the known bug where FAH doesn't recover gracefully from a loss of internet connection during upload? I haven't had that happen before.

Though I should add that if there were such a break it would have been very brief, there's no sign of it in the router log or my monitoring app.

Re: Odd upload failure

Posted: Mon Nov 24, 2014 6:18 pm
by bruce
Odd is a good word for it.
It's not clear from your log whether or not a connection was ever established with that server. In all other reports on Internet interruptions,that I've seen, there were %_reports so it was clear the connection was established and then interrupted, followed by a hang. Maybe this is the same bug and maybe not.

Restarting the client is still the only known recovery. The WU upload should restart.

Re: Odd upload failure

Posted: Mon Nov 24, 2014 6:23 pm
by davidcoton
billford wrote:In particular, is this what I would expect from the known bug where FAH doesn't recover gracefully from a loss of internet connection during upload? I haven't had that happen before.
While it is not entirely clear, I think Yes. What seems to happen is that the ACK packet (very small) gets lost en route from server to client, and the link just hangs. The logs do not reveal enough to be certain about what happened, but the programmers ought to be able to solve it when it gets enough priority to be looked at. (Disclaimer -- I haven't looked at the code. I could be completely wrong :twisted: )

Re: Odd upload failure

Posted: Mon Nov 24, 2014 6:54 pm
by billford
Thanks both. Looks like I'll have to put it down as another of life's little mysteries.

And remember to check the clients a bit more often- it was so late it only got 530 points, I probably lost more than that with the reboot and the 780 Ti in the same machine having to restart from a checkpoint :shock: