Page 3 of 7

Re: 155.247.166.220 downloads stalled

Posted: Thu May 07, 2020 5:12 pm
by vvoelz
Thanks rickoic -- high traffic remains an issue, so we might put some assignment filters on to see if that helps.

Re: 155.247.166.220 downloads stalled

Posted: Fri May 08, 2020 11:37 am
by rickoic
Well got woke up early by thunderstorms moving through so checked and found only 1 gpu frozen on download at 3.65% and had sit there for 5 hours or so.

Re: 155.247.166.220 downloads stalled

Posted: Fri May 08, 2020 11:41 am
by vvoelz
p.s. Our head software dev is "working on a rewrite of the WS networking code that should help a lot
with uploads/downloads." Fingers crossed.

Re: 155.247.166.220 downloads stalled

Posted: Fri May 08, 2020 12:51 pm
by rickoic
Guru/witchdoctor/medicine man. Wish him heap big magic.

Re: 155.247.166.220 downloads stalled

Posted: Sun May 10, 2020 4:08 pm
by info2x
I wish them luck as well.

After several days of not being assigned to this server I got an assignment and failed to download again.

Re: 155.247.166.220 downloads stalled

Posted: Mon May 11, 2020 2:28 am
by HaloJones
just found one and had to reboot to clear

Re: 155.247.166.220 downloads stalled

Posted: Mon May 11, 2020 2:10 pm
by rickoic
Sat had 1 to reboot.
Sun had 2 to reboot
Mon had 2 to reboot
Plus through out the day Sun caught 3 more that I had to reboot.

Tks
Rick

Re: 155.247.166.220 downloads stalled

Posted: Mon May 18, 2020 5:14 pm
by rickoic
Appears Tech Guru had heap big magic. Been 3 days or more since I've had a problem with the sergver.

Tks for the good work.

Re: 155.247.166.220 downloads stalled

Posted: Tue May 26, 2020 12:46 am
by cfhdev
The last few days this has been stalling again.

Re: 155.247.166.220 downloads stalled

Posted: Tue May 26, 2020 7:48 am
by PantherX
I can access the landing page without issues. Thus, would chalk it up to peak traffic potentially saturating the bandwidth. If the situation worsens, please post the log file so we can investigate it further :)

Re: 155.247.166.220 downloads stalled

Posted: Tue May 26, 2020 3:50 pm
by cfhdev
I can to. I don't think the issue is just a bandwidth issue. The client should have a time out period where it will either dump the process and try again or switch to another server.

Example from log of WU02:FS02 from one client. Before and after the reboot. Normally I dump the WU from

Code: Select all

/var/lib/fahclient/work
this time just rebooted

Code: Select all

*********************** Log Started 2020-05-26T01:13:35Z ***********************
******************************* Date: 2020-05-26 *******************************
07:44:25:WU02:FS02:Connecting to assign1.foldingathome.org:80
07:44:25:WARNING:WU02:FS02:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
07:44:25:WU02:FS02:Connecting to assign2.foldingathome.org:80
07:44:26:WU02:FS02:Assigned to work server 155.247.166.220
07:44:26:WU02:FS02:Requesting new work unit for slot 02: RUNNING gpu:1:********* from 155.247.166.220
07:44:26:WU02:FS02:Connecting to 155.247.166.220:8080
07:44:26:WU02:FS02:Downloading 12.96MiB
******************************* Date: 2020-05-26 *******************************
******************************* Date: 2020-05-26 *******************************
*********************** Log Started 2020-05-26T15:22:32Z ***********************
*********************** Log Started 2020-05-26T15:22:32Z ***********************
15:23:19:WU02:FS02:Connecting to assign1.foldingathome.org:80
15:23:24:WARNING:WU02:FS02:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
15:23:24:WU02:FS02:Connecting to assign2.foldingathome.org:80
15:23:25:WU02:FS02:Assigned to work server 128.252.203.10
15:23:25:WU02:FS02:Requesting new work unit for slot 02: READY gpu:1:********* from 128.252.203.10
15:23:25:WU02:FS02:Connecting to 128.252.203.10:8080
15:25:34:WARNING:WU02:FS02:WorkServer connection failed on port 8080 trying 80
15:25:34:WU02:FS02:Connecting to 128.252.203.10:80
15:26:28:WU02:FS02:Downloading 86.23MiB
15:26:34:WU02:FS02:Download 20.58%
15:26:40:WU02:FS02:Download 36.38%
15:26:46:WU02:FS02:Download 45.23%
15:26:52:WU02:FS02:Download 52.69%
15:26:58:WU02:FS02:Download 61.32%
15:27:04:WU02:FS02:Download 68.71%
15:27:10:WU02:FS02:Download 77.84%
15:27:16:WU02:FS02:Download 89.22%
15:27:22:WU02:FS02:Download 97.77%
15:27:22:WU02:FS02:Download complete
It just stops/hangs in the middle of the download and never tries anything again until the reboot 8 hours later.

Re: 155.247.166.220 downloads stalled

Posted: Wed May 27, 2020 3:17 am
by PantherX
When you encounter the network issue, there's no need to delete any files. Simply restart FAHClient. It's a known bug: https://github.com/FoldingAtHome/fah-issues/issues/983

Re: 155.247.166.220 downloads stalled

Posted: Sat May 30, 2020 2:39 am
by cfhdev
Thanks for that bug page. Its interesting that it only happens for us on that one IP across multiple machines.

Re: 155.247.166.220 downloads stalled

Posted: Wed Jun 10, 2020 7:56 am
by Sparkly
This issue is still occurring a lot on this Server/IP – 155.247.166.220 by the looks of it and as far as I can see from my setups, which are currently running 12 slots total, it happens several times a day for me alone, thus needing a reboot of the process, since there is no recovery and the slot hangs in Ready.

Re: 155.247.166.220 downloads stalled

Posted: Wed Jun 10, 2020 4:27 pm
by bruce
Yes, it happens on specific servers. They do get overloaded and data transfer rates become slower and slower ... and evenually it starts to hang which leads to the need to restart FAHClient. Posting topics like this one often leads to a correction on the server side. (except at night when we can't contact the appropriate people if they happen to have a life).

155.247.166.220 (vav4) reportedly had two CSs, one of which has failed. We reported a problem last night and it looks like somebody is working on it. (It was rebooted 44 minutes ago.)