Page 1 of 8

Fah gets stuck in every few days

Posted: Sun Feb 02, 2020 9:34 am
by Akaanc
Hello,

I run fah on a dedicated computer however in every 3-4 days I see it gets stuck on cpu or gpu.

It says ready but doesnt download a new wu or it says download but doesnt start new wus.

I need to restart my pc each time to fix it. Is there a way to reset automatcally its wasting time and energy when my pc stays idle.

Re: Fah gets stuck in every few days

Posted: Sun Feb 02, 2020 10:50 am
by HaloJones
need to see the log. there was a problem with one particular server but that's supposedly fixed now.

Download Bug Happening Again (*.220)

Posted: Mon Feb 17, 2020 10:20 pm
by gordonbb
Had one system stall yesterday and another today.

download from 155.247.166.220 was the culprit today.

Client-type "Advanced" on both these two systems

Code: Select all

21:29:36:WU00:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
21:29:37:WU01:FS01:Connecting to 65.254.110.245:8080
21:29:38:WU01:FS01:Assigned to work server 155.247.166.220
21:29:38:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:TU106 [GeForce RTX 2070 Rev. A] M 7465 from 155.247.166.220
21:29:38:WU01:FS01:Connecting to 155.247.166.220:8080
21:29:38:WU01:FS01:Downloading 7.40MiB
21:29:44:WU01:FS01:Download 5.91%
21:29:50:WU01:FS01:Download 10.98%
21:29:56:WU01:FS01:Download 16.89%
21:30:17:WU01:FS01:Download 20.27%
21:30:24:WU01:FS01:Download 21.96%
21:30:30:WU01:FS01:Download 27.02%
21:30:39:WU01:FS01:Download 29.56%
21:31:08:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
21:31:11:WU01:FS01:Download 30.40%
21:31:18:WU00:FS01:0x22:Saving result file ../logfile_01.txt
21:31:18:WU00:FS01:0x22:Saving result file checkpointState.xml
21:31:18:WU00:FS01:0x22:Saving result file checkpt.crc
21:31:18:WU00:FS01:0x22:Saving result file positions.xtc
21:31:19:WU00:FS01:0x22:Saving result file science.log
21:31:19:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
21:31:19:WU01:FS01:Download 32.09%
21:31:19:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:31:19:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11738 run:0 clone:630 gen:75 core:0x22 unit:0x000000558ca304f15e127b23b8bf8d21
21:31:19:WU00:FS01:Uploading 86.96MiB to 140.163.4.241
21:31:19:WU00:FS01:Connecting to 140.163.4.241:8080
21:31:25:WU00:FS01:Upload 9.99%
21:31:31:WU00:FS01:Upload 18.18%
21:31:37:WU00:FS01:Upload 25.30%
21:31:43:WU00:FS01:Upload 34.86%
21:31:49:WU00:FS01:Upload 43.20%
21:31:55:WU00:FS01:Upload 51.53%
21:32:01:WU00:FS01:Upload 59.87%
21:32:07:WU00:FS01:Upload 68.21%
21:32:13:WU00:FS01:Upload 76.54%
21:32:19:WU00:FS01:Upload 84.88%
21:32:25:WU00:FS01:Upload 93.22%
21:32:34:WU00:FS01:Upload complete
21:32:34:WU00:FS01:Server responded WORK_ACK (400)
21:32:34:WU00:FS01:Final credit estimate, 216117.00 points
21:32:34:WU00:FS01:Cleaning up

Re: Download Bug Happening Again

Posted: Tue Feb 18, 2020 11:16 pm
by snapshot
Exactly the same here at about 19:40, same server. I was out so didn't spot it until 22:40.

Re: Download Bug Happening Again

Posted: Wed Feb 19, 2020 4:25 pm
by HaloJones
155.247.166.220 not completing downloads. Had two clients this afternoon start to download but just never finish or fail

Re: Download Bug Happening Again

Posted: Wed Feb 19, 2020 7:41 pm
by HaloJones
Make that three clients

Re: Download Bug Happening Again

Posted: Wed Feb 19, 2020 9:32 pm
by ManiacJoe
Add me to the list for download problems with 155.247.166.220

Re: Download Bug Happening Again

Posted: Wed Feb 19, 2020 10:17 pm
by gordonbb
Another yesterday evening and again today on 155.247.166.220

Re: Download Bug Happening Again

Posted: Wed Feb 19, 2020 10:27 pm
by snapshot
Again tonight. One at 21:30 and another at 22:00.

Re: Download Bug Happening Again

Posted: Thu Feb 20, 2020 1:30 am
by vvoelz
Thanks for alerting us! There may be some traffic issues at Temple, resulting in hung connections snowballing. We'll look into it, and restart the server in the meantime. --Vince

Re: Download Bug Happening Again

Posted: Thu Feb 20, 2020 4:32 am
by gordonbb
vvoelz wrote:Thanks for alerting us! There may be some traffic issues at Temple, resulting in hung connections snowballing. We'll look into it, and restart the server in the meantime. --Vince
Thanks Vince. I Had another failure at 21:15 EST and again after a restart at 22:15

Re: Download Bug Happening Again

Posted: Thu Feb 20, 2020 7:32 am
by Gleep
I've been seeing this too. As recently as 5 minutes before this post.

Code: Select all

*********************** Log Started 2020-02-20T07:26:10Z ***********************
07:26:10:WU03:FS02:Connecting to 65.254.110.245:8080
07:26:13:WU03:FS02:Assigned to work server 140.163.4.241
07:26:13:WU03:FS02:Requesting new work unit for slot 02: READY gpu:2:GP102 [TITAN Xp] 12150 from 140.163.4.241
07:26:13:WU03:FS02:Connecting to 140.163.4.241:8080
07:26:14:ERROR:WU03:FS02:Exception: Server did not assign work unit
07:26:14:WU03:FS02:Connecting to 65.254.110.245:8080
07:26:15:WU03:FS02:Assigned to work server 155.247.166.220
07:26:15:WU03:FS02:Requesting new work unit for slot 02: READY gpu:2:GP102 [TITAN Xp] 12150 from 155.247.166.220
07:26:15:WU03:FS02:Connecting to 155.247.166.220:8080
07:26:15:WU03:FS02:Downloading 5.82MiB
07:26:22:WU03:FS02:Download 3.22%
07:26:32:WU03:FS02:Download 5.37%
07:26:58:WU03:FS02:Download 6.45%

Re: Download Bug Happening Again

Posted: Thu Feb 20, 2020 1:28 pm
by HaloJones
vvoelz wrote:Thanks for alerting us! There may be some traffic issues at Temple, resulting in hung connections snowballing. We'll look into it, and restart the server in the meantime. --Vince
Hi Vince, did you get this server rebooted?

because I've had two more clients get stuck today. rebooting machines remotely is a real bore.

Re: Download Bug Happening Again

Posted: Thu Feb 20, 2020 3:07 pm
by Joe_H
Server stats shows an uptime of 13 hours, looks like it was rebooted yesterday evening.

Re: Download Bug Happening Again

Posted: Thu Feb 20, 2020 3:46 pm
by HaloJones
Joe_H wrote:Server stats shows an uptime of 13 hours, looks like it was rebooted yesterday evening.
then rebooting does not seem to be a resolution for the problem