Page 1 of 1
Hung downloads from 171.67.108.52
Posted: Mon Oct 20, 2014 7:07 pm
by rwh202
Hi,
I've got 6 clients with hung downloads from 171.67.108.52. I've restarted a few times and my Kepler GPUs were given different servers, but no luck so far on the maxwell - they get the same server each time. Downloads start very slowly then just hang after a few percent (of 1.52MB)
Also, the Stanford website is being even slower than normal - maybe more general network problems?
Anyone else experiencing issues?
Re: Hung downloads from 171.67.108.52
Posted: Mon Oct 20, 2014 7:20 pm
by ei57
Same here, upload is slow and sometimes fails, download doesn't work at all. You may try to shut down fah completely, but you can't babysit 24/7.
Re: Hung downloads from 171.67.108.52
Posted: Mon Oct 20, 2014 8:17 pm
by Breach
Same here, one of my WUs hang during download (which was very slow to begin with). Manually restarted fahclient which fixed it.
Re: Hung downloads from 171.67.108.52
Posted: Mon Oct 20, 2014 10:04 pm
by jadeshi
Hey guys,
Currently looking into this, sorry for the inconvenience. Looking at the logs on my end, this doesn't seem like a large-scale issue with the server itself, since I'm still assigning/getting a consistent stream of WU's. I'll try restarting the server and see if that fixes things. I suspect it may be an issue with the FAH client, since restarting the client seems to fix the issue for some people. Does reinstalling/updating the client help at all?
Re: Hung downloads from 171.67.108.52
Posted: Tue Oct 21, 2014 6:54 am
by rwh202
Thanks for looking in to this.
I restarted my 4 remaining struggling clients this morning and they all got nice fast downloads from 171.67.108.52 and happily folding again.
I think it's a known issue that the client doesn't handle dropped network connections elegantly and needs manual intervention.
Re: Hung downloads from 171.67.108.52
Posted: Tue Oct 21, 2014 2:03 pm
by 7im
Don't know if this server is on older code or newer, but maybe it's not handling dropped connections well either. Would like to see both fixed.
Re: Hung downloads from 171.67.108.52
Posted: Wed Oct 22, 2014 8:38 am
by WiSK
I'm having upload issues with 171.67.108.52 as well, but I don't have to restart - it seems to break the connection fine and failover to a collection server 171.65.103.160.
I am missing some points, so I checked the
server stats page. But I can't see the collection server (171.65.103.160) listed there. Is it normal that the collection servers are not listed?
excerpt from log
Code: Select all
23:23:08:WU01:FS01:Connecting to 171.67.108.201:80
23:23:09:WU01:FS01:Assigned to work server 171.67.108.52
23:23:09:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 980] from 171.67.108.52
23:23:09:WU01:FS01:Connecting to 171.67.108.52:8080
23:23:12:WU01:FS01:Downloading 1.52MiB
...
23:23:45:WU01:FS01:Download complete
23:23:45:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:797 clone:1 gen:96 core:0x17 unit:0x000000b46652edc45399f56058365161
23:23:45:WU01:FS01:Starting
...
23:24:07:WU01:FS01:0x17:Completed 0 out of 5000000 steps (0%)
23:25:46:WU01:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
...
02:10:58:WU01:FS01:0x17:Completed 5000000 out of 5000000 steps (100%)
...
02:11:03:WU01:FS01:0x17:Folding@home Core Shutdown: FINISHED_UNIT
02:11:03:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:11:03:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:9201 run:797 clone:1 gen:96 core:0x17 unit:0x000000b46652edc45399f56058365161
02:11:03:WU01:FS01:Uploading 8.36MiB to 171.67.108.52
02:11:03:WU01:FS01:Connecting to 171.67.108.52:8080
02:11:10:WU01:FS01:Upload 1.49%
02:11:16:WU01:FS01:Upload 3.74%
...
02:12:45:WU01:FS01:Upload 26.16%
02:13:05:WU01:FS01:Upload 26.91%
02:13:05:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
02:13:05:WU01:FS01:Trying to send results to collection server
02:13:05:WU01:FS01:Uploading 8.36MiB to 171.65.103.160
02:13:05:WU01:FS01:Connecting to 171.65.103.160:8080
02:13:13:WU01:FS01:Upload 2.24%
...
02:18:47:WU01:FS01:Upload 99.40%
02:18:49:WU01:FS01:Upload complete
02:18:49:WU01:FS01:Server responded WORK_ACK (400)
02:18:49:WU01:FS01:Final credit estimate, 36579.00 points
02:18:49:WU01:FS01:Cleaning up
Re: Hung downloads from 171.67.108.52
Posted: Wed Oct 22, 2014 10:20 am
by bollix47
@ WiSK
Thank you for your report ... I'll ask PG to have a look since project:9201 run:797 clone:1 gen:96 does not appear in the database at this time.
Re: Hung downloads from 171.67.108.52
Posted: Wed Oct 22, 2014 2:29 pm
by 7im
Since all these growing pains are continuing to linger (upgrading AS code / adding Maxwell support) maybe PG should offer a little more frequent progress reports.
"We're working on it" is okay for a few days, even a few weeks with a few feedback posts, but now it's going on MONTHS and there is no apparent end to the problems in sight. The donor community deserves better at this point.
Re: Hung downloads from 171.67.108.52
Posted: Fri Oct 24, 2014 10:38 pm
by bollix47
The missing credit has shown up:
Hi WiSK (team 37726),
Your WU (P9201 R797 C1 G96) was added to the stats database on 2014-10-24 15:04:42 for 36576 points of credit.
Re: Hung downloads from 171.67.108.52
Posted: Sat Oct 25, 2014 1:26 am
by PS3EdOlkkola
7im, you've got a very good point. A little feedback on progress would be greatly appreciated. Thank you.