Page 4 of 5
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 9:38 am
by Simplex0
bruce wrote:jdmurray wrote:I see on the server stats page that 171.64.65.104 is full and rejecting connections
Where do you see that? I see it is in
standby (and reject) which means it's intentionally off-line. That agrees with
p92xx temporarily offline, which states that the server is being worked on.
Yes, it sucks when a WU expires and is dumped.
The final deadline for those projects is 10 days after downloading. What GPU are you running? How many hours per week do you fold?
Here you can read the log
http://fah-web.stanford.edu/pybeta/logs ... 4.log.html
The message changes over time.
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 12:42 pm
by tofuwombat
Four days (94 attempts) of "PLEASE_WAIT" active refusals here. . .
Code: Select all
05:50:18:WU01:FS01:Upload complete
05:50:18:WU01:FS01:Server responded PLEASE_WAIT (464)
05:50:18:WARNING:WU01:FS01:Failed to send results, will try again later
05:50:18:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:9213 run:28 clone:11 gen:4 core:0x21 unit:0x0000000d664f2dd056fb298331e4c962
05:50:18:WU01:FS01:Uploading 37.94MiB to 171.64.65.104
05:50:18:WU01:FS01:Connecting to 171.64.65.104:8080
05:50:19:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
05:50:19:WU01:FS01:Connecting to 171.64.65.104:80
05:50:21:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: No connection could be made because the target machine actively refused it.
05:50:21:WU01:FS01:Trying to send results to collection server
05:50:21:WU01:FS01:Uploading 37.94MiB to 171.65.103.160
I hear that "the server is being worked on" and "p92xx temporarily offline"
I have a suggestion for "active refusals": They might be improved if they failed
before all the data gets re-sent. Seems like a waste of bandwidth.
I don't know how to read the collection server status link
http://fah-web.stanford.edu/logs/171.65 ... 0.log.html:
Code: Select all
"
Sat Sep 29 01:00:10 PDT 2012 171.65.103.160 classic VSPMF93 - accept Accepting 0.00 10 1 321 8800 1100 0 0 0 0 0 - - - - 2 0 - - 0 171.64.122.76
171.67.108.17
- 0 1 - - - - - - - - ; - - - - - - 0 - 0 VSPMF93
Sat Sep 29 02:00:11 PDT 2012 171.65.103.160 classic VSPMF93 - accept Accepting 0.00 20 0 321 8800 1100 1 0 0 0 0 - - - - 4 0 - - 0 171.64.122.76
171.67.108.17
- 0 1 - - - - - - - - ; - - - - - - 0 - 0 VSPMF93
Sat Sep 29 03:00:10 PDT 2012 171.65.103.160 classic VSPMF93 - accept Accepting 0.00 10 1 321 8800 1100 0 0 0 0 0 - - - - 0 0 - - 0 171.64.122.76
171.67.108.17
- 0 1 - - - - - - - - ; - - - - - - 0 - 0 VSPMF93
Sat Sep 29 04:00:10 PDT 2012 171.65.103.160 classic VSPMF93 - accept Accepting 0.00 13 0 374 8800 1100 0 0 0 0 0 - - - - 0 0 - - 0 171.64.122.76
171.67.108.17
- 0 1 - - - - - - - - ; - - - - - - 0 - 0 VSPMF93
Sat Sep 29 05:00:10 PDT 2012 171.65.103.160 classic VSPMF93 - accept Accepting 0.00 17 4 374 8800 1100 -1 0 0 0 0 - - - - 2 0 - - 0 171.64.122.76
171.67.108.17
- 0 1 - - - - - - - - ; - - - - - - 0 - 0 VSPMF93
I'm curious, is it relevant that the collection server seems to think: it's FOUR years ago, or that a "classic" server begin sent a GPU WU?
Mostly, I don't know what shape my patience should be, or when to mention that something weird is happening.
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 1:46 pm
by Joe_H
The link you are using goes to the old logs that were part of the now deprecated old version of the Server Status page. That is probably from the folding client which is a couple years old. Dr. Pande has announced work is being done on an updated version of the client, that should correct such links when released. The current log entries are here -
http://fah-web.stanford.edu/pybeta/logs ... 0.log.html, you can get there by connecting from the
Server Status page.
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 3:36 pm
by bruce
tofuwombat wrote:Four days (94 attempts) of "PLEASE_WAIT" active refusals here. . .
I have a suggestion for "active refusals": They might be improved if they failed before all the data gets re-sent. Seems like a waste of bandwidth.
I agree.
The code that coordinates the CS with the WS is being rewritten.
... but the upload to the Collection Server (CS) is independent of the upload attempted to the Works Server (WS). The CS doesn't know there was an active refusal, only that your client is attempting to upload a WU -- which is successful. Then after receiving the WU, it attempts to verify if the WU is a duplicate of one that was recently uploaded to the WS. In the old version of the server code, the list of acceptable uploads is limited until it can be refreshed by the WS.
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 4:41 pm
by DocJonz
I've just got back from work (UK time!), and having similar upload issues;
Code: Select all
16:30:56:WU02:FS02:Upload 82.20%
16:31:03:WU02:FS02:Upload 93.90%
16:31:10:WU02:FS02:Upload complete
16:31:10:WU02:FS02:Server responded PLEASE_WAIT (464)
16:31:10:WARNING:WU02:FS02:Failed to send results, will try again later
16:31:10:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:9208 run:44 clone:9 gen:4 core:0x21 unit:0x0000000f664f2dd056fb2755cc358773
16:31:10:WU02:FS02:Uploading 37.94MiB to 171.64.65.104
16:31:10:WU02:FS02:Connecting to 171.64.65.104:8080
16:31:10:WARNING:WU02:FS02:WorkServer connection failed on port 8080 trying 80
16:31:10:WU02:FS02:Connecting to 171.64.65.104:80
16:31:10:WARNING:WU02:FS02:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: Connection refused
16:31:10:WU02:FS02:Trying to send results to collection server
16:31:10:WU02:FS02:Uploading 37.94MiB to 171.65.103.160
16:31:10:WU02:FS02:Connecting to 171.65.103.160:8080
16:31:16:WU02:FS02:Upload 6.10%
16:31:22:WU00:FS02:0x21:Completed 2775000 out of 7500000 steps (37%)
16:31:22:WU02:FS02:Upload 11.70%
16:31:28:WU02:FS02:Upload 17.46%
16:31:34:WU02:FS02:Upload 23.72%
16:31:34:WU01:FS01:0x18:Completed 25000 out of 2500000 steps (1%)
16:31:40:WU02:FS02:Upload 30.31%
16:31:46:WU02:FS02:Upload 41.68%
16:31:52:WU02:FS02:Upload 49.75%
16:31:58:WU02:FS02:Upload 57.16%
16:32:04:WU02:FS02:Upload 69.35%
16:32:10:WU02:FS02:Upload 80.72%
16:32:14:WU01:FS01:0x18:Completed 50000 out of 2500000 steps (2%)
16:32:16:WU02:FS02:Upload 92.09%
16:32:20:WU02:FS02:Upload complete
16:32:21:WU02:FS02:Server responded PLEASE_WAIT (464)
16:32:21:WARNING:WU02:FS02:Failed to send results, will try again later
16:32:47:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:9208 run:44 clone:9 gen:4 core:0x21 unit:0x0000000f664f2dd056fb2755cc358773
16:32:47:WU02:FS02:Uploading 37.94MiB to 171.64.65.104
16:32:47:WU02:FS02:Connecting to 171.64.65.104:8080
16:32:47:WARNING:WU02:FS02:WorkServer connection failed on port 8080 trying 80
16:32:47:WU02:FS02:Connecting to 171.64.65.104:80
16:32:47:WARNING:WU02:FS02:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: Connection refused
16:32:47:WU02:FS02:Trying to send results to collection server
16:32:47:WU02:FS02:Uploading 37.94MiB to 171.65.103.160
16:32:47:WU02:FS02:Connecting to 171.65.103.160:8080
16:32:53:WU02:FS02:Upload 8.40%
16:32:55:WU01:FS01:0x18:Completed 75000 out of 2500000 steps (3%)
16:32:59:WU02:FS02:Upload 15.65%
16:33:05:WU02:FS02:Upload 23.89%
16:33:11:WU02:FS02:Upload 35.25%
.......
I've got 4 machines which have a WU status at 'SEND', and they are going nowhere.
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 7:36 pm
by tofuwombat
OK, so I'm looking at the correct link (Thanks Joe_H) with lines like:
Code: Select all
Wed Jul 20 12:00:17 PDT 2016 171.65.103.160 VSPMF93 - classic accept Accepting 0.00 25 0 733 114 0 0 0 0 - - - - - 6 0 - - 1 - 0 1 - - - - - - - - - - -
which seems to mirror the Server Status page.
How can I tell from looking at server stats that there is something wrong?
Also, can a "classic" server accept a GPU WU?
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 10:19 pm
by bruce
The problem is not with 171.65.103.160. It stems back to a problem with 171.64.65.104 (See the title of this topic.) and the inability of the CS to function correctly when it can't contact the original WS.
171.65.103.160 is doing a good job of handling WUs that need to be redirected to other Work Servers so there's noting bad reported on the line you quoted.
Re: Failed to connect to 171.64.65.104:80
Posted: Wed Jul 20, 2016 11:07 pm
by Joe_H
The labelling of servers for Client type has gotten a bit looser, and the labels are not always consistently labeled based on the type of WU they provide. Part of this is that some WS's host both CPU and GPU projects. Another reason is that the distinction between "classic" which was the description for WU's designated for processing on a single CPU core, and SMP is no longer relevant as Core_A4 can do both types of processing.
Re: Failed to connect to 171.64.65.104:80
Posted: Thu Jul 21, 2016 9:20 pm
by kwerboom
This problem started Sunday and still isn't resolved. Is there any ETA on a fix? This server ate one of my WU's back in March and I really don't want to lose another one. Also, this server seems to have reoccurring problems, is there a more permanent fix so we don't have do this tango every three or so months?
Re: Failed to connect to 171.64.65.104:80
Posted: Sat Jul 23, 2016 2:04 pm
by bs_texas
Having the same issue. (Sorry haven't read back through all the previous posts. Just throwing in my bit here.)
Code: Select all
13:45:07:WU02:FS01:Upload 97.99%
13:45:13:WU02:FS01:Upload 99.31%
13:45:17:WU02:FS01:Upload complete
13:45:17:WU02:FS01:Server responded PLEASE_WAIT (464)
13:45:17:WARNING:WU02:FS01:Failed to send results, will try again later
13:45:18:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9210 run:18 clone:6 gen:24 core:0x21 unit:0x0000004c664f2dd056fb27dc2eba629f
13:45:18:WU02:FS01:Uploading 37.95MiB to 171.64.65.104
13:45:18:WU02:FS01:Connecting to 171.64.65.104:8080
13:45:19:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
13:45:19:WU02:FS01:Connecting to 171.64.65.104:80
13:45:21:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: No connection could be made because the target machine actively refused it.
13:45:21:WU02:FS01:Trying to send results to collection server
13:45:21:WU02:FS01:Uploading 37.95MiB to 171.65.103.160
13:45:21:WU02:FS01:Connecting to 171.65.103.160:8080
13:45:27:WU02:FS01:Upload 1.32%
13:45:33:WU02:FS01:Upload 2.80%
13:45:39:WU02:FS01:Upload 4.28%
13:45:45:WU02:FS01:Upload 5.76%
13:45:51:WU02:FS01:Upload 7.25%
13:45:57:WU02:FS01:Upload 8.73%
13:46:03:WU02:FS01:Upload 10.21%
13:46:09:WU02:FS01:Upload 11.69%
13:46:15:WU02:FS01:Upload 13.18%
13:46:21:WU02:FS01:Upload 14.66%
13:46:27:WU02:FS01:Upload 16.14%
13:46:33:WU02:FS01:Upload 17.46%
13:46:39:WU02:FS01:Upload 18.94%
13:46:45:WU02:FS01:Upload 20.42%
13:46:51:WU02:FS01:Upload 21.90%
13:46:57:WU02:FS01:Upload 23.39%
13:47:03:WU02:FS01:Upload 24.87%
13:47:09:WU02:FS01:Upload 26.35%
13:47:15:WU02:FS01:Upload 27.83%
13:47:21:WU02:FS01:Upload 29.32%
13:47:27:WU02:FS01:Upload 30.80%
13:47:33:WU02:FS01:Upload 32.28%
13:47:39:WU02:FS01:Upload 33.60%
13:47:45:WU02:FS01:Upload 34.75%
13:47:51:WU02:FS01:Upload 35.90%
13:47:57:WU02:FS01:Upload 37.22%
13:48:04:WU02:FS01:Upload 38.37%
13:48:10:WU02:FS01:Upload 39.20%
13:48:16:WU02:FS01:Upload 40.19%
13:48:22:WU02:FS01:Upload 41.01%
13:48:29:WU02:FS01:Upload 42.00%
13:48:35:WU02:FS01:Upload 42.99%
13:48:41:WU02:FS01:Upload 43.81%
13:48:48:WU02:FS01:Upload 44.80%
13:48:54:WU02:FS01:Upload 45.79%
13:49:00:WU02:FS01:Upload 46.61%
13:49:06:WU02:FS01:Upload 47.43%
The problem is that it's in an endless loops. Says upload complete, then failed, then starts all over. This is on the Status tab:
How can I stop the endless loop of trying to upload?
Thanks . . .
Edit: Oh wait... this time it is trying to upload to 171.65.103.160. I'll see how that goes.
Nope, now it's trying again. I think it's been doing this for days....
Code: Select all
14:15:14:WU02:FS01:Upload 94.54%
14:15:20:WU02:FS01:Upload 95.36%
14:15:26:WU02:FS01:Upload 96.35%
14:15:32:WU02:FS01:Upload 97.17%
14:15:38:WU02:FS01:Upload 97.99%
14:15:44:WU02:FS01:Upload 98.82%
14:15:51:WU02:FS01:Upload 99.81%
14:15:52:WU02:FS01:Upload complete
14:15:52:WU02:FS01:Server responded PLEASE_WAIT (464)
14:15:52:WARNING:WU02:FS01:Failed to send results, will try again later
14:15:53:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9210 run:18 clone:6 gen:24 core:0x21 unit:0x0000004c664f2dd056fb27dc2eba629f
14:15:53:WU02:FS01:Uploading 37.95MiB to 171.64.65.104
14:15:53:WU02:FS01:Connecting to 171.64.65.104:8080
14:15:54:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
14:15:54:WU02:FS01:Connecting to 171.64.65.104:80
14:15:56:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: No connection could be made because the target machine actively refused it.
14:15:56:WU02:FS01:Trying to send results to collection server
14:15:56:WU02:FS01:Uploading 37.95MiB to 171.65.103.160
14:15:56:WU02:FS01:Connecting to 171.65.103.160:8080
14:16:02:WU02:FS01:Upload 1.32%
Re: Failed to connect to 171.64.65.104:80
Posted: Sat Jul 23, 2016 5:14 pm
by tofuwombat
Feeling similar pain and frustration here.
Noticed yesterday that the number of attempts field rolls over in the low triple digits,
caught it going from 103 to 11 over the course of a few hours.
WU has been done for a week. Bandwidth wasted. Points bleeding away. will be automatically discarded in three days.
Not fun being insulted by someone else's infinite loop, with no good outcome in sight.
Would be much less frustrating if we could:
-- get credit for the work as if the collection server wasn't FUBAR
-- manually redirect the finished work to somewhere it could be useful
-- have any useful option
-- etc.
Even better if the credit/redirection would happen automagically. [which attempt has probably sponsored our infinite loop]
Next time instead of trying to be useful and mention the fault,
it will be very tempting to delete the orphaned WU, stay silent and move on.
Re: Failed to connect to 171.64.65.104:80
Posted: Sat Jul 23, 2016 7:12 pm
by foldy
Same here:
19:04:30:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80
ping 171.64.65.104 is fine
Server Status says
48 171.64.65.104 vspg14b jadeshi GPU full Reject
Re: Failed to connect to 171.64.65.104:80
Posted: Sat Jul 23, 2016 9:46 pm
by bs_texas
tofuwombat wrote:
it will be very tempting to delete the orphaned WU, stay silent and move on.
I'd just like to this... if I knew how.
Re: Failed to connect to 171.64.65.104:80
Posted: Sat Jul 23, 2016 10:33 pm
by foldy
You can delete the hanging work unit. Here is a screenshot of my case.
https://postimg.org/image/ykelruplj/
1) First find the work unit number, in my case 00.
2) Then on Windows go to C:\ProgramData\FAHClient\work or %appdata%\FAHClient\work and on linux go to /var/lib/fahclient/work
3) Close folding@home client and then delete the work unit number folder, in my case 00.
4) Start folding@home client again.
Re: Failed to connect to 171.64.65.104:80
Posted: Sat Jul 23, 2016 10:47 pm
by bs_texas
^ hey thanks... But, but I had to search around and found it was actually in C:\Users\bs\AppData\Roaming\FAHClient\work.