Page 2 of 5

Re: Failed to connect to 171.64.65.104:80

Posted: Thu Mar 17, 2016 11:53 pm
by Joe_H
Why the remaining Core_15 projects are available, you opt in by adding the following settings to the GPU slot:

max-packet-size = small

client-type - beta

The WU's are not actually beta, this combination of settings is just used to indicate a willingness to process them. Core_15 is end of life, so once these projects finish up no further projects will be using it. These projects also only get the base points indicated in the Project Summary page, no QRB.

Re: Failed to connect to 171.64.65.104:80

Posted: Thu Mar 17, 2016 11:56 pm
by DarkFoss
Great to see it's up and running . Unfortunately server didn't like the results

Code: Select all

06:53:27:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9211 run:2 clone:11 gen:22 core:0x21 unit:0x00000075664f2dd055ee292a0f6788f6
06:53:28:WU01:FS00:Uploading 17.50MiB to 171.64.65.104
06:53:28:WU01:FS00:Connecting to 171.64.65.104:8080
06:53:34:WU01:FS00:Upload 27.86%
06:53:40:WU01:FS00:Upload 52.50%
06:53:46:WU01:FS00:Upload 77.14%
06:53:53:WU01:FS00:Upload complete
06:53:53:WU01:FS00:Server responded WORK_QUIT (404)
06:53:53:WARNING:WU01:FS00:Server did not like results, dumping
06:53:53:WU01:FS00:Cleaning up

Re: Failed to connect to 171.64.65.104:80

Posted: Fri Mar 18, 2016 12:48 am
by bruce
It's not obvious at all. The concept was adopted without requiring a new client version so it is, in fact, obscure. You have to make two specific settings:

client-type=beta
&
max-packet-size=small

The assignments are not necessarily small and they're certainly not beta projects.

Re: Failed to connect to 171.64.65.104:80

Posted: Sun Jul 17, 2016 9:41 pm
by PS3EdOlkkola
Looks like this collection server is having issues accepting work units:

Code: Select all

21:24:14:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:9209 run:53 clone:1 gen:44 core:0x21 unit:0x0000005a664f2dd056fb27b4f3e98df7
21:24:14:WU00:FS01:Uploading 37.95MiB to 171.64.65.104
21:24:14:WU00:FS01:Connecting to 171.64.65.104:8080
21:24:16:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
21:24:16:WU00:FS01:Connecting to 171.64.65.104:80
21:24:17:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: No connection could be made because the target machine actively refused it.
21:24:17:WU00:FS01:Trying to send results to collection server
21:24:17:WU00:FS01:Uploading 37.95MiB to 171.65.103.160
21:24:17:WU00:FS01:Connecting to 171.65.103.160:8080
21:24:23:WU00:FS01:Upload 0.82%
21:24:29:WU00:FS01:Upload 1.48%
21:24:35:WU00:FS01:Upload 2.47%
.
.
21:35:09:WU00:FS01:Upload 99.47%
21:35:14:WU00:FS01:Upload complete
21:35:14:WU00:FS01:Server responded PLEASE_WAIT (464)
21:35:14:WARNING:WU00:FS01:Failed to send results, will try again later
Has repeated the above sequence 5 times so far. Other GPU slots on the same machine have uploaded to other collection servers without any issues. Looking at that particular server status, it's configured to accept "classic" not GPU, but Status and Collect are "accept" and "Accepting", respectively.

Can someone please take a look?

Re: Failed to connect to 171.64.65.104:80

Posted: Sun Jul 17, 2016 9:54 pm
by bruce
@PS3EdOlkkola
Please scroll back to the point at which project:9209 run:53 clone:1 gen:44 reached 100%. (If FAH has been restarted, you may need to look in the "logs" subdirectory of FAH's data directory.

I need to see the FIRST time or two that it tried to upload.

I believe this is a bug in the server code but I've never gotten enough information to prove that to Development.

Re: Failed to connect to 171.64.65.104:80

Posted: Sun Jul 17, 2016 10:36 pm
by PS3EdOlkkola
Thanks bruce. Below are the first few lines the first time the wu attempted an upload

Code: Select all

20:23:26:WU00:FS01:0x21:Completed 2500000 out of 2500000 steps (100%)
20:23:30:WU00:FS01:0x21:Saving result file logfile_01.txt
20:23:30:WU00:FS01:0x21:Saving result file checkpointState.xml
20:23:30:WU00:FS01:0x21:Saving result file checkpt.crc
20:23:30:WU00:FS01:0x21:Saving result file log.txt
20:23:30:WU00:FS01:0x21:Saving result file positions.xtc
20:23:30:WU00:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
20:23:31:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
20:23:31:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:9209 run:53 clone:1 gen:44 core:0x21 unit:0x0000005a664f2dd056fb27b4f3e98df7
20:23:31:WU00:FS01:Uploading 37.95MiB to 171.64.65.104
20:23:31:WU00:FS01:Connecting to 171.64.65.104:8080
20:23:50:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
20:23:50:WU00:FS01:Connecting to 171.64.65.104:80
20:23:51:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: No connection could be made because the target machine actively refused it.
20:23:51:WU00:FS01:Trying to send results to collection server
20:23:51:WU00:FS01:Uploading 37.95MiB to 171.65.103.160
20:23:51:WU00:FS01:Connecting to 171.65.103.160:8080
20:23:57:WU00:FS01:Upload 1.48%
20:24:03:WU00:FS01:Upload 2.80%
20:24:09:WU00:FS01:Upload 3.95%
....
20:33:04:WU00:FS01:Upload 99.63%
20:33:07:WU00:FS01:Upload complete
20:33:07:WU00:FS01:Server responded PLEASE_WAIT (464)
20:33:07:WARNING:WU00:FS01:Failed to send results, will try again later
20:33:07:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:9209 run:53 clone:1 gen:44 core:0x21 unit:0x0000005a664f2dd056fb27b4f3e98df7
From what I can tell, the initial connection attempts to connect to 171.64.65.104:8080, which immediately fails, then attempts to connect to the same IP address on port 80, which is also refused by the server. It then picks a different IP address:port at 171.65.103.160:8080 and proceeds to upload to that address until the Server at that address responds with PLEASE_WAIT (464) and then immediately fails. I think the error is when the GPU collection server 171.64.65.104 Status is "full" and the Connect is "reject" (which it currently is indicating), instead of failing over to another GPU collection server, it fails over to a "classic" server (171.65.103.160:8080) which can't interpret the wu (looking for SMP and gets a GPU unit).

Let me know if you need anything else.

Server 171.65.103.160

Posted: Sun Jul 17, 2016 11:58 pm
by RABishop
I have two jobs on two different computers from which the above server is not accepting the "send" product. On the computer I am using to post this, the same GPU is working a new job that is about 27.6% finished, with 3 hours and 23 minutes to go. My math here means it has been waiting at his point about 40 minutes. Hard to get exact, since it keeps changing. The other system, at last glance has 5 hours 28 minutes remaining, and is only 39% done. Which could mean it has been waiting much longer: about 4.25 hours into waiting. I won't bother copying the log from the other system, since that is a bunch of work. Both machines are connected through the same wired network to the internet, and both have no trouble on Firefox pulling up my homepage. So it isn't my network.

This is about an hour of the stuff from the log on this machine:

Code: Select all

22:44:53:WU04:FS02:Uploading 37.95MiB to 171.64.65.104
22:44:53:WU04:FS02:Connecting to 171.64.65.104:8080
22:44:54:WARNING:WU04:FS02:WorkServer connection failed on port 8080 trying 80
22:44:54:WU04:FS02:Connecting to 171.64.65.104:80
22:44:55:WARNING:WU04:FS02:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: Connection refused
22:44:55:WU04:FS02:Trying to send results to collection server
22:44:55:WU04:FS02:Uploading 37.95MiB to 171.65.103.160
22:44:55:WU04:FS02:Connecting to 171.65.103.160:8080
22:45:01:WU04:FS02:Upload 9.06%
22:45:07:WU04:FS02:Upload 17.13%
22:45:13:WU04:FS02:Upload 25.03%
22:45:19:WU04:FS02:Upload 33.27%
22:45:25:WU04:FS02:Upload 41.34%
22:45:31:WU04:FS02:Upload 49.41%
22:45:37:WU04:FS02:Upload 57.48%
22:45:43:WU04:FS02:Upload 65.55%
22:45:49:WU04:FS02:Upload 73.62%
22:45:50:WU02:FS00:0xa4:Completed 56000 out of 160000 steps  (35%)
22:45:55:WU04:FS02:Upload 81.69%
22:46:01:WU04:FS02:Upload 89.27%
22:46:07:WU04:FS02:Upload 97.34%
22:46:09:WU04:FS02:Upload complete
22:46:10:WU04:FS02:Server responded PLEASE_WAIT (464)
22:46:10:WARNING:WU04:FS02:Failed to send results, will try again later
22:46:26:WU01:FS02:0x21:Completed 220000 out of 2000000 steps (11%)
22:46:30:WU00:FS03:0x21:Completed 4600000 out of 5000000 steps (92%)
22:47:05:WU03:FS01:0x21:Completed 530000 out of 1000000 steps (53%)
22:47:07:WU02:FS00:0xa4:Completed 57600 out of 160000 steps  (36%)
22:48:23:WU02:FS00:0xa4:Completed 59200 out of 160000 steps  (37%)
22:49:09:WU01:FS02:0x21:Completed 240000 out of 2000000 steps (12%)
22:49:32:WU02:FS00:0xa4:Completed 60800 out of 160000 steps  (38%)
22:50:02:WU00:FS03:0x21:Completed 4650000 out of 5000000 steps (93%)
22:50:12:WU03:FS01:0x21:Completed 540000 out of 1000000 steps (54%)
22:50:41:WU02:FS00:0xa4:Completed 62400 out of 160000 steps  (39%)
22:51:49:WU02:FS00:0xa4:Completed 64000 out of 160000 steps  (40%)
22:51:59:WU01:FS02:0x21:Completed 260000 out of 2000000 steps (13%)
22:52:58:WU02:FS00:0xa4:Completed 65600 out of 160000 steps  (41%)
22:53:16:WU03:FS01:0x21:Completed 550000 out of 1000000 steps (55%)
22:53:32:WU00:FS03:0x21:Completed 4700000 out of 5000000 steps (94%)
22:54:08:WU02:FS00:0xa4:Completed 67200 out of 160000 steps  (42%)
22:54:43:WU01:FS02:0x21:Completed 280000 out of 2000000 steps (14%)
22:55:17:WU02:FS00:0xa4:Completed 68800 out of 160000 steps  (43%)
22:56:26:WU02:FS00:0xa4:Completed 70400 out of 160000 steps  (44%)
22:56:30:WU03:FS01:0x21:Completed 560000 out of 1000000 steps (56%)
22:57:02:WU00:FS03:0x21:Completed 4750000 out of 5000000 steps (95%)
22:57:26:WU01:FS02:0x21:Completed 300000 out of 2000000 steps (15%)
22:57:35:WU02:FS00:0xa4:Completed 72000 out of 160000 steps  (45%)
22:58:44:WU02:FS00:0xa4:Completed 73600 out of 160000 steps  (46%)
22:59:35:WU03:FS01:0x21:Completed 570000 out of 1000000 steps (57%)
22:59:53:WU02:FS00:0xa4:Completed 75200 out of 160000 steps  (47%)
23:00:10:WU01:FS02:0x21:Completed 320000 out of 2000000 steps (16%)
23:00:33:WU00:FS03:0x21:Completed 4800000 out of 5000000 steps (96%)
23:01:02:WU02:FS00:0xa4:Completed 76800 out of 160000 steps  (48%)
23:02:11:WU02:FS00:0xa4:Completed 78400 out of 160000 steps  (49%)
23:02:40:WU03:FS01:0x21:Completed 580000 out of 1000000 steps (58%)
23:02:49:WU04:FS02:Sending unit results: id:04 state:SEND error:NO_ERROR project:9209 run:41 clone:7 gen:14 core:0x21 unit:0x0000002e664f2dd056fb27a1d1bc4ae7
23:02:49:WU04:FS02:Uploading 37.95MiB to 171.64.65.104
23:02:49:WU04:FS02:Connecting to 171.64.65.104:8080
23:02:50:WARNING:WU04:FS02:WorkServer connection failed on port 8080 trying 80
23:02:50:WU04:FS02:Connecting to 171.64.65.104:80
23:02:52:WARNING:WU04:FS02:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: Connection refused
23:02:52:WU04:FS02:Trying to send results to collection server
23:02:52:WU04:FS02:Uploading 37.95MiB to 171.65.103.160
23:02:52:WU04:FS02:Connecting to 171.65.103.160:8080
23:02:54:WU01:FS02:0x21:Completed 340000 out of 2000000 steps (17%)
23:02:58:WU04:FS02:Upload 9.06%
23:03:04:WU04:FS02:Upload 17.13%
23:03:10:WU04:FS02:Upload 25.20%
23:03:16:WU04:FS02:Upload 33.27%
23:03:20:WU02:FS00:0xa4:Completed 80000 out of 160000 steps  (50%)
23:03:22:WU04:FS02:Upload 41.50%
23:03:28:WU04:FS02:Upload 49.58%
23:03:34:WU04:FS02:Upload 57.65%
23:03:40:WU04:FS02:Upload 65.72%
23:03:46:WU04:FS02:Upload 73.79%
23:03:52:WU04:FS02:Upload 82.02%
23:03:58:WU04:FS02:Upload 90.09%
23:04:03:WU00:FS03:0x21:Completed 4850000 out of 5000000 steps (97%)
23:04:04:WU04:FS02:Upload 98.16%
23:04:06:WU04:FS02:Upload complete
23:04:06:WU04:FS02:Server responded PLEASE_WAIT (464)
23:04:06:WARNING:WU04:FS02:Failed to send results, will try again later
23:04:30:WU02:FS00:0xa4:Completed 81600 out of 160000 steps  (51%)
23:05:37:WU01:FS02:0x21:Completed 360000 out of 2000000 steps (18%)
23:05:38:WU02:FS00:0xa4:Completed 83200 out of 160000 steps  (52%)
23:05:44:WU03:FS01:0x21:Completed 590000 out of 1000000 steps (59%)
23:06:48:WU02:FS00:0xa4:Completed 84800 out of 160000 steps  (53%)
23:07:35:WU00:FS03:0x21:Completed 4900000 out of 5000000 steps (98%)
23:08:03:WU02:FS00:0xa4:Completed 86400 out of 160000 steps  (54%)
23:08:28:WU01:FS02:0x21:Completed 380000 out of 2000000 steps (19%)
23:08:49:WU03:FS01:0x21:Completed 600000 out of 1000000 steps (60%)
23:09:18:WU02:FS00:0xa4:Completed 88000 out of 160000 steps  (55%)
23:10:30:WU02:FS00:0xa4:Completed 89600 out of 160000 steps  (56%)
23:11:04:WU00:FS03:0x21:Completed 4950000 out of 5000000 steps (99%)
23:11:12:WU01:FS02:0x21:Completed 400000 out of 2000000 steps (20%)
23:11:43:WU02:FS00:0xa4:Completed 91200 out of 160000 steps  (57%)
23:12:04:WU03:FS01:0x21:Completed 610000 out of 1000000 steps (61%)
23:12:53:WU02:FS00:0xa4:Completed 92800 out of 160000 steps  (58%)
23:13:55:WU01:FS02:0x21:Completed 420000 out of 2000000 steps (21%)
23:14:06:WU02:FS00:0xa4:Completed 94400 out of 160000 steps  (59%)
23:14:33:WU00:FS03:0x21:Completed 5000000 out of 5000000 steps (100%)
23:14:34:WU05:FS03:Connecting to 171.67.108.45:80
23:14:34:WU05:FS03:Assigned to work server 171.64.65.92
23:14:34:WU05:FS03:Requesting new work unit for slot 03: RUNNING gpu:2:GM200 [GeForce GTX 980 Ti] from 171.64.65.92
23:14:34:WU05:FS03:Connecting to 171.64.65.92:8080
23:14:35:WU05:FS03:Downloading 2.85MiB
23:14:35:WU00:FS03:0x21:Saving result file logfile_01.txt
23:14:35:WU00:FS03:0x21:Saving result file checkpointState.xml
23:14:35:WU05:FS03:Download complete
23:14:36:WU05:FS03:Received Unit: id:05 state:DOWNLOAD error:NO_ERROR project:9162 run:163 clone:0 gen:352 core:0x18 unit:0x00000194ab40415c56748147b6c09d56
23:14:37:WU00:FS03:0x21:Saving result file checkpt.crc
23:14:37:WU00:FS03:0x21:Saving result file log.txt
23:14:37:WU00:FS03:0x21:Saving result file positions.xtc
23:14:38:WU00:FS03:FahCore returned: FINISHED_UNIT (100 = 0x64)
23:14:38:WU00:FS03:Sending unit results: id:00 state:SEND error:NO_ERROR project:11423 run:5 clone:40 gen:8 core:0x21 unit:0x0000000f8ca304f1571a748ccdcc2acf
23:14:38:WU00:FS03:Uploading 9.35MiB to 140.163.4.241
23:14:38:WU00:FS03:Connecting to 140.163.4.241:8080
23:14:38:WU05:FS03:Starting
23:14:38:WU05:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 05 -suffix 01 -version 704 -lifeline 1812 -checkpoint 30 -gpu 2 -gpu-vendor nvidia
23:14:38:WU05:FS03:Started FahCore on PID 13340
23:14:38:WU05:FS03:Core PID:13344
23:14:38:WU05:FS03:FahCore 0x18 started
23:14:39:WU05:FS03:0x18:*********************** Log Started 2016-07-17T23:14:38Z ***********************
23:14:39:WU05:FS03:0x18:Project: 9162 (Run 163, Clone 0, Gen 352)
23:14:39:WU05:FS03:0x18:Unit: 0x00000194ab40415c56748147b6c09d56
23:14:39:WU05:FS03:0x18:CPU: 0x00000000000000000000000000000000
23:14:39:WU05:FS03:0x18:Machine: 3
23:14:39:WU05:FS03:0x18:Reading tar file core.xml
23:14:39:WU05:FS03:0x18:Reading tar file system.xml
23:14:39:WU05:FS03:0x18:Reading tar file integrator.xml
23:14:39:WU05:FS03:0x18:Reading tar file state.xml
23:14:39:WU05:FS03:0x18:Digital signatures verified
23:14:39:WU05:FS03:0x18:Folding@home GPU core18
23:14:39:WU05:FS03:0x18:Version 0.0.4
23:14:44:WU00:FS03:Upload 39.45%
23:14:50:WU00:FS03:Upload 76.22%
23:14:51:WU05:FS03:0x18:Completed 0 out of 2500000 steps (0%)
23:14:51:WU05:FS03:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:15:04:WU00:FS03:Upload complete
23:15:04:WU00:FS03:Server responded WORK_ACK (400)
23:15:04:WU00:FS03:Final credit estimate, 68114.00 points
23:15:04:WU00:FS03:Cleaning up
23:15:11:WU03:FS01:0x21:Completed 620000 out of 1000000 steps (62%)
23:15:17:WU02:FS00:0xa4:Completed 96000 out of 160000 steps  (60%)
23:16:29:WU05:FS03:0x18:Completed 25000 out of 2500000 steps (1%)
23:16:31:WU02:FS00:0xa4:Completed 97600 out of 160000 steps  (61%)
23:16:39:WU01:FS02:0x21:Completed 440000 out of 2000000 steps (22%)
23:17:41:WU02:FS00:0xa4:Completed 99200 out of 160000 steps  (62%)
23:18:03:WU05:FS03:0x18:Completed 50000 out of 2500000 steps (2%)
23:18:16:WU03:FS01:0x21:Completed 630000 out of 1000000 steps (63%)
23:18:53:WU02:FS00:0xa4:Completed 100800 out of 160000 steps  (63%)
23:19:22:WU01:FS02:0x21:Completed 460000 out of 2000000 steps (23%)
23:19:37:WU05:FS03:0x18:Completed 75000 out of 2500000 steps (3%)
23:20:02:WU02:FS00:0xa4:Completed 102400 out of 160000 steps  (64%)
23:21:11:WU02:FS00:0xa4:Completed 104000 out of 160000 steps  (65%)
23:21:12:WU05:FS03:0x18:Completed 100000 out of 2500000 steps (4%)
23:21:21:WU03:FS01:0x21:Completed 640000 out of 1000000 steps (64%)
23:22:06:WU01:FS02:0x21:Completed 480000 out of 2000000 steps (24%)
23:22:22:WU02:FS00:0xa4:Completed 105600 out of 160000 steps  (66%)
23:22:50:WU05:FS03:0x18:Completed 125000 out of 2500000 steps (5%)
23:23:31:WU02:FS00:0xa4:Completed 107200 out of 160000 steps  (67%)
23:24:24:WU05:FS03:0x18:Completed 150000 out of 2500000 steps (6%)
23:24:25:WU03:FS01:0x21:Completed 650000 out of 1000000 steps (65%)
23:24:41:WU02:FS00:0xa4:Completed 108800 out of 160000 steps  (68%)
23:24:50:WU01:FS02:0x21:Completed 500000 out of 2000000 steps (25%)
23:25:54:WU02:FS00:0xa4:Completed 110400 out of 160000 steps  (69%)
23:25:58:WU05:FS03:0x18:Completed 175000 out of 2500000 steps (7%)
23:27:08:WU02:FS00:0xa4:Completed 112000 out of 160000 steps  (70%)
23:27:32:WU05:FS03:0x18:Completed 200000 out of 2500000 steps (8%)
23:27:40:WU03:FS01:0x21:Completed 660000 out of 1000000 steps (66%)
23:27:40:WU01:FS02:0x21:Completed 520000 out of 2000000 steps (26%)
23:28:20:WU02:FS00:0xa4:Completed 113600 out of 160000 steps  (71%)
23:29:10:WU05:FS03:0x18:Completed 225000 out of 2500000 steps (9%)
23:29:30:WU02:FS00:0xa4:Completed 115200 out of 160000 steps  (72%)
23:30:24:WU01:FS02:0x21:Completed 540000 out of 2000000 steps (27%)
23:30:41:WU02:FS00:0xa4:Completed 116800 out of 160000 steps  (73%)
23:30:45:WU05:FS03:0x18:Completed 250000 out of 2500000 steps (10%)
23:30:45:WU03:FS01:0x21:Completed 670000 out of 1000000 steps (67%)
23:31:52:WU04:FS02:Sending unit results: id:04 state:SEND error:NO_ERROR project:9209 run:41 clone:7 gen:14 core:0x21 unit:0x0000002e664f2dd056fb27a1d1bc4ae7
23:31:52:WU04:FS02:Uploading 37.95MiB to 171.64.65.104
23:31:52:WU04:FS02:Connecting to 171.64.65.104:8080
23:31:53:WARNING:WU04:FS02:WorkServer connection failed on port 8080 trying 80
23:31:53:WU04:FS02:Connecting to 171.64.65.104:80
23:31:53:WU02:FS00:0xa4:Completed 118400 out of 160000 steps  (74%)
23:31:54:WARNING:WU04:FS02:Exception: Failed to send results to work server: Failed to connect to 171.64.65.104:80: Connection refused
23:31:54:WU04:FS02:Trying to send results to collection server
23:31:54:WU04:FS02:Uploading 37.95MiB to 171.65.103.160
23:31:54:WU04:FS02:Connecting to 171.65.103.160:8080
23:32:00:WU04:FS02:Upload 3.13%
23:32:06:WU04:FS02:Upload 8.73%
23:32:12:WU04:FS02:Upload 16.96%
23:32:18:WU04:FS02:Upload 24.71%
23:32:19:WU05:FS03:0x18:Completed 275000 out of 2500000 steps (11%)
23:32:24:WU04:FS02:Upload 33.11%
23:32:30:WU04:FS02:Upload 40.85%
23:32:36:WU04:FS02:Upload 49.25%
23:32:42:WU04:FS02:Upload 56.99%
23:32:48:WU04:FS02:Upload 65.22%
23:32:54:WU04:FS02:Upload 72.96%
23:33:00:WU04:FS02:Upload 81.36%
23:33:06:WU04:FS02:Upload 89.10%
23:33:06:WU02:FS00:0xa4:Completed 120000 out of 160000 steps  (75%)
23:33:07:WU01:FS02:0x21:Completed 560000 out of 2000000 steps (28%)
23:33:12:WU04:FS02:Upload 97.34%
23:33:15:WU04:FS02:Upload complete
23:33:15:WU04:FS02:Server responded PLEASE_WAIT (464)
23:33:15:WARNING:WU04:FS02:Failed to send results, will try again later
23:33:50:WU03:FS01:0x21:Completed 680000 out of 1000000 steps (68%)
23:33:53:WU05:FS03:0x18:Completed 300000 out of 2500000 steps (12%)
23:34:17:WU02:FS00:0xa4:Completed 121600 out of 160000 steps  (76%)
23:35:27:WU02:FS00:0xa4:Completed 123200 out of 160000 steps  (77%)
23:35:31:WU05:FS03:0x18:Completed 325000 out of 2500000 steps (13%)
23:35:51:WU01:FS02:0x21:Completed 580000 out of 2000000 steps (29%)
23:36:37:WU02:FS00:0xa4:Completed 124800 out of 160000 steps  (78%)
23:36:55:WU03:FS01:0x21:Completed 690000 out of 1000000 steps (69%)
23:37:06:WU05:FS03:0x18:Completed 350000 out of 2500000 steps (14%)
23:37:47:WU02:FS00:0xa4:Completed 126400 out of 160000 steps  (79%)
23:38:35:WU01:FS02:0x21:Completed 600000 out of 2000000 steps (30%)
23:38:39:WU05:FS03:0x18:Completed 375000 out of 2500000 steps (15%)
23:38:57:WU02:FS00:0xa4:Completed 128000 out of 160000 steps  (80%)
23:39:59:WU03:FS01:0x21:Completed 700000 out of 1000000 steps (70%)
23:40:10:WU02:FS00:0xa4:Completed 129600 out of 160000 steps  (81%)
23:40:13:WU05:FS03:0x18:Completed 400000 out of 2500000 steps (16%)
23:41:19:WU01:FS02:0x21:Completed 620000 out of 2000000 steps (31%)
23:41:21:WU02:FS00:0xa4:Completed 131200 out of 160000 steps  (82%)
23:41:51:WU05:FS03:0x18:Completed 425000 out of 2500000 steps (17%)
23:42:34:WU02:FS00:0xa4:Completed 132800 out of 160000 steps  (83%)
23:43:16:WU03:FS01:0x21:Completed 710000 out of 1000000 steps (71%)
23:43:25:WU05:FS03:0x18:Completed 450000 out of 2500000 steps (18%)
23:43:46:WU02:FS00:0xa4:Completed 134400 out of 160000 steps  (84%)
23:44:09:WU01:FS02:0x21:Completed 640000 out of 2000000 steps (32%)
23:44:59:WU02:FS00:0xa4:Completed 136000 out of 160000 steps  (85%)
23:44:59:WU05:FS03:0x18:Completed 475000 out of 2500000 steps (19%)
23:46:11:WU02:FS00:0xa4:Completed 137600 out of 160000 steps  (86%)
I notice it's saying that 171.64.65.104, which is the work server involved. It appears to be trying to send the results back to the work server instead of the collection server, which is the number 171.65.103.160 listed above. Is that right? Shouldn't it go back to the collection server? I just checked, and the same thing is happening on the other system. Work server is 171.64.65.104, with collection server 171.65.103.160.

Mod edit: added Code tags to log, and merged with existing topic on WS 171.64.65.104 - j

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 12:24 am
by Joe_H
All Wu's first are returned by preference to the WS that assigned them. Only if the return fails to the WS is the CS used. Many projects do not have a designated CS, the only return to the WS. Those will show a 0.0.0.0 address in the CS field.

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 4:56 am
by 7im
Is that server message PLEASE WAIT a new type of message?

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 5:11 am
by PS3EdOlkkola
I now have 6 rigs unable to connect to the proper collection server. The collection server each of my rigs are supposed to connect to has a Status of "full" and a Connect condition of "reject". All of these are big work units, with points between 150,000 and 175,000 each. Jadeshi, who is responsible for the full servers, needs to get their act together and free up some space to accept work units. Unless he/she does it soon, these processed work units will time out and the whole project will be set back, not the least of which is over a million points in one day that gets flushed.

Seriously, is it really that hard to monitor your own servers to ensure they are configured correctly and maintained? Yes, I'm getting annoyed, and rightly so.

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 5:23 am
by bruce
I'd be annoyed, too, but Stanford has never had very good coverage on weekends. I don't know how many problems I've seen that were fixed on a Monday morning. I do see a lot of servers which have a status of either DOWN or Reject.

The Collection Servers can only recover a certain percentage of WUs associated with a Work Server that' has failed (and that often results in a PLEASE WAIT message). My hunch is that the problem with the CS will go away once the WS is back on-line.

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 9:18 am
by Simplex0
Same problem here

"

Code: Select all

08:03:22:WU00:FS02:Uploading 37.94MiB to 171.64.65.104
08:03:22:WU00:FS02:Connecting to 171.64.65.104:8080
08:03:24:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
08:03:24:WU00:FS02:Connecting to 171.64.65.104:80
08:03:32:WU00:FS02:Upload 2.31%
08:03:38:WU00:FS02:Upload 4.12%
08:03:44:WU00:FS02:Upload 5.93%
08:03:50:WU00:FS02:Upload 7.91%
08:03:56:WU00:FS02:Upload 9.72%
08:04:02:WU00:FS02:Upload 11.53%
08:04:08:WU00:FS02:Upload 13.34%
08:04:14:WU00:FS02:Upload 15.15%
08:04:20:WU00:FS02:Upload 17.13%
08:04:26:WU00:FS02:Upload 18.94%
08:04:26:WU04:FS01:0x21:Completed 12800 out of 640000 steps (2%)
08:04:32:WU00:FS02:Upload 20.76%
08:04:38:WU00:FS02:Upload 22.57%
08:04:44:WU00:FS02:Upload 24.38%
08:04:50:WU00:FS02:Upload 26.36%
08:04:56:WU00:FS02:Upload 27.84%
08:05:02:WU00:FS02:Upload 29.65%
08:05:08:WU00:FS02:Upload 31.46%
08:05:08:WU01:FS02:0x21:Completed 1400000 out of 2500000 steps (56%)
08:05:14:WU00:FS02:Upload 33.27%
08:05:20:WU00:FS02:Upload 35.25%
08:05:26:WU00:FS02:Upload 37.06%
08:05:32:WU00:FS02:Upload 38.87%
08:05:38:WU00:FS02:Upload 40.69%
08:05:44:WU00:FS02:Upload 42.50%
08:05:50:WU00:FS02:Upload 44.48%
08:05:56:WU00:FS02:Upload 46.12%
08:06:02:WU00:FS02:Upload 48.10%
08:06:08:WU00:FS02:Upload 49.91%
08:06:14:WU00:FS02:Upload 51.72%
08:06:20:WU00:FS02:Upload 53.54%
08:06:26:WU00:FS02:Upload 55.51%
08:06:32:WU00:FS02:Upload 57.32%
08:06:38:WU00:FS02:Upload 59.14%
08:06:44:WU00:FS02:Upload 60.95%
08:06:48:WU04:FS01:0x21:Completed 19200 out of 640000 steps (3%)
08:06:50:WU00:FS02:Upload 62.92%
08:06:56:WU00:FS02:Upload 64.74%
08:07:02:WU00:FS02:Upload 66.55%
08:07:08:WU00:FS02:Upload 68.36%
08:07:14:WU00:FS02:Upload 70.17%
08:07:21:WU00:FS02:Upload 72.15%
08:07:27:WU00:FS02:Upload 74.29%
08:07:33:WU00:FS02:Upload 76.10%
08:07:38:WU01:FS02:0x21:Completed 1425000 out of 2500000 steps (57%)
08:07:39:WU00:FS02:Upload 77.91%
08:07:45:WU00:FS02:Upload 79.73%
08:07:51:WU00:FS02:Upload 81.54%
08:07:57:WU00:FS02:Upload 83.35%
08:08:03:WU00:FS02:Upload 85.33%
08:08:09:WU00:FS02:Upload 87.14%
08:08:15:WU00:FS02:Upload 88.95%
08:08:21:WU00:FS02:Upload 90.76%
08:08:27:WU00:FS02:Upload 92.57%
08:08:33:WU00:FS02:Upload 94.55%
08:08:39:WU00:FS02:Upload 96.36%
08:08:45:WU00:FS02:Upload 98.01%
08:08:51:WU00:FS02:Upload 99.82%
08:08:52:WU00:FS02:Upload complete
08:08:52:WU00:FS02:Server responded PLEASE_WAIT (464)
08:08:52:WARNING:WU00:FS02:Failed to send results, will try again later
"

Mode edit: added Code tags to log file

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 3:05 pm
by tofuwombat
bruce wrote:I'd be annoyed, too, but Stanford has never had very good coverage on weekends. I don't know how many problems I've seen that were fixed on a Monday morning. I do see a lot of servers which have a status of either DOWN or Reject.

The Collection Servers can only recover a certain percentage of WUs associated with a Work Server that' has failed (and that often results in a PLEASE WAIT message). My hunch is that the problem with the CS will go away once the WS is back on-line.
Having similar "PLEASE WAIT" issue.

Thanks for this insight.

I will wait.

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 4:21 pm
by Simplex0
Does this means that the servers hard drives is full?
"
Mon Jul 18 09:00:25 PDT 2016 171.64.65.104 vspg14b jadeshi GPU full Reject 0.00 0 0 50541 9913 -1563 0 0 0 - - - - - 0 0 - - 1 - 0 0 WL; WL; 10000, 10000 7.0, 7.0 - 49, 49 64, 64 - - 2, 1 B, B 8080G, 8080G
"

http://fah-web.stanford.edu/pybeta/logs ... 4.log.html

Re: Failed to connect to 171.64.65.104:80

Posted: Mon Jul 18, 2016 4:54 pm
by Joe_H
Maybe. or it might be another problem with the WS. The "full" in that status line refers to whether the server will both assign and receive WU's, the "Reject" is that currently it was not accepting connections but should be. Currently the status is "standby" and "Not accept".