Page 2 of 3

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 1:38 am
by n1np
I am shutting my machines down for now, as I have seen more WU dumped.

I also want to make a comment about what I am seeing that I had not noted before. Many work units will show up to "Upload 100%", but never receive an acknowledgement from the server that they were uploaded or credited. Maybe they are then not released from the work queue and try to upload again (just speculating on that part)?

*EDIT*

Adding an example of a machine I want to shut down but the client will not exit because it has not received an ACK from the server (WU01):

Code: Select all

02:40:23:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 
unit:0x0000009bab40417c55b2caf5b7556523
02:40:23:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
02:40:23:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m02:41:26:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
02:41:26:WU01:FS00:Connecting to 171.64.65.124:80
02:41:48:WU00:FS00:0xa4:Completed 2350000 out of 2500000 steps  (94%)
ESC[93m02:42:29:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:8
0: Connection timed outESC[0m
02:43:42:WU00:FS00:0xa4:Completed 2375000 out of 2500000 steps  (95%)
02:45:35:WU00:FS00:0xa4:Completed 2400000 out of 2500000 steps  (96%)
02:47:29:WU00:FS00:0xa4:Completed 2425000 out of 2500000 steps  (97%)
02:49:27:WU00:FS00:0xa4:Completed 2450000 out of 2500000 steps  (98%)
02:51:24:WU00:FS00:0xa4:Completed 2475000 out of 2500000 steps  (99%)
02:51:29:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 
unit:0x0000009bab40417c55b2caf5b7556523
02:51:29:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
02:51:29:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m02:52:32:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
02:52:32:WU01:FS00:Connecting to 171.64.65.124:80
02:52:40:WU01:FS00:Upload 4.26%
02:53:20:WU00:FS00:0xa4:Completed 2500000 out of 2500000 steps  (100%)
02:53:20:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
02:53:30:WU00:FS00:0xa4:
02:53:30:WU00:FS00:0xa4:Finished Work Unit:
02:53:30:WU00:FS00:0xa4:- Reading up to 1554192 from "00/wudata_01.trr": Read 1554192
02:53:30:WU00:FS00:0xa4:trr file hash check passed.
02:53:30:WU00:FS00:0xa4:- Reading up to 67748 from "00/wudata_01.xtc": Read 67748
02:53:30:WU00:FS00:0xa4:xtc file hash check passed.
02:53:30:WU00:FS00:0xa4:edr file hash check passed.
02:53:30:WU00:FS00:0xa4:logfile size: 56447
02:53:30:WU00:FS00:0xa4:Leaving Run
02:53:34:WU00:FS00:0xa4:- Writing 1715487 bytes of core data to disk...
02:53:34:WU00:FS00:0xa4:Done: 1714975 -> 1453655 (compressed to 84.7 percent)
02:53:34:WU00:FS00:0xa4:  ... Done.
02:54:11:WU01:FS00:Upload 12.78%
02:54:12:WU00:FS00:0xa4:- Shutting down core
02:54:12:WU00:FS00:0xa4:
02:54:12:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
02:54:15:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:54:15:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:8608 run:16 clone:3 gen:257 core:0xa4 unit:0x000001400002894c55a952c3fc54ecdb
02:54:15:WU00:FS00:Uploading 1.39MiB to 155.247.166.220
02:54:15:WU00:FS00:Connecting to 155.247.166.220:8080
02:54:18:WU00:FS00:Upload complete
02:54:18:WU00:FS00:Server responded WORK_ACK (400)
02:54:18:WU00:FS00:Final credit estimate, 2390.00 points
02:54:18:WU00:FS00:Cleaning up
02:55:12:WU01:FS00:Upload 17.04%
02:55:21:WU01:FS00:Upload 21.30%
02:56:15:WU01:FS00:Upload 25.56%
02:56:57:WU01:FS00:Upload 29.82%
02:58:13:WU01:FS00:Upload 34.09%
02:58:33:WU01:FS00:Upload 38.35%
02:59:29:WU01:FS00:Upload 42.61%
02:59:56:WU01:FS00:Upload 46.87%
03:00:56:WU01:FS00:Upload 51.13%
03:01:15:WU01:FS00:Upload 55.39%
03:01:52:WU01:FS00:Upload 59.65%
03:03:07:WU01:FS00:Upload 63.91%
03:04:14:WU01:FS00:Upload 68.17%
03:04:29:WU01:FS00:Upload 72.43%
03:05:30:WU01:FS00:Upload 76.69%
03:06:49:WU01:FS00:Upload 80.95%
03:06:58:WU01:FS00:Upload 85.21%
03:07:40:WU01:FS00:Upload 89.47%
03:09:00:WU01:FS00:Upload 93.73%
03:10:12:WU01:FS00:Upload 97.99%
03:10:43:WU01:FS00:Upload 100.00%
Then CTRL-C three times to force close, and on re-start it attempts to re-send the same WU:

Code: Select all

03:33:08:Trying to access database...
03:33:08:Successfully acquired database lock
03:33:08:Enabled folding slot 00: READY cpu:8
03:33:08:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 unit:0x0000009bab40417c55b2caf5b7556523
03:33:08:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
03:33:08:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m03:34:11:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
03:34:11:WU01:FS00:Connecting to 171.64.65.124:80
ESC[93m03:35:14:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: Connection timed outESC[0m
03:35:14:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 unit:0x0000009bab40417c55b2caf5b7556523
03:35:14:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
03:35:14:WU01:FS00:Connecting to 171.64.65.124:8080
Could this duplication of uploads be causing excess load on the server?

*EDIT 2*
I just shut that machine down (without grabbing the log first :oops: ). It had uploaded the same work unit again and got the 434 error, work unit already received, dumping. Was the first upload credited?

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 2:28 am
by twizzle
I've got three work units hanging around that won't upload to that server, and no WU's being assigned. Time to shut the client down... :(

02:26:21:WARNING:WU00:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_SERVICE_UNAVAILABLE
02:26:22:WARNING:WU02:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_SERVICE_UNAVAILABLE
02:26:22:WARNING:WU01:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_SERVICE_UNAVAILABLE

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 3:08 am
by mmonnin
Pausing my CPU clients as well. No point running them if we can't return the WUs.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 3:38 am
by onDvine
I still have one WU stuck. This afternoon got 2-3 WUs from/to a different server. They've been accepted so will continue to run the client.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 4:19 am
by bruce
n1np wrote:Could this duplication of uploads be causing excess load on the server?

It had uploaded the same work unit again and got the 434 error, work unit already received, dumping. Was the first upload credited?
Yes, When a server is receiving more uploads it can handle (for whatever reason) every little bit makes it worse.

At times, if the WU uploads successfully but the acknowledgement message doesn't get back to the Client, it will upload again but it will only be credited once, so dumping the WU is the right thing to do.

Code: Select all

SEND error:NO_ERROR project:8608 run:16 clone:3 gen:257 
SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 


Hi n1np (team 12912),
Your WU (P8608 R16 C3 G257) was added to the stats database on 2015-12-09 18:11:36 for 2390.8 points of credit.
Your WU (P9017 R492 C3 G130) was added to the stats database on 2015-12-09 15:12:06 for 1492.88 points of credit.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 7:48 am
by oxymoot
I am getting errors uploading to this server, I have also experienced lags over downloading and uploading over the last few days. Have attached the log file.

Code: Select all

07:26:52:WU00:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
07:27:43:WU00:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
07:27:44:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
07:27:54:WU00:FS00:0xa4:
07:27:54:WU00:FS00:0xa4:Finished Work Unit:
07:27:54:WU00:FS00:0xa4:- Reading up to 811512 from "00/wudata_01.trr": Read 811512
07:27:54:WU00:FS00:0xa4:trr file hash check passed.
07:27:54:WU00:FS00:0xa4:- Reading up to 745900 from "00/wudata_01.xtc": Read 745900
07:27:54:WU00:FS00:0xa4:xtc file hash check passed.
07:27:54:WU00:FS00:0xa4:edr file hash check passed.
07:27:54:WU00:FS00:0xa4:logfile size: 22797
07:27:54:WU00:FS00:0xa4:Leaving Run
07:27:56:WU00:FS00:0xa4:- Writing 1582697 bytes of core data to disk...
07:27:56:WU00:FS00:0xa4:Done: 1582185 -> 1537356 (compressed to 97.1 percent)
07:27:56:WU00:FS00:0xa4:  ... Done.
07:27:56:WU00:FS00:0xa4:- Shutting down core
07:27:56:WU00:FS00:0xa4:
07:27:56:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
07:27:57:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
07:27:57:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:27:57:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:27:57:WU00:FS00:Connecting to 171.64.65.124:8080
07:28:18:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:28:18:WU00:FS00:Connecting to 171.64.65.124:80
07:28:39:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:28:39:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:28:40:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:28:40:WU00:FS00:Connecting to 171.64.65.124:8080
07:29:01:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:29:01:WU00:FS00:Connecting to 171.64.65.124:80
07:29:22:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:29:40:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:29:40:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:29:40:WU00:FS00:Connecting to 171.64.65.124:8080
07:30:01:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:30:01:WU00:FS00:Connecting to 171.64.65.124:80
07:30:22:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:31:17:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:31:17:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:31:17:WU00:FS00:Connecting to 171.64.65.124:8080
07:31:38:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:31:38:WU00:FS00:Connecting to 171.64.65.124:80
07:31:59:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:33:54:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:33:54:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:33:54:WU00:FS00:Connecting to 171.64.65.124:8080
07:34:15:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:34:15:WU00:FS00:Connecting to 171.64.65.124:80
07:34:36:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:38:08:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:38:08:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:38:08:WU00:FS00:Connecting to 171.64.65.124:8080
07:38:29:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:38:29:WU00:FS00:Connecting to 171.64.65.124:80
07:38:50:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:45:00:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:45:00:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:45:00:WU00:FS00:Connecting to 171.64.65.124:8080
07:45:21:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:45:21:WU00:FS00:Connecting to 171.64.65.124:80
07:45:42:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.


Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 8:32 am
by vmzy
Is there something wrong with AS server code?
171.64.65.124 has been in reject mode and it`s net load was so high.Why AS still keep assigning client to 124 server to download new WU?

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 2:24 pm
by NASAdude
I have two WUs ready for upload. They're paused until I see a reply here saying the server is back online. Thanks.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 2:44 pm
by ArticBlast
Has this been resolved?

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 3:53 pm
by bruce
It was resolved and the server accepted a considerable number of uploads from a very large backlog, but it seems to have gone off-line again. I'll confirm the server's owner is aware of the (new) problem.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 4:11 pm
by ArticBlast
Thanks Bruce!

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 11:17 pm
by davidcoton
Three WUs that appear to have uploaded 100% but have not been ack'ed. Can a mod confirm receipt so I can clear the slots, rather than letting it try again and get "dumped"?
9017(120,8,15), 9017(656,1,16), 9021(33,12,42).

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Thu Dec 10, 2015 11:49 pm
by Joe_H
All three have been entered into the stats database.

From what I have been seeing on my systems at home getting and returning assignments from this server, responses are slow. It has taken as much as 5-6 minutes for the server's ACk for receiving a WU to show up in the log. Uploads and downloads were also slow a few hours ago when I was home.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Fri Dec 11, 2015 4:01 am
by Nert
Looks as if the problem still exists. I think I'll shut down the cpu client until this gets resolved.

Code: Select all

*********************** Log Started 2015-12-10T20:09:42Z ***********************
******************************* Date: 2015-12-11 *******************************
03:48:40:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:49:01:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:49:23:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:49:44:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:50:23:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:50:26:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
03:50:44:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:50:47:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:51:08:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
03:51:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:51:41:WU01:FS01:0x21:WARNING:Console control signal 1 on PID 6344
03:51:41:WU02:FS02:0x18:WARNING:Console control signal 1 on PID 7964
03:51:41:WU02:FS02:0x18:ERROR:103: Lost client lifeline
03:52:00:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:52:06:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
03:52:21:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:53:00:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:53:08:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
03:53:21:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:53:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Posted: Fri Dec 11, 2015 4:22 am
by ArticBlast
Yeah...its Rejecting again according the server status sheet.