171.64.65.124 SLOW upload/download and Dumping WU

Moderators: Site Moderators, FAHC Science Team

n1np
Posts: 31
Joined: Sat Mar 14, 2009 2:16 pm
Hardware configuration: HP DL380g8 x12
Location: Virginia
Contact:

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by n1np »

I am shutting my machines down for now, as I have seen more WU dumped.

I also want to make a comment about what I am seeing that I had not noted before. Many work units will show up to "Upload 100%", but never receive an acknowledgement from the server that they were uploaded or credited. Maybe they are then not released from the work queue and try to upload again (just speculating on that part)?

*EDIT*

Adding an example of a machine I want to shut down but the client will not exit because it has not received an ACK from the server (WU01):

Code: Select all

02:40:23:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 
unit:0x0000009bab40417c55b2caf5b7556523
02:40:23:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
02:40:23:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m02:41:26:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
02:41:26:WU01:FS00:Connecting to 171.64.65.124:80
02:41:48:WU00:FS00:0xa4:Completed 2350000 out of 2500000 steps  (94%)
ESC[93m02:42:29:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:8
0: Connection timed outESC[0m
02:43:42:WU00:FS00:0xa4:Completed 2375000 out of 2500000 steps  (95%)
02:45:35:WU00:FS00:0xa4:Completed 2400000 out of 2500000 steps  (96%)
02:47:29:WU00:FS00:0xa4:Completed 2425000 out of 2500000 steps  (97%)
02:49:27:WU00:FS00:0xa4:Completed 2450000 out of 2500000 steps  (98%)
02:51:24:WU00:FS00:0xa4:Completed 2475000 out of 2500000 steps  (99%)
02:51:29:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 
unit:0x0000009bab40417c55b2caf5b7556523
02:51:29:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
02:51:29:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m02:52:32:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
02:52:32:WU01:FS00:Connecting to 171.64.65.124:80
02:52:40:WU01:FS00:Upload 4.26%
02:53:20:WU00:FS00:0xa4:Completed 2500000 out of 2500000 steps  (100%)
02:53:20:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
02:53:30:WU00:FS00:0xa4:
02:53:30:WU00:FS00:0xa4:Finished Work Unit:
02:53:30:WU00:FS00:0xa4:- Reading up to 1554192 from "00/wudata_01.trr": Read 1554192
02:53:30:WU00:FS00:0xa4:trr file hash check passed.
02:53:30:WU00:FS00:0xa4:- Reading up to 67748 from "00/wudata_01.xtc": Read 67748
02:53:30:WU00:FS00:0xa4:xtc file hash check passed.
02:53:30:WU00:FS00:0xa4:edr file hash check passed.
02:53:30:WU00:FS00:0xa4:logfile size: 56447
02:53:30:WU00:FS00:0xa4:Leaving Run
02:53:34:WU00:FS00:0xa4:- Writing 1715487 bytes of core data to disk...
02:53:34:WU00:FS00:0xa4:Done: 1714975 -> 1453655 (compressed to 84.7 percent)
02:53:34:WU00:FS00:0xa4:  ... Done.
02:54:11:WU01:FS00:Upload 12.78%
02:54:12:WU00:FS00:0xa4:- Shutting down core
02:54:12:WU00:FS00:0xa4:
02:54:12:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
02:54:15:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:54:15:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:8608 run:16 clone:3 gen:257 core:0xa4 unit:0x000001400002894c55a952c3fc54ecdb
02:54:15:WU00:FS00:Uploading 1.39MiB to 155.247.166.220
02:54:15:WU00:FS00:Connecting to 155.247.166.220:8080
02:54:18:WU00:FS00:Upload complete
02:54:18:WU00:FS00:Server responded WORK_ACK (400)
02:54:18:WU00:FS00:Final credit estimate, 2390.00 points
02:54:18:WU00:FS00:Cleaning up
02:55:12:WU01:FS00:Upload 17.04%
02:55:21:WU01:FS00:Upload 21.30%
02:56:15:WU01:FS00:Upload 25.56%
02:56:57:WU01:FS00:Upload 29.82%
02:58:13:WU01:FS00:Upload 34.09%
02:58:33:WU01:FS00:Upload 38.35%
02:59:29:WU01:FS00:Upload 42.61%
02:59:56:WU01:FS00:Upload 46.87%
03:00:56:WU01:FS00:Upload 51.13%
03:01:15:WU01:FS00:Upload 55.39%
03:01:52:WU01:FS00:Upload 59.65%
03:03:07:WU01:FS00:Upload 63.91%
03:04:14:WU01:FS00:Upload 68.17%
03:04:29:WU01:FS00:Upload 72.43%
03:05:30:WU01:FS00:Upload 76.69%
03:06:49:WU01:FS00:Upload 80.95%
03:06:58:WU01:FS00:Upload 85.21%
03:07:40:WU01:FS00:Upload 89.47%
03:09:00:WU01:FS00:Upload 93.73%
03:10:12:WU01:FS00:Upload 97.99%
03:10:43:WU01:FS00:Upload 100.00%
Then CTRL-C three times to force close, and on re-start it attempts to re-send the same WU:

Code: Select all

03:33:08:Trying to access database...
03:33:08:Successfully acquired database lock
03:33:08:Enabled folding slot 00: READY cpu:8
03:33:08:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 unit:0x0000009bab40417c55b2caf5b7556523
03:33:08:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
03:33:08:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m03:34:11:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
03:34:11:WU01:FS00:Connecting to 171.64.65.124:80
ESC[93m03:35:14:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: Connection timed outESC[0m
03:35:14:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 unit:0x0000009bab40417c55b2caf5b7556523
03:35:14:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
03:35:14:WU01:FS00:Connecting to 171.64.65.124:8080
Could this duplication of uploads be causing excess load on the server?

*EDIT 2*
I just shut that machine down (without grabbing the log first :oops: ). It had uploaded the same work unit again and got the 434 error, work unit already received, dumping. Was the first upload credited?
Last edited by n1np on Thu Dec 10, 2015 3:08 am, edited 1 time in total.
Antonomasia Productions
New Release 2022: Re-Entrant
twizzle
Posts: 25
Joined: Fri Aug 30, 2013 3:55 am

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by twizzle »

I've got three work units hanging around that won't upload to that server, and no WU's being assigned. Time to shut the client down... :(

02:26:21:WARNING:WU00:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_SERVICE_UNAVAILABLE
02:26:22:WARNING:WU02:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_SERVICE_UNAVAILABLE
02:26:22:WARNING:WU01:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_SERVICE_UNAVAILABLE
mmonnin
Posts: 324
Joined: Wed Dec 05, 2007 1:27 am

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by mmonnin »

Pausing my CPU clients as well. No point running them if we can't return the WUs.
onDvine
Posts: 3
Joined: Mon May 07, 2012 4:10 pm

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by onDvine »

I still have one WU stuck. This afternoon got 2-3 WUs from/to a different server. They've been accepted so will continue to run the client.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by bruce »

n1np wrote:Could this duplication of uploads be causing excess load on the server?

It had uploaded the same work unit again and got the 434 error, work unit already received, dumping. Was the first upload credited?
Yes, When a server is receiving more uploads it can handle (for whatever reason) every little bit makes it worse.

At times, if the WU uploads successfully but the acknowledgement message doesn't get back to the Client, it will upload again but it will only be credited once, so dumping the WU is the right thing to do.

Code: Select all

SEND error:NO_ERROR project:8608 run:16 clone:3 gen:257 
SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 


Hi n1np (team 12912),
Your WU (P8608 R16 C3 G257) was added to the stats database on 2015-12-09 18:11:36 for 2390.8 points of credit.
Your WU (P9017 R492 C3 G130) was added to the stats database on 2015-12-09 15:12:06 for 1492.88 points of credit.
oxymoot
Posts: 12
Joined: Tue Mar 06, 2012 11:52 am

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by oxymoot »

I am getting errors uploading to this server, I have also experienced lags over downloading and uploading over the last few days. Have attached the log file.

Code: Select all

07:26:52:WU00:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
07:27:43:WU00:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
07:27:44:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
07:27:54:WU00:FS00:0xa4:
07:27:54:WU00:FS00:0xa4:Finished Work Unit:
07:27:54:WU00:FS00:0xa4:- Reading up to 811512 from "00/wudata_01.trr": Read 811512
07:27:54:WU00:FS00:0xa4:trr file hash check passed.
07:27:54:WU00:FS00:0xa4:- Reading up to 745900 from "00/wudata_01.xtc": Read 745900
07:27:54:WU00:FS00:0xa4:xtc file hash check passed.
07:27:54:WU00:FS00:0xa4:edr file hash check passed.
07:27:54:WU00:FS00:0xa4:logfile size: 22797
07:27:54:WU00:FS00:0xa4:Leaving Run
07:27:56:WU00:FS00:0xa4:- Writing 1582697 bytes of core data to disk...
07:27:56:WU00:FS00:0xa4:Done: 1582185 -> 1537356 (compressed to 97.1 percent)
07:27:56:WU00:FS00:0xa4:  ... Done.
07:27:56:WU00:FS00:0xa4:- Shutting down core
07:27:56:WU00:FS00:0xa4:
07:27:56:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
07:27:57:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
07:27:57:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:27:57:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:27:57:WU00:FS00:Connecting to 171.64.65.124:8080
07:28:18:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:28:18:WU00:FS00:Connecting to 171.64.65.124:80
07:28:39:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:28:39:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:28:40:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:28:40:WU00:FS00:Connecting to 171.64.65.124:8080
07:29:01:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:29:01:WU00:FS00:Connecting to 171.64.65.124:80
07:29:22:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:29:40:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:29:40:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:29:40:WU00:FS00:Connecting to 171.64.65.124:8080
07:30:01:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:30:01:WU00:FS00:Connecting to 171.64.65.124:80
07:30:22:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:31:17:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:31:17:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:31:17:WU00:FS00:Connecting to 171.64.65.124:8080
07:31:38:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:31:38:WU00:FS00:Connecting to 171.64.65.124:80
07:31:59:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:33:54:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:33:54:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:33:54:WU00:FS00:Connecting to 171.64.65.124:8080
07:34:15:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:34:15:WU00:FS00:Connecting to 171.64.65.124:80
07:34:36:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:38:08:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:38:08:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:38:08:WU00:FS00:Connecting to 171.64.65.124:8080
07:38:29:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:38:29:WU00:FS00:Connecting to 171.64.65.124:80
07:38:50:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:45:00:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9021 run:888 clone:7 gen:116 core:0xa4 unit:0x00000085ab40417c55e81ce905d675c5
07:45:00:WU00:FS00:Uploading 1.47MiB to 171.64.65.124
07:45:00:WU00:FS00:Connecting to 171.64.65.124:8080
07:45:21:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:45:21:WU00:FS00:Connecting to 171.64.65.124:80
07:45:42:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

vmzy
Posts: 136
Joined: Wed Apr 16, 2008 6:25 am

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by vmzy »

Is there something wrong with AS server code?
171.64.65.124 has been in reject mode and it`s net load was so high.Why AS still keep assigning client to 124 server to download new WU?
NASAdude
Posts: 1
Joined: Thu Dec 10, 2015 2:19 pm

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by NASAdude »

I have two WUs ready for upload. They're paused until I see a reply here saying the server is back online. Thanks.
ArticBlast
Posts: 4
Joined: Thu Dec 10, 2015 2:39 pm

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by ArticBlast »

Has this been resolved?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by bruce »

It was resolved and the server accepted a considerable number of uploads from a very large backlog, but it seems to have gone off-line again. I'll confirm the server's owner is aware of the (new) problem.
ArticBlast
Posts: 4
Joined: Thu Dec 10, 2015 2:39 pm

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by ArticBlast »

Thanks Bruce!
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by davidcoton »

Three WUs that appear to have uploaded 100% but have not been ack'ed. Can a mod confirm receipt so I can clear the slots, rather than letting it try again and get "dumped"?
9017(120,8,15), 9017(656,1,16), 9021(33,12,42).
Image
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by Joe_H »

All three have been entered into the stats database.

From what I have been seeing on my systems at home getting and returning assignments from this server, responses are slow. It has taken as much as 5-6 minutes for the server's ACk for receiving a WU to show up in the log. Uploads and downloads were also slow a few hours ago when I was home.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Nert
Posts: 154
Joined: Wed Mar 26, 2014 7:46 pm

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by Nert »

Looks as if the problem still exists. I think I'll shut down the cpu client until this gets resolved.

Code: Select all

*********************** Log Started 2015-12-10T20:09:42Z ***********************
******************************* Date: 2015-12-11 *******************************
03:48:40:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:49:01:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:49:23:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:49:44:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:50:23:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:50:26:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
03:50:44:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:50:47:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:51:08:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
03:51:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:51:41:WU01:FS01:0x21:WARNING:Console control signal 1 on PID 6344
03:51:41:WU02:FS02:0x18:WARNING:Console control signal 1 on PID 7964
03:51:41:WU02:FS02:0x18:ERROR:103: Lost client lifeline
03:52:00:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:52:06:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
03:52:21:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:53:00:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
03:53:08:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
03:53:21:ERROR:WU03:FS00:Exception: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:53:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Image
ArticBlast
Posts: 4
Joined: Thu Dec 10, 2015 2:39 pm

Re: 171.64.65.124 SLOW upload/download and Dumping WU

Post by ArticBlast »

Yeah...its Rejecting again according the server status sheet.
Post Reply