Re: 171.64.65.124 SLOW upload/download and Dumping WU
Posted: Thu Dec 10, 2015 1:38 am
I am shutting my machines down for now, as I have seen more WU dumped.
I also want to make a comment about what I am seeing that I had not noted before. Many work units will show up to "Upload 100%", but never receive an acknowledgement from the server that they were uploaded or credited. Maybe they are then not released from the work queue and try to upload again (just speculating on that part)?
*EDIT*
Adding an example of a machine I want to shut down but the client will not exit because it has not received an ACK from the server (WU01):
Then CTRL-C three times to force close, and on re-start it attempts to re-send the same WU:
Could this duplication of uploads be causing excess load on the server?
*EDIT 2*
I just shut that machine down (without grabbing the log first ). It had uploaded the same work unit again and got the 434 error, work unit already received, dumping. Was the first upload credited?
I also want to make a comment about what I am seeing that I had not noted before. Many work units will show up to "Upload 100%", but never receive an acknowledgement from the server that they were uploaded or credited. Maybe they are then not released from the work queue and try to upload again (just speculating on that part)?
*EDIT*
Adding an example of a machine I want to shut down but the client will not exit because it has not received an ACK from the server (WU01):
Code: Select all
02:40:23:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4
unit:0x0000009bab40417c55b2caf5b7556523
02:40:23:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
02:40:23:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m02:41:26:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
02:41:26:WU01:FS00:Connecting to 171.64.65.124:80
02:41:48:WU00:FS00:0xa4:Completed 2350000 out of 2500000 steps (94%)
ESC[93m02:42:29:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:8
0: Connection timed outESC[0m
02:43:42:WU00:FS00:0xa4:Completed 2375000 out of 2500000 steps (95%)
02:45:35:WU00:FS00:0xa4:Completed 2400000 out of 2500000 steps (96%)
02:47:29:WU00:FS00:0xa4:Completed 2425000 out of 2500000 steps (97%)
02:49:27:WU00:FS00:0xa4:Completed 2450000 out of 2500000 steps (98%)
02:51:24:WU00:FS00:0xa4:Completed 2475000 out of 2500000 steps (99%)
02:51:29:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4
unit:0x0000009bab40417c55b2caf5b7556523
02:51:29:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
02:51:29:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m02:52:32:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
02:52:32:WU01:FS00:Connecting to 171.64.65.124:80
02:52:40:WU01:FS00:Upload 4.26%
02:53:20:WU00:FS00:0xa4:Completed 2500000 out of 2500000 steps (100%)
02:53:20:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
02:53:30:WU00:FS00:0xa4:
02:53:30:WU00:FS00:0xa4:Finished Work Unit:
02:53:30:WU00:FS00:0xa4:- Reading up to 1554192 from "00/wudata_01.trr": Read 1554192
02:53:30:WU00:FS00:0xa4:trr file hash check passed.
02:53:30:WU00:FS00:0xa4:- Reading up to 67748 from "00/wudata_01.xtc": Read 67748
02:53:30:WU00:FS00:0xa4:xtc file hash check passed.
02:53:30:WU00:FS00:0xa4:edr file hash check passed.
02:53:30:WU00:FS00:0xa4:logfile size: 56447
02:53:30:WU00:FS00:0xa4:Leaving Run
02:53:34:WU00:FS00:0xa4:- Writing 1715487 bytes of core data to disk...
02:53:34:WU00:FS00:0xa4:Done: 1714975 -> 1453655 (compressed to 84.7 percent)
02:53:34:WU00:FS00:0xa4: ... Done.
02:54:11:WU01:FS00:Upload 12.78%
02:54:12:WU00:FS00:0xa4:- Shutting down core
02:54:12:WU00:FS00:0xa4:
02:54:12:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
02:54:15:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:54:15:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:8608 run:16 clone:3 gen:257 core:0xa4 unit:0x000001400002894c55a952c3fc54ecdb
02:54:15:WU00:FS00:Uploading 1.39MiB to 155.247.166.220
02:54:15:WU00:FS00:Connecting to 155.247.166.220:8080
02:54:18:WU00:FS00:Upload complete
02:54:18:WU00:FS00:Server responded WORK_ACK (400)
02:54:18:WU00:FS00:Final credit estimate, 2390.00 points
02:54:18:WU00:FS00:Cleaning up
02:55:12:WU01:FS00:Upload 17.04%
02:55:21:WU01:FS00:Upload 21.30%
02:56:15:WU01:FS00:Upload 25.56%
02:56:57:WU01:FS00:Upload 29.82%
02:58:13:WU01:FS00:Upload 34.09%
02:58:33:WU01:FS00:Upload 38.35%
02:59:29:WU01:FS00:Upload 42.61%
02:59:56:WU01:FS00:Upload 46.87%
03:00:56:WU01:FS00:Upload 51.13%
03:01:15:WU01:FS00:Upload 55.39%
03:01:52:WU01:FS00:Upload 59.65%
03:03:07:WU01:FS00:Upload 63.91%
03:04:14:WU01:FS00:Upload 68.17%
03:04:29:WU01:FS00:Upload 72.43%
03:05:30:WU01:FS00:Upload 76.69%
03:06:49:WU01:FS00:Upload 80.95%
03:06:58:WU01:FS00:Upload 85.21%
03:07:40:WU01:FS00:Upload 89.47%
03:09:00:WU01:FS00:Upload 93.73%
03:10:12:WU01:FS00:Upload 97.99%
03:10:43:WU01:FS00:Upload 100.00%
Code: Select all
03:33:08:Trying to access database...
03:33:08:Successfully acquired database lock
03:33:08:Enabled folding slot 00: READY cpu:8
03:33:08:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 unit:0x0000009bab40417c55b2caf5b7556523
03:33:08:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
03:33:08:WU01:FS00:Connecting to 171.64.65.124:8080
ESC[93m03:34:11:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80ESC[0m
03:34:11:WU01:FS00:Connecting to 171.64.65.124:80
ESC[93m03:35:14:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.64.65.124:80: Connection timed outESC[0m
03:35:14:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9017 run:492 clone:3 gen:130 core:0xa4 unit:0x0000009bab40417c55b2caf5b7556523
03:35:14:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
03:35:14:WU01:FS00:Connecting to 171.64.65.124:8080
*EDIT 2*
I just shut that machine down (without grabbing the log first ). It had uploaded the same work unit again and got the 434 error, work unit already received, dumping. Was the first upload credited?