Upload failures to mostly *.wustl.edu endpoints
Posted: Wed Jan 26, 2022 2:32 am
I have 2 machines running FAH and am constantly seeing output in logs like:
I don't think it's an issue going on with my computers in particular, because I see things working for some servers but not for others. Looking at the last couple days of logs, I have 17 WORK_ACKs from the following servers:
I have over 2000 failures to upload (and 20 pending completed WUs on this machine); here is a sampling of recent upload failures:
The stuck WUs are for projects 17257, 17258, and 18201.
Anything I can do to help diagnose? Thanks folks! (Hope your stats server issue gets fixed)
I've put all the logs in
Code: Select all
16:36:41:WU19:FS01:Sending unit results: id:19 state:SEND error:NO_ERROR project:17257 run:774 clone:5 gen:3 core:0x22 unit:0x00000005000000030000436900000306
16:36:41:WU19:FS01:Uploading 51.39MiB to 128.252.203.10
16:36:41:WU19:FS01:Connecting to 128.252.203.10:8080
16:37:11:WARNING:WU19:FS01:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
Code: Select all
[*] 207.53.233.146:8080 (fah.redesignscience.com) x6
[*] 34.72.228.44:8080 (stxfahwork01.silicontx.com) x3
[*] 140.163.4.210:8080 (pllwskifah2.mskcc.org) x2
[*] 128.252.203.13:8080 (highland3.engr.wustl.edu)
[*] 128.252.203.9:8080 (islay.seas.wustl.edu)
[*] 128.174.73.74:8080 x2 (ds01.scs.illinois.edu)
[*] 129.32.209.202:8080 (vav19.fah.temple.edu)
[*] 178.174.196.138:8080 (fah-ws1.bahnhof.net)
Code: Select all
[*] 128.252.203.1:8080
[*] 128.252.203.9:8080 (But wait, I also have WORK_ACKs from here?)
[*] 128.252.203.10:8080
[*] 128.252.203.11:8080
[*] 128.252.203.13:8080
[*] 128.252.203.14:8080
[*] 206.223.170.146:8080
Anything I can do to help diagnose? Thanks folks! (Hope your stats server issue gets fixed)
I've put all the logs in
Code: Select all
https://gist.github.com/BillyONeal/c97f452b99475f16a089a9206dc32140