20:40:10:WU00:FS00:0xa7:Project: 13871 (Run 0, Clone 1020, Gen 16)
20:40:10:WU00:FS00:0xa7:Unit: 0x000000140d5262775e791bc0cba96ccf
20:40:10:WU00:FS00:0xa7:Reading tar file core.xml
20:40:10:WU00:FS00:0xa7:Reading tar file frame16.tpr
20:40:10:WU00:FS00:0xa7:Digital signatures verified
20:40:10:WU00:FS00:0xa7:Calling: mdrun -s frame16.tpr -o frame16.trr -x frame16.xtc -e frame16.edr -cpt 15 -nt 8
20:40:10:WU00:FS00:0xa7:Steps: first=2000000 total=125000
20:40:16:WU00:FS00:0xa7:Completed 1 out of 125000 steps (0%)
20:40:28:WU01:FS00:Upload 1.50%
20:40:28:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
20:42:45:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14364 run:146 clone:6 gen:1 core:0xa7 unit:0x000000029bf7a4d65e7cbfc50fa5eca4
20:42:45:WU01:FS00:Uploading 8.32MiB to 155.247.164.214
20:42:45:WU01:FS00:Connecting to 155.247.164.214:8080
20:43:03:WU00:FS00:0xa7:Completed 1250 out of 125000 steps (1%)
20:43:46:WU01:FS00:Upload 1.50%
20:43:46:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
20:45:41:WU00:FS00:0xa7:Completed 2500 out of 125000 steps (2%)
20:47:00:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14364 run:146 clone:6 gen:1 core:0xa7 unit:0x000000029bf7a4d65e7cbfc50fa5eca4
20:47:00:WU01:FS00:Uploading 8.32MiB to 155.247.164.214
20:47:00:WU01:FS00:Connecting to 155.247.164.214:8080
20:47:19:WU01:FS00:Upload 1.50%
20:47:19:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
20:48:09:WU00:FS00:0xa7:Completed 3750 out of 125000 steps (3%)
20:50:40:WU00:FS00:0xa7:Completed 5000 out of 125000 steps (4%)
20:53:08:WU00:FS00:0xa7:Completed 6250 out of 125000 steps (5%)
20:53:51:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14364 run:146 clone:6 gen:1 core:0xa7 unit:0x000000029bf7a4d65e7cbfc50fa5eca4
20:53:51:WU01:FS00:Uploading 8.32MiB to 155.247.164.214
20:53:51:WU01:FS00:Connecting to 155.247.164.214:8080
20:54:12:WU01:FS00:Upload 1.50%
20:54:12:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
20:55:41:WU00:FS00:0xa7:Completed 7500 out of 125000 steps (6%)
20:58:10:WU00:FS00:0xa7:Completed 8750 out of 125000 steps (7%)
21:00:38:WU00:FS00:0xa7:Completed 10000 out of 125000 steps (8%)
21:03:07:WU00:FS00:0xa7:Completed 11250 out of 125000 steps (9%)
21:04:56:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14364 run:146 clone:6 gen:1 core:0xa7 unit:0x000000029bf7a4d65e7cbfc50fa5eca4
21:04:56:WU01:FS00:Uploading 8.32MiB to 155.247.164.214
21:04:56:WU01:FS00:Connecting to 155.247.164.214:8080
21:05:16:WU01:FS00:Upload 1.50%
21:05:17:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
21:05:39:WU00:FS00:0xa7:Completed 12500 out of 125000 steps (10%)
21:08:09:WU00:FS00:0xa7:Completed 13750 out of 125000 steps (11%)
21:10:42:WU00:FS00:0xa7:Completed 15000 out of 125000 steps (12%)
21:13:10:WU00:FS00:0xa7:Completed 16250 out of 125000 steps (13%)
21:15:39:WU00:FS00:0xa7:Completed 17500 out of 125000 steps (14%)
21:18:08:WU00:FS00:0xa7:Completed 18750 out of 125000 steps (15%)
21:20:36:WU00:FS00:0xa7:Completed 20000 out of 125000 steps (16%)
21:22:53:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14364 run:146 clone:6 gen:1 core:0xa7 unit:0x000000029bf7a4d65e7cbfc50fa5eca4
21:22:53:WU01:FS00:Uploading 8.32MiB to 155.247.164.214
21:22:53:WU01:FS00:Connecting to 155.247.164.214:8080
21:23:04:WU00:FS00:0xa7:Completed 21250 out of 125000 steps (17%)
21:23:06:WU01:FS00:Upload 1.50%
21:23:31:WU01:FS00:Upload 2.25%
21:23:31:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
Yeah, not sure, to me it doesn't look exactly like the previous issue. The previous issues seemed to be related to large results, over 50 MiB or so, not being accepted. And this one is 8.32 MiB.
Also they were usually rejected immediately, without any upload progress shown, and this one has some progress before the error. So it seems to be a different problem, but I might be wrong.
Using Wireshark to see exactly what is communicated by the server when the error is received might help identify the issue, as the client seems to hide some of the details about the error it receives from the server. But using Wireshark is not trivial, unless you know a bit about networking.
21:17:57:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14364 run:369 clone:5 gen:3 core:0xa7 unit:0x000000039bf7a4d65e7cbfc0c100874e
21:17:57:WU02:FS00:Uploading 8.25MiB to 155.247.164.214
21:17:57:WU02:FS00:Connecting to 155.247.164.214:8080
21:18:10:WU02:FS00:Upload 4.54%
21:18:16:WU02:FS00:Upload 6.82%
21:21:46:WU02:FS00:Upload 7.57%
21:21:46:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
First and foremost, the science part: what's the actual official word about dropped work units due to not being submitted in time due to server problems? We're losing valuable scientific data or we're getting a chance of submitting eventually? I mean, we're talking COVID-19 here, guys - the more WUs we solve, the sooner we get a cure!
Secondly, what's the actual official word about dropped (bonus) points for our machine(s)'s hard work which they tried to submit and either partly or even completely failed to do so due to those server problems? Do they get those points or is it all lost in the middle of nowhere?
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
*********************** Log Started 2020-04-03T12:15:16Z ***********************
12:15:16:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11750 run:0 clone:3212 gen:0 core:0x22 unit:0x000000068ca304e75e6a802ccb8c7fa8
12:15:16:WU01:FS01:Uploading 14.51MiB to 140.163.4.231
12:15:16:WU01:FS01:Connecting to 140.163.4.231:8080
12:15:37:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:15:37:WU01:FS01:Connecting to 140.163.4.231:80
12:15:59:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
12:15:59:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11750 run:0 clone:3212 gen:0 core:0x22 unit:0x000000068ca304e75e6a802ccb8c7fa8
12:15:59:WU01:FS01:Uploading 14.51MiB to 140.163.4.231
12:15:59:WU01:FS01:Connecting to 140.163.4.231:8080
12:16:20:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:16:20:WU01:FS01:Connecting to 140.163.4.231:80
12:16:23:WU01:FS01:Upload 0.43%
12:18:04:WU01:FS01:Upload 0.86%
12:18:04:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
12:18:04:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11750 run:0 clone:3212 gen:0 core:0x22 unit:0x000000068ca304e75e6a802ccb8c7fa8
12:18:04:WU01:FS01:Uploading 14.51MiB to 140.163.4.231
12:18:04:WU01:FS01:Connecting to 140.163.4.231:8080
12:18:28:WU01:FS01:Upload 0.86%
12:18:28:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
12:19:41:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11750 run:0 clone:3212 gen:0 core:0x22 unit:0x000000068ca304e75e6a802ccb8c7fa8
12:19:41:WU01:FS01:Uploading 14.51MiB to 140.163.4.231
12:19:41:WU01:FS01:Connecting to 140.163.4.231:8080
12:20:02:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:20:02:WU01:FS01:Connecting to 140.163.4.231:80
12:20:24:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
12:22:19:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11750 run:0 clone:3212 gen:0 core:0x22 unit:0x000000068ca304e75e6a802ccb8c7fa8
12:22:19:WU01:FS01:Uploading 14.51MiB to 140.163.4.231
12:22:19:WU01:FS01:Connecting to 140.163.4.231:8080
12:22:40:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:22:40:WU01:FS01:Connecting to 140.163.4.231:80
12:23:01:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
12:26:33:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11750 run:0 clone:3212 gen:0 core:0x22 unit:0x000000068ca304e75e6a802ccb8c7fa8
12:26:33:WU01:FS01:Uploading 14.51MiB to 140.163.4.231
12:26:33:WU01:FS01:Connecting to 140.163.4.231:8080
12:26:54:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:26:54:WU01:FS01:Connecting to 140.163.4.231:80
12:27:15:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
08:04:15:WU01:FS00:Trying to send results to collection server
08:04:15:WU01:FS00:Uploading 8.24MiB to 155.247.166.219
08:04:15:WU01:FS00:Connecting to 155.247.166.219:8080
08:05:01:WU01:FS00:Upload 1.52%
08:05:02:ERROR:WU01:FS00:Exception: Transfer failed
I'm guessing it's a server overload, but my other pc doesn't seem to have too much trouble uploading it's results (didn't check if it was on t same server though)