Page 1 of 2

Client timeout issue when uploading to multiple CS servers

Posted: Tue Nov 10, 2020 9:39 am
by MaxTranced
https://imgur.com/a/TmaInta

The list of IPs I'm having trouble with:
150.136.14.110
129.213.40.229
128.252.203.1
140.163.4.231

I think (but have not checked) that most errors look like this:
15:56:00:WU27:FS02:Connecting to 129.213.40.229:8080
15:56:10:ERROR:WU19:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
I've already lost quite a lot of WUs that have expired due to this issue... Quite a shame... What should I do?

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Tue Nov 10, 2020 3:38 pm
by Joe_H
Welcome to the folding support forum.

This issue can be caused by problems with the servers and their settings or network traffic. But they can also be from issues either on the system itself or the local network. Some ISPs routers also can cause this by dropping acknowledgement packets for the HTTP connections used to upload and download WUs.

On your own system and local network check that the FAHClient process has full access to using HTTP or HTTPS for transferring packet up and down. If connected by a wifi connection, try a wired connection.

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Tue Nov 10, 2020 4:38 pm
by Neil-B
I have seen this a few times (intermittently) since Project 16927 came on line (in beta) ... in each case an upload attempt looks as if it gets close to uploading - fails with a short response message then tries again and completes but dumps as server doesn't like ... until the WU Status system is given a kick (chased this in beta channel) then the next bit is pure speculation on my part - but I have a suspicion that the first attempt is actually completing (this will be shown by the WU Status when it catches up) but that for some reason the message gets truncated when completion is sent to client - client tries to resend and when it all gets to server server say "don't like - dump" because it already has it ... This only happens occasionally (and matches symptoms reported by others) but until such time as WU Status/Stats system is sorted I cant prove this ... Fairly sure I saw the same pattern on another project (that is being reported by WU Status) just need to track down which one ... and my final WAG is that this is linked to the timeout changes that may (or may not have) recently (version .20 on) been applied (probably to sort out the hanging connection issue that used to happen ... But I may be totally wrong ad simply jumping at shadows !!

OK so not proving easy to find case that actually went through - maybe that was my imagination (lockdown might be getting to me) - but currently a dumping message attempt is preceded by a failed short connection message attempt ... something makes me think these are linked.

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Tue Nov 10, 2020 6:28 pm
by MaxTranced
Neil-B wrote:and my final WAG is that this is linked to the timeout changes that may (or may not have) recently (version .20 on) been applied (probably to sort out the hanging connection issue that used to happen ...
I suspect that the issue I'm seeing is correlated with me updating to the latest client version on Windows. There was no such issue before the update, I rarely saw more than 2-3 WUs waiting, but now I have 34. I updated to 7.6.21_x86 on the 2nd of November. I was using 7.6.13 before.

This is a wired connection. As far as I know the client has full access to the network, not sure how to check for particular issues.

I'll try downgrading and see if that solves the issue. I hope it's possible to downgrade! :D

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Tue Nov 10, 2020 6:41 pm
by MaxTranced
I've downgraded to 7.6.13 and the issue is gone (all WUs were sent on startup). I think it's safe to say that this is a client bug, most likely a timeout issue along the lines suggested by Neil-B.

Re: Client timeout issue when uploading to multiple CS serve

Posted: Tue Nov 10, 2020 6:46 pm
by Neil-B
As discussed with OP in Discord ... could still be network not liking the new client - but at least a short/medium term resolution has been found that cleared the fairly large backlog and got things flowing again ... might be useful in a log could be posted of some of the failed attempts to upload if possible - but my guess is that in this case maybe not a server side issue

Re: Client timeout issue when uploading to multiple CS serve

Posted: Tue Nov 10, 2020 6:47 pm
by MaxTranced
I also posted about this on the client forum here https://foldingforum.org/viewtopic.php?f=108&t=36409.

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Tue Nov 10, 2020 6:48 pm
by Joe_H
With Windows firewall settings you may need to whitelist the FAHClient executable after updating. Though the updated version will have the same name, Windows also sometimes matches an executable given an exception against a checksum or other digital signature.

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Tue Nov 10, 2020 7:04 pm
by MaxTranced
Joe_H wrote:With Windows firewall settings you may need to whitelist the FAHClient executable after updating. Though the updated version will have the same name, Windows also sometimes matches an executable given an exception against a checksum or other digital signature.
Thank you for the info Joe_H. Will keep this in mind for the next client update.

Re: Client timeout issue when uploading to multiple CS serve

Posted: Wed Nov 11, 2020 12:38 am
by bruce
MaxTranced wrote: The list of IPs I'm having trouble with:
150.136.14.110
129.213.40.229
128.252.203.1
140.163.4.231
I am able to open the landing page in my browser on all of those servers. That's not enough to validate that the server is working properly, but it finds many problems and it's the first
Another thing I always try it to search on https://apps.foldingathome.org/serverstats.

The following are officially DOWN
3.21.157.11
69.94.66.6
69.94.66.7
128.252.203.2
128.252.203.11
129.32.209.200
155.247.164.214
155.247.166.219

The following are in ACCEPT mode and are being drained which means they should not be assigning new work but previously assigned work should still be accepted.
128.174.73.74
128.252.203.4
129.32.209.202
129.32.209.205
129.32.209.206
129.32.209.207
130.237.11.145
140.163.4.200
140.163.4.210
140.163.4.231
140.163.4.241
146.94.192.82
150.136.14.110
155.247.164.213
168.245.198.125

Some are special cases. For example, 140.163.4.231. It's in ACCEPT mode so it should be accepting WUs. Also, under Collection Server, it says YES so failed connections SHOULD be redirected to another server. Please post a segment of the log showing a pair of those attempts. (Was this server called a Work Server or a Collection Server in that context?)

Re: Client timeout issue when uploading to multiple CS serve

Posted: Wed Nov 11, 2020 10:35 am
by MaxTranced
bruce wrote: Some are special cases. For example, 140.163.4.231. It's in ACCEPT mode so it should be accepting WUs. Also, under Collection Server, it says YES so failed connections SHOULD be redirected to another server. Please post a segment of the log showing a pair of those attempts. (Was this server called a Work Server or a Collection Server in that context?)
Here are a couple of chunks from the logs, I hope that they help:

Code: Select all

12:21:04:WU00:FS02:0x22:Completed 4950000 out of 5000000 steps (99%)
12:21:04:WU01:FS02:Connecting to assign1.foldingathome.org:80
12:21:04:WU01:FS02:Assigned to work server 129.213.157.105
12:21:04:WU01:FS02:Requesting new work unit for slot 02: RUNNING gpu:1:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 129.213.157.105
12:21:04:WU01:FS02:Connecting to 129.213.157.105:8080
12:21:05:ERROR:WU01:FS02:Exception: Server did not assign work unit
12:21:05:WU01:FS02:Connecting to assign1.foldingathome.org:80
12:21:06:WU01:FS02:Assigned to work server 206.223.170.146
12:21:06:WU01:FS02:Requesting new work unit for slot 02: RUNNING gpu:1:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 206.223.170.146
12:21:06:WU01:FS02:Connecting to 206.223.170.146:8080
12:21:30:WU00:FS02:0x22:Completed 5000000 out of 5000000 steps (100%)
12:21:30:WU00:FS02:0x22:Average performance: 327.273 ns/day
12:21:31:WU00:FS02:0x22:Checkpoint completed at step 5000000
12:21:31:WU00:FS02:0x22:Saving result file ..\logfile_01.txt
12:21:31:WU00:FS02:0x22:Saving result file checkpointIntegrator.xml
12:21:31:WU00:FS02:0x22:Saving result file checkpointState.xml
12:21:31:WU00:FS02:0x22:Saving result file positions.xtc
12:21:31:WU00:FS02:0x22:Saving result file science.log
12:21:31:WU00:FS02:0x22:Folding@home Core Shutdown: FINISHED_UNIT
12:21:32:WU00:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
12:21:32:WU00:FS02:Sending unit results: id:00 state:SEND error:NO_ERROR project:16921 run:97 clone:4 gen:160 core:0x22 unit:0x000000c10002894c5f5bf34bf68ce549
12:21:32:WU00:FS02:Uploading 4.99MiB to 155.247.166.220
12:21:32:WU00:FS02:Connecting to 155.247.166.220:8080
12:21:37:WU02:FS01:0x22:Completed 750000 out of 5000000 steps (15%)
12:21:37:WU02:FS01:0x22:Checkpoint completed at step 750000
12:21:37:ERROR:WU01:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
12:21:42:WU00:FS02:Upload complete
12:21:42:WU00:FS02:Server responded WORK_ACK (400)
12:21:42:WU00:FS02:Final credit estimate, 50477.00 points
12:21:42:WU00:FS02:Cleaning up
12:22:05:WU01:FS02:Connecting to assign1.foldingathome.org:80
12:22:06:WU01:FS02:Assigned to work server 129.213.157.105
12:22:06:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 129.213.157.105
12:22:06:WU01:FS02:Connecting to 129.213.157.105:8080
12:22:06:ERROR:WU01:FS02:Exception: Server did not assign work unit
12:22:08:WU03:FS00:0xa8:Completed 85000 out of 500000 steps (17%)
12:22:37:WU02:FS01:0x22:Completed 800000 out of 5000000 steps (16%)
12:23:34:WU02:FS01:0x22:Completed 850000 out of 5000000 steps (17%)
12:23:42:WU01:FS02:Connecting to assign1.foldingathome.org:80
12:23:43:WU01:FS02:Assigned to work server 206.223.170.146
12:23:43:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 206.223.170.146
12:23:43:WU01:FS02:Connecting to 206.223.170.146:8080
12:23:59:WU03:FS00:0xa8:Completed 90000 out of 500000 steps (18%)
12:24:04:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
12:24:04:WU01:FS02:Connecting to 206.223.170.146:80
12:24:11:WU02:FS01:0x22:Completed 900000 out of 5000000 steps (18%)
12:24:13:ERROR:WU01:FS02:Exception: Server did not assign work unit
12:24:43:WU02:FS01:0x22:Completed 950000 out of 5000000 steps (19%)
12:25:14:WU02:FS01:0x22:Completed 1000000 out of 5000000 steps (20%)
12:25:14:WU02:FS01:0x22:Checkpoint completed at step 1000000
12:25:45:WU02:FS01:0x22:Completed 1050000 out of 5000000 steps (21%)
12:25:57:WU03:FS00:0xa8:Completed 95000 out of 500000 steps (19%)
12:26:15:WU02:FS01:0x22:Completed 1100000 out of 5000000 steps (22%)
12:26:20:WU01:FS02:Connecting to assign1.foldingathome.org:80
12:26:20:WU01:FS02:Assigned to work server 206.223.170.146
12:26:20:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 206.223.170.146
12:26:20:WU01:FS02:Connecting to 206.223.170.146:8080
12:26:27:ERROR:WU01:FS02:Exception: Server did not assign work unit

Code: Select all

03:13:55:WU03:FS02:0x22:Folding@home Core Shutdown: FINISHED_UNIT
03:13:56:WU03:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
03:13:56:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:17309 run:0 clone:873 gen:57 core:0x22 unit:0x0000004512bc7d9a0000000000000369
03:13:56:WU03:FS02:Uploading 14.64MiB to 18.188.125.154
03:13:56:WU03:FS02:Connecting to 18.188.125.154:8080
03:13:56:WU12:FS02:Starting

<<<redacted>>>>

03:13:57:WU12:FS02:0x22:Digital signatures verified
03:13:57:WU12:FS02:0x22:Folding@home GPU Core22 Folding@home Core
03:13:57:WU12:FS02:0x22:Version 0.0.13
03:13:57:WU12:FS02:0x22:  Checkpoint write interval: 62500 steps (5%) [20 total]
03:13:57:WU12:FS02:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
03:13:57:WU12:FS02:0x22:  XTC frame write interval: 125000 steps (10%) [10 total]
03:13:57:WU12:FS02:0x22:  Global context and integrator variables write interval: disabled
03:13:58:WU12:FS02:0x22:There are 4 platforms available.
03:13:58:WU12:FS02:0x22:Platform 0: Reference
03:13:58:WU12:FS02:0x22:Platform 1: CPU
03:13:58:WU12:FS02:0x22:Platform 2: OpenCL
03:13:58:WU12:FS02:0x22:  opencl-device 1 specified
03:13:58:WU12:FS02:0x22:Platform 3: CUDA
03:13:58:WU12:FS02:0x22:  cuda-device 1 specified
03:14:17:WU12:FS02:0x22:Attempting to create CUDA context:
03:14:17:WU12:FS02:0x22:  Configuring platform CUDA
03:14:22:WU12:FS02:0x22:  Using CUDA and gpu 1
03:14:22:WU12:FS02:0x22:Completed 0 out of 1250000 steps (0%)
03:14:23:WU12:FS02:0x22:Checkpoint completed at step 0
03:14:26:WARNING:WU03:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:14:26:WU03:FS02:Trying to send results to collection server
03:14:26:WU03:FS02:Uploading 14.64MiB to 140.163.4.231
03:14:26:WU03:FS02:Connecting to 140.163.4.231:8080
03:14:33:WU10:FS00:0xa7:Completed 120000 out of 250000 steps (48%)
03:14:56:ERROR:WU03:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
03:14:57:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:17309 run:0 clone:873 gen:57 core:0x22 unit:0x0000004512bc7d9a0000000000000369
03:14:57:WU03:FS02:Uploading 14.64MiB to 18.188.125.154
03:14:57:WU03:FS02:Connecting to 18.188.125.154:8080
03:15:15:WU12:FS02:0x22:Completed 12500 out of 1250000 steps (1%)
03:15:27:WARNING:WU03:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:15:27:WU03:FS02:Trying to send results to collection server
03:15:27:WU03:FS02:Uploading 14.64MiB to 140.163.4.231
03:15:27:WU03:FS02:Connecting to 140.163.4.231:8080
03:15:28:WU00:FS01:0x22:Completed 4900000 out of 5000000 steps (98%)
03:15:29:WU00:FS01:0x22:Checkpoint completed at step 4900000
03:15:33:WU10:FS00:0xa7:Completed 122500 out of 250000 steps (49%)
03:15:57:ERROR:WU03:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
03:15:57:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:17309 run:0 clone:873 gen:57 core:0x22 unit:0x0000004512bc7d9a0000000000000369
03:15:57:WU03:FS02:Uploading 14.64MiB to 18.188.125.154
03:15:57:WU03:FS02:Connecting to 18.188.125.154:8080
03:16:12:WU12:FS02:0x22:Completed 25000 out of 1250000 steps (2%)
03:16:28:WARNING:WU03:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:16:28:WU03:FS02:Trying to send results to collection server
03:16:28:WU03:FS02:Uploading 14.64MiB to 140.163.4.231
03:16:28:WU03:FS02:Connecting to 140.163.4.231:8080
03:16:34:WU10:FS00:0xa7:Completed 125000 out of 250000 steps (50%)
03:16:58:ERROR:WU03:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
03:17:09:WU12:FS02:0x22:Completed 37500 out of 1250000 steps (3%)
03:17:33:WU10:FS00:0xa7:Completed 127500 out of 250000 steps (51%)
03:17:35:WU11:FS02:Sending unit results: id:11 state:SEND error:NO_ERROR project:14909 run:45 clone:8 gen:72 core:0x22 unit:0x0000006181d59d695f526024535b608c
03:17:35:WU11:FS02:Uploading 16.92MiB to 129.213.157.105
03:17:35:WU11:FS02:Connecting to 129.213.157.105:8080
03:17:35:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:17309 run:0 clone:873 gen:57 core:0x22 unit:0x0000004512bc7d9a0000000000000369
03:17:35:WU03:FS02:Uploading 14.64MiB to 18.188.125.154
03:17:35:WU03:FS02:Connecting to 18.188.125.154:8080
03:18:05:WARNING:WU11:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:18:05:WU11:FS02:Trying to send results to collection server
03:18:05:WU11:FS02:Uploading 16.92MiB to 129.213.40.229
03:18:05:WU11:FS02:Connecting to 129.213.40.229:8080
03:18:05:WARNING:WU03:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:18:05:WU03:FS02:Trying to send results to collection server
03:18:05:WU03:FS02:Uploading 14.64MiB to 140.163.4.231
03:18:05:WU03:FS02:Connecting to 140.163.4.231:8080
03:18:06:WU12:FS02:0x22:Completed 50000 out of 1250000 steps (4%)
03:18:33:WU10:FS00:0xa7:Completed 130000 out of 250000 steps (52%)
03:18:35:ERROR:WU03:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
03:18:36:ERROR:WU11:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
03:19:04:WU12:FS02:0x22:Completed 62500 out of 1250000 steps (5%)
03:19:06:WU12:FS02:0x22:Checkpoint completed at step 62500
03:19:33:WU10:FS00:0xa7:Completed 132500 out of 250000 steps (53%)
03:20:04:WU12:FS02:0x22:Completed 75000 out of 1250000 steps (6%)
03:20:12:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:17309 run:0 clone:873 gen:57 core:0x22 unit:0x0000004512bc7d9a0000000000000369
03:20:12:WU03:FS02:Uploading 14.64MiB to 18.188.125.154
03:20:12:WU03:FS02:Connecting to 18.188.125.154:8080
03:20:32:WU10:FS00:0xa7:Completed 135000 out of 250000 steps (54%)
03:20:42:WARNING:WU03:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:20:42:WU03:FS02:Trying to send results to collection server
03:20:42:WU03:FS02:Uploading 14.64MiB to 140.163.4.231
03:20:42:WU03:FS02:Connecting to 140.163.4.231:8080
03:21:03:WU12:FS02:0x22:Completed 87500 out of 1250000 steps (7%)
03:21:10:WU07:FS02:Sending unit results: id:07 state:SEND error:NO_ERROR project:14908 run:136 clone:2 gen:107 core:0x22 unit:0x0000008981d59d695f5260297031acbb
03:21:10:WU07:FS02:Uploading 19.79MiB to 129.213.157.105
03:21:10:WU07:FS02:Connecting to 129.213.157.105:8080
03:21:13:ERROR:WU03:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
03:21:31:WU10:FS00:0xa7:Completed 137500 out of 250000 steps (55%)
03:21:38:WU00:FS01:0x22:Completed 4950000 out of 5000000 steps (99%)
03:21:39:WU13:FS01:Connecting to assign1.foldingathome.org:80
03:21:39:WU13:FS01:Assigned to work server 18.188.125.154
03:21:39:WU13:FS01:Requesting new work unit for slot 01: gpu:10:0 TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 18.188.125.154
03:21:39:WU13:FS01:Connecting to 18.188.125.154:8080
03:21:40:WARNING:WU07:FS02:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
03:21:40:WU07:FS02:Trying to send results to collection server
03:21:40:WU07:FS02:Uploading 19.79MiB to 150.136.14.110
03:21:40:WU07:FS02:Connecting to 150.136.14.110:8080
03:21:41:WU13:FS01:Downloading 12.01MiB


Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Sun Nov 15, 2020 11:55 am
by tomillo_CY
MaxTranced wrote:I've downgraded to 7.6.13 and the issue is gone (all WUs were sent on startup). I think it's safe to say that this is a client bug, most likely a timeout issue along the lines suggested by Neil-B.
Can confirm, downgrading to 7.6.13 resolves the problem.

Also I see that the 7.6.21 client generates a lot of network traffic continuously trying to upload the WUs. It always ends with the "Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0" error and the network traffic floods my ADSL upload channel (and that is how I found the problem with 7.6.21).

(W10 20H2)

Re: Trouble uploading to CS 150.136.14.110, 129.213.40.229,

Posted: Fri Nov 27, 2020 11:41 am
by Jandska
tomillo_CY wrote:
MaxTranced wrote:I've downgraded to 7.6.13 and the issue is gone (all WUs were sent on startup). I think it's safe to say that this is a client bug, most likely a timeout issue along the lines suggested by Neil-B.
Can confirm, downgrading to 7.6.13 resolves the problem.

Also I see that the 7.6.21 client generates a lot of network traffic continuously trying to upload the WUs. It always ends with the "Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0" error and the network traffic floods my ADSL upload channel (and that is how I found the problem with 7.6.21).

(W10 20H2)
Hi, how did you downgrade to lesser version of the client without losing all the work? I have these problems with the new version with various WUs and it also kills my internet latency (and not only on my computer but the router seems to be bogged by it)

Re: Client timeout issue when uploading to multiple CS serve

Posted: Fri Nov 27, 2020 11:55 am
by MaxTranced
I simply started the installation of the old version and it worked properly. Latest Windows 10.

Re: Client timeout issue when uploading to multiple CS serve

Posted: Fri Nov 27, 2020 1:19 pm
by Jandska
MaxTranced: Thanks....worked beautifully. The WUs were sent immediately