Page 1 of 5

140.163.4.200

Posted: Thu Sep 10, 2020 2:05 pm
by rickoic
Don't think its passing returned wu's on for counting as completed.

Returned 12:13:46 13430(80,0,3) estimated points 225,605 comes back not found
Returned 12:42:42 13437(12,0,1) estimated points 379,636 comes back not found

Noticed a drop in 3 hourly points on 9/9 and continued into 9/10. These are only first 2 I've found but think there are 15-20 more that haven't been counted over the past 2 days.

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 2:26 pm
by rickoic
Returned 14:15:30 13434(12,1,1) est pts 379,081 comes back not found

Other wu's returned during the last few hours are counted from same computer.

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 2:33 pm
by Shirty
I think I've fallen into the same hole. I have been happily averaging 16-18 million ppd for quite a while and haven't changed anything in terms of cards or settings in the past few days, but popped over to EOC a short while ago and saw:

Image

Yesterday's output looks to be 50% of normal, and I'd expect to have hit around 5.5-5 million points so far today, but have only been credited 2.3 million. My HFM shows normal output, I have only 2 failed WUs in the last week or two, and all my clients are folding away merrily showing a combined 16m ppd at present.

Where are all the points going? I am burning a LOT of electricity for this, about 2.5kW at the wall! I can only surmise that quite a few of my WUs are not being credited correctly in the past day and a bit. Will this be identified and credited retrospectively?

:?

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 4:00 pm
by Joe_H
This is a new server added recently, it is possible it has not been reporting results to the stats database. That could be from a configuration issue or another problem. I will report this to the persons managing this server.

As for the points, they are very rarely lost, but delays are quite common.

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 5:43 pm
by Bastiaan_NL
Seems I'm having the same issue.
Here are a few units that I could not find with the WU Status search tool.

Code: Select all

Project: 13430 (Run 15, Clone 0, Gen 3)
Project: 13430 (Run 76, Clone 0, Gen 4)
Project: 13430 (Run 59, Clone 0, Gen 5)
Project: 13431 (Run 51, Clone 0, Gen 0)
Project: 13432 (Run 52, Clone 0, Gen 2)
Project: 13432 (Run 89, Clone 0, Gen 0)
Project: 13433 (Run 51, Clone 0, Gen 1)
Project: 13433 (Run 69, Clone 0, Gen 3)
Project: 13433 (Run 40, Clone 0, Gen 5)

09-09-2020:
Project: 13433 (Run 67, Clone 0, Gen 0)
14:45:09:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13433 run:67 clone:0 gen:0 core:0x22 unit:0x000000008ca304c85f5871b3c2dfef51
14:45:09:WU00:FS01:Uploading 3.82MiB to 140.163.4.200

10-09-2020
Project: 13430 (Run 69, Clone 0, Gen 6)
15:33:57:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13430 run:69 clone:0 gen:6 core:0x22 unit:0x000000078ca304c85f58716b31df4f6b
15:33:57:WU02:FS01:Uploading 3.77MiB to 140.163.4.200

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 8:04 pm
by rickoic
Looks like it's been reconfigured, I just got a big bump in points at the 20Z update.
Tks

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 8:10 pm
by Bastiaan_NL
Same for me, thanks for fixing it! :)

Re: 140.163.4.200

Posted: Thu Sep 10, 2020 9:39 pm
by Shirty
+1

Many thanks for the swift fix.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 1:44 am
by ipkh
It is still buggy for me.

01:41:02:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13434 run:51 clone:9 gen:0 core:0x22 unit:0x000000018ca304c85f59831205100ac7
01:41:02:WU01:FS01:Uploading 5.52MiB to 140.163.4.200
01:41:02:WU01:FS01:Connecting to 140.163.4.200:8080
01:41:23:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
01:41:23:WU01:FS01:Connecting to 140.163.4.200:80

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 3:33 am
by mgetz

Code: Select all

03:25:50:WU02:FS01:Uploading 5.56MiB to 140.163.4.200
03:25:50:WU02:FS01:Connecting to 140.163.4.200:8080
03:27:11:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
03:28:00:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
03:28:00:WU02:FS01:Connecting to 140.163.4.200:80
03:30:11:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: Connection timed out
I'm also seeing serious issues with this server, I have a WU waiting to be submitted and the assignment server keeps pushing this one. It really shouldn't be doing this if the server is a black hole. At a minimum can we set this server as not assigning until this is figured out?

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 3:41 am
by JohnChodera
Thanks for all the feedback! This is a new workserver we are gingerly testing on some small projects to support the COVID Moonshot.

> Don't think its passing returned wu's on for counting as completed.

We fixed a stats server connection issue. This should be working now---I can see the stats for the first projects listed above.

> 01:41:23:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80

This might be a networking error. We'll look into this!

> I'm also seeing serious issues with this server, I have a WU waiting to be submitted and the assignment server keeps pushing this one. It really shouldn't be doing this if the server is a black hole. At a minimum can we set this server as not assigning until this is figured out?

I've zeroed out the weights until we can figure out what is wrong.

Apologies for this!

~ John Chodera // MSKCC

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 4:51 am
by aetch
Mine has been trying to upload for the past 2 hours. These are not small projects, the base points was 118,000.

Code: Select all

04:12:47:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13436 run:2 clone:11 gen:0 core:0x22 unit:0x000000018ca304c85f598339547f200a
04:12:47:WU00:FS01:Uploading 5.95MiB to 140.163.4.200
04:12:47:WU00:FS01:Connecting to 140.163.4.200:8080
04:13:08:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
04:13:08:WU00:FS01:Connecting to 140.163.4.200:80
04:13:30:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 6:02 am
by Sevrin
Same problem.

Code: Select all

05:57:54:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR [b]project:13431 run:81 clone:1 gen:1[/b] core:0x22 unit:0x000000018ca304c85f587186e3aaaeea
05:57:54:WU00:FS01:Uploading 3.82MiB to 140.163.4.200
05:57:54:WU00:FS01:Connecting to 140.163.4.200:8080
05:58:15:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
05:58:15:WU00:FS01:Connecting to 140.163.4.200:80
05:58:36:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Restarting FAH got me stuck connecting, so I restarted my machine. Now I've connected, and started on a new WU, while the old one is still trying to reconnect.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 6:16 am
by LazyDev
I've got two WU's that are failing to upload to the same server, 140.163.4.200:80. 13432 and 13431 respectively.

Code: Select all

01:18:52:WARNING:WU05:FS04:WorkServer connection failed on port 8080 trying 80
01:19:14:ERROR:WU05:FS04:Exception: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
01:21:35:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:26:59:WARNING:WU01:FS04:Exception: Failed to send results to work server: Transfer failed
01:27:21:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:27:42:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
01:28:20:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:28:42:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
01:29:58:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:30:19:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
01:32:35:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:36:45:WARNING:WU01:FS04:Exception: Failed to send results to work server: Transfer failed
01:37:06:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:37:28:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
01:43:58:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:44:20:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
01:55:04:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
01:59:19:WARNING:WU01:FS04:Exception: Failed to send results to work server: Transfer failed
02:13:00:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
02:13:22:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
02:43:33:WARNING:WU01:FS04:Exception: Failed to send results to work server: Transfer failed
03:29:01:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
03:29:23:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:33:37:WARNING:WU07:FS00:AS lowered CPUs from 11 to 9
03:54:07:ERROR:WU06:FS04:Exception: Server did not assign work unit
04:29:02:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
04:29:23:WARNING:WU01:FS04:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:31:19:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
04:31:41:WARNING:WU04:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:32:03:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
04:32:24:WARNING:WU04:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:34:46:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
04:35:07:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:37:11:WARNING:WU10:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
04:37:12:WARNING:WU00:FS02:Exception: Failed to send results to work server: Transfer failed
04:37:34:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
04:37:55:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:39:11:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
04:39:33:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:41:48:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
04:42:10:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:46:02:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
04:46:24:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:52:54:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
04:57:06:WARNING:WU00:FS02:Exception: Failed to send results to work server: Transfer failed
05:04:00:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
05:04:21:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
05:21:56:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
05:22:18:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
05:29:02:WARNING:WU01:FS04:WorkServer connection failed on port 8080 trying 80
05:33:18:WARNING:WU01:FS04:Exception: Failed to send results to work server: Transfer failed
05:50:58:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
05:51:20:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2020-09-11 *******************************

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 6:37 am
by Teddy
I've got a similar work unit won't send either

Code: Select all

06:30:13:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13433 run:92 clone:3 gen:0 core:0x22 unit:0x000000008ca304c85f5871b5f76c5193
06:30:13:WU01:FS01:Uploading 3.75MiB to 140.163.4.200
06:30:13:WU01:FS01:Connecting to 140.163.4.200:8080
06:30:34:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
06:30:34:WU01:FS01:Connecting to 140.163.4.200:80
06:30:34:WU02:FS00:0xa7:Completed 157500 out of 250000 steps (63%)
06:30:55:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond.
06:31:31:WU02:FS00:0xa7:Completed 160000 out of 250000 steps (64%)
06:31:34:WU00:FS01:0x22:Completed 1000000 out of 1250000 steps (80%)
06:32:24:WU02:FS00:0xa7:Completed 162500 out of 250000 steps (65%)
06:33:17:WU02:FS00:0xa7:Completed 165000 out of 250000 steps (66%)
06:34:09:WU02:FS00:0xa7:Completed 167500 out of 250000 steps (67%)
06:34:24:WU00:FS01:0x22:Completed 1012500 out of 1250000 steps (81%)
06:34:27:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13433 run:92 clone:3 gen:0 core:0x22 unit:0x000000008ca304c85f5871b5f76c5193
06:34:27:WU01:FS01:Uploading 3.75MiB to 140.163.4.200
06:34:27:WU01:FS01:Connecting to 140.163.4.200:8080

I have been out of the fold for a while and wasn't aware that server issues were still a thing?