Page 2 of 5

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 6:47 am
by aetch
Teddy wrote:I have been out of the fold for a while and wasn't aware that server issues were still a thing?
Covid-19 has brought an influx of new problems.
New folders (me included) which lead to a shortage of work units.
The back-end was beefed up to handle the new folders.
The clients have been updated and the cores have seen development as well.
Servers periodically dropping out or filling up.
The covid-19 unit have seen some variety but now appear to be stable.
And now, what appears to be, a mis-configured collection server.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 6:56 am
by Teddy
aetch wrote:
Teddy wrote:I have been out of the fold for a while and wasn't aware that server issues were still a thing?
Covid-19 has brought an influx of new problems.
New folders (me included) which lead to a shortage of work units.
The back-end was beefed up to handle the new folders.
The clients have been updated and the cores have seen development as well.
Servers periodically dropping out or filling up.
The covid-19 unit have seen some variety but now appear to be stable.
And now, what appears to be, a mis-configured collection server.
Yeah a COVID-19 news article on overclockers Australia got me interested again in folding again, it’s been many years since I last fired up any distributed computers.
I guess a shortage of work units is good and bad, many new folders I imagine are always welcome.
Hopefully the mis configured server will be fixed soon, I sometimes forget that these servers are on the other side of the world and it’s probably the middle of the night there.
It’s a bright sunny day here Friday arvo so assume it would be fixed quickly.
Good to see Vijay Pande still involved in the research.

Teddy

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 7:40 am
by NGruia
07:39:23:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:13433 run:45 clone:2 gen:0 core:0x22 unit:0x000000008ca304c85f5871ae9996175e
07:39:23:WU02:FS00:Uploading 4.20MiB to 140.163.4.200
07:39:23:WU02:FS00:Connecting to 140.163.4.200:8080
07:39:24:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13433 run:93 clone:4 gen:0 core:0x22 unit:0x000000008ca304c85f5871b4fffe4555
07:39:24:WU01:FS01:Uploading 3.81MiB to 140.163.4.200
07:39:24:WU01:FS01:Connecting to 140.163.4.200:8080
07:39:33:WU04:FS00:0x22:Completed 580000 out of 2000000 steps (29%)
07:39:44:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
07:39:44:WU02:FS00:Connecting to 140.163.4.200:80
07:39:45:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
07:39:45:WU01:FS01:Connecting to 140.163.4.200:80
07:39:52:WU02:FS00:Upload 1.49%
07:40:06:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 7:47 am
by NGruia
07:45:15:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:13430 run:100 clone:2 gen:0 core:0x22 unit:0x000000008ca304c85f587170a32af3a8
07:45:15:WU01:FS00:Uploading 3.79MiB to 140.163.4.200
07:45:15:WU01:FS00:Connecting to 140.163.4.200:8080
07:45:23:WU04:FS01:0x22:Completed 587500 out of 1250000 steps (47%)
07:45:28:WU00:FS00:0x22:Completed 250000 out of 1250000 steps (20%)
07:45:36:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
07:45:36:WU01:FS00:Connecting to 140.163.4.200:80
07:45:58:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:46:10:WU04:FS01:0x22:Completed 600000 out of 1250000 steps (48%)
07:46:11:WU00:FS00:0x22:Completed 262500 out of 1250000 steps (21%)
07:46:35:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13432 run:90 clone:4 gen:0 core:0x22 unit:0x000000008ca304c85f58719e978f0989
07:46:35:WU02:FS01:Uploading 3.76MiB to 140.163.4.200
07:46:35:WU02:FS01:Connecting to 140.163.4.200:8080
07:46:54:WU00:FS00:0x22:Completed 275000 out of 1250000 steps (22%)
07:46:56:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
07:46:56:WU02:FS01:Connecting to 140.163.4.200:80
07:46:58:WU04:FS01:0x22:Completed 612500 out of 1250000 steps (49%)

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 7:49 am
by Shirty
Just to add my voice back into the conversation, after the credit of missing points yesterday I went straight back to the previous issue, so whilst the manual credit worked it appears that the auto-submission is still faulty.

The server simply doesn't seem to be accepting submissions at all now, multiple units stuck on send, which in some ways is worse as the credit is just dropping by the minute.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 8:13 am
by Bastiaan_NL
Same for me, 2 units stuck trying to upload.

Code: Select all

07:43:15:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13435 run:33 clone:0 gen:2 core:0x22 unit:0x000000028ca304c85f59891db0cc875f
07:43:15:WU02:FS01:Uploading 5.99MiB to 140.163.4.200
07:43:15:WU02:FS01:Connecting to 140.163.4.200:8080
07:43:36:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
07:43:36:WU02:FS01:Connecting to 140.163.4.200:80
07:43:57:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Code: Select all

05:48:14:WU00:FS02:Sending unit results: id:00 state:SEND error:NO_ERROR project:13436 run:57 clone:1 gen:0 core:0x22 unit:0x000000008ca304c85f598340fbc03b08
05:48:14:WU00:FS02:Uploading 5.87MiB to 140.163.4.200
05:48:14:WU00:FS02:Connecting to 140.163.4.200:8080
05:48:36:WARNING:WU00:FS02:WorkServer connection failed on port 8080 trying 80
05:48:36:WU00:FS02:Connecting to 140.163.4.200:80
05:48:57:WARNING:WU00:FS02:Exception: Failed to send results to work server: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 8:17 am
by PantherX
Thanks for that. I have notified the person who looks after that so let's wait and see what happens.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 8:33 am
by Teddy
It took a while but the stuck unit has cleared :-)

Code: Select all

06:34:24:WU00:FS01:0x22:Completed 1012500 out of 1250000 steps (81%)
06:34:27:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13433 run:92 clone:3 gen:0 core:0x22 unit:0x000000008ca304c85f5871b5f76c5193
06:34:27:WU01:FS01:Uploading 3.75MiB to 140.163.4.200
06:34:27:WU01:FS01:Connecting to 140.163.4.200:8080
06:35:01:WU02:FS00:0xa7:Completed 170000 out of 250000 steps (68%)
06:35:54:WU02:FS00:0xa7:Completed 172500 out of 250000 steps (69%)

snip
06:45:43:WU00:FS01:0x22:Completed 1062500 out of 1250000 steps (85%)
06:46:06:WU01:FS01:Upload 5.00%
06:46:12:WU01:FS01:Upload 41.69%
06:46:18:WU01:FS01:Upload 81.72%
06:46:21:WU01:FS01:Upload complete
06:46:21:WU01:FS01:Server responded WORK_ACK (400)
06:46:21:WU01:FS01:Final credit estimate, 180707.00 points
06:46:21:WU01:FS01:Cleaning up
All happy in the Bear household again.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 9:32 am
by Bastiaan_NL
Teddy wrote:It took a while but the stuck unit has cleared :-)
Still stuck on my end.
Finished one of the units 9 hours ago, credit (on the advanced control) dropped from 335k to 228k.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 12:42 pm
by Sevrin
Seems to have been sorted.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 1:24 pm
by Bastiaan_NL
The last one of my pending units was sent a few minutes ago!
I'll keep my eyes open for the next couple of hours :)

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 2:22 pm
by mgetz
I'm still sadly seeing transfer failures, the furthest I've gotten in uploading the WU I have is 2.25% sadly... hopefully this gets fixed before the WU times out all together.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 2:26 pm
by Bastiaan_NL
mgetz wrote:I'm still sadly seeing transfer failures, the furthest I've gotten in uploading the WU I have is 2.25% sadly... hopefully this gets fixed before the WU times out all together.
I had to do some work on both systems, so I had to shut them down anyway.
Now it doesn't make sense to me that if the server is the problem a reboot would matter, but strange enough both of them were returned a few minutes after starting the clients again.
So you could give a reboot a try if you haven't done it already, won't hurt I guess..

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 2:50 pm
by mgetz
Bastiaan_NL wrote: So you could give a reboot a try if you haven't done it already, won't hurt I guess..
I checked with a down detector... it's not just me, that endpoint is dead at the moment.

Looks like it finally went... at least the science is submitted, hopefully the server is in condition enough to accept it and make that work.

Re: 140.163.4.200

Posted: Fri Sep 11, 2020 2:52 pm
by paulch2
Just seen signs of life.
After hung/timed out connection attempts for a few hours, it just uploaded a queued result in a few seconds.