Page 1 of 2

130.237.232.140 Not Accepting

Posted: Tue Sep 07, 2010 9:36 am
by Mitsimonsta
I have 2x SMP bonus units sitting here that I would really like to upload for the science. I do see that the remaining available WU's on this server is very low.

Any chance of at least accepting units crunched?

Re: 130.237.232.140 Not Accepting

Posted: Tue Sep 07, 2010 10:41 am
by kasson
We'll do our best, but this server may be down for a little while.

Re: 130.237.232.140 Not Accepting

Posted: Tue Sep 07, 2010 10:49 am
by Mitsimonsta
Thanks for the speedy reply Peter. Even getting the WU details to the collection/results server would be nice.

It is slightly disappointing when the processing is done and you can't get the results sent back, and more so when it is multiple units.

I hope it is up and running again soon.

Re: 130.237.232.140 Not Accepting

Posted: Tue Sep 07, 2010 5:15 pm
by MrNovi
When you say little while, do you mean hours or days? Since it's already been over 7 hours since it was reported hours doesn't seem very likely. What happens to the WU's we have waiting to upload if it isn't fixed by the preferred deadline?

Re: 130.237.232.140 Not Accepting

Posted: Tue Sep 07, 2010 10:47 pm
by MrNovi
Well, it's been 12 hours since it was first reported. Any updates as to when this is going to be fixed?

Re: 130.237.232.140 Not Accepting

Posted: Tue Sep 07, 2010 11:49 pm
by Mitsimonsta
I am sure they will have a fix in place as soon as possible. From Dr. Kasson's message it appears that the issue is not with the server instance or waiting for new WU's to generate, but possibly the underlying hardware (RAID volume failure, NIC failure possibly... who knows).
MrNovi wrote:What happens to the WU's we have waiting to upload if it isn't fixed by the preferred deadline?
You lose it, it will expire and auto-delete from the queue when it passes final deadline. Too bad, so sad.

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 12:05 am
by MrNovi
Mitsimonsta wrote:I am sure they will have a fix in place as soon as possible. From Dr. Kasson's message it appears that the issue is not with the server instance or waiting for new WU's to generate, but possibly the underlying hardware (RAID volume failure, NIC failure possibly... who knows).


I can understand that there may be more to it than just reboot the server, but there should be redundancies in place for that occasion.
MrNovi wrote:What happens to the WU's we have waiting to upload if it isn't fixed by the preferred deadline?
Mitsimonsta wrote:You lose it, it will expire and auto-delete from the queue when it passes final deadline. Too bad, so sad.


That part I find unacceptable. If the completed WU's can't be uploaded they should NOT be distributing them to be worked on.

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 12:47 am
by 7im
Mitsimonsta isn't know for his diplomatic skills, but is technically correct. ;)

FAH does have redundancies. First, the completed WU is cached at the client level, and the client moves on to new work to keep the client productive. Next, the fah client will automatically try to upload the completed WU every 6 hours. And most WS (work servers) have a back up CS (collection server) so that when the WS goes offline, the CS can accept the stranded work units.

The Server Status page (link on toolbar at top of page) shows the CS is 130.237.165.141 for WS 130.237.165.140. Does your fahlog show the client attempting to upload to the CS as well as the WS?

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 12:57 am
by Full_Taoer
7im wrote:The Server Status page (link on toolbar at top of page) shows the CS is 130.237.165.141 for WS 130.237.165.140. Does your fahlog show the client attempting to upload to the CS as well as the WS?
My HFM log shows the client attempting both of those servers for over 16 hours now. I am about 2 hours away from another retry, but as I write this the work server is still listed as not accepting. Maybe I am overlooking it, but I do not see the collection server listed on the status page.

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 1:00 am
by MrNovi
No, it tries 130.237.232.140:8080 first then goes to 130.237.165.141:8080. At no time does it try to go to 130.237.165.140. 130.267.165.140 isn't even on your server status list at http://fah-web.stanford.edu/serverstat.html

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 1:04 am
by bollix47

Code: Select all

[22:41:07] - Autosending finished units... [September 7 22:41:07 UTC]
[22:41:07] Trying to send all finished work units
[22:41:07] Project: 6015 (Run 1, Clone 63, Gen 282)


[22:41:07] + Attempting to send results [September 7 22:41:07 UTC]
[22:41:07] - Reading file work/wuresults_03.dat from core
[22:41:07]   (Read 20542514 bytes from disk)
[22:41:07] Connecting to http://130.237.232.140:8080/
[22:41:08] - Couldn't send HTTP request to server
[22:41:08] + Could not connect to Work Server (results)
[22:41:08]     (130.237.232.140:8080)
[22:41:08] + Retrying using alternative port
[22:41:08] Connecting to http://130.237.232.140:80/
[22:41:10] - Couldn't send HTTP request to server
[22:41:10] + Could not connect to Work Server (results)
[22:41:10]     (130.237.232.140:80)
[22:41:10] - Error: Could not transmit unit 03 (completed September 7) to work server.
[22:41:10] - 6 failed uploads of this unit.


[22:41:10] + Attempting to send results [September 7 22:41:10 UTC]
[22:41:10] - Reading file work/wuresults_03.dat from core
[22:41:10]   (Read 20542514 bytes from disk)
[22:41:10] Connecting to http://130.237.165.141:8080/
[22:41:31] - Couldn't send HTTP request to server
[22:41:31] + Could not connect to Work Server (results)
[22:41:31]     (130.237.165.141:8080)
[22:41:31] + Retrying using alternative port
[22:41:31] Connecting to http://130.237.165.141:80/
[22:41:52] - Couldn't send HTTP request to server
[22:41:52] + Could not connect to Work Server (results)
[22:41:52]     (130.237.165.141:80)
[22:41:52]   Could not transmit unit 03 to Collection server; keeping in queue.
[22:41:52] + Sent 0 of 1 completed units to the server
[22:41:52] - Autosend completed

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 1:48 am
by MrNovi
And as for the redundancies, if they are in place why they heck aren't the WU's uploading to them? It's obvious that they either don't actually have any redundancies, or they are so misconfigured as to be non-functional.

So now, when are they going to fix this this so we can upload our finished Work Units?

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 2:49 am
by PantherX
The good news is that there will be more SMP Servers soon: http://folding.typepad.com/news/2010/08 ... -line.html

The "unknown" news is that there isn't any ETA.

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 3:11 am
by aoeu
Dear MrNovi,

If everything was perfect all the time, my mother would not have Alzheimer's. Please be patient. If I lose a WU it is the end of the world as I know it, not.

Re: 130.237.232.140 Not Accepting

Posted: Wed Sep 08, 2010 9:26 pm
by gwildperson
MrNovi wrote:What happens to the WU's we have waiting to upload if it isn't fixed by the preferred deadline?
Mitsimonsta wrote:You lose it, it will expire and auto-delete from the queue when it passes final deadline. Too bad, so sad.
Actually, you get base points with no bonus when it doesn't upload by the preferred deadline. The WU will auto-delete from the queue at the final deadline.
MrNovi wrote:That part I find unacceptable. If the completed WU's can't be uploaded they should NOT be distributing them to be worked on.
You can call it "unacceptable" if you want, but we certainly have been warned. None of us like it, but but we have to accept that risk when we choose to run SMP units with the bonus plan, and it's really a rather rare phenomenon.
kasson wrote:Important: The bonus scheme is based on the time that returned work units are received by our servers. We make every effort to keep these servers available to receive work, but there will inevitably be congestion or downtimes. We do not guarantee server availability. If for some reason you do not receive the expected bonus please do let us know, but unlike base points, we will generally not give recredits for bonuses. Bonuses are not guaranteed. Similar policies apply for unexpected loss of work units, etc. The bonus program has some "slack" calculated in to allow for such unexpected events.
source: Subject: new release: extra-large work units