Page 1 of 2

vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Fri Oct 17, 2008 11:26 pm
by gbowman
vsp05, vsp11, and vsp15 (171.64.122.72/78/82) are down for maintenance

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sat Oct 18, 2008 1:37 pm
by 314159
Argh!

I have two completed p4433's queued for upload.
These SMP WUs have very short deadlines and were issued JUST prior to your bringing these servers down.

The final deadlines are this Sunday (like tomorrow) at 3AM and 10AM (EDT).

May I respectfully inquire whether the "maintenance" will be completed today (10/18)?
May I also respectfully inquire if the indicated collection server has record of these WUs? (unable to connect to CS so far - see below)

I assume that this was a planned maintenance.

Thank you in advance for a timely response.

Note: Edited after reviewing the load on the collection server.
These two WUs may well be on this overloaded server.
Best result so far has been to obtain a "503". :cry:

Too bad that we cannot just cancel "weekends" permanently. :)

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sat Oct 18, 2008 5:42 pm
by gbowman
I've brought vsp05 and vsp15 back up for now. I'll try and keep them up and running as much as possible this weekend.

Greg

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sat Oct 18, 2008 10:01 pm
by 314159
Thank you for "Customer Service" above and beyond the call of duty! :)

Both WUs were received and acknowledged (by xx.82) when sent automatically by the clients at the 6 hour interval points.

I hope that you and the other fine members of your group can find the time to enjoy this weekend (and others to follow). :!: :wink:

John

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sat Oct 18, 2008 11:13 pm
by gbowman
No problem:)

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sun Oct 19, 2008 12:34 am
by weedacres
Shortly after they went down for maintenance I had one ready to go. I still can't upload it to either 82 or 76.
I ran qfix and it shows no problems and ready to upload. I also tried sending several times with the same result as below.

Code: Select all

--- Opening Log file [October 19 00:18:50 UTC] 


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.23 Beta R1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/meema/folding/FAH
Executable: ./fah6
Arguments: -send all 

[00:18:50] - Ask before connecting: No
[00:18:50] - User name: Meema_Ubuntu (Team 52523)
[00:18:50] - User ID: 7EE636CC39A2B04D
[00:18:50] - Machine ID: 1
[00:18:50] 
[00:18:50] Loaded queue successfully.
[00:18:50] Attempting to return result(s) to server...
[00:18:50] Project: 4433 (Run 16, Clone 11, Gen 11)
[00:18:50] - Read packet limit of 540015616... Set to 524286976.


[00:18:50] + Attempting to send results [October 19 00:18:50 UTC]
[00:20:14] - Couldn't send HTTP request to server
[00:20:14] + Could not connect to Work Server (results)
[00:20:14]     (171.64.122.82:8080)
[00:20:14] + Retrying using alternative port
[00:21:47] - Couldn't send HTTP request to server
[00:21:47] + Could not connect to Work Server (results)
[00:21:47]     (171.64.122.82:80)
[00:21:47] - Error: Could not transmit unit 06 (completed October 17) to work server.
[00:21:47] - Read packet limit of 540015616... Set to 524286976.


[00:21:47] + Attempting to send results [October 19 00:21:47 UTC]
[00:21:47] - Couldn't send HTTP request to server
[00:21:47]   (Got status 503)
[00:21:47] + Could not connect to Work Server (results)
[00:21:47]     (171.64.122.76:8080)
[00:21:47] + Retrying using alternative port
[00:21:48] - Couldn't send HTTP request to server
[00:21:48]   (Got status 503)
[00:21:48] + Could not connect to Work Server (results)
[00:21:48]     (171.64.122.76:80)
[00:21:48]   Could not transmit unit 06 to Collection server; keeping in queue.
[00:21:48] - Failed to send all units to server

Folding@Home Client Shutdown.


--- Opening Log file [October 19 00:24:36 UTC] 



Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sun Oct 19, 2008 1:39 pm
by weedacres
The deadline has passed. Another work unit lost.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Tue Oct 21, 2008 3:31 am
by 314159
171.64.122.82 is down once again.

Have two WUs completed and queued and unable to send to this server or the "alleged" CS.

Earliest final deadline is (EDT) Tue Oct 21 11:59:35 2008 (2 days), i.e. in about 12 hours from now.

Might I suggest that these 2 day deadline SMPs be moved to another server if xx.82 is having problems?

Thanks,

John

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Tue Oct 21, 2008 11:58 pm
by 314159
Geesh Greg:

XX.82 is in reject mode once again.
Have two more SMPs awaiting upload with earliest final deadline (EDT) of Wed Oct 22 08:41:14 2008 (2 days).

I am an "old guy" and go to bed early and get up late. :wink:

The others did clear under clients' 6 hour auto-send. Hope that these do too.

Help!

Thanks,
John

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Wed Oct 22, 2008 3:03 pm
by 314159
Two lost. :(
These things happen. :wink:

Server's "WEIGHT" is set at 100000.
How about setting it down to around 1000 (or less) until this server is reliable?

I might also suggest resetting the final deadline on p4433 to 4 days.
I understand that quite a few of these WUs were lost and a four day expiration, given the problems with this server, would probably result in MORE science being submitted along with fewer upset "Volunteers".

I am not particularly upset as this is a small fraction of my farm's production.
I am, however, concerned. :roll:

I would also mention that I seriously question the benchmark on this particular WU. :!:
Statistically, it diverges substantially from other A2 and even A1 WUs.

Code: Select all

[07:01:00] - Autosending finished units...
[07:01:00] Trying to send all finished work units


[07:01:00] + Attempting to send results
[07:01:00] - Reading file work/wuresults_01.dat from core
[07:01:00]   (Read 1717366 bytes from disk)
[07:01:00] Connecting to http://171.64.122.82:8080/
[07:01:00] - Couldn't send HTTP request to server
[07:01:00] + Could not connect to Work Server (results)
[07:01:00]     (171.64.122.82:8080)
[07:01:00] - Error: Could not transmit unit 01 (completed October 21) to work server.
[07:01:00] - 5 failed uploads of this unit.


[07:01:00] + Attempting to send results
[07:01:00] - Reading file work/wuresults_01.dat from core
[07:01:00]   (Read 1717366 bytes from disk)
[07:01:00] Connecting to http://171.64.122.76:8080/
[07:01:01] - Couldn't send HTTP request to server
[07:01:01] + Could not connect to Work Server (results)
[07:01:01]     (171.64.122.76:8080)
[07:01:01]   Could not transmit unit 01 to Collection server; keeping in queue.
[07:01:01] + Sent 0 of 1 completed units to the server
[07:01:01] - Autosend completed
[07:01:45] Completed 47500 out of 250000 steps  (19%)

[12:57:37] Completed 80000 out of 250000 steps  (32%)
[13:01:01] Unit 1's deadline (October 22 12:41) has passed. <---------------------------------
[13:01:15] - Warning: Could not delete all work unit files (1): Core file absent
[13:01:15] - Autosending finished units...
[13:01:15] Trying to send all finished work units
[13:01:15] + No unsent completed units remaining.
[13:01:15] - Autosend completed

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Wed Oct 22, 2008 4:18 pm
by gbowman
Thanks for your input. I'll try and do what I can while we get these machines fixed.

Greg

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Wed Oct 22, 2008 4:50 pm
by wdanwatts
[05:35:12] + Attempting to send results
[05:35:12] - Couldn't send HTTP request to server
[05:35:12] + Could not connect to Work Server (results)
[05:35:12] (171.64.122.76:8080)
[05:35:12] Could not transmit unit 00 to Collection server; keeping in queue.

[11:35:12] Unit 0's deadline (October 22 10:14) has passed.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Wed Oct 22, 2008 8:16 pm
by hstierhoff
I am an absolute novice and have no idea what needs to be done, but just want to say that it is very discouraging to go through a folding sequence then have it set there until it we loose it. Getting folks to fold, like myself, will quickly be discouraged if this type of thing continues to happen.

Take one day at a time, and count today as a blessing.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Wed Oct 22, 2008 9:42 pm
by Trotador
It seems it is working now.

Thanks

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Thu Oct 23, 2008 5:41 pm
by TommyHicks
Could someone at Stanford please shoot servers 171.64.122.72/76.

Thanks in advance

th