Page 24 of 28

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 22, 2010 10:03 pm
by noorman
.

Beam us all up, Scotty :lol:


.

.71

Posted: Mon Feb 22, 2010 10:04 pm
by HaloJones
Can't get any work. Three clients are stuck on .71 and won't talk to anything else.

Re: 171.64.65.71 accepting... but

Posted: Mon Feb 22, 2010 10:09 pm
by noorman
HaloJones wrote:Can't get any work. Three clients are stuck on .71 and won't talk to anything else.
.


Did you try a shutdown of F@H, deletion of Work folder and queue.dat + restart ?


.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 22, 2010 10:18 pm
by tobor
Something still is'nt right..
Once I completed a wu it was unable to connect to 171.64.65.71... :|

Code: Select all

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[21:53:36] - Ask before connecting: No
[21:53:36] - User name: stv911 (Team 4)
[21:53:36] - User ID: BE4069840F022A4
[21:53:36] - Machine ID: 3
[21:53:36] 
[21:53:36] Loaded queue successfully.
[21:53:36] Initialization complete
[21:53:36] - Preparing to get new work unit...
[21:53:36] + Attempting to get work packet
[21:53:36] - Connecting to assignment server
[21:53:37] - Successful: assigned to (171.64.65.71).
[21:53:37] + News From Folding@Home: Welcome to Folding@Home
[21:53:37] Loaded queue successfully.
[21:53:37] - Couldn't send HTTP request to server
[21:53:37] + Could not connect to Work Server
[21:53:37] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[21:53:48] + Attempting to get work packet
[21:53:48] - Connecting to assignment server
[21:53:49] - Successful: assigned to (171.64.65.71).
[21:53:49] + News From Folding@Home: Welcome to Folding@Home
[21:53:49] Loaded queue successfully.
[21:53:49] - Couldn't send HTTP request to server
[21:53:49] + Could not connect to Work Server
[21:53:49] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[21:54:14] + Attempting to get work packet
[21:54:14] - Connecting to assignment server
[21:54:15] - Successful: assigned to (171.64.65.71).
[21:54:15] + News From Folding@Home: Welcome to Folding@Home
[21:54:15] Loaded queue successfully.
[21:54:15] - Couldn't send HTTP request to server
[21:54:15] + Could not connect to Work Server
[21:54:15] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
[21:54:44] + Attempting to get work packet
[21:54:44] - Connecting to assignment server
[21:54:45] - Successful: assigned to (171.64.65.71).
[21:54:45] + News From Folding@Home: Welcome to Folding@Home
[21:54:45] Loaded queue successfully.
[21:54:45] - Couldn't send HTTP request to server
[21:54:45] + Could not connect to Work Server
[21:54:45] - Attempt #4  to get work failed, and no other work to do.
Waiting before retry.
[21:55:27] + Attempting to get work packet
[21:55:27] - Connecting to assignment server
[21:55:27] - Successful: assigned to (171.64.65.71).
[21:55:27] + News From Folding@Home: Welcome to Folding@Home
[21:55:27] Loaded queue successfully.
[21:55:28] - Couldn't send HTTP request to server
[21:55:28] + Could not connect to Work Server
[21:55:28] - Attempt #5  to get work failed, and no other work to do.
Waiting before retry.
[21:56:51] + Attempting to get work packet
[21:56:51] - Connecting to assignment server
[21:56:51] - Successful: assigned to (171.64.65.71).
[21:56:51] + News From Folding@Home: Welcome to Folding@Home
[21:56:51] Loaded queue successfully.
[21:56:52] - Couldn't send HTTP request to server
[21:56:52] + Could not connect to Work Server
[21:56:52] - Attempt #6  to get work failed, and no other work to do.
Waiting before retry.

Folding@Home Client Shutdown.


--- Opening Log file [February 22 21:57:53 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[21:57:53] - Ask before connecting: No
[21:57:53] - User name: stv911 (Team 4)
[21:57:53] - User ID: BE4069840F022A4
[21:57:53] - Machine ID: 3
[21:57:53] 
[21:57:53] Loaded queue successfully.
[21:57:53] Initialization complete
[21:57:53] - Preparing to get new work unit...
[21:57:53] + Attempting to get work packet
[21:57:53] - Connecting to assignment server
[21:57:53] - Successful: assigned to (171.64.65.71).
[21:57:53] + News From Folding@Home: Welcome to Folding@Home
[21:57:54] Loaded queue successfully.
[21:57:54] - Couldn't send HTTP request to server
[21:57:54] + Could not connect to Work Server
[21:57:54] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[21:58:09] + Attempting to get work packet
[21:58:09] - Connecting to assignment server
[21:58:10] - Successful: assigned to (171.64.65.71).
[21:58:10] + News From Folding@Home: Welcome to Folding@Home
[21:58:10] Loaded queue successfully.
[21:58:10] - Couldn't send HTTP request to server
[21:58:10] + Could not connect to Work Server
[21:58:10] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[21:58:26] + Attempting to get work packet
[21:58:26] - Connecting to assignment server
[21:58:27] - Successful: assigned to (171.64.65.71).
[21:58:27] + News From Folding@Home: Welcome to Folding@Home
[21:58:27] Loaded queue successfully.
[21:58:27] - Couldn't send HTTP request to server
[21:58:27] + Could not connect to Work Server
[21:58:27] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
[21:58:56] + Attempting to get work packet
[21:58:56] - Connecting to assignment server
[21:58:56] - Successful: assigned to (171.64.65.71).
[21:58:56] + News From Folding@Home: Welcome to Folding@Home
[21:58:56] Loaded queue successfully.
[21:58:57] - Couldn't send HTTP request to server
[21:58:57] + Could not connect to Work Server
[21:58:57] - Attempt #4  to get work failed, and no other work to do.
Waiting before retry.
[21:59:43] + Attempting to get work packet
[21:59:43] - Connecting to assignment server
[21:59:44] - Successful: assigned to (171.64.65.71).
[21:59:44] + News From Folding@Home: Welcome to Folding@Home
[21:59:44] Loaded queue successfully.
[21:59:44] - Couldn't send HTTP request to server
[21:59:44] + Could not connect to Work Server
[21:59:44] - Attempt #5  to get work failed, and no other work to do.
Waiting before retry.
[22:01:05] + Attempting to get work packet
[22:01:05] - Connecting to assignment server
[22:01:05] - Successful: assigned to (171.64.65.71).
[22:01:05] + News From Folding@Home: Welcome to Folding@Home
[22:01:06] Loaded queue successfully.
[22:01:06] - Couldn't send HTTP request to server
[22:01:06] + Could not connect to Work Server
[22:01:06] - Attempt #6  to get work failed, and no other work to do.
Waiting before retry.
[22:03:48] + Attempting to get work packet
[22:03:48] - Connecting to assignment server
[22:03:49] - Successful: assigned to (171.64.65.71).
[22:03:49] + News From Folding@Home: Welcome to Folding@Home
[22:03:49] Loaded queue successfully.
[22:03:49] - Couldn't send HTTP request to server
[22:03:49] + Could not connect to Work Server
[22:03:49] - Attempt #7  to get work failed, and no other work to do.
Waiting before retry.

Folding@Home Client Shutdown.


--- Opening Log file [February 22 22:05:33 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[22:05:33] - Ask before connecting: No
[22:05:33] - User name: stv911 (Team 4)
[22:05:33] - User ID: BE4069840F022A4
[22:05:33] - Machine ID: 3
[22:05:33] 
[22:05:33] Loaded queue successfully.
[22:05:33] Initialization complete
[22:05:33] - Preparing to get new work unit...
[22:05:33] + Attempting to get work packet
[22:05:33] - Connecting to assignment server
[22:05:34] - Successful: assigned to (171.64.65.71).
[22:05:34] + News From Folding@Home: Welcome to Folding@Home
[22:05:34] Loaded queue successfully.
[22:05:34] - Couldn't send HTTP request to server
[22:05:34] + Could not connect to Work Server
[22:05:34] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[22:05:47] + Attempting to get work packet
[22:05:47] - Connecting to assignment server
[22:05:48] - Successful: assigned to (171.64.65.71).
[22:05:48] + News From Folding@Home: Welcome to Folding@Home
[22:05:48] Loaded queue successfully.
[22:05:48] - Couldn't send HTTP request to server
[22:05:48] + Could not connect to Work Server
[22:05:48] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[22:06:12] + Attempting to get work packet
[22:06:12] - Connecting to assignment server
[22:06:13] - Successful: assigned to (171.64.65.71).
[22:06:13] + News From Folding@Home: Welcome to Folding@Home
[22:06:13] Loaded queue successfully.
[22:06:13] - Couldn't send HTTP request to server
[22:06:13] + Could not connect to Work Server
[22:06:13] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
[22:06:37] + Attempting to get work packet
[22:06:37] - Connecting to assignment server
[22:06:37] - Successful: assigned to (171.64.65.71).
[22:06:37] + News From Folding@Home: Welcome to Folding@Home
[22:06:37] Loaded queue successfully.
[22:06:38] - Couldn't send HTTP request to server
[22:06:38] + Could not connect to Work Server
[22:06:38] - Attempt #4  to get work failed, and no other work to do.
Waiting before retry.
[22:07:21] + Attempting to get work packet
[22:07:21] - Connecting to assignment server
[22:07:21] - Successful: assigned to (171.64.65.71).
[22:07:21] + News From Folding@Home: Welcome to Folding@Home
[22:07:21] Loaded queue successfully.
[22:07:22] - Couldn't send HTTP request to server
[22:07:22] + Could not connect to Work Server
[22:07:22] - Attempt #5  to get work failed, and no other work to do.
Waiting before retry.
[22:08:43] + Attempting to get work packet
[22:08:43] - Connecting to assignment server
[22:08:43] - Successful: assigned to (171.64.65.71).
[22:08:43] + News From Folding@Home: Welcome to Folding@Home
[22:08:43] Loaded queue successfully.
[22:08:44] - Couldn't send HTTP request to server
[22:08:44] + Could not connect to Work Server
[22:08:44] - Attempt #6  to get work failed, and no other work to do.
Waiting before retry.

Folding@Home Client Shutdown.


--- Opening Log file [February 22 22:11:28 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[22:11:28] - Ask before connecting: No
[22:11:28] - User name: stv911 (Team 4)
[22:11:28] - User ID: BE4069840F022A4
[22:11:28] - Machine ID: 3
[22:11:28] 
[22:11:28] Loaded queue successfully.
[22:11:28] Initialization complete
[22:11:28] - Preparing to get new work unit...
[22:11:28] + Attempting to get work packet
[22:11:28] - Connecting to assignment server
[22:11:28] - Successful: assigned to (171.64.65.71).
[22:11:28] + News From Folding@Home: Welcome to Folding@Home
[22:11:28] Loaded queue successfully.
[22:11:29] - Couldn't send HTTP request to server
[22:11:29] + Could not connect to Work Server
[22:11:29] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[22:11:35] + Attempting to get work packet
[22:11:35] - Connecting to assignment server
[22:11:36] - Successful: assigned to (171.64.65.71).
[22:11:36] + News From Folding@Home: Welcome to Folding@Home
[22:11:36] Loaded queue successfully.
[22:11:36] - Couldn't send HTTP request to server
[22:11:36] + Could not connect to Work Server
[22:11:36] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[22:12:00] + Attempting to get work packet
[22:12:00] - Connecting to assignment server
[22:12:01] - Successful: assigned to (171.64.65.71).
[22:12:01] + News From Folding@Home: Welcome to Folding@Home
[22:12:01] Loaded queue successfully.
[22:12:01] - Couldn't send HTTP request to server
[22:12:01] + Could not connect to Work Server
[22:12:01] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
[22:12:34] + Attempting to get work packet
[22:12:34] - Connecting to assignment server
[22:12:34] - Successful: assigned to (171.64.65.71).
[22:12:34] + News From Folding@Home: Welcome to Folding@Home
[22:12:34] Loaded queue successfully.
[22:12:34] - Couldn't send HTTP request to server
[22:12:34] + Could not connect to Work Server
[22:12:34] - Attempt #4  to get work failed, and no other work to do.
Waiting before retry.
[22:13:18] + Attempting to get work packet
[22:13:18] - Connecting to assignment server
[22:13:19] - Successful: assigned to (171.64.65.71).
[22:13:19] + News From Folding@Home: Welcome to Folding@Home
[22:13:19] Loaded queue successfully.
[22:13:19] - Couldn't send HTTP request to server
[22:13:19] + Could not connect to Work Server
[22:13:19] - Attempt #5  to get work failed, and no other work to do.
Waiting before retry.

Folding@Home Client Shutdown.


--- Opening Log file [February 22 22:14:50 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[22:14:50] - Ask before connecting: No
[22:14:50] - User name: stv911 (Team 4)
[22:14:50] - User ID: BE4069840F022A4
[22:14:50] - Machine ID: 3
[22:14:50] 
[22:14:50] Loaded queue successfully.
[22:14:50] Initialization complete
[22:14:50] - Preparing to get new work unit...
[22:14:50] + Attempting to get work packet
[22:14:50] - Connecting to assignment server
[22:14:51] - Successful: assigned to (171.64.65.71).
[22:14:51] + News From Folding@Home: Welcome to Folding@Home
[22:14:51] Loaded queue successfully.
[22:14:51] - Couldn't send HTTP request to server
[22:14:51] + Could not connect to Work Server
[22:14:51] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[22:15:06] + Attempting to get work packet
[22:15:06] - Connecting to assignment server
[22:15:07] - Successful: assigned to (171.64.65.71).
[22:15:07] + News From Folding@Home: Welcome to Folding@Home
[22:15:07] Loaded queue successfully.
[22:15:07] - Couldn't send HTTP request to server
[22:15:07] + Could not connect to Work Server
[22:15:07] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[22:15:19] + Attempting to get work packet
[22:15:19] - Connecting to assignment server
[22:15:19] - Successful: assigned to (171.64.65.71).
[22:15:19] + News From Folding@Home: Welcome to Folding@Home
[22:15:19] Loaded queue successfully.
[22:15:20] - Couldn't send HTTP request to server
[22:15:20] + Could not connect to Work Server
[22:15:20] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.

Re: 171.64.65.71 accepting... but

Posted: Mon Feb 22, 2010 10:26 pm
by HaloJones
Yes.

Only seems to be the system tray version. A console version works so they're running of that for the moment.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 22, 2010 10:39 pm
by tobor
4 clients down..
restarted all a few times.
guess will just have to wait and hope fr the best.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 22, 2010 11:24 pm
by cdb
I can't connect either.

Re: 171.64.65.71 accepting... but

Posted: Mon Feb 22, 2010 11:34 pm
by rjbelans
Same here. 6 systray versions with no work being downloaded.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 22, 2010 11:39 pm
by tobor
Mine just connected...

Re: 171.64.65.71 accepting... but

Posted: Mon Feb 22, 2010 11:46 pm
by bruce
It's a very, very busy server and it's dishing out a lot of work, but it's being hit by a lot of clients. From serverstat:
CPU LOAD = 14.18
NET LOAD = 182
% Ass G = 75

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 22, 2010 11:57 pm
by derrickmcc
171.64.65.71 still not issuing WU's 4 GPUs with no work for 2 hours.

Have just rebooted and picked up 4 Wus from 171.64.65.20.

CPU load on 171.64.65.71 has been over 14 (currently 14.18) for days, in fact last time CPU load was in single figures was 12th February.

Why doesn't the assignment server automatically switch to an alternative server instead of repeatedly trying the same overloaded server?

Or could 171.64.65.71 be set just to receive completed WU's, so that our GPU's get assigned from other servers?

Image

Re: 171.64.65.71 accepting... but

Posted: Tue Feb 23, 2010 12:09 am
by rjbelans
It is odd though. Usually my clients will get switched to a different sever rather quickly, but not this time. They all seem to keep on the same server no matter how many times the clients are restarted or haw many times the client fails to connect to the server.

Re: 171.64.65.71 accepting... but

Posted: Tue Feb 23, 2010 12:36 am
by _r2w_ben
rjbelans wrote:It is odd though. Usually my clients will get switched to a different sever rather quickly, but not this time. They all seem to keep on the same server no matter how many times the clients are restarted or haw many times the client fails to connect to the server.
171.64.65.71 has a weight of 20
alternate 171.67.108.11 has a weight of 5
Despite 171.64.65.71 being busy, approximately 4 of 5 clients are still assigned to it.

IMHO, someone should either decrease 171.64.65.71 (as was done on Fri-Sat last week) or increase 5 to distribute the load better.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Tue Feb 23, 2010 12:45 am
by ElectricVehicle
tobor wrote:Something still is'nt right..
Once I completed a wu it was unable to connect to 171.64.65.71... :|

[21:58:09] + Attempting to get work packet
[21:58:09] - Connecting to assignment server
[21:58:10] - Successful: assigned to (171.64.65.71).
[21:58:10] + News From Folding@Home: Welcome to Folding@Home
[21:58:10] Loaded queue successfully.
[21:58:10] - Couldn't send HTTP request to server
[21:58:10] + Could not connect to Work Server
[21:58:10] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
171.64.65.71 is definitely still messed up as far as consistently handing out GPU WU's. Everyone who managed to get folding again did so by getting assigned to another work server, mostly by luck. The wait time before retrys increases after each retry, so I think the only thing you can do to increase your luck is to restart clients that are waiting over an hour or so to get new work. I did a number of quick client restarts and that really didn't help, but over the next hour, I restarted and got a differnet work server which the client would have done on it's own, but just a little later - like an hour or so since the time between retries gets longer with each attempt.

Restarts can be helpful, but you want to use them judiciously because the restarting resets the retry period and puts more load on the servers at a time when they might have load issues. That's why the reties have increasing periods, so that when something goes wrong the servers don't get barraged by all the clients at once presenting an excessive load they aren't designed for and that doesn't occur in normal operation.

The good news is Pande Group has this new server code and several people actively working on it, so as it fails they are finding the bugs and fixing them over time. Our pain now should pay off in a month or two when the server code gets more fixes and becomes much more reliable than the previous generation of server code. One of the contributions to FAH is identifying the client, server, etc. issues so Pande group can continue to build and scale up the FAH system for even greater levels of science!

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Tue Feb 23, 2010 1:34 am
by VijayPande
Joe has been continuing to pound on this and he thinks he's found and fixed several bugs. I'm optimistic, but the history shows that this may not be all. The WS code on .71 was updated ~5pm pacific time.

I'm sorry for all this mess. Once we get back to normal, we will have a very careful rollout plan for new WS changes, since this should never be happening.