Page 2 of 2

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Mon Oct 27, 2008 11:48 pm
by toTOW
Is maintenance done on these servers ?

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Tue Nov 04, 2008 12:23 am
by anandhanju
mikesmusic is still unable to send queued results: viewtopic.php?p=66220#p66220 Can anyone please check if the server is fine?

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Tue Nov 04, 2008 1:36 am
by anko1
171.64.122.72 is in reject now. Just lost two units: the 4419 that expired and the unit that got a special exit when the program tried to auto send the expired unit. Of the 6 WUs 4419-21 (or so, they're on different machines) that I've gotten, only one has been returned. I'm going to delete the others before they kill units in process too.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Tue Nov 04, 2008 7:26 am
by anko1
I see that 171.64.122.72 is back up now, but it has a really high CPU load (6.86) and a heavy net load (171). Also the DL (days left on the tape?) is at zero, so don't know if that is affecting things too.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Thu Nov 06, 2008 6:52 pm
by mikesmusic
My two work units, project 4418, completed 30 oct and 1 nov still won't upload.
Neither (171.67.108.17:8080)nor(171.64.122.72:8080) will respond

Code: Select all

[06:44:48] + Attempting to send results
[06:45:06] - Couldn't send HTTP request to server
[06:45:06] + Could not connect to Work Server (results)
[06:45:06]     (171.67.108.17:8080)
[06:45:06]   Could not transmit unit 07 to Collection server; keeping in queue.
[08:05:44] Writing local files
[08:05:44] Completed 820000 out of 2000000 steps  (41)
[10:26:44] Writing local files
[10:26:44] Completed 840000 out of 2000000 steps  (42)


[12:45:10] + Attempting to send results
[12:47:43] Writing local files
[12:47:43] Completed 860000 out of 2000000 steps  (43)
[12:52:31] - Couldn't send HTTP request to server
[12:52:31] + Could not connect to Work Server (results)
[12:52:31]     (171.64.122.72:8080)
[12:52:31] - Error: Could not transmit unit 06 (completed October 30) to work server.


[12:52:31] + Attempting to send results
[12:52:32] - Couldn't send HTTP request to server
[12:52:32] + Could not connect to Work Server (results)
[12:52:32]     (171.67.108.17:8080)
[12:52:32]   Could not transmit unit 06 to Collection server; keeping in queue.


[12:52:32] + Attempting to send results
[12:59:54] - Couldn't send HTTP request to server
[12:59:54] + Could not connect to Work Server (results)
[12:59:54]     (171.64.122.72:8080)
[12:59:54] - Error: Could not transmit unit 07 (completed November 1) to work server.


[12:59:54] + Attempting to send results
[12:59:54] - Couldn't send HTTP request to server
[12:59:54] + Could not connect to Work Server (results)
[12:59:54]     (171.67.108.17:8080)
[12:59:54]   Could not transmit unit 07 to Collection server; keeping in queue.
[15:08:41] Writing local files
[15:08:41] Completed 880000 out of 2000000 steps  (44)
[17:31:16] Writing local files
[17:31:16] Completed 900000 out of 2000000 steps  (45)

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Thu Nov 06, 2008 7:02 pm
by anko1
171.64.122.72 has a monstrously high net load of 445 and a CPU load of 3.42 with an assignment weight of 9%. Could we maybe turn off all assigning until units come home? I have some outstanding from October.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Sat Nov 08, 2008 12:58 am
by mikesmusic
That server status page is beyond the ken of this mere mortal. Of the 30 or so servers supposedly accepting jobs for the 'classic' clients, only about two currently have "% Ass 80" in double figures: Good ol' vsp05 (171.64.122.72) and VSPMF93 (171.65.103.160). The net load of vsp05 is way beyond anything else. I cannot even ping vsp05 at present. i wonder how many more weeks this will go on for.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Mon Nov 10, 2008 2:08 am
by mikesmusic
Anko where did you see assignment weight of 9%? I'm looking at the WEight column in the server stats page. Last time I looked I thought vsp05's weight was '10000' but today it is '5000' (ie less). One of my jobs actually was accepted in the last day or so. :D I have just one ( completed nov 1st) left now. Not that I'm any judge :egeek: but it looks like these work packets are a bit on the small side and are overloading the servers by coming back too quickly??

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Mon Nov 10, 2008 4:01 am
by anko1
I "misspoke." I was referring to the % ASSigned column, which upon closer reading doesn't actually mean what I thought it did. <blush> I ended up losing another two units: the 4419 that expired and the unit it killed b/c autosend ran into an expired unit. I went ahead and deleted the last two I had, which were close to expiring, rather than miss the deadline and loose two more. I suspect that you're right - the units are so small that the servers get overloaded with the returns. They go back [or try to] almost as fast as they get sent. ;-)

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Mon Nov 10, 2008 3:59 pm
by toTOW
122.78 is currently in Reject mode :(

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Thu Nov 13, 2008 6:55 pm
by TommyHicks
I've got 14 computers folding and I'm considering shutting them down as far as folding is concerned. Every one of them has completed work units that won't upload. When the time limit expires, the current work unit is lost with a "corrupted core". The listing below is from a computer with 6 completed work units that it cannot upload plus it can't get any work.

Is there anyone left at Stanford that gives a damn?


[15:11:09] + Attempting to send results
[15:11:09] - Couldn't send HTTP request to server
[15:11:09] + Could not connect to Work Server (results)
[15:11:09] (171.64.122.72:8080)
[15:11:09] - Error: Could not transmit unit 00 (completed November 10) to work server.


[15:11:09] + Attempting to send results
[15:11:10] - Couldn't send HTTP request to server
[15:11:10] + Could not connect to Work Server (results)
[15:11:10] (171.67.108.17:8080)
[15:11:10] Could not transmit unit 00 to Collection server; keeping in queue.


[15:11:10] + Attempting to send results
[15:11:11] - Couldn't send HTTP request to server
[15:11:11] + Could not connect to Work Server (results)
[15:11:11] (171.64.122.72:8080)
[15:11:11] - Error: Could not transmit unit 02 (completed November 11) to work server.


[15:11:11] + Attempting to send results
[15:11:11] - Couldn't send HTTP request to server
[15:11:11] + Could not connect to Work Server (results)
[15:11:11] (171.67.108.17:8080)
[15:11:11] Could not transmit unit 02 to Collection server; keeping in queue.


[15:11:11] + Attempting to send results
[15:11:11] - Couldn't send HTTP request to server
[15:11:11] + Could not connect to Work Server (results)
[15:11:11] (171.64.65.65:8080)
[15:11:11] - Error: Could not transmit unit 03 (completed November 13) to work server.


[15:11:11] + Attempting to send results
[15:11:12] - Couldn't send HTTP request to server
[15:11:12] + Could not connect to Work Server (results)
[15:11:12] (171.67.108.25:8080)
[15:11:12] Could not transmit unit 03 to Collection server; keeping in queue.


[15:11:12] + Attempting to send results
[15:11:12] - Couldn't send HTTP request to server
[15:11:12] + Could not connect to Work Server (results)
[15:11:12] (:8080)
[15:11:12] - Error: Could not transmit unit 04 (completed November 13) to work server.


[15:11:12] + Attempting to send results
[15:11:13] - Couldn't send HTTP request to server
[15:11:13] + Could not connect to Work Server (results)
[15:11:13] (171.67.108.17:8080)
[15:11:13] Could not transmit unit 04 to Collection server; keeping in queue.


[15:11:13] + Attempting to send results
[15:11:14] - Couldn't send HTTP request to server
[15:11:14] + Could not connect to Work Server (results)
[15:11:14] (171.64.65.111:8080)
[15:11:14] - Error: Could not transmit unit 08 (completed November 10) to work server.


[15:11:14] + Attempting to send results
[15:11:14] - Couldn't send HTTP request to server
[15:11:14] + Could not connect to Work Server (results)
[15:11:14] (171.67.108.17:8080)
[15:11:14] Could not transmit unit 08 to Collection server; keeping in queue.


[15:11:14] + Attempting to send results
[15:11:15] - Couldn't send HTTP request to server
[15:11:15] + Could not connect to Work Server (results)
[15:11:15] (171.64.122.72:8080)
[15:11:15] - Error: Could not transmit unit 09 (completed November 10) to work server.


[15:11:15] + Attempting to send results
[15:11:15] - Couldn't send HTTP request to server
[15:11:15] + Could not connect to Work Server (results)
[15:11:15] (171.67.108.17:8080)
[15:11:15] Could not transmit unit 09 to Collection server; keeping in queue.
[15:28:10] + Attempting to get work packet
[15:28:10] - Connecting to assignment server
[15:28:11] - Successful: assigned to (171.64.65.65).
[15:28:11] + News From Folding@Home: Welcome to Folding@Home
[15:28:11] Loaded queue successfully.
[15:28:11] - Couldn't send HTTP request to server
[15:28:11] (Got status 503)
[15:28:11] + Could not connect to Work Server
[15:28:11] - Error: Attempt #12 to get work failed, and no other work to do.
Waiting before retry.
[16:16:18] + Attempting to get work packet
[16:16:18] - Connecting to assignment server
[16:16:18] - Successful: assigned to (171.64.65.65).
[16:16:18] + News From Folding@Home: Welcome to Folding@Home
[16:16:18] Loaded queue successfully.
[16:16:19] - Couldn't send HTTP request to server
[16:16:19] (Got status 503)
[16:16:19] + Could not connect to Work Server
[16:16:19] - Error: Attempt #13 to get work failed, and no other work to do.
Waiting before retry.
[17:04:22] + Attempting to get work packet
[17:04:22] - Connecting to assignment server
[17:04:22] - Successful: assigned to (171.64.122.72).
[17:04:22] + News From Folding@Home: Welcome to Folding@Home
[17:04:22] Loaded queue successfully.
[17:04:23] - Couldn't send HTTP request to server
[17:04:23] (Got status 503)
[17:04:23] + Could not connect to Work Server
[17:04:23] - Error: Attempt #14 to get work failed, and no other work to do.
Waiting before retry.
[17:52:36] + Attempting to get work packet
[17:52:36] - Connecting to assignment server
[17:52:36] - Successful: assigned to (171.64.65.65).
[17:52:36] + News From Folding@Home: Welcome to Folding@Home
[17:52:37] Loaded queue successfully.
[17:52:37] - Couldn't send HTTP request to server
[17:52:37] (Got status 503)
[17:52:37] + Could not connect to Work Server
[17:52:37] - Error: Attempt #15 to get work failed, and no other work to do.
Waiting before retry.

Re: vsp05, vsp11, and vsp15 (171.64.122.72/78/82) down

Posted: Mon Nov 24, 2008 6:02 pm
by mikesmusic
TommyHicks wrote: Is there anyone left at Stanford that gives a damn?
That has to be a fair question Tommy. Here we are 11 days later and no response is your question.

The server stats show that vsp5,11,15 are verry verry busy indeed.

Here is a typical ping plotter response from Vsp05. vsp11 and 15 are the same

Code: Select all

Target Name: vsp05
         IP: 171.65.122.78
  Date/Time: 24/11/2008 17:44:36

 1    1 ms  private
 2   28 ms  private
 3   26 ms  ge1-3-0-100.core1.ixn.dub.stisp.net [84.203.130.9]
 4   30 ms  ge1-3-0-98.core1.tcy.dub.stisp.net [84.203.130.2]
 5   37 ms  [195.66.224.185]
 6   49 ms  te2-7.ccr02.ams03.atlas.cogentco.com [130.117.1.169]
 7  129 ms  te7-3.mpd01.ymq02.atlas.cogentco.com [130.117.0.69]
 8  137 ms  te3-7.mpd01.yyz02.atlas.cogentco.com [154.54.7.213]
 9  141 ms  te7-8.ccr02.ord01.atlas.cogentco.com [154.54.7.73]
10  151 ms  te4-3.ccr02.mci01.atlas.cogentco.com [154.54.6.201]
11  190 ms  te8-4.ccr02.sfo01.atlas.cogentco.com [154.54.24.117]
12  189 ms  te4-4.mpd01.sjc04.atlas.cogentco.com [154.54.7.174]
13  187 ms  Stanford_University2.demarc.cogentco.com [66.250.7.138]
14  196 ms  bbrb-isp.Stanford.EDU [171.64.1.155]
15   *       [-]
The [-] on line 15 means destination unreachable..
That means your work packet has no chance


Do they care? Its a mystery.