Page 22 of 25

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 1:32 pm
by road-runner
I have several that cant send in the wus...

Code: Select all

[06:20:10] Folding@home Core Shutdown: FINISHED_UNIT
[06:23:23] CoreStatus = 64 (100)
[06:23:23] Sending work to server
[06:23:23] Project: 2669 (Run 11, Clone 128, Gen 166)


[06:23:23] + Attempting to send results [November 2 06:23:23 UTC]
[06:23:24] - Couldn't send HTTP request to server
[06:23:24]   (Got status 503)
[06:23:24] + Could not connect to Work Server (results)
[06:23:24]     (171.64.65.56:8080)
[06:23:24] + Retrying using alternative port
[06:23:24] - Couldn't send HTTP request to server
[06:23:24]   (Got status 503)
[06:23:24] + Could not connect to Work Server (results)
[06:23:24]     (171.64.65.56:80)
[06:23:24] - Error: Could not transmit unit 01 (completed November 2) to work server.
[06:23:24]   Keeping unit 01 in queue.
[06:23:24] Project: 2669 (Run 11, Clone 128, Gen 166)


[06:23:24] + Attempting to send results [November 2 06:23:24 UTC]
[06:23:24] - Couldn't send HTTP request to server
[06:23:24]   (Got status 503)
[06:23:24] + Could not connect to Work Server (results)
[06:23:24]     (171.64.65.56:8080)
[06:23:24] + Retrying using alternative port
[06:23:24] - Couldn't send HTTP request to server
[06:23:24]   (Got status 503)
[06:23:24] + Could not connect to Work Server (results)
[06:23:24]     (171.64.65.56:80)
[06:23:24] - Error: Could not transmit unit 01 (completed November 2) to work server.


[06:23:24] + Attempting to send results [November 2 06:23:24 UTC]
[06:23:25] - Couldn't send HTTP request to server
[06:23:25]   (Got status 503)
[06:23:25] + Could not connect to Work Server (results)
[06:23:25]     (171.67.108.25:8080)
[06:23:25] + Retrying using alternative port
[06:23:25] - Couldn't send HTTP request to server
[06:23:25]   (Got status 503)
[06:23:25] + Could not connect to Work Server (results)
[06:23:25]     (171.67.108.25:80)
[06:23:25]   Could not transmit unit 01 to Collection server; keeping in queue.
[06:23:25] - Preparing to get new work unit...
[06:23:25] Cleaning up work directory
[06:23:26] + Attempting to get work packet
[06:23:26] - Connecting to assignment server
[06:23:26] - Successful: assigned to (171.67.108.22).
[06:23:26] + News From Folding@Home: Welcome to Folding@Home
[06:23:26] Loaded queue successfully.
[06:24:41] Project: 2669 (Run 11, Clone 128, Gen 166)


[06:24:41] + Attempting to send results [November 2 06:24:41 UTC]
[06:24:41] - Couldn't send HTTP request to server
[06:24:41]   (Got status 503)
[06:24:41] + Could not connect to Work Server (results)
[06:24:41]     (171.64.65.56:8080)
[06:24:41] + Retrying using alternative port
[06:24:41] - Couldn't send HTTP request to server
[06:24:41]   (Got status 503)
[06:24:41] + Could not connect to Work Server (results)
[06:24:41]     (171.64.65.56:80)
[06:24:41] - Error: Could not transmit unit 01 (completed November 2) to work server.


[06:24:41] + Attempting to send results [November 2 06:24:41 UTC]
[06:24:41] - Couldn't send HTTP request to server
[06:24:41]   (Got status 503)
[06:24:41] + Could not connect to Work Server (results)
[06:24:41]     (171.67.108.25:8080)
[06:24:41] + Retrying using alternative port
[06:24:42] - Couldn't send HTTP request to server
[06:24:42]   (Got status 503)
[06:24:42] + Could not connect to Work Server (results)
[06:24:42]     (171.67.108.25:80)
[06:24:42]   Could not transmit unit 01 to Collection server; keeping in queue.
[06:24:42] + Closed connections
[06:24:42] 
[06:24:42] + Processing work unit
[06:24:42] Core required: FahCore_a2.exe
[06:24:42] Core found.
[06:24:42] Working on queue slot 02 [November 2 06:24:42 UTC]
[06:24:42] + Working ...
[06:24:42] 
[06:24:42] *------------------------------*
[06:24:42] Folding@Home Gromacs SMP Core
[06:24:42] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[06:24:42] 
[06:24:42] Preparing to commence simulation
[06:24:42] - Ensuring status. Please wait.
[06:24:45] Called DecompressByteArray: compressed_data_size=30331959 data_size=159726549, decompressed_data_size=159726549 diff=0
[06:24:46] - Digital signature verified
[06:24:46] 
[06:24:46] Project: 2682 (Run 2, Clone 12, Gen 2)
[06:24:46] 
[06:24:46] Assembly optimizations on if available.
[06:24:46] Entering M.D.
[06:24:57]  (Run 2, Clone 12, Gen 2)
[06:24:57] 
[06:24:57] Entering M.D.
[07:00:08] pleted 2500 out of 250000 steps  (1%)
[07:34:08] Completed 5000 out of 250000 steps  (2%)
[08:08:09] Completed 7500 out of 250000 steps  (3%)
[08:42:09] Completed 10000 out of 250000 steps  (4%)
[09:16:08] Completed 12500 out of 250000 steps  (5%)
[09:50:09] Completed 15000 out of 250000 steps  (6%)
[10:24:11] Completed 17500 out of 250000 steps  (7%)
[10:58:13] Completed 20000 out of 250000 steps  (8%)
[11:32:15] Completed 22500 out of 250000 steps  (9%)
[12:06:16] Completed 25000 out of 250000 steps  (10%)
[12:19:20] Project: 2669 (Run 11, Clone 128, Gen 166)


[12:19:24] + Attempting to send results [November 2 12:19:24 UTC]
[12:19:27] - Couldn't send HTTP request to server
[12:19:27]   (Got status 503)
[12:19:27] + Could not connect to Work Server (results)
[12:19:27]     (171.64.65.56:8080)
[12:19:27] + Retrying using alternative port
[12:19:28] - Couldn't send HTTP request to server
[12:19:28]   (Got status 503)
[12:19:28] + Could not connect to Work Server (results)
[12:19:28]     (171.64.65.56:80)
[12:19:28] - Error: Could not transmit unit 01 (completed November 2) to work server.


[12:19:28] + Attempting to send results [November 2 12:19:28 UTC]
[12:19:28] - Couldn't send HTTP request to server
[12:19:28]   (Got status 503)
[12:19:28] + Could not connect to Work Server (results)
[12:19:28]     (171.67.108.25:8080)
[12:19:28] + Retrying using alternative port
[12:19:28] - Couldn't send HTTP request to server
[12:19:28]   (Got status 503)
[12:19:28] + Could not connect to Work Server (results)
[12:19:28]     (171.67.108.25:80)
[12:19:28]   Could not transmit unit 01 to Collection server; keeping in queue.
[12:40:17] Completed 27500 out of 250000 steps  (11%)
[13:14:18] Completed 30000 out of 250000 steps  (12%)

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 1:39 pm
by preet.to
It seems almost random if you get a WU or not. Some of mine did, the others remain idle.

Now my problem is that finished WU's are staying in the queue so long, they are expiring. So PPD=0. What will be done about these?

Server status has no hint of a problem.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 1:44 pm
by rickoic
I u/l'd a 2681 from 1930 to 2011 Sunday, but have been catching the 2662-2677 wu's everysince.

Fold on
Rick

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 2:00 pm
by JadeMiner
171.64.65.56 ... reject, reject, reject.

I'm amazed that in late 2009 anybody could let a server go down like this.

Honestly makes me wonder.

Does all this folding stuff really help anybody? These guys can't even receive files.

Seriously reconsidering all of this electricity I use every month.

Has folding ever helped a single person?

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 2:09 pm
by rickoic
I'm glad all the Stanford people have thick skins, I for one appreciate everything they are doing and realize that on their budget they can't be there 24 hours a day to baby sit their servers. If my pc's don't catch the wu that I want I feel that there is an additional need for the wu I did catch, and so continue to fold it. Only exception if when one dies on me 2-3 times on what I know is a stable machine.

Not sure that anyone has been medically helped by what has been done as yet, but just pushing the envelope further along a path is an improvement, and who knows, that next work unit that you fold might be the one with the golden bullet of information that will help someone.

Big hand to Dr. Vijay and all the staff at Stanford.

Fold on
Rick

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 2:14 pm
by uncle fuzzy
Grandpa_01 wrote:Whats your secret uncle fuzzy
Pure luck, and offering sacrifices to the SMP idol I keep in the corner shrine. 8-)

Although I got a new WU, this last completed one won't go home.

edit- I need to make another sacrifice. Trying to force the upload, I seem to have lost the completed WU and trashed the one I was working on. After 5 failed downloads, I was sent to another server and am folding again. Player 3, notfred's, 4-core on Q6600@3.4

171.64.65.56 not OK = 503 error

Posted: Mon Nov 02, 2009 4:32 pm
by stevew
Since Oct 31 my 2 Mac/Intels running SMP WUs have had problems with server 171.64.65.56. Getting status 503. Toggling one machine's PrefPane on an off managed to send a WU and get one new one. Now 2nd machine is stuck, no upload and no work. Entering http://171.64.65.56 gets nothing, not OK.

[16:27:26] + Could not connect to Work Server (results)
[16:27:26] (171.64.65.56:8080)
[16:27:26] + Retrying using alternative port
[16:27:33] - Couldn't send HTTP request to server
[16:27:33] + Could not connect to Work Server (results)
[16:27:33] (171.64.65.56:80)
[16:27:33] - Error: Could not transmit unit 04 (completed November 2) to work server.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 4:47 pm
by BrokenWolf
I have quite a few that are unable to return completed WU's. When browsing to the addresses I do not get an OK.

Code: Select all

[16:39:52] Loaded queue successfully.
[16:39:52] Attempting to return result(s) to server...
[16:39:52] Trying to send all finished work units
[16:39:52] Project: 2662 (Run 1, Clone 132, Gen 46)


[16:39:52] + Attempting to send results [November 2 16:39:52 UTC]
[16:39:52] - Reading file work/wuresults_01.dat from core
[16:39:52]   (Read 26799556 bytes from disk)
[16:39:52] Connecting to http://171.64.65.56:8080/
[16:39:59] - Couldn't send HTTP request to server
[16:39:59]   (Got status 502)
[16:39:59] + Could not connect to Work Server (results)
[16:39:59]     (171.64.65.56:8080)
[16:39:59] + Retrying using alternative port
[16:39:59] Connecting to http://171.64.65.56:80/
[16:40:07] - Couldn't send HTTP request to server
[16:40:07]   (Got status 502)
[16:40:07] + Could not connect to Work Server (results)
[16:40:07]     (171.64.65.56:80)
[16:40:07] - Error: Could not transmit unit 01 (completed November 2) to work server.
[16:40:07] - 5 failed uploads of this unit.


[16:40:07] + Attempting to send results [November 2 16:40:07 UTC]
[16:40:07] - Reading file work/wuresults_01.dat from core
[16:40:07]   (Read 26799556 bytes from disk)
[16:40:07] Connecting to http://171.67.108.25:8080/
[16:40:13] - Couldn't send HTTP request to server
[16:40:13]   (Got status 502)
[16:40:13] + Could not connect to Work Server (results)
[16:40:13]     (171.67.108.25:8080)
[16:40:13] + Retrying using alternative port
[16:40:13] Connecting to http://171.67.108.25:80/
[16:40:31] - Couldn't send HTTP request to server
[16:40:31]   (Got status 502)
[16:40:31] + Could not connect to Work Server (results)
[16:40:31]     (171.67.108.25:80)
[16:40:31]   Could not transmit unit 01 to Collection server; keeping in queue.
[16:40:31] + Sent 0 of 1 completed units to the server
[16:40:31] - Failed to send all units to server
[16:40:31] ***** Got a SIGTERM signal (15)
[16:40:31] Killing all core threads

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 4:48 pm
by coolamasta
I got 7 SMP clients all waiting for work now and at least 5 of them got work to send back :(

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 5:02 pm
by preet.to
Running FAH is a large endeavour. They rely on us to donate bits and we rely on them for server support. So for that I am greatful.

What I am upset about is the lack of communications on this matter. I don't care if this takes two weeks to fix, I can wait. But tell us what the problem is and what to expect.

Meanwhile all my held units are expired. I have lost a number of days of production.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 5:04 pm
by Pick2
I've got 5 out of 12 waiting for work , with a few more getting close to done. I have 8 WU waiting to get sent up. I hope this gets straitened out soon.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 5:25 pm
by bruce
theo343 wrote:Have you ever thought about setting up proxy WU servers instead of having all the network and data transfer load to one location? You should seriously think about setting up a proxy oriented model of the WU servers. I guess this can be accomplished by renting resources on servercenters around the world. There should be a proxy on each continent for each type of client. These proxies are the ones who in a timly manner should send clientresults back to you, get new bulks of WUs and assign them to the clients.

Right now your model are too vunerable and you should start thinking of branching out.

I hope Stanford can give us an updated ETA on the operational status of the new servers. (still no proxy oriented model, but at least an improvement)
FAH does have a proxy server model. It's certainly not ideal, since the proxy (known as a collection server) only accepts uploads but the fundamental problem is that all of the hardware that's presently available is overloaded -- both the Work Server and the Collection Servers. The new hardware is critical.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 5:29 pm
by bruce
preet.to wrote:Server status has no hint of a problem.
Hint: NetLoad=200 is a problem. That's all this server can handle at one time. Everyone else is turned away.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 5:36 pm
by kasson
As bruce notes, the server is talking to a lot of clients at once right now. The server is functioning, but more clients want to talk to it than it has capacity. This should improve as it clears the backlog.

Re: 171.64.65.56 not responding

Posted: Mon Nov 02, 2009 5:37 pm
by rickoic
I wonder if any thought has gone into putting the current server time in a column. That would tell for sure if it was updating?

Fold on
Rick