Page 2 of 9

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Thu Nov 13, 2008 6:42 pm
by [BrainBug]
I've PM'd Vince (vvoelz) and he's currently working on the issue. The hard drives are full. He hopes to have the issue resolved today.

Thanks Vince!

171.64.65.111 in REJECT since last weekend

Posted: Thu Nov 13, 2008 8:56 pm
by Mactin
Hello,

171.64.65.111 is in REJECT since last weekend !
Do you have an ETA for return to operations

I have a WU that has many failed uploads !
Normaly this is no big deal, but the CS does not also seam to accept WUs.

Code: Select all

[20:30:48] Trying to send all finished work units

[20:30:48] + Attempting to send results
[20:30:48] - Reading file work/wuresults_05.dat from core
[20:30:48]   (Read 3274576 bytes from disk)
[20:30:48] Connecting to http://171.64.65.111:8080/
[20:30:49] - Couldn't send HTTP request to server
[20:30:49] + Could not connect to Work Server (results)
[20:30:49]     (171.64.65.111:8080)
[20:30:49] - Error: Could not transmit unit 05 (completed November 10) to work server.
[20:30:49] - 28 failed uploads of this unit.

[20:30:49] + Attempting to send results
[20:30:49] - Reading file work/wuresults_05.dat from core
[20:30:49]   (Read 3274576 bytes from disk)
[20:30:49] Connecting to http://171.67.108.17:8080/
[20:30:57] - Couldn't send HTTP request to server
[20:30:57] + Could not connect to Work Server (results)
[20:30:57]     (171.67.108.17:8080)
[20:30:57]   Could not transmit unit 05 to Collection server; keeping in queue.
[20:30:57] + Sent 0 of 1 completed units to the server
[20:30:57] - Failed to send all units to server
Thank you

Re: 171.64.65.111 in REJECT since last weekend

Posted: Thu Nov 13, 2008 9:02 pm
by Sahkuhnder
There is a similar thread on the server issue here.
[BrainBug] wrote:I've PM'd Vince (vvoelz) and he's currently working on the issue. The hard drives are full. He hopes to have the issue resolved today.

Thanks Vince!

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Fri Nov 14, 2008 3:51 am
by jrabb1920
Thanks guy, I was getting to wonder if we slipped through the cracks.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Sat Nov 15, 2008 7:25 pm
by whynot
I think that was fixed yesterday's evening

Code: Select all

[18:17:04] Completed 2150000 out of 2500000 steps  (86%)
[18:17:04] Writing checkpoint files


[18:20:38] + Attempting to send results
[18:25:32] + Results successfully sent
[18:25:32] Thank you for your contribution to Folding@Home.
[18:25:32] + Number of Units Completed: 30

[18:26:02] Writing local files
[18:26:02] Completed 2175000 out of 2500000 steps  (87%)
while in morning

Code: Select all

[06:06:54] Completed 75000 out of 2500000 steps  (3%)
[06:12:50] Writing checkpoint files


[06:15:23] + Attempting to send results
[06:15:28] - Couldn't send HTTP request to server
[06:15:28] + Could not connect to Work Server (results)
[06:15:28]     (171.64.65.111:8080)
[06:15:28] - Error: Could not transmit unit 03 (completed November 10) to work server.


[06:15:28] + Attempting to send results
[06:15:36] - Couldn't send HTTP request to server
[06:15:36] + Could not connect to Work Server (results)
[06:15:36]     (171.67.108.17:8080)
[06:15:36]   Could not transmit unit 03 to Collection server; keeping in queue.
[06:16:04] Writing local files
[06:16:04] Completed 100000 out of 2500000 steps  (4%)

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Sun Nov 16, 2008 8:20 am
by yourpcguy
[08:08:45] + Attempting to send results [November 16 08:08:45 UTC]
[08:09:06] - Couldn't send HTTP request to server
[08:09:06] + Could not connect to Work Server (results)
[08:09:06] (171.64.122.136:8080)
[08:09:06] + Retrying using alternative port
[08:09:27] - Couldn't send HTTP request to server
[08:09:27] + Could not connect to Work Server (results)
[08:09:27] (171.64.122.136:80)
[08:09:27] - Error: Could not transmit unit 01 (completed November 13) to work server.


[08:09:27] + Attempting to send results [November 16 08:09:27 UTC]
[08:09:28] - Couldn't send HTTP request to server
[08:09:28] (Got status 503)
[08:09:28] + Could not connect to Work Server (results)
[08:09:28] (171.67.108.17:8080)
[08:09:28] + Retrying using alternative port
[08:09:29] - Couldn't send HTTP request to server
[08:09:29] (Got status 503)
[08:09:29] + Could not connect to Work Server (results)
[08:09:29] (171.67.108.17:80)
[08:09:29] Could not transmit unit 01 to Collection server; keeping in queue.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Mon Nov 17, 2008 12:29 am
by spudston
I'm having this same problem.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Mon Nov 17, 2008 5:32 am
by ppetrone
It looks like it should be accepting. Let me ask Edgar to have it checked. However, its Sunday night so I guess this will have to wait till tomorrow.
Sorry about the hassle.
Thanks

Paula

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Mon Nov 17, 2008 5:55 am
by anandhanju
@yourpcguy, your problem is with .136. That server is down at the moment and the staff have said they'll be looking into it. See viewtopic.php?f=18&t=6945.

.17 is a Collection server and is overloaded, showing the 503 error for you as well as the OP. In the future, please post server issues in the appropriate topic ;). Thanks.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Thu Nov 27, 2008 3:41 pm
by whynot
And now again (since Nov 26th, 19:37UTC)

[19:36:27] Completed 1500000 out of 1500000 steps (100%)
[19:36:27] Writing final coordinates.
[19:36:28] Past main M.D. loop
[19:37:28]
[19:37:28] Finished Work Unit:
[19:37:28] - Reading up to 466440 from "work/wudata_00.arc": Read 466440
[19:37:28] - Reading up to 3346192 from "work/wudata_00.xtc": Read 3346192
[19:37:28] goefile size: 0
[19:37:28] logfile size: 14976
[19:37:28] Leaving Run
[19:37:28] - Writing 4470860 bytes of core data to disk...
[19:37:30] Done: 4470348 -> 3772804 (compressed to 84.3 percent)
[19:37:30] ... Done.
[19:37:31] - Shutting down core
[19:37:31]
[19:37:31] Folding@home Core Shutdown: FINISHED_UNIT
[19:37:31] CoreStatus = 64 (100)
[19:37:31] Updated performance fraction: 0.965867
[19:37:31] Sending work to server


[19:37:31] + Attempting to send results
[19:37:52] - Couldn't send HTTP request to server
[19:37:52] + Could not connect to Work Server (results)
[19:37:52] (171.64.65.111:8080)
[19:37:52] - Error: Could not transmit unit 00 (completed November 26) to work server.
[19:37:52] Keeping unit 00 in queue.


[19:37:52] + Attempting to send results
[19:37:52] - Couldn't send HTTP request to server
[19:37:52] + Could not connect to Work Server (results)
[19:37:52] (171.64.65.111:8080)
[19:37:52] - Error: Could not transmit unit 00 (completed November 26) to work server.


[19:37:52] + Attempting to send results
[19:45:56] - Server does not have record of this unit. Will try again later.
[19:45:56] Could not transmit unit 00 to Collection server; keeping in queue.


[19:45:56] + Attempting to send results
[19:45:57] - Couldn't send HTTP request to server
[19:45:57] + Could not connect to Work Server (results)
[19:45:57] (171.64.65.111:8080)
[19:45:57] - Error: Could not transmit unit 00 (completed November 26) to work server.


[19:45:57] + Attempting to send results
[19:55:18] - Server does not have record of this unit. Will try again later.
[19:55:18] Could not transmit unit 00 to Collection server; keeping in queue.
[19:55:18] - Preparing to get new work unit...
[19:55:18] + Attempting to get work packet
[19:55:18] - Connecting to assignment server
[19:55:22] - Successful: assigned to (171.64.122.139).
[19:55:22] + News From Folding@Home: Welcome to Folding@Home
[19:55:22] Loaded queue successfully.
[19:59:43] - Couldn't send HTTP request to server
[19:59:43] + Could not connect to Work Server
[19:59:43] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[19:59:58] + Attempting to get work packet
[19:59:58] - Connecting to assignment server
[20:00:01] - Successful: assigned to (169.230.26.30).
[20:00:01] + News From Folding@Home: Welcome to Folding@Home
[20:00:01] Loaded queue successfully.
[20:00:07] + Received work.


[20:00:07] + Attempting to send results
[20:00:08] - Couldn't send HTTP request to server
[20:00:08] + Could not connect to Work Server (results)
[20:00:08] (171.64.65.111:8080)
[20:00:08] - Error: Could not transmit unit 00 (completed November 26) to work server.


[20:00:08] + Attempting to send results
[20:00:23] - Couldn't send HTTP request to server
[20:00:23] + Could not connect to Work Server (results)
[20:00:23] (171.67.108.17:8080)
[20:00:23] Could not transmit unit 00 to Collection server; keeping in queue.


[20:00:23] + Attempting to send results
[20:00:26] - Couldn't send HTTP request to server
[20:00:26] + Could not connect to Work Server (results)
[20:00:26] (171.64.65.111:8080)
[20:00:26] - Error: Could not transmit unit 00 (completed November 26) to work server.


[20:00:26] + Attempting to send results
[20:00:42] - Couldn't send HTTP request to server
[20:00:42] + Could not connect to Work Server (results)
[20:00:42] (171.67.108.17:8080)
[20:00:42] Could not transmit unit 00 to Collection server; keeping in queue.
[20:00:42] + Closed connections
[20:00:42]
[20:00:42] + Processing work unit
[20:00:42] Core required: FahCore_82.exe
[20:00:42] Core found.
[20:00:42] Working on Unit 01 [November 26 20:00:42]
[20:00:42] + Working ...
[20:00:42]
[20:00:42] *------------------------------*
[20:00:42] Folding@Home PMD Core
[20:00:42] Version 1.03 (September 7, 2005)
[20:00:42]
[20:00:42] Preparing to commence simulation
[20:00:42] - Looking at optimizations...
[20:00:42] - Created dyn
[20:00:42] - Files status OK
[20:00:42] - Expanded 19061 -> 121149 (decompressed 635.5 percent)
[20:00:42]
[20:00:42] Project: 4622 (Run 15, Clone 12, Gen 32)
[20:00:42]
[20:00:42] Assembly optimizations on if available.
[20:00:42] Entering M.D.
[20:00:49] Protein: p4622_T0_proG-16_minout
[20:00:49]
[20:00:49] Completed 0 out of 2500000 steps (0%)

Still waiting in queue.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Fri Nov 28, 2008 1:51 am
by KingDingeling
I have the same issue with the 65.111 server, trying to upload a couple of WUs and it's giving me the "server does not know this WU" type of thing.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Sat Nov 29, 2008 7:03 am
by anko1
171.64.65.111 is in reject at the moment. I think most if not all servers are set to restart after an hour if they go into reject. Is it 171.67.108.17 giving you, "Server does not have a record of this unit"? I would guess so, since that's a not uncommon response from a collection server if the work server hasn't told it to expect the unit.

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Sat Nov 29, 2008 9:49 am
by whynot
And now one of that pair (171.67.108.17) is back (since
2008-11-28 23:35 UTC).

[23:33:59] Completed 2500000 out of 2500000 steps (100%)
[23:33:59] Writing checkpoint files
[23:35:00]
[23:35:00] Finished Work Unit:
[23:35:00] Leaving Run
[23:35:04] - Writing 307912 bytes of core data to disk...
[23:35:04] ... Done.
[23:35:04] - Shutting down core
[23:35:04]
[23:35:04] Folding@home Core Shutdown: FINISHED_UNIT
[23:35:04] CoreStatus = 64 (100)
[23:35:04] Updated performance fraction: 0.982471
[23:35:04] Sending work to server


[23:35:04] + Attempting to send results
[23:35:42] + Results successfully sent
[23:35:42] Thank you for your contribution to Folding@Home.
[23:35:42] + Number of Units Completed: 42



[23:35:42] + Attempting to send results
[23:35:42] - Couldn't send HTTP request to server
[23:35:42] + Could not connect to Work Server (results)
[23:35:42] (171.64.65.111:8080)
[23:35:42] - Error: Could not transmit unit 00 (completed November 26) to work server.


[23:35:42] + Attempting to send results
[23:40:48] + Results successfully sent
[23:40:48] Thank you for your contribution to Folding@Home.
[23:40:48] + Number of Units Completed: 43

[23:40:48] Successfully sent unit 00 to Collection server.
[23:40:48] - Preparing to get new work unit...
[23:40:48] + Attempting to get work packet
[23:40:48] - Connecting to assignment server
[23:40:51] - Successful: assigned to (169.230.26.30).
[23:40:51] + News From Folding@Home: Welcome to Folding@Home
[23:40:51] Loaded queue successfully.
[23:40:53] + Received work.

Hence the question: I wonder, maybe all that noise about Work
servers is nothing to care about? Are people near them aware of
such?.. What? Do Work servers freeze? Or maybe just overload?
I don't understand the magick behind that. Should I reread FAQ?

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Sat Nov 29, 2008 9:28 pm
by anko1
As I understand it, assignment servers send you to different work servers depending on your client and parameters set and the weight that FAH gives each program. You can only get an assignment from a work server, so if all the work servers for your type of units are down, you're out of luck. If they're just really busy, then it may take a couple of tries. When a unit is done, it can go back to either the work server or a collection server, but the collection server needs notice from the work server to expect that unit, otherwise you get a "no record of this unit" message from the collection server. How busy the servers are, or if they're down, can also affect how quickly a unit goes back. You might want to look at the server status page (link above).

PS - It's also helpful when you're posting log to use the code button to condense it (highlight the log and click "code", or type "code" at start in brackets and "[/code]" at end, w/o the quote marks.).

Re: 503 error on 171.67.108.17 and 171.64.65.111

Posted: Wed Dec 03, 2008 10:42 am
by bruce
whynot wrote:Hence the question: I wonder, maybe all that noise about Work
servers is nothing to care about? Are people near them aware of
such?.. What? Do Work servers freeze? Or maybe just overload?
I don't understand the magick behind that. Should I reread FAQ?
Yes, work servers get overloaded. Sometimes it is a temporary condition that resolves itself and sometimes somebody needs to adjust some priorities so that the workload is distributed to other servers.

Sometimes servers just freeze. Follow the "News" link at the top of this page to Vijay's blog. On Nov 28 he talked about this issue.

During daylight hours in California, there are generally people near the servers and they're aware of what's going on. If a problem occurs late at nigh/early in the morning, it may take longer to get the attention of right person to fix the problem. It never hurts to report a problem here, though unless others have already reported the same issue. Even with all the monitoring tools, some issues are unique enough to take longer to be noticed.