Page 7 of 28

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 12:35 am
by Wrish
Well, that lasted a short time. "Unstable machine" error at 66% of a 384-pt ATI unit. I kind of gave up hope seeing the ATI core 11 is over 3 mb, and the Nvidia one is only 1.9 mb. :) Viewer showed all the atoms jiggling around like normal... looked like it was folding fine! Well, back to normal flags.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 12:39 am
by Sir-Les-MP
nope no errors at all apart from unable to get work unit
Log below of last attept to get unit
[23:28:49] Loaded queue successfully.
[23:28:49] - Preparing to get new work unit...
[23:28:49] - Autosending finished units... [February 14 23:28:49 UTC]
[23:28:49] + Attempting to get work packet
[23:28:49] Trying to send all finished work units
[23:28:49] - Will indicate memory of 4095 MB
[23:28:49] + No unsent completed units remaining.
[23:28:49] - Detect CPU.[23:28:49] - Autosend completed
Vendor: AuthenticAMD, Family: 15, Model: 4, Stepping: 2
[23:28:49] - Connecting to assignment server
[23:28:49] Connecting to http://assign-GPU.stanford.edu:8080/
[23:28:50] Posted data.
[23:28:50] Initial: 43AB; - Successful: assigned to (171.67.108.21).
[23:28:50] + News From Folding@Home: Welcome to Folding@Home
[23:28:50] Loaded queue successfully.
[23:28:50] Connecting to http://171.67.108.21:8080/
[23:28:51] - Couldn't send HTTP request to server
[23:28:51] + Could not connect to Work Server
[23:28:51] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[23:28:59] + Attempting to get work packet
[23:28:59] - Will indicate memory of 4095 MB
[23:28:59] - Connecting to assignment server
[23:28:59] Connecting to http://assign-GPU.stanford.edu:8080/
[23:29:01] Posted data.
[23:29:01] Initial: 43AB; - Successful: assigned to (171.67.108.21).
[23:29:01] + News From Folding@Home: Welcome to Folding@Home
[23:29:01] Loaded queue successfully.
[23:29:01] Connecting to http://171.67.108.21:8080/
[23:29:01] - Couldn't send HTTP request to server
[23:29:01] + Could not connect to Work Server
[23:29:01] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
[23:29:20] + Attempting to get work packet
[23:29:20] - Will indicate memory of 4095 MB
[23:29:20] - Connecting to assignment server
[23:29:20] Connecting to http://assign-GPU.stanford.edu:8080/
[23:29:21] Posted data.
[23:29:21] Initial: 43AB; - Successful: assigned to (171.67.108.21).
[23:29:21] + News From Folding@Home: Welcome to Folding@Home
[23:29:21] Loaded queue successfully.
[23:29:21] Connecting to http://171.67.108.21:8080/
[23:29:22] - Couldn't send HTTP request to server
[23:29:22] + Could not connect to Work Server
[23:29:22] - Attempt #3 to get work failed, and no other work to do.
Waiting before retry.
[23:29:28] ***** Got a SIGTERM signal (2)
[23:29:28] Killing all core threads

Folding@Home Client Shutdown.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 2:50 am
by leexgx
171.67.108.21

i am intermittently getting work from the above server now (think i am going to be able to finish the project before i can get more work for my other GPUs :) if i am right is it saying that there are no work units available ?)

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 3:03 am
by Marine Iguana
I have got a couple of WU's from 171.67.108.11

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 3:25 am
by DrSpalding
I would be OK with it if the servers were actually not accepting or handing out WUs but that is not the boat I (and many others) are in. Our WUs are in a nebulous state where they were not uploaded but marked as uploaded. I would be willing to bet that a great many WUs were overwritten, since many of them get done in 90 minutes or so, that means the queue is fully cycled in about 15 hours of WU. That is why I stopped my GPU clients. I have nine WUs on two machines that are in a such a state of limbo. I plan on them going until about 99% on their current WUs and stopping before they complete them. I think we are going to need a qfix of some sort to mark the queue items that are currently marked as "finished", server 172.67.108.21 (and any others in that state) and the wuresults_0X.dat file still exists for queue item X, as "not uploaded" or whatever the state needs to be.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 3:31 am
by weedacres
DrSpalding wrote:I would be OK with it if the servers were actually not accepting or handing out WUs but that is not the boat I (and many others) are in. Our WUs are in a nebulous state where they were not uploaded but marked as uploaded. I would be willing to bet that a great many WUs were overwritten, since many of them get done in 90 minutes or so, that means the queue is fully cycled in about 15 hours of WU. That is why I stopped my GPU clients. I have nine WUs on two machines that are in a such a state of limbo. I plan on them going until about 99% on their current WUs and stopping before they complete them. I think we are going to need a qfix of some sort to mark the queue items that are currently marked as "finished", server 172.67.108.21 (and any others in that state) and the wuresults_0X.dat file still exists for queue item X, as "not uploaded" or whatever the state needs to be.
I have the same problem, about 90 work units sitting in limbo. I'm copying the gpu client folders to backup folders before they start overwriting themselves. Hopefully when this problem gets sorted out I'll be able to send them in and get them accepted.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 3:59 am
by thegrub
Here is another view
The one after the "?" is 171.64.65.71 and the result is the same for all three. But they don't seem to be at Stanford but rather in Texas.

Image

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 5:12 am
by DrSpalding
My clients are still in the same state, i.e. "can't upload" and "already received":

Code: Select all

[05:04:37] Folding@home Core Shutdown: FINISHED_UNIT
[05:04:41] CoreStatus = 64 (100)
[05:04:41] Sending work to server
[05:04:41] Project: 5781 (Run 13, Clone 935, Gen 4)
[05:04:41] - Read packet limit of 540015616... Set to 524286976.


[05:04:41] + Attempting to send results [February 15 05:04:41 UTC]
[05:04:42] - Couldn't send HTTP request to server
[05:04:42] + Could not connect to Work Server (results)
[05:04:42]     (171.67.108.21:8080)
[05:04:42] + Retrying using alternative port
[05:05:03] - Couldn't send HTTP request to server
[05:05:03] + Could not connect to Work Server (results)
[05:05:03]     (171.67.108.21:80)
[05:05:03] - Error: Could not transmit unit 08 (completed February 15) to work server.
[05:05:03]   Keeping unit 08 in queue.
[05:05:03] Project: 5781 (Run 13, Clone 935, Gen 4)
[05:05:03] - Read packet limit of 540015616... Set to 524286976.


[05:05:03] + Attempting to send results [February 15 05:05:03 UTC]
[05:05:04] - Server has already received unit.
[05:05:34] + -oneunit flag given and have now finished a unit. Exiting.
Folding@Home Client Shutdown.
That makes 10 WUs in limbo for me. I'm leaving my GPU clients shut down at this point until we know definitively whether or not we have a chance to upload them properly at some later time.

Good night all and happy Valentine's Day.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 5:20 am
by Leonardo
All my GPU2 Nvidia clients are loaded and folding again. ~.21 is back up/functioning properly.

And there was much rejoicing.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 5:22 am
by DrSpalding
PS: I'm beginning to think that the work server, 171.67.108.21 is not the one to blame here, although it doesn't seem to be accepting uploads of completed WUs. The queue info for the above just failed to upload WU has the collection server of 171.67.108.26, which the server status has in the "FAIL" state right now.

Code: Select all

 Index 8: finished 783.00 pts (70.046 pt/hr) 53.7 X min speed
   server: 171.67.108.21:8080; project: 5781
   Folding: run 13, clone 935, generation 4; benchmark 0; misc: 500, 200, 11 (be)
   issue: Sun Feb 14 09:53:58 2010; begin: Sun Feb 14 09:53:59 2010
   end: Sun Feb 14 21:04:41 2010; due: Thu Mar 11 09:53:59 2010 (25 days)
   preferred: Mon Mar 01 09:53:59 2010 (15 days)
   core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah (V1.31)
   core number: 0x11; core name: GROGPU2
   CPU: 1,687 Pentium II/III; OS: 1,0 Windows
   flops: 1065242605 (1065.242605 megaflops)
   memory: 4096 MB; gpu memory: 258 MB
   client type: 3 Advmethods
   assignment info (be): Sun Feb 14 09:53:45 2010; B850DEC3
   CS: 171.67.108.26; P limit: 524286976
   work/wudata_08.dat file size: 65506; WU type: Folding@Home
And the status:

Code: Select all

Sun Feb 14 20:55:10 PST 2010  171.67.108.26  -  vsp09a  -  FAIL  Accepting
Good night for real this time.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 5:24 am
by Tobit
Leonardo wrote:All my GPU2 Nvidia clients are loaded and folding again. ~.21 is back up/functioning properly.
Although it is handing out work again, it is not functioning properly. The original "Server has already received unit" problem still exists when sending in work.

Code: Select all

[04:49:35] + Attempting to send results [February 15 04:49:35 UTC]
[04:49:35] - Reading file work/wuresults_00.dat from core
[04:49:35]   (Read 131063 bytes from disk)
[04:49:35] Connecting to http://171.67.108.21:8080/
[04:49:36] Posted data.
[04:49:36] Initial: 0000; - Uploaded at ~128 kB/s
[04:49:36] - Averaged speed for that direction ~118 kB/s
[04:49:36] - Server has already received unit.
[04:49:36] + Sent 0 of 1 completed units to the server
[04:49:36] - Preparing to get new work unit...

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 5:25 am
by chriskwarren
All my clients folding now. Time to wait and see if we get credit for them.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 6:00 am
by Ravage7779
No joy for me. I wonder why the assignment servers havent figured out that this server is borked?

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 6:13 am
by Teddy
No joy here either, all 12 GPU clients out of work, switched most of my farm off who cares?
I am not sure Stanford are too fussed by the situation coz I'm not...

Teddy

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Mon Feb 15, 2010 6:14 am
by ElectricVehicle
171.67.108.21 is definitely not issuing WU's (for me anyway). I've stopped and restarted several of my clients just now, all with the same results:

Attempt to get work failed, and no other work to do.

As of [February 15 06:07:25 UTC] (10:07 pm PST)

[06:07:25] + Attempting to get work packet
[06:07:25] - Will indicate memory of 2046 MB
[06:07:25] - Connecting to assignment server
[06:07:25] Connecting to http://assign-GPU.stanford.edu:8080/
[06:07:26] Posted data.
[06:07:26] Initial: 43AB; - Successful: assigned to (171.67.108.21).
[06:07:26] + News From Folding@Home: Welcome to Folding@Home
[06:07:26] Loaded queue successfully.
[06:07:26] Connecting to http://171.67.108.21:8080/
[06:07:26] - Couldn't send HTTP request to server
[06:07:26] + Could not connect to Work Server
[06:07:26] - Attempt #6 to get work failed, and no other work to do.
Waiting before retry.