171.64.65.62 (vspg10c) down?

Moderators: Site Moderators, FAHC Science Team

DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

171.64.65.62 (vspg10c) down?

Post by DrSpalding »

I have three classic clients on two different networks that cannot upload finished WUs nor connect to it in order to pick up a new WU. Server status says that it is up but these three clients have been in limbo for about 90-120 minutes as of 11:30 PDT.
Not a real doctor, I just play one on the 'net!
Image
Hyperlife
Posts: 192
Joined: Sun Dec 02, 2007 7:38 am

Re: 171.64.65.62 (vspg10c) down?

Post by Hyperlife »

Same here. Could someone take a look at the server?

Code: Select all

18:41:15:Unit 02: Uploading 438.49KiB
18:41:15:Connecting to 171.64.65.62:8080
18:41:15:Sending unit results: id:01 state:SEND project:6508 run:1 clone:215 gen:93 core:0x78 unit:0x0356bdba4d8916a1005d00d70001196c
18:41:15:WARNING: Exception: Failed to send results to work server: Upload failed
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 171.64.65.62 (vspg10c) down?

Post by 7im »

Could someone a look at Server Status page? ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Hyperlife
Posts: 192
Joined: Sun Dec 02, 2007 7:38 am

Re: 171.64.65.62 (vspg10c) down?

Post by Hyperlife »

7im wrote:Could someone a look at Server Status page? ;)
I guess you didn't read DrSpalding's post carefully enough:
DrSpalding wrote:Server status says that it is up
:roll:
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 171.64.65.62 (vspg10c) down?

Post by 7im »

Look again. CPU load is at 4+. ;)
CPULOAD tells how many processes are running (in the past 1, 5, and 15 minutes). When this number gets above 2-3, the server is probably heavily loaded.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Hyperlife
Posts: 192
Joined: Sun Dec 02, 2007 7:38 am

Re: 171.64.65.62 (vspg10c) down?

Post by Hyperlife »

7im wrote:Look again. CPU load is at 4+. ;)
So? That's not excessive. For example, 171.67.108.20 has a CPU load of 4.59 right now and is working fine -- I've been able to upload and download WUs to it with no hiccups.

I suspect that 2-3 load warning is rather ancient. I've never had problems getting/sending WUs from servers in that load range. It's not even colored yellow!
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 171.64.65.62 (vspg10c) down?

Post by 7im »

See the whole picture.

Assignment percentage on 171.67.108.20 is nothing. It's 41% on 171.64.65.62. 171.64.65.62 is getting almost half of all CPU clients routed to it. Downloads also affect server availability.

However, someone may want to nugde that AS% down a tad so that completed WUs can be returned more quickly.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Hyperlife
Posts: 192
Joined: Sun Dec 02, 2007 7:38 am

Re: 171.64.65.62 (vspg10c) down?

Post by Hyperlife »

7im wrote:However, someone may want to nugde that AS% down a tad so that completed WUs can be returned more quickly.
Glad you agree that something should be done on the PG end. That wasn't so hard to admit, now was it?
Image
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: 171.64.65.62 (vspg10c) down?

Post by DrSpalding »

The server is now (well, 11:35 PDT) at 60% of WUs assigned to it. It is fairly obvious that it is neither handing out WUs nor accepting them or doing so at a rate that is wholly inadequate for the AS to keep assigning to it.

Edit: 12:40 PDT. It looks like many other classic WU servers are down and that may be why this server is trying to field so many requests now. The NetLoad is high: 96 and the only higher one is a PS3 WU server. In addition, one of the AS machines (vsp10v-vz00) is running super loaded with a netload of 150.
Not a real doctor, I just play one on the 'net!
Image
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: 171.64.65.62 (vspg10c) down?

Post by VijayPande »

Thanks for the feedback. We're looking into it right now.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: 171.64.65.62 (vspg10c) down?

Post by John_Weatherman »

I've managed to upload and download now two clients, so thanks to the person who kicked the server :)
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: 171.64.65.62 (vspg10c) down?

Post by DrSpalding »

Thanks Dr. Pande. Two out of three of my clients that were stuck have uploaded and retrieved new WUs successfully. I'm waiting on the third to wake up and try again on its own schedule.
Not a real doctor, I just play one on the 'net!
Image
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: 171.64.65.62 (vspg10c) down?

Post by VijayPande »

That's good to hear. The server is looking loaded but well behaved right now and the load has been working its way down over the last few hours. It's now below the level where we would expect there to be any major problems, although it is a heavy load.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 171.64.65.62 (vspg10c) down?

Post by 7im »

Hyperlife wrote:
7im wrote:However, someone may want to nugde that AS% down a tad so that completed WUs can be returned more quickly.
Glad you agree that something should be done on the PG end. That wasn't so hard to admit, now was it?
Well, since you didn't include a smilie, I have to assume you were serious about what you said. So here is the serious response.

Agree something "should" be done? On a normal day, that statement is a sarcastic way of saying I've already called in the calvary to take a look at the problem. I actually DO stuff around here. :roll: But in this case, PG noticed the problem without my help.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.64.65.62 (vspg10c) down?

Post by bruce »

I'm currently tracking a problem that appears to be only in V7, where a WU that has certain types of EUEs is uploaded for partial credit in V6 and fails to upload at all in V7. Does that describe what you're seeing?

Please find the end of processing of the WU in question and the FIRST attempt to upload. Post that much of the log here. I don't have enough data for it to be conclusive yet.

I had not discovered a problem with getting a new WU yet, but there might be a tie-in and might not be.
Post Reply