RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Moderators: Site Moderators, FAHC Science Team

Post Reply
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by noorman »

both these GPU servers are in Reject, as stated in the title ...
Is anyone checking these server stats or getting any reports on failing servers ?

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by noorman »

Does anyone know why some servers go in to 'Reject' mode ?

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by 7im »

noorman wrote:both these GPU servers are in Reject, as stated in the title ...
Is anyone checking these server stats or getting any reports on failing servers ?

.

You are asking very broad questions, which have simple and broad answers.

Yes, people do monitor the servers, and get reports about them. Those people include both members of the Stanford IT Staff and members of Pande Group.

Asking a more specific question might help get a more direct answer.


Yes, someone does know. Are you asking who that someone is, or asking for a list of reasons, or both?

The common reasons are hardware failures (mostly RAID problems), networking issues, and also when they load new work units from a new Project.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by noorman »

I guess, the latter will be the most common answer ...

You seem to be able enough to read between the lines, so to speak.
The answer that tells me why they do is much more important than who knows why they do it and I think I made it clear that I didn't know, just by asking the question.

The 'automatism' that was brought in to 're-boot' (or whatever) servers if they misbehave seems to have worked.
I don't think that 're-loading' 2 servers would stop at about the exact same time. Both servers were back to normal when I checked back just after I typed out this thread.
So I changed the title to reflect that, then and there.

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by 7im »

Please note that Stanford is virtualizing servers. So multiple "servers" can go down if 1 hardware box goes offline. Probably not the case here, but it does happen.

Also, as these are both GPU servers, they were likely taken off line at the same time for a specific reason.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by noorman »

I didn't know that when a server is running in a virtual environment with another, it can't be taken offline - on its own - anymore ...
Anyway, the problem seems to be solved.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by bruce »

Of course virtual servers can be taken off-line, either intentionally or due to certain specific hardware or software failures. I think the point is that multiple virtual servers can all be taken off-line simultaneously by general hardware failures, too.

No matter how reliable server hardware has become, it still goes off-line from time to time. The automatic notification systems sometimes work; the automatic reboot systems sometimes work, and posting a note here in this forum is productive in still other cases.
baz657
Posts: 55
Joined: Thu Nov 12, 2009 1:40 pm
Hardware configuration: Win 7 Ultimate x64, 2 X Nvidia Fermi cards, one 2 Core CPU and one quad and one very old laptop
Location: Chesterfield, UK
Contact:

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by baz657 »

Problems uploading results again today.

Code: Select all

[13:07:05] + Attempting to send results [June 13 13:07:05 UTC]
[13:07:05] Gpu type=3 species=21.
[13:07:06] - Couldn't send HTTP request to server
[13:07:06] + Could not connect to Work Server (results)
[13:07:06]     (171.64.65.64:8080)
[13:07:06] + Retrying using alternative port
[13:07:08] - Couldn't send HTTP request to server
[13:07:08] + Could not connect to Work Server (results)
[13:07:08]     (171.64.65.64:80)
[13:07:08] - Error: Could not transmit unit 09 (completed June 13) to work server.
[13:07:08] - Read packet limit of 540015616... Set to 524286976.


[13:07:08] + Attempting to send results [June 13 13:07:08 UTC]
[13:07:08] Gpu type=3 species=21.
[13:12:23] - Couldn't send HTTP request to server
[13:12:23] + Could not connect to Work Server (results)
[13:12:23]     (171.67.108.26:8080)
[13:12:23] + Retrying using alternative port
[13:12:25] - Couldn't send HTTP request to server
[13:12:25] + Could not connect to Work Server (results)
[13:12:25]     (171.67.108.26:80)
[13:12:25]   Could not transmit unit 09 to Collection server; keeping in queue.
[13:12:25] + Closed connections
and

Code: Select all

[15:47:48] Sending work to server
[15:47:48] Project: 11245 (Run 2, Clone 65, Gen 10)
[15:47:48] - Read packet limit of 540015616... Set to 524286976.


[15:47:48] + Attempting to send results [June 13 15:47:48 UTC]
[15:47:48] Gpu type=3 species=21.
[15:48:14] + Results successfully sent
[15:48:14] Thank you for your contribution to Folding@Home.
[15:48:14] + Number of Units Completed: 388

[15:48:20] Project: 6801 (Run 3210, Clone 4, Gen 19)
[15:48:20] - Read packet limit of 540015616... Set to 524286976.


[15:48:20] + Attempting to send results [June 13 15:48:20 UTC]
[15:48:20] Gpu type=3 species=21.
[15:48:22] - Couldn't send HTTP request to server
[15:48:22] + Could not connect to Work Server (results)
[15:48:22]     (171.64.65.64:8080)
[15:48:22] + Retrying using alternative port
[15:48:24] - Couldn't send HTTP request to server
[15:48:24] + Could not connect to Work Server (results)
[15:48:24]     (171.64.65.64:80)
[15:48:24] - Error: Could not transmit unit 09 (completed June 13) to work server.
[15:48:24] - Read packet limit of 540015616... Set to 524286976.

noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject

Post by noorman »

171.64.65.64 and 171.67.108.26 both back up ...
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
Post Reply