RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
both these GPU servers are in Reject, as stated in the title ...
Is anyone checking these server stats or getting any reports on failing servers ?
.
Is anyone checking these server stats or getting any reports on failing servers ?
.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
Does anyone know why some servers go in to 'Reject' mode ?
.
.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
noorman wrote:both these GPU servers are in Reject, as stated in the title ...
Is anyone checking these server stats or getting any reports on failing servers ?
.
You are asking very broad questions, which have simple and broad answers.
Yes, people do monitor the servers, and get reports about them. Those people include both members of the Stanford IT Staff and members of Pande Group.
Asking a more specific question might help get a more direct answer.
Yes, someone does know. Are you asking who that someone is, or asking for a list of reasons, or both?
The common reasons are hardware failures (mostly RAID problems), networking issues, and also when they load new work units from a new Project.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
I guess, the latter will be the most common answer ...
You seem to be able enough to read between the lines, so to speak.
The answer that tells me why they do is much more important than who knows why they do it and I think I made it clear that I didn't know, just by asking the question.
The 'automatism' that was brought in to 're-boot' (or whatever) servers if they misbehave seems to have worked.
I don't think that 're-loading' 2 servers would stop at about the exact same time. Both servers were back to normal when I checked back just after I typed out this thread.
So I changed the title to reflect that, then and there.
.
You seem to be able enough to read between the lines, so to speak.
The answer that tells me why they do is much more important than who knows why they do it and I think I made it clear that I didn't know, just by asking the question.
The 'automatism' that was brought in to 're-boot' (or whatever) servers if they misbehave seems to have worked.
I don't think that 're-loading' 2 servers would stop at about the exact same time. Both servers were back to normal when I checked back just after I typed out this thread.
So I changed the title to reflect that, then and there.
.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
Please note that Stanford is virtualizing servers. So multiple "servers" can go down if 1 hardware box goes offline. Probably not the case here, but it does happen.
Also, as these are both GPU servers, they were likely taken off line at the same time for a specific reason.
Also, as these are both GPU servers, they were likely taken off line at the same time for a specific reason.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
I didn't know that when a server is running in a virtual environment with another, it can't be taken offline - on its own - anymore ...
Anyway, the problem seems to be solved.
Anyway, the problem seems to be solved.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
Of course virtual servers can be taken off-line, either intentionally or due to certain specific hardware or software failures. I think the point is that multiple virtual servers can all be taken off-line simultaneously by general hardware failures, too.
No matter how reliable server hardware has become, it still goes off-line from time to time. The automatic notification systems sometimes work; the automatic reboot systems sometimes work, and posting a note here in this forum is productive in still other cases.
No matter how reliable server hardware has become, it still goes off-line from time to time. The automatic notification systems sometimes work; the automatic reboot systems sometimes work, and posting a note here in this forum is productive in still other cases.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 55
- Joined: Thu Nov 12, 2009 1:40 pm
- Hardware configuration: Win 7 Ultimate x64, 2 X Nvidia Fermi cards, one 2 Core CPU and one quad and one very old laptop
- Location: Chesterfield, UK
- Contact:
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
Problems uploading results again today.
and
Code: Select all
[13:07:05] + Attempting to send results [June 13 13:07:05 UTC]
[13:07:05] Gpu type=3 species=21.
[13:07:06] - Couldn't send HTTP request to server
[13:07:06] + Could not connect to Work Server (results)
[13:07:06] (171.64.65.64:8080)
[13:07:06] + Retrying using alternative port
[13:07:08] - Couldn't send HTTP request to server
[13:07:08] + Could not connect to Work Server (results)
[13:07:08] (171.64.65.64:80)
[13:07:08] - Error: Could not transmit unit 09 (completed June 13) to work server.
[13:07:08] - Read packet limit of 540015616... Set to 524286976.
[13:07:08] + Attempting to send results [June 13 13:07:08 UTC]
[13:07:08] Gpu type=3 species=21.
[13:12:23] - Couldn't send HTTP request to server
[13:12:23] + Could not connect to Work Server (results)
[13:12:23] (171.67.108.26:8080)
[13:12:23] + Retrying using alternative port
[13:12:25] - Couldn't send HTTP request to server
[13:12:25] + Could not connect to Work Server (results)
[13:12:25] (171.67.108.26:80)
[13:12:25] Could not transmit unit 09 to Collection server; keeping in queue.
[13:12:25] + Closed connections
Code: Select all
[15:47:48] Sending work to server
[15:47:48] Project: 11245 (Run 2, Clone 65, Gen 10)
[15:47:48] - Read packet limit of 540015616... Set to 524286976.
[15:47:48] + Attempting to send results [June 13 15:47:48 UTC]
[15:47:48] Gpu type=3 species=21.
[15:48:14] + Results successfully sent
[15:48:14] Thank you for your contribution to Folding@Home.
[15:48:14] + Number of Units Completed: 388
[15:48:20] Project: 6801 (Run 3210, Clone 4, Gen 19)
[15:48:20] - Read packet limit of 540015616... Set to 524286976.
[15:48:20] + Attempting to send results [June 13 15:48:20 UTC]
[15:48:20] Gpu type=3 species=21.
[15:48:22] - Couldn't send HTTP request to server
[15:48:22] + Could not connect to Work Server (results)
[15:48:22] (171.64.65.64:8080)
[15:48:22] + Retrying using alternative port
[15:48:24] - Couldn't send HTTP request to server
[15:48:24] + Could not connect to Work Server (results)
[15:48:24] (171.64.65.64:80)
[15:48:24] - Error: Could not transmit unit 09 (completed June 13) to work server.
[15:48:24] - Read packet limit of 540015616... Set to 524286976.
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
Re: RESOLVED: 171.64.65.103 & 171.64.65.64 both in Reject
171.64.65.64 and 171.67.108.26 both back up ...
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010