171.67.108.25 and 171.67.108.24 Down?

Moderators: Site Moderators, FAHC Science Team

Post Reply
road-runner
Posts: 227
Joined: Sun Dec 02, 2007 4:01 am
Location: Willis, Texas

171.67.108.25 and 171.67.108.24 Down?

Post by road-runner »

Got a few rigs that cant send in results, could someone kick the servers please...

Code: Select all

[01:10:49] Finished Work Unit:
[01:10:49] - Reading up to 21182544 from "work/wudata_09.trr": Read 21182544
[01:10:49] trr file hash check passed.
[01:10:49] - Reading up to 27672948 from "work/wudata_09.xtc": Read 27672948
[01:10:49] xtc file hash check passed.
[01:10:49] edr file hash check passed.
[01:10:49] logfile size: 178024
[01:10:49] Leaving Run
[01:10:51] - Writing 49249980 bytes of core data to disk...
[01:10:52]   ... Done.
[01:11:01] - Shutting down core
[01:11:01] 
[01:11:01] Folding@home Core Shutdown: FINISHED_UNIT
[01:14:14] CoreStatus = 64 (100)
[01:14:14] Sending work to server
[01:14:14] Project: 2671 (Run 10, Clone 71, Gen 0)


[01:14:14] + Attempting to send results [March 27 01:14:14 UTC]
[01:14:14] - Couldn't send HTTP request to server
[01:14:14]   (Got status 503)
[01:14:14] + Could not connect to Work Server (results)
[01:14:14]     (171.67.108.24:8080)
[01:14:14] + Retrying using alternative port
[01:14:14] - Couldn't send HTTP request to server
[01:14:14] + Could not connect to Work Server (results)
[01:14:14]     (171.67.108.24:80)
[01:14:14] - Error: Could not transmit unit 09 (completed March 27) to work server.
[01:14:14]   Keeping unit 09 in queue.
[01:14:14] Project: 2671 (Run 10, Clone 71, Gen 0)


[01:14:14] + Attempting to send results [March 27 01:14:14 UTC]
[01:14:14] - Couldn't send HTTP request to server
[01:14:14]   (Got status 503)
[01:14:14] + Could not connect to Work Server (results)
[01:14:14]     (171.67.108.24:8080)
[01:14:14] + Retrying using alternative port
[01:14:14] - Couldn't send HTTP request to server
[01:14:14] + Could not connect to Work Server (results)
[01:14:14]     (171.67.108.24:80)
[01:14:14] - Error: Could not transmit unit 09 (completed March 27) to work server.


[01:14:14] + Attempting to send results [March 27 01:14:14 UTC]
[01:14:15] - Couldn't send HTTP request to server
[01:14:15]   (Got status 503)
[01:14:15] + Could not connect to Work Server (results)
[01:14:15]     (171.67.108.25:8080)
[01:14:15] + Retrying using alternative port
[01:14:15] - Couldn't send HTTP request to server
[01:14:15]   (Got status 503)
[01:14:15] + Could not connect to Work Server (results)
[01:14:15]     (171.67.108.25:80)
[01:14:15]   Could not transmit unit 09 to Collection server; keeping in queue.
[01:14:15] - Preparing to get new work unit...
[01:14:15] + Attempting to get work packet
[01:14:15] - Connecting to assignment server
[01:14:15] - Successful: assigned to (171.64.65.56).
[01:14:15] + News From Folding@Home: Welcome to Folding@Home
[01:14:15] Loaded queue successfully.
[01:14:20] Project: 2671 (Run 10, Clone 71, Gen 0)


[01:14:20] + Attempting to send results [March 27 01:14:20 UTC]
[01:14:21] - Couldn't send HTTP request to server
[01:14:21]   (Got status 503)
[01:14:21] + Could not connect to Work Server (results)
[01:14:21]     (171.67.108.24:8080)
[01:14:21] + Retrying using alternative port
[01:14:21] - Couldn't send HTTP request to server
[01:14:21] + Could not connect to Work Server (results)
[01:14:21]     (171.67.108.24:80)
[01:14:21] - Error: Could not transmit unit 09 (completed March 27) to work server.


[01:14:21] + Attempting to send results [March 27 01:14:21 UTC]
[01:14:21] - Couldn't send HTTP request to server
[01:14:21]   (Got status 503)
[01:14:21] + Could not connect to Work Server (results)
[01:14:21]     (171.67.108.25:8080)
[01:14:21] + Retrying using alternative port
[01:14:21] - Couldn't send HTTP request to server
[01:14:21]   (Got status 503)
[01:14:21] + Could not connect to Work Server (results)
[01:14:21]     (171.67.108.25:80)
[01:14:21]   Could not transmit unit 09 to Collection server; keeping in queue.
[01:14:26] Project: 2671 (Run 10, Clone 71, Gen 0)


[01:14:26] + Attempting to send results [March 27 01:14:26 UTC]
[01:14:27] - Couldn't send HTTP request to server
[01:14:27]   (Got status 503)
[01:14:27] + Could not connect to Work Server (results)
[01:14:27]     (171.67.108.24:8080)
[01:14:27] + Retrying using alternative port
[01:14:27] - Couldn't send HTTP request to server
[01:14:27] + Could not connect to Work Server (results)
[01:14:27]     (171.67.108.24:80)
[01:14:27] - Error: Could not transmit unit 09 (completed March 27) to work server.


[01:14:27] + Attempting to send results [March 27 01:14:27 UTC]
[01:14:27] - Couldn't send HTTP request to server
[01:14:27]   (Got status 503)
[01:14:27] + Could not connect to Work Server (results)
[01:14:27]     (171.67.108.25:8080)
[01:14:27] + Retrying using alternative port
[01:14:27] - Couldn't send HTTP request to server
[01:14:27]   (Got status 503)
[01:14:27] + Could not connect to Work Server (results)
[01:14:27]     (171.67.108.25:80)
[01:14:27]   Could not transmit unit 09 to Collection server; keeping in queue.
[01:14:27] + Closed connections
[01:14:27] 
[01:14:27] + Processing work unit
[01:14:27] At least 4 processors must be requested.Core required: FahCore_a2.exe
[01:14:27] Core found.
[01:14:27] Working on queue slot 00 [March 27 01:14:27 UTC]
[01:14:27] + Working ...
[01:14:27] 
[01:14:27] *------------------------------*
[01:14:27] Folding@Home Gromacs SMP Core
[01:14:27] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[01:14:27] 
[01:14:27] Preparing to commence simulation
[01:14:27] - Ensuring status. Please wait.
[01:14:28] Called DecompressByteArray: compressed_data_size=4834326 data_size=24039181, decompressed_data_size=24039181 diff=0
[01:14:28] - Digital signature verified
[01:14:28] 
[01:14:28] Project: 2677 (Run 7, Clone 1, Gen 1)
Image
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by anandhanju »

Net load for .24 appears to be stuck at 300.
Pick2
Posts: 85
Joined: Fri Feb 13, 2009 12:38 pm
Hardware configuration: Linux & CPUs
Location: USA

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by Pick2 »

I've got two WU waiting on server 171.67.108.24 to get fixed also. Still at 300 net load
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by susato »

It's back up (or else the collection server is) - it just accepted two work units from my machine when I restarted Folding to force the upload.
road-runner
Posts: 227
Joined: Sun Dec 02, 2007 4:01 am
Location: Willis, Texas

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by road-runner »

Seems to be worse today, I have a lot trying to send in. Do we need to go ahead and shutdown till they get this fixed? Does not seem like its doing much good to do the WUs if there going to expire before they get back...
Image
Pick2
Posts: 85
Joined: Fri Feb 13, 2009 12:38 pm
Hardware configuration: Linux & CPUs
Location: USA

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by Pick2 »

Net load for .24 appears to be stuck at 300 again.
Won't take a WU at present.
road-runner
Posts: 227
Joined: Sun Dec 02, 2007 4:01 am
Location: Willis, Texas

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by road-runner »

Yea still having problems here also, this is 2 days now...
Image
goodyca
Posts: 187
Joined: Sun Dec 02, 2007 12:36 pm

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by goodyca »

I also have a WU (project 2676, run 3, clone 196, generation 2) that can not connect to this server. It was completed Mar 27 01:44:34 2009 UTC and is due Mar 29 07:10:40 2009 UTC (3 days).

Here is its last attempt to connect:

Code: Select all

[11:05:32] - Autosending finished units... [March 28 11:05:32 UTC]
[11:05:32] Trying to send all finished work units
[11:05:32] Project: 2676 (Run 3, Clone 196, Gen 2)


[11:05:32] + Attempting to send results [March 28 11:05:32 UTC]
[11:05:32] - Reading file work/wuresults_03.dat from core
[11:05:32]   (Read 26036354 bytes from disk)
[11:05:32] Connecting to http://171.67.108.24:8080/
[11:05:32] - Couldn't send HTTP request to server
[11:05:32]   (Got status 503)
[11:05:32] + Could not connect to Work Server (results)
[11:05:32]     (171.67.108.24:8080)
[11:05:32] + Retrying using alternative port
[11:05:32] Connecting to http://171.67.108.24:80/
[11:05:32] - Couldn't send HTTP request to server
[11:05:32] + Could not connect to Work Server (results)
[11:05:32]     (171.67.108.24:80)
[11:05:32] - Error: Could not transmit unit 03 (completed March 27) to work server.
[11:05:32] - 12 failed uploads of this unit.


[11:05:32] + Attempting to send results [March 28 11:05:32 UTC]
[11:05:32] - Reading file work/wuresults_03.dat from core
[11:05:32]   (Read 26036354 bytes from disk)
[11:05:32] Connecting to http://171.67.108.25:8080/
[11:05:32] - Couldn't send HTTP request to server
[11:05:32]   (Got status 503)
[11:05:32] + Could not connect to Work Server (results)
[11:05:32]     (171.67.108.25:8080)
[11:05:32] + Retrying using alternative port
[11:05:32] Connecting to http://171.67.108.25:80/
[11:05:33] - Couldn't send HTTP request to server
[11:05:33]   (Got status 503)
[11:05:33] + Could not connect to Work Server (results)
[11:05:33]     (171.67.108.25:80)
[11:05:33]   Could not transmit unit 03 to Collection server; keeping in queue.
[11:05:33] + Sent 0 of 1 completed units to the server
[11:05:33] - Autosend completed
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by Flathead74 »

[12:52:26] - Autosending finished units... [March 28 12:52:26 UTC]
[12:52:26] Trying to send all finished work units
[12:52:26] Project: 2677 (Run 34, Clone 50, Gen 2)


[12:52:26] + Attempting to send results [March 28 12:52:26 UTC]
[12:52:26] - Reading file work/wuresults_05.dat from core
[12:52:26] (Read 49222561 bytes from disk)
[12:52:26] Connecting to http://171.64.65.56:8080/
[12:52:27] - Couldn't send HTTP request to server
[12:52:27] (Got status 503)
[12:52:27] + Could not connect to Work Server (results)
[12:52:27] (171.64.65.56:8080)
[12:52:27] + Retrying using alternative port
[12:52:27] Connecting to http://171.64.65.56:80/
[12:52:27] - Couldn't send HTTP request to server
[12:52:27] (Got status 503)
[12:52:27] + Could not connect to Work Server (results)
[12:52:27] (171.64.65.56:80)
[12:52:27] - Error: Could not transmit unit 05 (completed March 28) to work server.
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by toTOW »

I've just notified the Pande Group about this issue.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by VijayPande »

Thanks for the heads up. Looking at the situation, we see that both of these servers are up right now. We've done some tweaks to help them out. It's possible that this may be some weird Stanford network issue, especially since the servers seem fine, although people are clearly having problems. We're keeping an eye on it and will post more info when we know it.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Pick2
Posts: 85
Joined: Fri Feb 13, 2009 12:38 pm
Hardware configuration: Linux & CPUs
Location: USA

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by Pick2 »

They seem to be working fine since ( somepoint ) after your post VP :)
Both 171.67.108.24 and 171.64.65.56 have sent up all "Ready For Upload" WU and have downloaded new WU when assigned there.
WickedPixie
Posts: 6
Joined: Mon Jan 21, 2008 5:40 pm

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by WickedPixie »

Seems like still having issues uploading...

3 on queue on this one and have a couple on another

Code: Select all

[10:01:20] Folding@home Core Shutdown: FINISHED_UNIT
Error encountered before initializing MPICH
[10:04:14] CoreStatus = 64 (100)
[10:04:14] Unit 6 finished with 79 percent of time to deadline remaining.
[10:04:14] Updated performance fraction: 0.784875
[10:04:14] Sending work to server
[10:04:14] - Already sending work
[10:04:14] Trying to send all finished work units
[10:04:14] - Already sending work
[10:04:14] - Already sending work
[10:04:14] - Already sending work
[10:04:14] + Sent 0 of 3 completed units to the server
[10:04:14] - Preparing to get new work unit...
[10:04:14] + Attempting to get work packet
[10:04:14] - Will indicate memory of 1004 MB
[10:04:14] - Connecting to assignment server
[10:04:14] Connecting to http://assign.stanford.edu:8080/
[10:04:14] Posted data.
[10:04:14] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[10:04:14] + News From Folding@Home: Welcome to Folding@Home
[10:04:15] Loaded queue successfully.
[10:04:15] Connecting to http://171.64.65.56:8080/
[10:04:20] Posted data.
[10:04:20] Initial: 0000; - Receiving payload (expected size: 4838194)
[10:04:29] - Downloaded at ~524 kB/s
[10:04:29] - Averaged speed for that direction ~511 kB/s
[10:04:29] + Received work.
[10:04:29] Trying to send all finished work units
[10:04:29] - Already sending work
[10:04:29] - Already sending work
[10:04:29] - Already sending work
[10:04:29] + Sent 0 of 3 completed units to the server
[10:04:29] + Closed connections
[10:04:29] 
[10:04:29] + Processing work unit
[10:04:29] At least 4 processors must be requested.Core required: FahCore_a2.exe
[10:04:29] Core found.
[10:04:29] Working on queue slot 07 [March 29 10:04:29 UTC]
[10:04:29] + Working ...
[10:04:29] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 07 -checkpoint 15 -verbose -lifeline 17582 -version 624'

[10:04:29] 
[10:04:29] *------------------------------*
[10:04:29] Folding@Home Gromacs SMP Core
[10:04:29] Version 2.04 (Thu Jan 29 16:43:57 PST 2009)
[10:04:29] 
[10:04:29] Preparing to commence simulation
[10:04:29] - Ensuring status. Please wait.
[10:04:30] Called DecompressByteArray: compressed_data_size=4837682 data_size=24019629, decompressed_data_size=24019629 diff=0
[10:04:30] - Digital signature verified
[10:04:30] 
[10:04:30] Project: 2677 (Run 3, Clone 25, Gen 2)
[10:04:30] 
[10:04:30] Assembly optimizations on if available.
[10:04:30] Entering M.D.
[10:04:42]  (Run 3, Clone 25, Gen 2)
bollix47
Posts: 2957
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by bollix47 »

Problems with 171.67.108.24 again.

Net load around 300.

Code: Select all

Sun Apr 5 05:00:20 PDT 2009 	171.67.108.24 	classic 	vsp21v 	kasson 	full 	Accepting 	0.10 	300 	0 	3 	26815 	2664 	0 	1.29 	4369 	4369 	2540 	- 	- 	- 	- 	22 	0 	3 	3 	 0 	- 	171.67.108.25 	0 	 0 	LX; LX; 	9000000, 1 	6, 6 	5, 5 	10000, 10000 	64, 64 	- 	F, A, B, F, A, B 	8080, 8080 	6.27.01 	- 	- 	- 	11 	kasson 	1 	vsp21v 	
Sun Apr 5 05:10:20 PDT 2009 	171.67.108.24 	classic 	vsp21v 	kasson 	full 	Accepting 	0.04 	301 	1 	3 	26815 	2664 	0 	1.29 	4369 	4369 	2540 	- 	- 	- 	- 	22 	0 	3 	3 	0 	- 	171.67.108.25 	0 	0 	LX; LX; 	9000000, 1 	6, 6 	5, 5 	10000, 10000 	64, 64 	- 	F, A, B, F, A, B 	8080, 8080 	6.27.01 	- 	- 	- 	11 	kasson 	1 	vsp21v 	
Sun Apr 5 05:20:20 PDT 2009 	171.67.108.24 	classic 	vsp21v 	kasson 	full 	Accepting 	0.00 	300 	0 	3 	26815 	2664 	0 	1.29 	4369 	4369 	2540 	- 	- 	- 	- 	22 	0 	3 	3 	0 	- 	171.67.108.25 	0 	0 	LX; LX; 	9000000, 1 	6, 6 	5, 5 	10000, 10000 
Image
kiore
Posts: 921
Joined: Fri Jan 16, 2009 5:45 pm
Location: USA

Re: 171.67.108.25 and 171.67.108.24 Down?

Post by kiore »

Getting 503's on this one. in the queue.
kiore
Image
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .

Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
Post Reply