Page 1 of 3

server 128.143.199.97 offline

Posted: Tue Feb 17, 2015 10:52 pm
by kasson
We've taken 128.143.199.97 offline due to a RAID failure. We're doing our best to restore function. No ETA at this time, unfortunately.

Re: server 128.143.199.97 offline

Posted: Thu Feb 19, 2015 6:43 am
by ThunderRd
OK, I got some work after I was finally assigned to a WS, but the queued-for-upload WU is still getting a 503 on the above server on both port 80 and 8080.

Re: server 128.143.199.97 offline

Posted: Thu Feb 19, 2015 3:28 pm
by Gary480six
Is anybody else having trouble getting new work since this work server went offline?

I have several systems still running the version 6.34 SMP client on various Windows 7 systems (AMD and Intel) and they have been trying for hours to get new work. The assignment server keeps kicking back this message:
Could not authenticate Assignment Server response
Could not authenticate Assignment Server 2 response

I know this was going on a few months ago when they were working on the new assignment servers.. I'm hoping it's just a short term glitch.

Re: server 128.143.199.97 offline

Posted: Thu Feb 19, 2015 3:35 pm
by ThunderRd
Interestingly enough, I finished the WU I finally was able to get, uploaded it to .96, and now I can't get another new one to keep going.

The one that came from .97 is still in the queue to upload with no success.

Re: server 128.143.199.97 offline

Posted: Thu Feb 19, 2015 9:44 pm
by kasson
We're making some progress on this server. If all goes well, we might be up within the next day. (At one point, it was looking like a total loss, so this is very good news.)

No connection could be made because the target machine activ

Posted: Fri Feb 20, 2015 7:04 am
by 1066ad
I don't think it's my end but how long will it keep retrying ? It ran for days so would hate to lose it !

(February 21, 2015 in UTC)

Code: Select all

05:59:48:WU00:FS00:0xa3:logfile size: 804780
05:59:48:WU00:FS00:0xa3:Leaving Run
05:59:50:WU00:FS00:0xa3:- Writing 43618000 bytes of core data to disk...
06:00:31:WU00:FS00:0xa3:Done: 43617488 -> 41729446 (compressed to 95.6 percent)
06:00:32:WU00:FS00:0xa3:  ... Done.
06:00:49:WU00:FS00:0xa3:- Shutting down core
06:00:49:WU00:FS00:0xa3:
06:00:49:WU00:FS00:0xa3:Folding@home Core Shutdown: FINISHED_UNIT
06:00:52:WU00:FS00:FahCore returned:[b] FINISHED_UNIT [/b](100 = 0x64)
06:00:52:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:00:52:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:00:52:WU01:FS00:Starting
06:00:52:WU00:FS00:Connecting to 128.143.199.97:8080
06:00:52:WU01:FS00:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" C:/Users/psg/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 704 -lifeline 4248 -checkpoint 15 -np 2
06:00:52:WU01:FS00:Started FahCore on PID 8888
06:00:53:WU01:FS00:Core PID:6680
06:00:53:WU01:FS00:FahCore 0xa4 started
06:00:53:WU01:FS00:0xa4:
06:00:53:WU01:FS00:0xa4:*------------------------------*
06:00:53:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
06:00:53:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
06:00:53:WU01:FS00:0xa4:
06:00:53:WU01:FS00:0xa4:Preparing to commence simulation
06:00:53:WU01:FS00:0xa4:- Looking at optimizations...
06:00:53:WU01:FS00:0xa4:- Created dyn
06:00:53:WU01:FS00:0xa4:- Files status OK
06:00:53:WU01:FS00:0xa4:- Expanded 923524 -> 1534204 (decompressed 166.1 percent)
06:00:53:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=923524 data_size=1534204, decompressed_data_size=1534204 diff=0
06:00:53:WU01:FS00:0xa4:- Digital signature verified
06:00:53:WU01:FS00:0xa4:
06:00:53:WU01:FS00:0xa4:Project: 9014 (Run 334, Clone 5, Gen 152)
06:00:53:WU01:FS00:0xa4:
06:00:53:WU01:FS00:0xa4:Assembly optimizations on if available.
06:00:53:WU01:FS00:0xa4:Entering M.D.
06:00:53:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:00:53:WU00:FS00:Connecting to 128.143.199.97:80
06:00:55:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:00:55:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:00:55:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:00:55:WU00:FS00:Connecting to 128.143.199.97:8080
06:00:56:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:00:56:WU00:FS00:Connecting to 128.143.199.97:80
06:00:57:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:00:59:WU01:FS00:0xa4:Mapping NT from 2 to 2 
06:01:00:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
06:01:55:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:01:55:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:01:55:WU00:FS00:Connecting to 128.143.199.97:8080
06:01:56:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:01:56:WU00:FS00:Connecting to 128.143.199.97:80
06:01:57:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:03:32:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:03:32:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:03:32:WU00:FS00:Connecting to 128.143.199.97:8080
06:03:33:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:03:33:WU00:FS00:Connecting to 128.143.199.97:80
06:03:34:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:06:09:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:06:09:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:06:09:WU00:FS00:Connecting to 128.143.199.97:8080
06:06:16:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:06:16:WU00:FS00:Connecting to 128.143.199.97:80
06:06:17:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:10:24:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:10:24:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:10:24:WU00:FS00:Connecting to 128.143.199.97:8080
06:10:25:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:10:25:WU00:FS00:Connecting to 128.143.199.97:80
06:10:26:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:12:45:WU01:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
06:17:15:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:17:15:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:17:15:WU00:FS00:Connecting to 128.143.199.97:8080
06:17:16:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:17:16:WU00:FS00:Connecting to 128.143.199.97:80
06:17:17:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:24:36:WU01:FS00:0xa4:Completed 5000 out of 250000 steps  (2%)
06:28:21:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:28:21:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:28:21:WU00:FS00:Connecting to 128.143.199.97:8080
06:28:22:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:28:22:WU00:FS00:Connecting to 128.143.199.97:80
06:28:23:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
06:38:54:WU01:FS00:0xa4:Completed 7500 out of 250000 steps  (3%)
06:46:17:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:7505 run:0 clone:31 gen:441 core:0xa3 unit:0x00000252fbcb017d4e29d28d72cbf625
06:46:17:WU00:FS00:Uploading 39.80MiB to 128.143.199.97
06:46:17:WU00:FS00:Connecting to 128.143.199.97:8080
06:46:18:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
06:46:18:WU00:FS00:Connecting to 128.143.199.97:80
06:46:20:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.199.97:80: No connection could be made because the target machine actively refused it.
Mod edit: Changed font type to Code tags


. . .

Re: server 128.143.199.97 offline

Posted: Fri Feb 20, 2015 7:10 am
by Joe_H
@1066ad, Your report has been moved to the announcement that the Work Server is down and being worked on. Hopefully as Dr. Kasson has posted the WS will be up soon. The client will keep retrying until the WU goes past its final deadline.

Re: server 128.143.199.97 offline

Posted: Fri Feb 20, 2015 3:21 pm
by 1066ad
kasson wrote:We're making some progress on this server. If all goes well, we might be up within the next day. (At one point, it was looking like a total loss, so this is very good news.)
Ah! This explains why my results aren't uploading, still re-trying since midnight, hope it keeps trying and not quitting, many days' worth of data.

Re: server 128.143.199.97 offline

Posted: Fri Feb 20, 2015 5:28 pm
by Ar`Kritz
Did someone say RAID isn't backup? No?

Well, RAID isn't backup.

Re: server 128.143.199.97 offline

Posted: Sat Feb 21, 2015 2:04 pm
by toTOW
Is it the only server that can serve work to v6 clients ?

Since it's down, my old linux v6 client is unable to get work :

Code: Select all

[02:16:13] + Attempting to get work packet
[02:16:13] Passkey found
[02:16:13] - Will indicate memory of 12036 MB
[02:16:13] - Connecting to assignment server
[02:16:13] Connecting to http://assign.stanford.edu:8080/
[02:16:13] Posted data.
[02:16:13] Initial: 0000; + Could not authenticate Assignment Server response
[02:16:13] Connecting to http://assign2.stanford.edu:80/
[02:16:18] Posted data.
[02:16:18] Initial: 0000; + Could not authenticate Assignment Server 2 response
[02:16:18] + Couldn't get work instructions.
[02:16:18] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.

Re: server 128.143.199.97 offline

Posted: Sat Feb 21, 2015 2:53 pm
by 7im
toTOW wrote:Is it the only server that can serve work to v6 clients ?
Depends. What version of v6? The log didn't show it.

Re: server 128.143.199.97 offline

Posted: Sat Feb 21, 2015 3:40 pm
by sco01
Similarly on Windows:

Code: Select all

[15:34:57] - Connecting to assignment server
[15:34:57] Connecting to http://assign.stanford.edu:8080/
[15:34:57] Posted data.
[15:34:57] Initial: 0000; + Could not authenticate Assignment Server response
[15:34:57] Connecting to http://assign2.stanford.edu:80/
[15:34:58] Posted data.
[15:34:58] Initial: 0000; + Could not authenticate Assignment Server 2 response
[15:34:58] + Couldn't get work instructions.
[15:34:58] - Attempt #58  to get work failed, and no other work to do.
Waiting before retry.

Re: server 128.143.199.97 offline

Posted: Sat Feb 21, 2015 8:01 pm
by Joe_H
toTOW wrote:Is it the only server that can serve work to v6 clients ?
A few other servers appear to be enabled for version 6 CPU folding clients, a minimum requirement of 6.34 seems common. Whether there is work to be had in those queues can not be determined from the serverstat page. Increasingly version 7 is the minimum required, sometimes at least 7.2.9.

In some cases deleting the machinedependent.dat file from the v6 directory and restarting the client will change the situation and result in an assignment to a different server. But that depends on what settings are being used on the client, and what is available on the servers.

Re: server 128.143.199.97 offline

Posted: Mon Feb 23, 2015 10:28 am
by toTOW
7im wrote:
toTOW wrote:Is it the only server that can serve work to v6 clients ?
Depends. What version of v6? The log didn't show it.
6.34 with SMP flag (8 cores and 12 GB reported to the AS). I tried a few other flag combination but it didn't help ...

Re: server 128.143.199.97 offline

Posted: Mon Feb 23, 2015 8:01 pm
by orion456
Half my SMPs are receiving data and the others are not. Of the ones not working, half are 12 cores the others 4 cores. When I try deleting machinedependent.dat as well as the work directory and queue, they still get no work. All are v6.34.