130.237.232.237 going down for maintenance

bollix47 · Post by **bollix47** » Sat Apr 21, 2012 11:34 am

A bigadv just finished using v6 and again the a5 core would not shutdown:

[22:25:44] Project: 6901 (Run 11, Clone 13, Gen 84)
[22:25:44] 
[22:25:44] Assembly optimizations on if available.
[22:25:44] Entering M.D.
[22:25:51] Mapping NT from 64 to 64 
[22:25:55] Completed 0 out of 250000 steps  (0%)
[22:33:19] Completed 2500 out of 250000 steps  (1%)
[22:40:55] Completed 5000 out of 250000 steps  (2%)
[22:48:51] Completed 7500 out of 250000 steps  (3%)
[22:56:36] Completed 10000 out of 250000 steps  (4%)
[23:04:14] Completed 12500 out of 250000 steps  (5%)
[23:11:56] Completed 15000 out of 250000 steps  (6%)
[23:19:37] Completed 17500 out of 250000 steps  (7%)
[23:27:21] Completed 20000 out of 250000 steps  (8%)
[23:34:57] Completed 22500 out of 250000 steps  (9%)
[23:42:31] Completed 25000 out of 250000 steps  (10%)
[23:50:11] Completed 27500 out of 250000 steps  (11%)
[23:57:49] Completed 30000 out of 250000 steps  (12%)
[00:05:21] Completed 32500 out of 250000 steps  (13%)
[00:13:00] Completed 35000 out of 250000 steps  (14%)
[00:20:32] Completed 37500 out of 250000 steps  (15%)
[00:28:10] Completed 40000 out of 250000 steps  (16%)
[00:35:37] Completed 42500 out of 250000 steps  (17%)
[00:43:15] Completed 45000 out of 250000 steps  (18%)
[00:50:40] Completed 47500 out of 250000 steps  (19%)
[00:58:17] Completed 50000 out of 250000 steps  (20%)
[01:05:57] Completed 52500 out of 250000 steps  (21%)
[01:13:28] Completed 55000 out of 250000 steps  (22%)
[01:21:08] Completed 57500 out of 250000 steps  (23%)
[01:29:02] Completed 60000 out of 250000 steps  (24%)
[01:36:34] Completed 62500 out of 250000 steps  (25%)
[01:44:13] Completed 65000 out of 250000 steps  (26%)
[01:51:56] Completed 67500 out of 250000 steps  (27%)
[01:59:37] Completed 70000 out of 250000 steps  (28%)
[02:07:13] Completed 72500 out of 250000 steps  (29%)
[02:14:52] Completed 75000 out of 250000 steps  (30%)
[02:22:39] Completed 77500 out of 250000 steps  (31%)
[02:30:24] Completed 80000 out of 250000 steps  (32%)
[02:37:59] Completed 82500 out of 250000 steps  (33%)
[02:45:35] Completed 85000 out of 250000 steps  (34%)
[02:53:13] Completed 87500 out of 250000 steps  (35%)
[03:00:57] Completed 90000 out of 250000 steps  (36%)
[03:08:36] Completed 92500 out of 250000 steps  (37%)
[03:16:18] Completed 95000 out of 250000 steps  (38%)
[03:24:03] Completed 97500 out of 250000 steps  (39%)
[03:31:41] Completed 100000 out of 250000 steps  (40%)
[03:39:19] Completed 102500 out of 250000 steps  (41%)
[03:46:54] Completed 105000 out of 250000 steps  (42%)
[03:54:35] Completed 107500 out of 250000 steps  (43%)
[04:02:15] Completed 110000 out of 250000 steps  (44%)
[04:09:52] Completed 112500 out of 250000 steps  (45%)
[04:17:32] Completed 115000 out of 250000 steps  (46%)
[04:24:43] - Autosending finished units... [April 21 04:24:43 UTC]
[04:24:43] Trying to send all finished work units
[04:24:43] + No unsent completed units remaining.
[04:24:43] - Autosend completed
[04:25:33] Completed 117500 out of 250000 steps  (47%)
[04:33:26] Completed 120000 out of 250000 steps  (48%)
[04:41:02] Completed 122500 out of 250000 steps  (49%)
[04:48:51] Completed 125000 out of 250000 steps  (50%)
[04:56:25] Completed 127500 out of 250000 steps  (51%)
[05:04:13] Completed 130000 out of 250000 steps  (52%)
[05:11:48] Completed 132500 out of 250000 steps  (53%)
[05:19:27] Completed 135000 out of 250000 steps  (54%)
[05:27:01] Completed 137500 out of 250000 steps  (55%)
[05:34:36] Completed 140000 out of 250000 steps  (56%)
[05:42:09] Completed 142500 out of 250000 steps  (57%)
[05:49:43] Completed 145000 out of 250000 steps  (58%)
[05:57:14] Completed 147500 out of 250000 steps  (59%)
[06:04:47] Completed 150000 out of 250000 steps  (60%)
[06:12:23] Completed 152500 out of 250000 steps  (61%)
[06:20:13] Completed 155000 out of 250000 steps  (62%)
[06:27:51] Completed 157500 out of 250000 steps  (63%)
[06:35:24] Completed 160000 out of 250000 steps  (64%)
[06:43:00] Completed 162500 out of 250000 steps  (65%)
[06:50:53] Completed 165000 out of 250000 steps  (66%)
[06:58:38] Completed 167500 out of 250000 steps  (67%)
[07:06:16] Completed 170000 out of 250000 steps  (68%)
[07:13:47] Completed 172500 out of 250000 steps  (69%)
[07:21:20] Completed 175000 out of 250000 steps  (70%)
[07:29:02] Completed 177500 out of 250000 steps  (71%)
[07:36:43] Completed 180000 out of 250000 steps  (72%)
[07:44:18] Completed 182500 out of 250000 steps  (73%)
[07:52:00] Completed 185000 out of 250000 steps  (74%)
[08:00:03] Completed 187500 out of 250000 steps  (75%)
[08:07:43] Completed 190000 out of 250000 steps  (76%)
[08:15:19] Completed 192500 out of 250000 steps  (77%)
[08:22:52] Completed 195000 out of 250000 steps  (78%)
[08:30:28] Completed 197500 out of 250000 steps  (79%)
[08:38:03] Completed 200000 out of 250000 steps  (80%)
[08:45:47] Completed 202500 out of 250000 steps  (81%)
[08:53:22] Completed 205000 out of 250000 steps  (82%)
[09:01:00] Completed 207500 out of 250000 steps  (83%)
[09:08:33] Completed 210000 out of 250000 steps  (84%)
[09:16:07] Completed 212500 out of 250000 steps  (85%)
[09:23:51] Completed 215000 out of 250000 steps  (86%)
[09:31:26] Completed 217500 out of 250000 steps  (87%)
[09:39:04] Completed 220000 out of 250000 steps  (88%)
[09:46:51] Completed 222500 out of 250000 steps  (89%)
[09:54:24] Completed 225000 out of 250000 steps  (90%)
[10:02:13] Completed 227500 out of 250000 steps  (91%)
[10:09:48] Completed 230000 out of 250000 steps  (92%)
[10:17:24] Completed 232500 out of 250000 steps  (93%)
[10:24:43] - Autosending finished units... [April 21 10:24:43 UTC]
[10:24:43] Trying to send all finished work units
[10:24:43] + No unsent completed units remaining.
[10:24:43] - Autosend completed
[10:25:05] Completed 235000 out of 250000 steps  (94%)
[10:32:46] Completed 237500 out of 250000 steps  (95%)
[10:40:36] Completed 240000 out of 250000 steps  (96%)
[10:48:12] Completed 242500 out of 250000 steps  (97%)
[10:55:46] Completed 245000 out of 250000 steps  (98%)
[11:03:28] Completed 247500 out of 250000 steps  (99%)
[11:11:04] Completed 250000 out of 250000 steps  (100%)
[11:11:18] DynamicWrapper: Finished Work Unit: sleep=10000
[11:11:28] 
[11:11:28] Finished Work Unit:
[11:11:28] - Reading up to 52713120 from "work/wudata_03.trr": Read 52713120
[11:11:28] trr file hash check passed.
[11:11:28] - Reading up to 47103644 from "work/wudata_03.xtc": Read 47103644
[11:11:29] xtc file hash check passed.
[11:11:29] edr file hash check passed.
[11:11:29] logfile size: 195676
[11:11:29] Leaving Run
[11:11:32] - Writing 100182388 bytes of core data to disk...
[11:11:34]   ... Done.

Had to kill a5 core before the log would continue and lost the WU


[11:21:41] - Shutting down core
[11:21:41] 
[11:21:41] Folding@home Core Shutdown: FINISHED_UNIT
[11:21:53] CoreStatus = 89 (137)
[11:21:53] Client-core communications error: ERROR 0x89
[11:21:53] Deleting current work unit & continuing...
[11:22:23] ***** Got an Activate signal (2)
[11:22:24] - Warning: Could not delete all work unit files (3): Core file absent
[11:22:24] Trying to send all finished work units
[11:22:24] + No unsent completed units remaining.
[11:22:24] - Preparing to get new work unit...
[11:22:24] Cleaning up work directory
[11:22:24] + Attempting to get work packet
[11:22:24] Passkey found
[11:22:24] - Will indicate memory of 32170 MB
[11:22:24] - Connecting to assignment server
[11:22:24] Connecting to http://assign.stanford.edu:8080/
[11:22:24] Killing all core threads

Folding@Home Client Shutdown.

So my only choices at this point are to do regular SMP or go back to Ubuntu 11.10 which appears to be working fine on another bigadv pc. For the next few hours I'll do some regular units in case there's something you would like me to try. After that I'll load 11.10.

Edit: a3 core shutting down and WU uploading as expected.
It would appear that Ubuntu 12.04 is not ready for prime time yet even though the release is expected soon. Got an error that said something like apport-gtk crashed when it was trying to report an error concerning samba.

bollix47 · Post by **bollix47** » Sun Apr 22, 2012 1:20 pm

Update:

Removing Ubuntu 12.04 and installing 11.10 appears to have cleared the problem of the core not shutting down.

Code: Select all

[11:44:30] Leaving Run
[11:44:33] - Writing 100104588 bytes of core data to disk...
[11:44:35]   ... Done.
[11:44:48] - Shutting down core
[11:44:48] 
[11:44:48] Folding@home Core Shutdown: FINISHED_UNIT
[11:44:49] CoreStatus = 64 (100)
[11:44:49] Unit 2 finished with 91 percent of time to deadline remaining.
[11:44:49] Updated performance fraction: 0.941052
[11:44:49] Sending work to server
[11:44:49] Project: 6901 (Run 5, Clone 4, Gen 191)


[11:44:49] + Attempting to send results [April 22 11:44:49 UTC]
[11:44:49] - Reading file work/wuresults_02.dat from core
[11:44:49]   (Read 100104588 bytes from disk)
[11:44:49] Connecting to http://130.237.232.237:8080/

However, after going thru the motions of uploading, the client never received acknowlegement from server .237 and after almost an hour of waiting I stopped the client and switched to a regular smp. Upon restart the client sent the results of the 6901 again but after the usual time there was no completion message. The server does appear to have a higher than usual net load so I could be the victim of coincidence.

Post by **kasson** » Sun Apr 22, 2012 2:13 pm

Hmm--does your client try port 80? Does that work?
I'm letting the work server developer know about this. At the moment, I'm not finding enough information in our logs to debug on my end.

bollix47 · Post by **bollix47** » Sun Apr 22, 2012 2:25 pm

No the v6 client did not try port 80 on .237, only 8080. It has now tried 3 or 4 times and always the same result, just sitting there waiting for acknowlegement.

The a3 WUs are uploading fine to other servers, also port 8080.

My browser is working fine so no problem on port 80.

Grandpa_01 · Post by **Grandpa_01** » Sun Apr 22, 2012 3:19 pm

There appears to be a problem with the server I have sent 6901 (13, 7, 150) 3 times this morning and the server never acknowledges it. I know it is sending by watching my bandwidth usage. After it completes the sending the assignment server never sends another 1 and the connection to the AS appears to be very slow short burst of 92bytes/s

bollix47 · Post by **bollix47** » Sun Apr 22, 2012 3:21 pm

Thanks for the confirmation Grandpa, I have little hair left to pull out.

Grandpa_01 · Post by **Grandpa_01** » Sun Apr 22, 2012 3:29 pm

I switched the computer over to smp and no problem getting a WU the problem appears to be on the server end. And I have now sent the above mentioned WU 4 times with no acknowledgement from the server.

bollix47 · Post by **bollix47** » Sun Apr 22, 2012 3:38 pm

I've switched over to a3 as well but when they finish I have to stop the client and send it manually using ./fah6 -send NN otherwise the client hangs and nothing gets sent. IIRC it depends on the position of the WU in the queue.

overdoze · Post by **overdoze** » Sun Apr 22, 2012 3:43 pm

I too have 2 WU on the sending queue for several hours.
Will try swiching to a3

Grandpa_01 · Post by **Grandpa_01** » Sun Apr 22, 2012 5:01 pm

Kasson will you shut the server down I have to manually shut fah down after it sends a completed smp wu to the smp collection server because fah re sends the bigadv wu that it is holding in q then it just sits and waits for a response from the cs that it has received the bigadv wu until I shut it down then it will download another smp wu upon restart.

schro · Post by **schro** » Sun Apr 22, 2012 5:17 pm

My 4p box has been trying to connect to .237 for a WU since early this morning.. I think it at least posted its last WU back successfully though..

mflanaga · Post by **mflanaga** » Sun Apr 22, 2012 6:22 pm

Same problem here... can't get any response from 130.237.237.237 all day.

Prelude514 · Post by **Prelude514** » Sun Apr 22, 2012 7:16 pm

Same issue here. I've uploaded a 6903 weighing 227MB 3 times now, with no success. Once upload is complete, nothing happens. Have a machine idling now.

sbinh · Post by **sbinh** » Sun Apr 22, 2012 7:26 pm

Grandpa_01 wrote:There appears to be a problem with the server I have sent 6901 (13, 7, 150) 3 times this morning and the server never acknowledges it. I know it is sending by watching my bandwidth usage. After it completes the sending the assignment server never sends another 1 and the connection to the AS appears to be very slow short burst of 92bytes/s

Same here ... Having issue with 6901 on 2p rig right now ...

Code: Select all

[19:09:21] Sending work to server
[19:09:21] Project: 6901 (Run 9, Clone 3, Gen 255)


[19:09:21] + Attempting to send results [April 22 19:09:21 UTC]
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:8080)
[19:09:21] + Retrying using alternative port
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:80)
[19:09:21] - Error: Could not transmit unit 01 (completed April 22) to work server.
[19:09:21]   Keeping unit 01 in queue.
[19:09:21] Project: 6901 (Run 9, Clone 3, Gen 255)


[19:09:21] + Attempting to send results [April 22 19:09:21 UTC]
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:8080)
[19:09:21] + Retrying using alternative port
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:80)
[19:09:21] - Error: Could not transmit unit 01 (completed April 22) to work server.
[19:09:21]   Keeping unit 01 in queue.
[19:09:21] - Preparing to get new work unit...
[19:09:21] Cleaning up work directory
[19:09:21] + Attempting to get work packet
[19:09:21] Passkey found
[19:09:21] - Connecting to assignment server
[19:09:21] - Successful: assigned to (130.237.232.237).
[19:09:21] + News From Folding@Home: Welcome to Folding@Home
[19:09:21] Loaded queue successfully.

Nathan_p had issue with 6904 and I had issue with 6903 WUs ... see here: viewtopic.php?f=19&p=213277#p213277.

Nathan_P · Post by **Nathan_P** » Sun Apr 22, 2012 7:38 pm

Just had issues uploading a 6904, it tried to sendfor 45 minutes but i got the "couldn't get a http response from server", then it appeared to upload but crashed the client as in the meantime it had managed to download a 6901, restarted the client and it has just resent the same WU, spent 90 minutes uploading the WU only to get the following in the log:

Code: Select all

 [17:54:11] Project: 6904 (Run 1, Clone 18, Gen 105)
[17:54:11] + Processing work unit


[17:54:11] Core required: FahCore_a5.exe
[17:54:11] + Attempting to send results [April 22 17:54:11 UTC]
[17:54:11] Core found.
[17:54:11] Working on queue slot 00 [April 22 17:54:11 UTC]
[17:54:11] + Working ...
[17:54:11] 
[17:54:11] *------------------------------*
[17:54:11] Folding@Home Gromacs SMP Core
[17:54:11] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[17:54:11] 
[17:54:11] Preparing to commence simulation
[17:54:11] - Looking at optimizations...
[17:54:11] - Created dyn
[17:54:11] - Files status OK
[17:54:13] - Expanded 24876005 -> 30796292 (decompressed 123.7 percent)
[17:54:13] Called DecompressByteArray: compressed_data_size=24876005 data_size=30796292, decompressed_data_size=30796292 diff=0
[17:54:13] - Digital signature verified
[17:54:13] 
[17:54:13] Project: 6901 (Run 16, Clone 6, Gen 195)
[17:54:13] 
[17:54:13] Assembly optimizations on if available.
[17:54:13] Entering M.D.
[17:54:20] Mapping NT from 24 to 24 
[17:54:22] Completed 0 out of 250000 steps  (0%)
[18:07:23] Completed 2500 out of 250000 steps  (1%)
[18:10:50]  NT from 24 to 24 
[18:11:01] Resuming from checkpoint
[18:11:01] Verified work/wudata_00.log
[18:11:02] Verified work/wudata_00.trr
[18:11:02] Verified work/wudata_00.xtc
[18:11:02] Verified work/wudata_00.edr
[18:11:03] Completed 2885 out of 250000 steps  (1%)
[18:22:16] Completed 5000 out of 250000 steps  (2%)
[18:35:22] Completed 7500 out of 250000 steps  (3%)
[18:48:27] Completed 10000 out of 250000 steps  (4%)
[19:01:33] Completed 12500 out of 250000 steps  (5%)
[19:14:39] Completed 15000 out of 250000 steps  (6%)
[19:26:47] - Couldn't send HTTP request to server
[19:26:47] + Could not connect to Work Server (results)
[19:26:47]     (130.237.232.237:8080)
[19:26:47] + Retrying using alternative port
[19:26:50] - Couldn't send HTTP request to server
[19:26:50] + Could not connect to Work Server (results)
[19:26:50]     (130.237.232.237:80)
[19:26:50] - Error: Could not transmit unit 09 (completed April 22) to work server.
[19:26:50]   Keeping unit 09 in queue.
[19:27:44] Completed 17500 out of 250000 steps  (7%)

At least i have work for the next 24 hrs to chew on, my other rig has spent the day doing standard smp,i was going to switch it back but i think i will leave it for now.

Folding Forum

130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance

Re: 130.237.232.237 going down for maintenance