130.237.232.237 going down for maintenance

Moderators: Site Moderators, FAHC Science Team

bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 130.237.232.237 going down for maintenance

Post by bollix47 »

A bigadv just finished using v6 and again the a5 core would not shutdown:

Code: Select all

[22:25:44] Project: 6901 (Run 11, Clone 13, Gen 84)
[22:25:44] 
[22:25:44] Assembly optimizations on if available.
[22:25:44] Entering M.D.
[22:25:51] Mapping NT from 64 to 64 
[22:25:55] Completed 0 out of 250000 steps  (0%)
[22:33:19] Completed 2500 out of 250000 steps  (1%)
[22:40:55] Completed 5000 out of 250000 steps  (2%)
[22:48:51] Completed 7500 out of 250000 steps  (3%)
[22:56:36] Completed 10000 out of 250000 steps  (4%)
[23:04:14] Completed 12500 out of 250000 steps  (5%)
[23:11:56] Completed 15000 out of 250000 steps  (6%)
[23:19:37] Completed 17500 out of 250000 steps  (7%)
[23:27:21] Completed 20000 out of 250000 steps  (8%)
[23:34:57] Completed 22500 out of 250000 steps  (9%)
[23:42:31] Completed 25000 out of 250000 steps  (10%)
[23:50:11] Completed 27500 out of 250000 steps  (11%)
[23:57:49] Completed 30000 out of 250000 steps  (12%)
[00:05:21] Completed 32500 out of 250000 steps  (13%)
[00:13:00] Completed 35000 out of 250000 steps  (14%)
[00:20:32] Completed 37500 out of 250000 steps  (15%)
[00:28:10] Completed 40000 out of 250000 steps  (16%)
[00:35:37] Completed 42500 out of 250000 steps  (17%)
[00:43:15] Completed 45000 out of 250000 steps  (18%)
[00:50:40] Completed 47500 out of 250000 steps  (19%)
[00:58:17] Completed 50000 out of 250000 steps  (20%)
[01:05:57] Completed 52500 out of 250000 steps  (21%)
[01:13:28] Completed 55000 out of 250000 steps  (22%)
[01:21:08] Completed 57500 out of 250000 steps  (23%)
[01:29:02] Completed 60000 out of 250000 steps  (24%)
[01:36:34] Completed 62500 out of 250000 steps  (25%)
[01:44:13] Completed 65000 out of 250000 steps  (26%)
[01:51:56] Completed 67500 out of 250000 steps  (27%)
[01:59:37] Completed 70000 out of 250000 steps  (28%)
[02:07:13] Completed 72500 out of 250000 steps  (29%)
[02:14:52] Completed 75000 out of 250000 steps  (30%)
[02:22:39] Completed 77500 out of 250000 steps  (31%)
[02:30:24] Completed 80000 out of 250000 steps  (32%)
[02:37:59] Completed 82500 out of 250000 steps  (33%)
[02:45:35] Completed 85000 out of 250000 steps  (34%)
[02:53:13] Completed 87500 out of 250000 steps  (35%)
[03:00:57] Completed 90000 out of 250000 steps  (36%)
[03:08:36] Completed 92500 out of 250000 steps  (37%)
[03:16:18] Completed 95000 out of 250000 steps  (38%)
[03:24:03] Completed 97500 out of 250000 steps  (39%)
[03:31:41] Completed 100000 out of 250000 steps  (40%)
[03:39:19] Completed 102500 out of 250000 steps  (41%)
[03:46:54] Completed 105000 out of 250000 steps  (42%)
[03:54:35] Completed 107500 out of 250000 steps  (43%)
[04:02:15] Completed 110000 out of 250000 steps  (44%)
[04:09:52] Completed 112500 out of 250000 steps  (45%)
[04:17:32] Completed 115000 out of 250000 steps  (46%)
[04:24:43] - Autosending finished units... [April 21 04:24:43 UTC]
[04:24:43] Trying to send all finished work units
[04:24:43] + No unsent completed units remaining.
[04:24:43] - Autosend completed
[04:25:33] Completed 117500 out of 250000 steps  (47%)
[04:33:26] Completed 120000 out of 250000 steps  (48%)
[04:41:02] Completed 122500 out of 250000 steps  (49%)
[04:48:51] Completed 125000 out of 250000 steps  (50%)
[04:56:25] Completed 127500 out of 250000 steps  (51%)
[05:04:13] Completed 130000 out of 250000 steps  (52%)
[05:11:48] Completed 132500 out of 250000 steps  (53%)
[05:19:27] Completed 135000 out of 250000 steps  (54%)
[05:27:01] Completed 137500 out of 250000 steps  (55%)
[05:34:36] Completed 140000 out of 250000 steps  (56%)
[05:42:09] Completed 142500 out of 250000 steps  (57%)
[05:49:43] Completed 145000 out of 250000 steps  (58%)
[05:57:14] Completed 147500 out of 250000 steps  (59%)
[06:04:47] Completed 150000 out of 250000 steps  (60%)
[06:12:23] Completed 152500 out of 250000 steps  (61%)
[06:20:13] Completed 155000 out of 250000 steps  (62%)
[06:27:51] Completed 157500 out of 250000 steps  (63%)
[06:35:24] Completed 160000 out of 250000 steps  (64%)
[06:43:00] Completed 162500 out of 250000 steps  (65%)
[06:50:53] Completed 165000 out of 250000 steps  (66%)
[06:58:38] Completed 167500 out of 250000 steps  (67%)
[07:06:16] Completed 170000 out of 250000 steps  (68%)
[07:13:47] Completed 172500 out of 250000 steps  (69%)
[07:21:20] Completed 175000 out of 250000 steps  (70%)
[07:29:02] Completed 177500 out of 250000 steps  (71%)
[07:36:43] Completed 180000 out of 250000 steps  (72%)
[07:44:18] Completed 182500 out of 250000 steps  (73%)
[07:52:00] Completed 185000 out of 250000 steps  (74%)
[08:00:03] Completed 187500 out of 250000 steps  (75%)
[08:07:43] Completed 190000 out of 250000 steps  (76%)
[08:15:19] Completed 192500 out of 250000 steps  (77%)
[08:22:52] Completed 195000 out of 250000 steps  (78%)
[08:30:28] Completed 197500 out of 250000 steps  (79%)
[08:38:03] Completed 200000 out of 250000 steps  (80%)
[08:45:47] Completed 202500 out of 250000 steps  (81%)
[08:53:22] Completed 205000 out of 250000 steps  (82%)
[09:01:00] Completed 207500 out of 250000 steps  (83%)
[09:08:33] Completed 210000 out of 250000 steps  (84%)
[09:16:07] Completed 212500 out of 250000 steps  (85%)
[09:23:51] Completed 215000 out of 250000 steps  (86%)
[09:31:26] Completed 217500 out of 250000 steps  (87%)
[09:39:04] Completed 220000 out of 250000 steps  (88%)
[09:46:51] Completed 222500 out of 250000 steps  (89%)
[09:54:24] Completed 225000 out of 250000 steps  (90%)
[10:02:13] Completed 227500 out of 250000 steps  (91%)
[10:09:48] Completed 230000 out of 250000 steps  (92%)
[10:17:24] Completed 232500 out of 250000 steps  (93%)
[10:24:43] - Autosending finished units... [April 21 10:24:43 UTC]
[10:24:43] Trying to send all finished work units
[10:24:43] + No unsent completed units remaining.
[10:24:43] - Autosend completed
[10:25:05] Completed 235000 out of 250000 steps  (94%)
[10:32:46] Completed 237500 out of 250000 steps  (95%)
[10:40:36] Completed 240000 out of 250000 steps  (96%)
[10:48:12] Completed 242500 out of 250000 steps  (97%)
[10:55:46] Completed 245000 out of 250000 steps  (98%)
[11:03:28] Completed 247500 out of 250000 steps  (99%)
[11:11:04] Completed 250000 out of 250000 steps  (100%)
[11:11:18] DynamicWrapper: Finished Work Unit: sleep=10000
[11:11:28] 
[11:11:28] Finished Work Unit:
[11:11:28] - Reading up to 52713120 from "work/wudata_03.trr": Read 52713120
[11:11:28] trr file hash check passed.
[11:11:28] - Reading up to 47103644 from "work/wudata_03.xtc": Read 47103644
[11:11:29] xtc file hash check passed.
[11:11:29] edr file hash check passed.
[11:11:29] logfile size: 195676
[11:11:29] Leaving Run
[11:11:32] - Writing 100182388 bytes of core data to disk...
[11:11:34]   ... Done.

Had to kill a5 core before the log would continue and lost the WU


[11:21:41] - Shutting down core
[11:21:41] 
[11:21:41] Folding@home Core Shutdown: FINISHED_UNIT
[11:21:53] CoreStatus = 89 (137)
[11:21:53] Client-core communications error: ERROR 0x89
[11:21:53] Deleting current work unit & continuing...
[11:22:23] ***** Got an Activate signal (2)
[11:22:24] - Warning: Could not delete all work unit files (3): Core file absent
[11:22:24] Trying to send all finished work units
[11:22:24] + No unsent completed units remaining.
[11:22:24] - Preparing to get new work unit...
[11:22:24] Cleaning up work directory
[11:22:24] + Attempting to get work packet
[11:22:24] Passkey found
[11:22:24] - Will indicate memory of 32170 MB
[11:22:24] - Connecting to assignment server
[11:22:24] Connecting to http://assign.stanford.edu:8080/
[11:22:24] Killing all core threads

Folding@Home Client Shutdown.
So my only choices at this point are to do regular SMP or go back to Ubuntu 11.10 which appears to be working fine on another bigadv pc. For the next few hours I'll do some regular units in case there's something you would like me to try. After that I'll load 11.10.

Edit: a3 core shutting down and WU uploading as expected.
It would appear that Ubuntu 12.04 is not ready for prime time yet even though the release is expected soon. Got an error that said something like apport-gtk crashed when it was trying to report an error concerning samba.
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 130.237.232.237 going down for maintenance

Post by bollix47 »

Update:

Removing Ubuntu 12.04 and installing 11.10 appears to have cleared the problem of the core not shutting down.

Code: Select all

[11:44:30] Leaving Run
[11:44:33] - Writing 100104588 bytes of core data to disk...
[11:44:35]   ... Done.
[11:44:48] - Shutting down core
[11:44:48] 
[11:44:48] Folding@home Core Shutdown: FINISHED_UNIT
[11:44:49] CoreStatus = 64 (100)
[11:44:49] Unit 2 finished with 91 percent of time to deadline remaining.
[11:44:49] Updated performance fraction: 0.941052
[11:44:49] Sending work to server
[11:44:49] Project: 6901 (Run 5, Clone 4, Gen 191)


[11:44:49] + Attempting to send results [April 22 11:44:49 UTC]
[11:44:49] - Reading file work/wuresults_02.dat from core
[11:44:49]   (Read 100104588 bytes from disk)
[11:44:49] Connecting to http://130.237.232.237:8080/
However, after going thru the motions of uploading, the client never received acknowlegement from server .237 and after almost an hour of waiting I stopped the client and switched to a regular smp. Upon restart the client sent the results of the 6901 again but after the usual time there was no completion message. The server does appear to have a higher than usual net load so I could be the victim of coincidence. :e?:
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: 130.237.232.237 going down for maintenance

Post by kasson »

Hmm--does your client try port 80? Does that work?
I'm letting the work server developer know about this. At the moment, I'm not finding enough information in our logs to debug on my end.
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 130.237.232.237 going down for maintenance

Post by bollix47 »

No the v6 client did not try port 80 on .237, only 8080. It has now tried 3 or 4 times and always the same result, just sitting there waiting for acknowlegement.

The a3 WUs are uploading fine to other servers, also port 8080.

My browser is working fine so no problem on port 80.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 130.237.232.237 going down for maintenance

Post by Grandpa_01 »

There appears to be a problem with the server I have sent 6901 (13, 7, 150) 3 times this morning and the server never acknowledges it. I know it is sending by watching my bandwidth usage. After it completes the sending the assignment server never sends another 1 and the connection to the AS appears to be very slow short burst of 92bytes/s
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 130.237.232.237 going down for maintenance

Post by bollix47 »

Thanks for the confirmation Grandpa, I have little hair left to pull out. :lol:
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 130.237.232.237 going down for maintenance

Post by Grandpa_01 »

I switched the computer over to smp and no problem getting a WU the problem appears to be on the server end. And I have now sent the above mentioned WU 4 times with no acknowledgement from the server. :eo
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 130.237.232.237 going down for maintenance

Post by bollix47 »

I've switched over to a3 as well but when they finish I have to stop the client and send it manually using ./fah6 -send NN otherwise the client hangs and nothing gets sent. IIRC it depends on the position of the WU in the queue.
overdoze
Posts: 4
Joined: Sat May 08, 2010 3:17 pm

Re: 130.237.232.237 going down for maintenance

Post by overdoze »

I too have 2 WU on the sending queue for several hours.
Will try swiching to a3
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 130.237.232.237 going down for maintenance

Post by Grandpa_01 »

Kasson will you shut the server down I have to manually shut fah down after it sends a completed smp wu to the smp collection server because fah re sends the bigadv wu that it is holding in q then it just sits and waits for a response from the cs that it has received the bigadv wu until I shut it down then it will download another smp wu upon restart.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
schro
Posts: 1
Joined: Tue Feb 14, 2012 6:41 pm

Re: 130.237.232.237 going down for maintenance

Post by schro »

My 4p box has been trying to connect to .237 for a WU since early this morning.. I think it at least posted its last WU back successfully though..
mflanaga
Posts: 10
Joined: Mon Jun 06, 2011 4:35 pm

Re: 130.237.232.237 going down for maintenance

Post by mflanaga »

Same problem here... can't get any response from 130.237.237.237 all day.
Prelude514
Posts: 19
Joined: Sun Feb 05, 2012 5:19 am
Location: Montreal, Canada

Re: 130.237.232.237 going down for maintenance

Post by Prelude514 »

Same issue here. I've uploaded a 6903 weighing 227MB 3 times now, with no success. Once upload is complete, nothing happens. Have a machine idling now.
sbinh
Posts: 14
Joined: Mon Feb 04, 2008 4:28 am

Re: 130.237.232.237 going down for maintenance

Post by sbinh »

Grandpa_01 wrote:There appears to be a problem with the server I have sent 6901 (13, 7, 150) 3 times this morning and the server never acknowledges it. I know it is sending by watching my bandwidth usage. After it completes the sending the assignment server never sends another 1 and the connection to the AS appears to be very slow short burst of 92bytes/s

Same here ... Having issue with 6901 on 2p rig right now ...

Code: Select all

[19:09:21] Sending work to server
[19:09:21] Project: 6901 (Run 9, Clone 3, Gen 255)


[19:09:21] + Attempting to send results [April 22 19:09:21 UTC]
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:8080)
[19:09:21] + Retrying using alternative port
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:80)
[19:09:21] - Error: Could not transmit unit 01 (completed April 22) to work server.
[19:09:21]   Keeping unit 01 in queue.
[19:09:21] Project: 6901 (Run 9, Clone 3, Gen 255)


[19:09:21] + Attempting to send results [April 22 19:09:21 UTC]
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:8080)
[19:09:21] + Retrying using alternative port
[19:09:21] - Couldn't send HTTP request to server
[19:09:21] + Could not connect to Work Server (results)
[19:09:21]     (130.237.232.237:80)
[19:09:21] - Error: Could not transmit unit 01 (completed April 22) to work server.
[19:09:21]   Keeping unit 01 in queue.
[19:09:21] - Preparing to get new work unit...
[19:09:21] Cleaning up work directory
[19:09:21] + Attempting to get work packet
[19:09:21] Passkey found
[19:09:21] - Connecting to assignment server
[19:09:21] - Successful: assigned to (130.237.232.237).
[19:09:21] + News From Folding@Home: Welcome to Folding@Home
[19:09:21] Loaded queue successfully.

Nathan_p had issue with 6904 and I had issue with 6903 WUs ... see here: viewtopic.php?f=19&p=213277#p213277.
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: 130.237.232.237 going down for maintenance

Post by Nathan_P »

Just had issues uploading a 6904, it tried to sendfor 45 minutes but i got the "couldn't get a http response from server", then it appeared to upload but crashed the client as in the meantime it had managed to download a 6901, restarted the client and it has just resent the same WU, spent 90 minutes uploading the WU only to get the following in the log:

Code: Select all

 [17:54:11] Project: 6904 (Run 1, Clone 18, Gen 105)
[17:54:11] + Processing work unit


[17:54:11] Core required: FahCore_a5.exe
[17:54:11] + Attempting to send results [April 22 17:54:11 UTC]
[17:54:11] Core found.
[17:54:11] Working on queue slot 00 [April 22 17:54:11 UTC]
[17:54:11] + Working ...
[17:54:11] 
[17:54:11] *------------------------------*
[17:54:11] Folding@Home Gromacs SMP Core
[17:54:11] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[17:54:11] 
[17:54:11] Preparing to commence simulation
[17:54:11] - Looking at optimizations...
[17:54:11] - Created dyn
[17:54:11] - Files status OK
[17:54:13] - Expanded 24876005 -> 30796292 (decompressed 123.7 percent)
[17:54:13] Called DecompressByteArray: compressed_data_size=24876005 data_size=30796292, decompressed_data_size=30796292 diff=0
[17:54:13] - Digital signature verified
[17:54:13] 
[17:54:13] Project: 6901 (Run 16, Clone 6, Gen 195)
[17:54:13] 
[17:54:13] Assembly optimizations on if available.
[17:54:13] Entering M.D.
[17:54:20] Mapping NT from 24 to 24 
[17:54:22] Completed 0 out of 250000 steps  (0%)
[18:07:23] Completed 2500 out of 250000 steps  (1%)
[18:10:50]  NT from 24 to 24 
[18:11:01] Resuming from checkpoint
[18:11:01] Verified work/wudata_00.log
[18:11:02] Verified work/wudata_00.trr
[18:11:02] Verified work/wudata_00.xtc
[18:11:02] Verified work/wudata_00.edr
[18:11:03] Completed 2885 out of 250000 steps  (1%)
[18:22:16] Completed 5000 out of 250000 steps  (2%)
[18:35:22] Completed 7500 out of 250000 steps  (3%)
[18:48:27] Completed 10000 out of 250000 steps  (4%)
[19:01:33] Completed 12500 out of 250000 steps  (5%)
[19:14:39] Completed 15000 out of 250000 steps  (6%)
[19:26:47] - Couldn't send HTTP request to server
[19:26:47] + Could not connect to Work Server (results)
[19:26:47]     (130.237.232.237:8080)
[19:26:47] + Retrying using alternative port
[19:26:50] - Couldn't send HTTP request to server
[19:26:50] + Could not connect to Work Server (results)
[19:26:50]     (130.237.232.237:80)
[19:26:50] - Error: Could not transmit unit 09 (completed April 22) to work server.
[19:26:50]   Keeping unit 09 in queue.
[19:27:44] Completed 17500 out of 250000 steps  (7%)
At least i have work for the next 24 hrs to chew on, my other rig has spent the day doing standard smp,i was going to switch it back but i think i will leave it for now.
Image
Post Reply