Page 1 of 1

128.143.199.97 reject status

Posted: Thu Mar 01, 2012 1:04 am
by mplee73
What I had thought were work unit issues, is just an issue with this particular server. It looks like it's had a status of reject, for the last 9 hours.

All of my 75xx units failed to upload to this machine and went to a collection server instead.

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 3:57 am
by bruce
I've notified the owner of the server.

Please post the segment of the log showing a WU failing to upload to that server, followed by a successful upload to the Collection Server. FAH is replacing the former non-functional Collection Server code with code that actually works but it's a gradual process that has to be coordinated with changes to the code on the primary Work Server, too.

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 4:58 am
by mplee73
Thanks. Here's the log from one machine. The other machines have similar messages. It fails a couple of times to the work server, then is picked up by the collection server, which reports a problem with the unit.


Code: Select all

 [10:40:53] + Processing work unit
[10:40:53] Core required: FahCore_a3.exe
[10:40:53] Core found.
[10:40:53] Working on queue slot 05 [February 29 10:40:53 UTC]
[10:40:53] + Working ...
[10:40:53] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 8 -checkpoint 15 -service -verbose -lifeline 2004 -version 634'

[10:40:53] 
[10:40:53] *------------------------------*
[10:40:53] Folding@Home Gromacs SMP Core
[10:40:53] Version 2.27 (Dec. 15, 2010)
[10:40:53] 
[10:40:53] Preparing to commence simulation
[10:40:53] - Looking at optimizations...
[10:40:53] - Created dyn
[10:40:53] - Files status OK
[10:40:53] - Expanded 1765831 -> 2700832 (decompressed 152.9 percent)
[10:40:53] Called DecompressByteArray: compressed_data_size=1765831 data_size=2700832, decompressed_data_size=2700832 diff=0
[10:40:53] - Digital signature verified
[10:40:53] 
[10:40:53] Project: 7504 (Run 0, Clone 11, Gen 233)
[10:40:53] 
[10:40:53] Assembly optimizations on if available.
[10:40:53] Entering M.D.
[10:40:59] Mapping NT from 8 to 8 
[10:40:59] Completed 0 out of 500000 steps  (0%)
[10:44:58] Completed 5000 out of 500000 steps  (1%)
[10:48:56] Completed 10000 out of 500000 steps  (2%)
[10:52:52] Completed 15000 out of 500000 steps  (3%)
[10:56:51] Completed 20000 out of 500000 steps  (4%)
[11:00:53] Completed 25000 out of 500000 steps  (5%)
[11:04:51] Completed 30000 out of 500000 steps  (6%)
[11:08:48] Completed 35000 out of 500000 steps  (7%)
[11:12:45] Completed 40000 out of 500000 steps  (8%)
[11:16:43] Completed 45000 out of 500000 steps  (9%)
[11:20:40] Completed 50000 out of 500000 steps  (10%)
[11:24:39] Completed 55000 out of 500000 steps  (11%)
[11:25:00] - Autosending finished units... [February 29 11:25:00 UTC]
[11:25:00] Trying to send all finished work units
[11:25:00] + No unsent completed units remaining.
[11:25:00] - Autosend completed
[11:28:39] Completed 60000 out of 500000 steps  (12%)
[11:32:50] Completed 65000 out of 500000 steps  (13%)
[11:36:49] Completed 70000 out of 500000 steps  (14%)
[11:40:48] Completed 75000 out of 500000 steps  (15%)
[11:44:46] Completed 80000 out of 500000 steps  (16%)
[11:48:43] Completed 85000 out of 500000 steps  (17%)
[11:52:40] Completed 90000 out of 500000 steps  (18%)
[11:56:38] Completed 95000 out of 500000 steps  (19%)
[12:00:35] Completed 100000 out of 500000 steps  (20%)
[12:04:33] Completed 105000 out of 500000 steps  (21%)
[12:08:29] Completed 110000 out of 500000 steps  (22%)
[12:12:25] Completed 115000 out of 500000 steps  (23%)
[12:16:22] Completed 120000 out of 500000 steps  (24%)
[12:20:19] Completed 125000 out of 500000 steps  (25%)
[12:24:17] Completed 130000 out of 500000 steps  (26%)
[12:28:13] Completed 135000 out of 500000 steps  (27%)
[12:32:10] Completed 140000 out of 500000 steps  (28%)
[12:36:08] Completed 145000 out of 500000 steps  (29%)
[12:40:05] Completed 150000 out of 500000 steps  (30%)
[12:44:03] Completed 155000 out of 500000 steps  (31%)
[12:48:00] Completed 160000 out of 500000 steps  (32%)
[12:51:55] Completed 165000 out of 500000 steps  (33%)
[12:55:53] Completed 170000 out of 500000 steps  (34%)
[12:59:49] Completed 175000 out of 500000 steps  (35%)
[13:03:46] Completed 180000 out of 500000 steps  (36%)
[13:07:43] Completed 185000 out of 500000 steps  (37%)
[13:11:39] Completed 190000 out of 500000 steps  (38%)
[13:15:37] Completed 195000 out of 500000 steps  (39%)
[13:19:33] Completed 200000 out of 500000 steps  (40%)
[13:23:29] Completed 205000 out of 500000 steps  (41%)
[13:27:28] Completed 210000 out of 500000 steps  (42%)
[13:31:24] Completed 215000 out of 500000 steps  (43%)
[13:35:22] Completed 220000 out of 500000 steps  (44%)
[13:39:19] Completed 225000 out of 500000 steps  (45%)
[13:43:15] Completed 230000 out of 500000 steps  (46%)
[13:47:12] Completed 235000 out of 500000 steps  (47%)
[13:51:09] Completed 240000 out of 500000 steps  (48%)
[13:55:06] Completed 245000 out of 500000 steps  (49%)
[13:59:02] Completed 250000 out of 500000 steps  (50%)
[14:02:58] Completed 255000 out of 500000 steps  (51%)
[14:06:55] Completed 260000 out of 500000 steps  (52%)
[14:10:52] Completed 265000 out of 500000 steps  (53%)
[14:14:49] Completed 270000 out of 500000 steps  (54%)
[14:18:46] Completed 275000 out of 500000 steps  (55%)
[14:22:43] Completed 280000 out of 500000 steps  (56%)
[14:26:43] Completed 285000 out of 500000 steps  (57%)
[14:30:41] Completed 290000 out of 500000 steps  (58%)
[14:34:39] Completed 295000 out of 500000 steps  (59%)
[14:38:38] Completed 300000 out of 500000 steps  (60%)
[14:42:36] Completed 305000 out of 500000 steps  (61%)
[14:46:35] Completed 310000 out of 500000 steps  (62%)
[14:50:32] Completed 315000 out of 500000 steps  (63%)
[14:54:30] Completed 320000 out of 500000 steps  (64%)
[14:58:27] Completed 325000 out of 500000 steps  (65%)
[15:02:27] Completed 330000 out of 500000 steps  (66%)
[15:06:24] Completed 335000 out of 500000 steps  (67%)
[15:10:20] Completed 340000 out of 500000 steps  (68%)
[15:14:17] Completed 345000 out of 500000 steps  (69%)
[15:18:15] Completed 350000 out of 500000 steps  (70%)
[15:22:11] Completed 355000 out of 500000 steps  (71%)
[15:26:10] Completed 360000 out of 500000 steps  (72%)
[15:30:13] Completed 365000 out of 500000 steps  (73%)
[15:34:11] Completed 370000 out of 500000 steps  (74%)
[15:38:09] Completed 375000 out of 500000 steps  (75%)
[15:42:05] Completed 380000 out of 500000 steps  (76%)
[15:46:04] Completed 385000 out of 500000 steps  (77%)
[15:50:44] Completed 390000 out of 500000 steps  (78%)
[15:54:59] Completed 395000 out of 500000 steps  (79%)
[15:59:11] Completed 400000 out of 500000 steps  (80%)
[16:03:38] Completed 405000 out of 500000 steps  (81%)
[16:07:48] Completed 410000 out of 500000 steps  (82%)
[16:11:50] Completed 415000 out of 500000 steps  (83%)
[16:15:57] Completed 420000 out of 500000 steps  (84%)
[16:19:59] Completed 425000 out of 500000 steps  (85%)
[16:24:01] Completed 430000 out of 500000 steps  (86%)
[16:28:03] Completed 435000 out of 500000 steps  (87%)
[16:32:10] Completed 440000 out of 500000 steps  (88%)
[16:36:16] Completed 445000 out of 500000 steps  (89%)
[16:40:58] Completed 450000 out of 500000 steps  (90%)
[16:45:30] Completed 455000 out of 500000 steps  (91%)
[16:50:14] Completed 460000 out of 500000 steps  (92%)
[16:54:16] Completed 465000 out of 500000 steps  (93%)
[16:58:35] Completed 470000 out of 500000 steps  (94%)
[17:02:52] Completed 475000 out of 500000 steps  (95%)
[17:07:02] Completed 480000 out of 500000 steps  (96%)
[17:11:04] Completed 485000 out of 500000 steps  (97%)
[17:15:18] Completed 490000 out of 500000 steps  (98%)
[17:19:34] Completed 495000 out of 500000 steps  (99%)
[17:24:08] Completed 500000 out of 500000 steps  (100%)
[17:24:09] DynamicWrapper: Finished Work Unit: sleep=10000
[17:24:19] 
[17:24:19] Finished Work Unit:
[17:24:19] - Reading up to 5030784 from "work/wudata_05.trr": Read 5030784
[17:24:19] trr file hash check passed.
[17:24:19] - Reading up to 5406544 from "work/wudata_05.xtc": Read 5406544
[17:24:19] xtc file hash check passed.
[17:24:19] edr file hash check passed.
[17:24:19] logfile size: 326142
[17:24:19] Leaving Run
[17:24:20] - Writing 10795874 bytes of core data to disk...
[17:24:22] Done: 10795362 -> 10210724 (compressed to 94.5 percent)
[17:24:22]   ... Done.
[17:24:26] - Shutting down core
[17:24:26] 
[17:24:26] Folding@home Core Shutdown: FINISHED_UNIT
[17:24:28] CoreStatus = 64 (100)
[17:24:28] Unit 5 finished with 95 percent of time to deadline remaining.
[17:24:28] Updated performance fraction: 0.946056
[17:24:28] Sending work to server
[17:24:28] Project: 7504 (Run 0, Clone 11, Gen 233)


[17:24:28] + Attempting to send results [February 29 17:24:28 UTC]
[17:24:28] - Reading file work/wuresults_05.dat from core
[17:24:28]   (Read 10211236 bytes from disk)
[17:24:28] Connecting to http://128.143.199.97:8080/
[17:24:30] - Couldn't send HTTP request to server
[17:24:30] + Could not connect to Work Server (results)
[17:24:30]     (128.143.199.97:8080)
[17:24:30] + Retrying using alternative port
[17:24:30] Connecting to http://128.143.199.97:80/
[17:24:31] - Couldn't send HTTP request to server
[17:24:31] + Could not connect to Work Server (results)
[17:24:31]     (128.143.199.97:80)
[17:24:31] - Error: Could not transmit unit 05 (completed February 29) to work server.
[17:24:31] - 1 failed uploads of this unit.
[17:24:31]   Keeping unit 05 in queue.
[17:24:31] Trying to send all finished work units
[17:24:31] Project: 7504 (Run 0, Clone 11, Gen 233)


[17:24:31] + Attempting to send results [February 29 17:24:31 UTC]
[17:24:31] - Reading file work/wuresults_05.dat from core
[17:24:31]   (Read 10211236 bytes from disk)
[17:24:31] Connecting to http://128.143.199.97:8080/
[17:24:33] - Couldn't send HTTP request to server
[17:24:33] + Could not connect to Work Server (results)
[17:24:33]     (128.143.199.97:8080)
[17:24:33] + Retrying using alternative port
[17:24:33] Connecting to http://128.143.199.97:80/
[17:24:34] - Couldn't send HTTP request to server
[17:24:34] + Could not connect to Work Server (results)
[17:24:34]     (128.143.199.97:80)
[17:24:34] - Error: Could not transmit unit 05 (completed February 29) to work server.
[17:24:34] - 2 failed uploads of this unit.


[17:24:34] + Attempting to send results [February 29 17:24:34 UTC]
[17:24:34] - Reading file work/wuresults_05.dat from core
[17:24:34]   (Read 10211236 bytes from disk)
[17:24:34] Connecting to http://128.143.231.202:8080/
[17:25:00] - Autosending finished units... [February 29 17:25:00 UTC]
[17:25:00] Trying to send all finished work units
[17:25:00] - Already sending work
[17:25:00] + Sent 0 of 1 completed units to the server
[17:25:00] - Autosend completed
[17:26:36] Posted data.
[17:26:36] Initial: 0000; - Uploaded at ~81 kB/s
[17:26:36] - Averaged speed for that direction ~86 kB/s
[17:26:36] - Server reports problem with unit.
[17:26:36]   Successfully sent unit 05 to Collection server.
[17:26:36] + Sent 1 of 1 completed units to the server
[17:26:36] - Preparing to get new work unit... 

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 10:52 am
by ArVee
I had the same issue, same servers, about 12 hrs. ago.

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 4:39 pm
by kasson
Thanks for the heads-up; the server should be up and running now. Let us know if any of the returned WU's fail to show up (give them 3-4 hours, but then please do post).

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 5:22 pm
by mplee73
Thanks Professor Kasson, I'll check my stats later on today. I think I had about 20 units that got submitted to the collection server instead.

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 11:13 pm
by mplee73
I think I'm still missing the credit for the units turned into the collection server. Below is the list of what I think are the units turned in during the downtime.

7500 (0,16,485)
7504 (0,11,233)
7504 (11,0,103)
7504 (0,95,152)
7504 (0,117,115)
7504 (19,2,99)
7505 (0,45,45)
7506 (0,86,150)
7506 (0,169,103)
7506 (0,138,103)
7507 (0,86,163)
7507 (0,2,181)
7507 (0,163,95)
7507 (0,138,101)
7508 (0,85,159)
7508 (0,38,159)
7508 (0,17,168)
7510 (0,91,90)
7510 (0,106,127)
7510 (0,7,215)
7510 (0,83,137)
7510 (0,9,199)
7510 (0,137,108)
7510 (0,62,153)
7511 (0,135,132)
7511 (0,167,66)

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 11:34 pm
by kasson
Unfortunately there appears to be a client / collection server discrepancy. The collection server did not in fact accept the returns, as best I can tell it was not configured to do so. For reasons that I don't yet understand, the client apparently decided that the CS did accept the return.

Re: 128.143.199.97 reject status

Posted: Thu Mar 01, 2012 11:50 pm
by mplee73
I'm guessing since the client thinks it sent succesfully, there's no way to force it to resend the units?

Re: 128.143.199.97 reject status

Posted: Fri Mar 02, 2012 12:24 am
by bruce
When a client successfully sends a WU, the local copy is deleted and removed from the queue. There's nothing left to send.