Page 1 of 1

Project: 6503 (Run 3, Clone 254, Gen 53)

Posted: Thu Aug 26, 2010 11:39 pm
by DrSpalding
On one of the remote Linux boxes I have FAH running on, I finally noted that it was continuing to retry the above project. It gets to 40% and quits with a "Client-core communications error: ERROR 0x0" every time and then sometimes downloads a new core (Fah_core78) and tries again. I have seen it try 18 times now since 15 August.

Here is the first of the many failure log files:

Code: Select all

--- Opening Log file [August 15 16:44:12]


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /var/www/html/folding
Executable: ./fah6
Arguments: -oneunit

[16:44:12] - Ask before connecting: No
[16:44:12] - User name: DrSpalding (Team 48083)
[16:44:12] - User ID: 1F23550B520F01BE
[16:44:12] - Machine ID: 1
[16:44:12]
[16:44:12] Loaded queue successfully.
[16:44:12] - Preparing to get new work unit...
[16:44:12] + Attempting to get work packet
[16:44:12] - Connecting to assignment server
[16:44:20] - Successful: assigned to (171.64.65.62).
[16:44:20] + News From Folding@Home: Welcome to Folding@Home
[16:44:20] Loaded queue successfully.
[16:47:29] - Couldn't send HTTP request to server
[16:47:29] + Could not connect to Work Server
[16:47:29] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[16:47:35] + Attempting to get work packet
[16:47:35] - Connecting to assignment server
[16:47:35] - Successful: assigned to (171.64.65.62).
[16:47:35] + News From Folding@Home: Welcome to Folding@Home
[16:47:35] Loaded queue successfully.
[16:47:44] + Closed connections
[16:47:44]
[16:47:44] + Processing work unit
[16:47:44] Core required: FahCore_78.exe
[16:47:44] Core found.
[16:47:44] Working on Unit 08 [August 15 16:47:44]
[16:47:44] + Working ...
[16:47:45]
[16:47:45] *------------------------------*
[16:47:45] Folding@Home Gromacs Core
[16:47:45] Version 1.90 (March 8, 2006)
[16:47:45]
[16:47:45] Preparing to commence simulation
[16:47:45] - Looking at optimizations...
[16:47:45] - Created dyn
[16:47:45] - Files status OK
[16:47:45] - Expanded 516503 -> 2531073 (decompressed 490.0 percent)
[16:47:45] - Starting from initial work packet
[16:47:45]
[16:47:45] Project: 6503 (Run 3, Clone 254, Gen 53)
[16:47:45]
[16:47:45] Assembly optimizations on if available.
[16:47:45] Entering M.D.
[16:47:52] Protein: TR462_B_4 in water
[16:47:52]
[16:47:52] Writing local files
[16:47:52] Extra SSE boost OK.
[16:47:52] Writing local files
[16:47:53] Completed 0 out of 250000 steps  (0%)
[17:07:40] Writing local files
[17:07:40] Completed 2500 out of 250000 steps  (1%)
[17:27:27] Writing local files
[17:27:27] Completed 5000 out of 250000 steps  (2%)
[17:47:12] Writing local files
[17:47:12] Completed 7500 out of 250000 steps  (3%)
[18:06:58] Writing local files
[18:06:59] Completed 10000 out of 250000 steps  (4%)
[18:26:45] Writing local files
[18:26:45] Completed 12500 out of 250000 steps  (5%)
[18:46:32] Writing local files
[18:46:32] Completed 15000 out of 250000 steps  (6%)
[19:06:19] Writing local files
[19:06:19] Completed 17500 out of 250000 steps  (7%)
[19:26:07] Writing local files
[19:26:08] Completed 20000 out of 250000 steps  (8%)
[19:45:57] Writing local files
[19:45:57] Completed 22500 out of 250000 steps  (9%)
[20:05:46] Writing local files
[20:05:46] Completed 25000 out of 250000 steps  (10%)
[20:25:35] Writing local files
[20:25:35] Completed 27500 out of 250000 steps  (11%)
[20:45:23] Writing local files
[20:45:23] Completed 30000 out of 250000 steps  (12%)
[21:05:09] Writing local files
[21:05:09] Completed 32500 out of 250000 steps  (13%)
[21:24:55] Writing local files
[21:24:55] Completed 35000 out of 250000 steps  (14%)
[21:44:40] Writing local files
[21:44:40] Completed 37500 out of 250000 steps  (15%)
[22:04:25] Writing local files
[22:04:25] Completed 40000 out of 250000 steps  (16%)
[22:24:12] Writing local files
[22:24:12] Completed 42500 out of 250000 steps  (17%)
[22:43:59] Writing local files
[22:43:59] Completed 45000 out of 250000 steps  (18%)
[23:03:45] Writing local files
[23:03:45] Completed 47500 out of 250000 steps  (19%)
[23:23:29] Writing local files
[23:23:29] Completed 50000 out of 250000 steps  (20%)
[23:43:15] Writing local files
[23:43:15] Completed 52500 out of 250000 steps  (21%)
[00:03:02] Writing local files
[00:03:02] Completed 55000 out of 250000 steps  (22%)
[00:22:48] Writing local files
[00:22:48] Completed 57500 out of 250000 steps  (23%)
[00:42:35] Writing local files
[00:42:35] Completed 60000 out of 250000 steps  (24%)
[01:02:22] Writing local files
[01:02:22] Completed 62500 out of 250000 steps  (25%)
[01:22:10] Writing local files
[01:22:10] Completed 65000 out of 250000 steps  (26%)
[01:41:57] Writing local files
[01:41:57] Completed 67500 out of 250000 steps  (27%)
[02:01:43] Writing local files
[02:01:43] Completed 70000 out of 250000 steps  (28%)
[02:21:29] Writing local files
[02:21:29] Completed 72500 out of 250000 steps  (29%)
[02:41:14] Writing local files
[02:41:14] Completed 75000 out of 250000 steps  (30%)
[03:00:58] Writing local files
[03:00:58] Completed 77500 out of 250000 steps  (31%)
[03:20:45] Writing local files
[03:20:46] Completed 80000 out of 250000 steps  (32%)
[03:40:31] Writing local files
[03:40:31] Completed 82500 out of 250000 steps  (33%)
[04:00:18] Writing local files
[04:00:18] Completed 85000 out of 250000 steps  (34%)
[04:20:05] Writing local files
[04:20:05] Completed 87500 out of 250000 steps  (35%)
[04:39:53] Writing local files
[04:39:53] Completed 90000 out of 250000 steps  (36%)
[04:59:41] Writing local files
[04:59:41] Completed 92500 out of 250000 steps  (37%)
[05:19:27] Writing local files
[05:19:27] Completed 95000 out of 250000 steps  (38%)
[05:39:14] Writing local files
[05:39:14] Completed 97500 out of 250000 steps  (39%)
[05:59:00] Writing local files
[05:59:00] Completed 100000 out of 250000 steps  (40%)
[06:10:17] CoreStatus = 0 (0)
[06:10:17] Client-core communications error: ERROR 0x0
[06:10:17] Deleting current work unit & continuing...
[06:10:35] - Preparing to get new work unit...
[06:10:35] + Attempting to get work packet
[06:10:35] - Connecting to assignment server
[06:10:35] - Successful: assigned to (171.64.65.62).
[06:10:35] + News From Folding@Home: Welcome to Folding@Home
[06:10:35] Loaded queue successfully.
[06:10:44] + Closed connections
[06:10:49]
[06:10:49] + Processing work unit
[06:10:49] Core required: FahCore_78.exe
[06:10:49] Core found.
[06:10:49] Working on Unit 09 [August 16 06:10:49]
[06:10:49] + Working ...
[06:10:49]
[06:10:49] *------------------------------*
[06:10:49] Folding@Home Gromacs Core
[06:10:49] Version 1.90 (March 8, 2006)
[06:10:49]
[06:10:49] Preparing to commence simulation
[06:10:49] - Looking at optimizations...
[06:10:49] - Created dyn
[06:10:49] - Files status OK
[06:10:50] - Expanded 516503 -> 2531073 (decompressed 490.0 percent)
[06:10:50] - Starting from initial work packet
[06:10:50]
[06:10:50] Project: 6503 (Run 3, Clone 254, Gen 53)
[06:10:50]
[06:10:50] Assembly optimizations on if available.
[06:10:50] Entering M.D.
[06:10:56] Protein: TR462_B_4 in water
[06:10:56]
[06:10:56] Writing local files
[06:10:57] Extra SSE boost OK.
[06:10:57] Writing local files
[06:10:58] Completed 0 out of 250000 steps  (0%)
[06:30:27] Writing local files
[06:30:27] Completed 2500 out of 250000 steps  (1%)
[06:49:55] Writing local files

etc., etc., etc.
I have deleted the work files and the queue.dat to try to get it restarted on a different WU. Is it a bad WU or bad configuration on my end?

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Posted: Fri Aug 27, 2010 12:51 am
by bruce
Thank you for your report.

The WU (P6503,R3,C254,G53) has been reported as a bad WU.

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Posted: Fri Aug 27, 2010 1:57 am
by DrSpalding
Thanks Bruce. Unfortunately, it looks like it is still out there in the wild because after a clearing out of queue.dat and work/*, it still picked it up at 18:33 PDT after 10 attempts of trying to contact AS get a WU. I'll wait it out again to ensure it gets to 40% and dies again before killing the queue.dat and work/* and restarting the client.

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Posted: Fri Aug 27, 2010 2:06 am
by sortofageek
I looked at it and see no recent feedback about it. FWIW, I reported it again.

Could you post a log, please. Are you sure it is exactly this WU? Project: 6503 (Run 3, Clone 254, Gen 53)

Could it possibly be a different WU with the same project number, but different run, clone, gen numbers?

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Posted: Fri Aug 27, 2010 3:02 am
by DrSpalding
It was the same WU. Here is the relevant log, FWIW.

Code: Select all

--- Opening Log file [August 26 22:23:58]


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /var/www/html/folding
Executable: ./fah6
Arguments: -oneunit

[22:23:58] - Ask before connecting: No
[22:23:58] - User name: DrSpalding (Team 48083)
[22:23:58] - User ID: 1F23550B520F01BE
[22:23:58] - Machine ID: 1
[22:23:58]
[22:23:58] Could not open work queue, generating new queue...
[22:23:58] - Preparing to get new work unit...
[22:23:58] + Attempting to get work packet
[22:23:58] - Connecting to assignment server
[22:23:58] + No appropriate work server was available; will try again in a bit.
[22:23:58] + Couldn't get work instructions.
[22:23:58] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[22:24:09] + Attempting to get work packet
[22:24:09] - Connecting to assignment server
[22:27:18] - Couldn't send HTTP request to server
[22:27:18] + Could not connect to Assignment Server
[22:27:18] + No appropriate work server was available; will try again in a bit.
[22:27:18] + Couldn't get work instructions.
[22:27:18] - Attempt #2  to get work failed, and no other work to do.
             Waiting before retry.
[22:27:32] + Attempting to get work packet
[22:27:32] - Connecting to assignment server
[22:27:32] + No appropriate work server was available; will try again in a bit.
[22:27:32] + Couldn't get work instructions.
[22:27:32] - Attempt #3  to get work failed, and no other work to do.
             Waiting before retry.
[22:28:03] + Attempting to get work packet
[22:28:03] - Connecting to assignment server
[22:29:36] + No appropriate work server was available; will try again in a bit.
[22:29:36] + Couldn't get work instructions.
[22:29:36] - Attempt #4  to get work failed, and no other work to do.
             Waiting before retry.
[22:30:19] + Attempting to get work packet
[22:30:19] - Connecting to assignment server
[22:31:52] + No appropriate work server was available; will try again in a bit.
[22:31:52] + Couldn't get work instructions.
[22:31:52] - Attempt #5  to get work failed, and no other work to do.
             Waiting before retry.
[22:33:21] + Attempting to get work packet
[22:33:21] - Connecting to assignment server
[22:34:06] + No appropriate work server was available; will try again in a bit.
[22:34:06] + Couldn't get work instructions.
[22:34:06] - Attempt #6  to get work failed, and no other work to do.
             Waiting before retry.
[22:36:58] + Attempting to get work packet
[22:36:58] - Connecting to assignment server
[22:36:58] + No appropriate work server was available; will try again in a bit.
[22:36:58] + Couldn't get work instructions.
[22:36:58] - Attempt #7  to get work failed, and no other work to do.
             Waiting before retry.
[22:42:31] + Attempting to get work packet
[22:42:31] - Connecting to assignment server
[22:42:31] + No appropriate work server was available; will try again in a bit.
[22:42:31] + Couldn't get work instructions.
[22:42:31] - Attempt #8  to get work failed, and no other work to do.
             Waiting before retry.
[22:53:17] + Attempting to get work packet
[22:53:17] - Connecting to assignment server
[22:53:18] + No appropriate work server was available; will try again in a bit.
[22:53:18] + Couldn't get work instructions.
[22:53:18] - Attempt #9  to get work failed, and no other work to do.
             Waiting before retry.
[23:14:50] + Attempting to get work packet
[23:14:50] - Connecting to assignment server
[23:14:51] + No appropriate work server was available; will try again in a bit.
[23:14:51] + Couldn't get work instructions.
[23:14:51] - Attempt #10  to get work failed, and no other work to do.
             Waiting before retry.
[23:57:34] + Attempting to get work packet
[23:57:34] - Connecting to assignment server
[23:57:35] + No appropriate work server was available; will try again in a bit.
[23:57:35] + Couldn't get work instructions.
[23:57:35] - Attempt #11  to get work failed, and no other work to do.
             Waiting before retry.
[00:45:40] + Attempting to get work packet
[00:45:40] - Connecting to assignment server
[00:45:40] + No appropriate work server was available; will try again in a bit.
[00:45:40] + Couldn't get work instructions.
[00:45:40] - Attempt #12  to get work failed, and no other work to do.
             Waiting before retry.
[01:33:45] + Attempting to get work packet
[01:33:45] - Connecting to assignment server
[01:33:45] - Successful: assigned to (171.64.65.62).
[01:33:45] + News From Folding@Home: Welcome to Folding@Home
[01:33:45] Loaded queue successfully.
[01:33:55] + Closed connections
[01:33:55]
[01:33:55] + Processing work unit
[01:33:55] Core required: FahCore_78.exe
[01:33:55] Core found.
[01:33:55] Working on Unit 01 [August 27 01:33:55]
[01:33:55] + Working ...
[01:33:55]
[01:33:55] *------------------------------*
[01:33:55] Folding@Home Gromacs Core
[01:33:55] Version 1.90 (March 8, 2006)
[01:33:55]
[01:33:55] Preparing to commence simulation
[01:33:55] - Looking at optimizations...
[01:33:55] - Created dyn
[01:33:55] - Files status OK
[01:33:55] - Expanded 516503 -> 2531073 (decompressed 490.0 percent)
[01:33:55] - Starting from initial work packet
[01:33:55]
[01:33:55] Project: 6503 (Run 3, Clone 254, Gen 53)
[01:33:55]
[01:33:55] Assembly optimizations on if available.
[01:33:55] Entering M.D.
[01:34:02] Protein: TR462_B_4 in water
[01:34:02]
[01:34:02] Writing local files
[01:34:02] Extra SSE boost OK.
[01:34:02] Writing local files
[01:34:03] Completed 0 out of 250000 steps  (0%)
[01:54:35] Writing local files
[01:54:35] Completed 2500 out of 250000 steps  (1%)
[02:15:06] Writing local files
[02:15:06] Completed 5000 out of 250000 steps  (2%)
[02:35:37] Writing local files
[02:35:37] Completed 7500 out of 250000 steps  (3%)


Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Posted: Fri Aug 27, 2010 3:41 pm
by 7im
It looks like after a WU is reported as bad, it may take a little while for that info to propagate out to the work servers so the WU isn't sent out again.