Page 2 of 2

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Tue Mar 29, 2011 4:39 am
by bruce
But you're not running as service, so it doesn't matter.

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Wed Apr 27, 2011 6:43 pm
by 7im
Let's back up a step here. 63-99 error is usually a permissions problem, especially when seen at client startup. The client can't start folding because it lacks access to some part of the folding setup. So let's recheck the basics. Tried various drivers versions... didn't help. Has the right client version, v6.41, Next...

Atom, please post the Target: and StartIn: settings from the shortcut you are using to launch the Systray client.

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Sun May 01, 2011 6:55 pm
by Atom
Thanks, 7im, the bit about the error being related to permissions is news to me. Here's the info you requested:

target "C:\Program Files (x86)\Folding@home-gpu\Folding@home.exe" -gpu 0

start in "C:\Users\David\AppData\Roaming\Folding@home-gpu"

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Tue May 03, 2011 8:45 pm
by Atom
I actually don't think it's permission-related. I have another three-way machine with 450s in it that has been running like a top for a while. Each GPU has completed more than 100 WU (as many as 124) and yesterday entered an EUE pause. Today each card started throwing "Corestatus = 63 (99)" errors. I can't see how permissions were fine yesterday and not fine today, when nobody has touched the machine at all.

Drivers didn't change overnight. Permissions didn't change overnight. The only thing that changed were the work units. I was actually watching the machine as two GPUs completed work units, uploaded their results, then downloaded new work units. THAT'S when they threw the error. After completing 124 work units, I don't think it's a generic driver issue. It seems like the core either wasn't shut down correctly, or wasn't started correctly.

Code: Select all

[16:32:39]
[16:32:39] Successful run
[16:32:39] DynamicWrapper: Finished Work Unit: sleep=10000
[16:32:49] Reserved 2474436 bytes for xtc file; Cosm status=0
[16:32:49] Allocated 2474436 bytes for xtc file
[16:32:49] - Reading up to 2474436 from "work/wudata_03.xtc": Read 2474436
[16:32:49] Read 2474436 bytes from xtc file; available packet space=783956028
[16:32:49] xtc file hash check passed.
[16:32:49] Reserved 76680 76680 783956028 bytes for arc file=<work/wudata_03.trr> Cosm status=0
[16:32:49] Allocated 76680 bytes for arc file
[16:32:49] - Reading up to 76680 from "work/wudata_03.trr": Read 76680
[16:32:49] Read 76680 bytes from arc file; available packet space=783879348
[16:32:49] trr file hash check passed.
[16:32:49] Allocated 544 bytes for edr file
[16:32:49] Read bedfile
[16:32:49] edr file hash check passed.
[16:32:49] Allocated 120122 bytes for logfile
[16:32:49] Read logfile
[16:32:49] GuardedRun: success in DynamicWrapper
[16:32:49] GuardedRun: done
[16:32:49] Run: GuardedRun completed.
[16:32:53] + Opened results file
[16:32:53] - Writing 2672294 bytes of core data to disk...
[16:32:54] Done: 2671782 -> 2514050 (compressed to 94.0 percent)
[16:32:54]   ... Done.
[16:32:54] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[16:32:54] Shutting down core
[16:32:54]
[16:32:54] Folding@home Core Shutdown: FINISHED_UNIT
[16:32:57] CoreStatus = 64 (100)
[16:32:57] Sending work to server
[16:32:57] Project: 6801 (Run 8362, Clone 2, Gen 9)
[16:32:57] - Read packet limit of 540015616... Set to 524286976.


[16:32:57] + Attempting to send results [May 3 16:32:57 UTC]
[16:32:57] Gpu type=3 species=0.
[16:33:01] + Results successfully sent
[16:33:01] Thank you for your contribution to Folding@Home.
[16:33:01] + Number of Units Completed: 362

[16:33:05] - Preparing to get new work unit...
[16:33:05] Cleaning up work directory
[16:33:05] + Attempting to get work packet
[16:33:05] Passkey found
[16:33:05] Gpu type=3 species=0.
[16:33:05] - Connecting to assignment server
[16:33:06] - Successful: assigned to (171.64.65.64).
[16:33:06] + News From Folding@Home: Welcome to Folding@Home
[16:33:06] Loaded queue successfully.
[16:33:06] Gpu type=3 species=0.
[16:33:07] + Closed connections
[16:33:07]
[16:33:07] + Processing work unit
[16:33:07] Core required: FahCore_15.exe
[16:33:07] Core found.
[16:33:07] Working on queue slot 04 [May 3 16:33:07 UTC]
[16:33:07] + Working ...
[16:33:07]
[16:33:07] *------------------------------*
[16:33:07] Folding@Home GPU Core
[16:33:07] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[16:33:07]
[16:33:07] Build host: SimbiosNvdWin7
[16:33:07] Board Type: NVIDIA/CUDA
[16:33:07] Core      : x=15
[16:33:07]  Window's signal control handler registered.
[16:33:07] Preparing to commence simulation
[16:33:07] - Looking at optimizations...
[16:33:07] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[16:33:07] - Created dyn
[16:33:07] - Files status OK
[16:33:07] sizeof(CORE_PACKET_HDR) = 512 file=<>
[16:33:07] - Expanded 43680 -> 171827 (decompressed 393.3 percent)
[16:33:07] Called DecompressByteArray: compressed_data_size=43680 data_size=171827, decompressed_data_size=171827 diff=0
[16:33:07] - Digital signature verified
[16:33:07]
[16:33:07] Project: 6801 (Run 9748, Clone 2, Gen 9)
[16:33:07]
[16:33:07] Assembly optimizations on if available.
[16:33:07] Entering M.D.
[16:33:09] Tpr hash work/wudata_04.tpr:  1433012669 2811342351 2985677414 1924824265 3721637216
[16:33:09] Working on ALZHEIMER'S DISEASE AMYLOID
[16:33:09] Client config found, loading data.
[16:33:09] Starting GUI Server
[16:33:09] Setting checkpoint frequency: 500000
[16:33:09] Setting checkpoint frequency: 500000
[16:35:28] Completed    500000 out of 50000000 steps (1%).
[16:37:48] Completed   1000000 out of 50000000 steps (2%).
[16:40:07] Completed   1500000 out of 50000000 steps (3%).
[16:42:27] Completed   2000000 out of 50000000 steps (4%).
[16:44:46] Completed   2500000 out of 50000000 steps (5%).
[16:47:05] Completed   3000000 out of 50000000 steps (6%).
[16:49:25] Completed   3500000 out of 50000000 steps (7%).
[16:51:44] Completed   4000000 out of 50000000 steps (8%).
[16:54:04] Completed   4500000 out of 50000000 steps (9%).
[16:56:23] Completed   5000000 out of 50000000 steps (10%).
[16:58:42] Completed   5500000 out of 50000000 steps (11%).
[17:01:02] Completed   6000000 out of 50000000 steps (12%).
[17:03:21] Completed   6500000 out of 50000000 steps (13%).
[17:05:41] Completed   7000000 out of 50000000 steps (14%).
[17:08:00] Completed   7500000 out of 50000000 steps (15%).
[17:10:20] Completed   8000000 out of 50000000 steps (16%).
[17:12:39] Completed   8500000 out of 50000000 steps (17%).
[17:14:59] Completed   9000000 out of 50000000 steps (18%).
[17:17:18] Completed   9500000 out of 50000000 steps (19%).
[17:19:38] Completed  10000000 out of 50000000 steps (20%).
[17:21:57] Completed  10500000 out of 50000000 steps (21%).
[17:24:16] Completed  11000000 out of 50000000 steps (22%).
[17:26:36] Completed  11500000 out of 50000000 steps (23%).
[17:28:55] Completed  12000000 out of 50000000 steps (24%).
[17:31:15] Completed  12500000 out of 50000000 steps (25%).
[17:33:34] Completed  13000000 out of 50000000 steps (26%).
[17:35:54] Completed  13500000 out of 50000000 steps (27%).
[17:38:13] Completed  14000000 out of 50000000 steps (28%).
[17:40:32] Completed  14500000 out of 50000000 steps (29%).
[17:42:52] Completed  15000000 out of 50000000 steps (30%).
[17:45:11] Completed  15500000 out of 50000000 steps (31%).
[17:47:31] Completed  16000000 out of 50000000 steps (32%).
[17:49:50] Completed  16500000 out of 50000000 steps (33%).
[17:52:10] Completed  17000000 out of 50000000 steps (34%).
[17:54:29] Completed  17500000 out of 50000000 steps (35%).
[17:56:48] Completed  18000000 out of 50000000 steps (36%).
[17:59:08] Completed  18500000 out of 50000000 steps (37%).
[18:01:27] Completed  19000000 out of 50000000 steps (38%).
[18:03:47] Completed  19500000 out of 50000000 steps (39%).
[18:06:06] Completed  20000000 out of 50000000 steps (40%).
[18:08:25] Completed  20500000 out of 50000000 steps (41%).
[18:10:45] Completed  21000000 out of 50000000 steps (42%).
[18:13:04] Completed  21500000 out of 50000000 steps (43%).
[18:15:24] Completed  22000000 out of 50000000 steps (44%).
[18:17:43] Completed  22500000 out of 50000000 steps (45%).
[18:20:02] Completed  23000000 out of 50000000 steps (46%).
[18:22:22] Completed  23500000 out of 50000000 steps (47%).
[18:24:41] Completed  24000000 out of 50000000 steps (48%).
[18:27:01] Completed  24500000 out of 50000000 steps (49%).
[18:29:20] Completed  25000000 out of 50000000 steps (50%).
[18:31:40] Completed  25500000 out of 50000000 steps (51%).
[18:33:59] Completed  26000000 out of 50000000 steps (52%).
[18:36:18] Completed  26500000 out of 50000000 steps (53%).
[18:38:38] Completed  27000000 out of 50000000 steps (54%).
[18:40:57] Completed  27500000 out of 50000000 steps (55%).
[18:43:17] Completed  28000000 out of 50000000 steps (56%).
[18:45:36] Completed  28500000 out of 50000000 steps (57%).
[18:47:55] Completed  29000000 out of 50000000 steps (58%).
[18:50:15] Completed  29500000 out of 50000000 steps (59%).
[18:52:34] Completed  30000000 out of 50000000 steps (60%).
[18:54:54] Completed  30500000 out of 50000000 steps (61%).
[18:57:13] Completed  31000000 out of 50000000 steps (62%).
[18:59:32] Completed  31500000 out of 50000000 steps (63%).
[19:01:52] Completed  32000000 out of 50000000 steps (64%).
[19:04:11] Completed  32499999 out of 50000000 steps (65%).
[19:06:31] Completed  32999999 out of 50000000 steps (66%).
[19:08:50] Completed  33499999 out of 50000000 steps (67%).
[19:11:10] Completed  33999999 out of 50000000 steps (68%).
[19:13:29] Completed  34499999 out of 50000000 steps (69%).
[19:15:48] Completed  34999999 out of 50000000 steps (70%).
[19:18:08] Completed  35499999 out of 50000000 steps (71%).
[19:20:27] Completed  35999999 out of 50000000 steps (72%).
[19:22:47] Completed  36499999 out of 50000000 steps (73%).
[19:25:07] Completed  36999999 out of 50000000 steps (74%).
[19:27:26] Completed  37499999 out of 50000000 steps (75%).
[19:29:45] Completed  37999999 out of 50000000 steps (76%).
[19:32:05] Completed  38499999 out of 50000000 steps (77%).
[19:34:24] Completed  38999999 out of 50000000 steps (78%).
[19:36:44] Completed  39499999 out of 50000000 steps (79%).
[19:39:03] Completed  39999999 out of 50000000 steps (80%).
[19:41:22] Completed  40499999 out of 50000000 steps (81%).
[19:43:42] Completed  40999999 out of 50000000 steps (82%).
[19:46:01] Completed  41499999 out of 50000000 steps (83%).
[19:48:21] Completed  41999999 out of 50000000 steps (84%).
[19:50:40] Completed  42499999 out of 50000000 steps (85%).
[19:53:00] Completed  42999999 out of 50000000 steps (86%).
[19:55:19] Completed  43499999 out of 50000000 steps (87%).
[19:57:38] Completed  43999999 out of 50000000 steps (88%).
[19:59:58] Completed  44499999 out of 50000000 steps (89%).
[20:02:17] Completed  44999999 out of 50000000 steps (90%).
[20:04:37] Completed  45499999 out of 50000000 steps (91%).
[20:06:56] Completed  45999999 out of 50000000 steps (92%).
[20:09:15] Completed  46499999 out of 50000000 steps (93%).
[20:11:35] Completed  46999999 out of 50000000 steps (94%).
[20:13:54] Completed  47499999 out of 50000000 steps (95%).
[20:16:14] Completed  47999999 out of 50000000 steps (96%).
[20:18:33] Completed  48499999 out of 50000000 steps (97%).
[20:20:53] Completed  48999999 out of 50000000 steps (98%).
[20:23:15] Completed  49499999 out of 50000000 steps (99%).
[20:25:36] Completed  49999999 out of 50000000 steps (100%).
[20:25:38] Finished fah_main
[20:25:38]
[20:25:38] Successful run
[20:25:38] DynamicWrapper: Finished Work Unit: sleep=10000
[20:25:48] Reserved 2476332 bytes for xtc file; Cosm status=0
[20:25:48] Allocated 2476332 bytes for xtc file
[20:25:48] - Reading up to 2476332 from "work/wudata_04.xtc": Read 2476332
[20:25:48] Read 2476332 bytes from xtc file; available packet space=783954132
[20:25:48] xtc file hash check passed.
[20:25:48] Reserved 76680 76680 783954132 bytes for arc file=<work/wudata_04.trr> Cosm status=0
[20:25:48] Allocated 76680 bytes for arc file
[20:25:48] - Reading up to 76680 from "work/wudata_04.trr": Read 76680
[20:25:48] Read 76680 bytes from arc file; available packet space=783877452
[20:25:48] trr file hash check passed.
[20:25:48] Allocated 544 bytes for edr file
[20:25:48] Read bedfile
[20:25:48] edr file hash check passed.
[20:25:48] Allocated 120122 bytes for logfile
[20:25:48] Read logfile
[20:25:48] GuardedRun: success in DynamicWrapper
[20:25:48] GuardedRun: done
[20:25:48] Run: GuardedRun completed.
[20:25:53] + Opened results file
[20:25:53] - Writing 2674190 bytes of core data to disk...
[20:25:54] Done: 2673678 -> 2516326 (compressed to 94.1 percent)
[20:25:54]   ... Done.
[20:25:54] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[20:25:54] Shutting down core
[20:25:54]
[20:25:54] Folding@home Core Shutdown: FINISHED_UNIT
[20:25:57] CoreStatus = 64 (100)
[20:25:57] Sending work to server
[20:25:57] Project: 6801 (Run 9748, Clone 2, Gen 9)
[20:25:57] - Read packet limit of 540015616... Set to 524286976.


[20:25:57] + Attempting to send results [May 3 20:25:57 UTC]
[20:25:57] Gpu type=3 species=0.
[20:26:03] + Results successfully sent
[20:26:03] Thank you for your contribution to Folding@Home.
[20:26:03] + Number of Units Completed: 363

[20:26:07] - Preparing to get new work unit...
[20:26:07] Cleaning up work directory
[20:26:07] + Attempting to get work packet
[20:26:07] Passkey found
[20:26:07] Gpu type=3 species=0.
[20:26:07] - Connecting to assignment server
[20:26:07] - Successful: assigned to (171.64.65.64).
[20:26:07] + News From Folding@Home: Welcome to Folding@Home
[20:26:08] Loaded queue successfully.
[20:26:08] Gpu type=3 species=0.
[20:26:09] + Closed connections
[20:26:09]
[20:26:09] + Processing work unit
[20:26:09] Core required: FahCore_15.exe
[20:26:09] Core found.
[20:26:09] Working on queue slot 05 [May 3 20:26:09 UTC]
[20:26:09] + Working ...
[20:26:09]
[20:26:09] *------------------------------*
[20:26:09] Folding@Home GPU Core
[20:26:09] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[20:26:09]
[20:26:09] Build host: SimbiosNvdWin7
[20:26:09] Board Type: NVIDIA/CUDA
[20:26:09] Core      : x=15
[20:26:09]  Window's signal control handler registered.
[20:26:09] Preparing to commence simulation
[20:26:09] - Looking at optimizations...
[20:26:09] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[20:26:09] - Created dyn
[20:26:09] - Files status OK
[20:26:09] sizeof(CORE_PACKET_HDR) = 512 file=<>
[20:26:09] - Expanded 43661 -> 171827 (decompressed 393.5 percent)
[20:26:09] Called DecompressByteArray: compressed_data_size=43661 data_size=171827, decompressed_data_size=171827 diff=0
[20:26:09] - Digital signature verified
[20:26:09]
[20:26:09] Project: 6801 (Run 6018, Clone 3, Gen 9)
[20:26:09]
[20:26:09] Assembly optimizations on if available.
[20:26:09] Entering M.D.
[20:26:11] Tpr hash work/wudata_05.tpr:  1500319053 2112754324 4111607144 1091663483 2920748839
[20:26:11] Working on ALZHEIMER'S DISEASE AMYLOID
[20:26:11] Client config found, loading data.
[20:26:15] CoreStatus = 63 (99)
[20:26:15] + Error starting Folding@home core.
[20:26:20]
[20:26:20] + Processing work unit
[20:26:20] Core required: FahCore_15.exe
[20:26:20] Core found.
[20:26:20] Working on queue slot 05 [May 3 20:26:20 UTC]
[20:26:20] + Working ...
[20:26:20]
[20:26:20] *------------------------------*
[20:26:20] Folding@Home GPU Core
[20:26:20] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[20:26:20]
[20:26:20] Build host: SimbiosNvdWin7
[20:26:20] Board Type: NVIDIA/CUDA
[20:26:20] Core      : x=15
[20:26:20]  Window's signal control handler registered.
[20:26:20] Preparing to commence simulation
[20:26:20] - Ensuring status. Please wait.
[20:26:30] - Looking at optimizations...
[20:26:30] - Working with standard loops on this execution.
[20:26:30] - Previous termination of core was improper.
[20:26:30] - Files status OK
[20:26:30] sizeof(CORE_PACKET_HDR) = 512 file=<>
[20:26:30] - Expanded 43661 -> 171827 (decompressed 393.5 percent)
[20:26:30] Called DecompressByteArray: compressed_data_size=43661 data_size=171827, decompressed_data_size=171827 diff=0
[20:26:30] - Digital signature verified
[20:26:30]
[20:26:30] Project: 6801 (Run 6018, Clone 3, Gen 9)
[20:26:30]
[20:26:30] Entering M.D.
[20:26:32] Tpr hash work/wudata_05.tpr:  1500319053 2112754324 4111607144 1091663483 2920748839
[20:26:32] Working on ALZHEIMER'S DISEASE AMYLOID
[20:26:32] Client config found, loading data.
[20:26:34] CoreStatus = 63 (99)
[20:26:34] + Error starting Folding@home core.
[20:26:39]
[20:26:39] + Processing work unit
[20:26:39] Core required: FahCore_15.exe
[20:26:39] Core found.
[20:26:39] Working on queue slot 05 [May 3 20:26:39 UTC]
[20:26:39] + Working ...
[20:26:39]
[20:26:39] *------------------------------*
[20:26:39] Folding@Home GPU Core
[20:26:39] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[20:26:39]
[20:26:39] Build host: SimbiosNvdWin7
[20:26:39] Board Type: NVIDIA/CUDA
[20:26:39] Core      : x=15
[20:26:39]  Window's signal control handler registered.
[20:26:39] Preparing to commence simulation
[20:26:39] - Ensuring status. Please wait.
[20:26:49] - Looking at optimizations...
[20:26:49] - Working with standard loops on this execution.
[20:26:49] - Previous termination of core was improper.
[20:26:49] - Going to use standard loops.
[20:26:49] - Files status OK
[20:26:49] sizeof(CORE_PACKET_HDR) = 512 file=<>
[20:26:49] - Expanded 43661 -> 171827 (decompressed 393.5 percent)
[20:26:49] Called DecompressByteArray: compressed_data_size=43661 data_size=171827, decompressed_data_size=171827 diff=0
[20:26:49] - Digital signature verified
[20:26:49]
[20:26:49] Project: 6801 (Run 6018, Clone 3, Gen 9)
[20:26:49]
[20:26:49] Entering M.D.
[20:26:51] Tpr hash work/wudata_05.tpr:  1500319053 2112754324 4111607144 1091663483 2920748839
[20:26:51] Working on ALZHEIMER'S DISEASE AMYLOID
[20:26:51] Client config found, loading data.
[20:26:54] CoreStatus = 63 (99)
[20:26:54] + Error starting Folding@home core.
[20:26:54] - Attempting to download new core...
[20:26:54] + Downloading new core: FahCore_15.exe
[20:26:54] + 10240 bytes downloaded
[20:26:54] + 20480 bytes downloaded
[20:26:54] + 30720 bytes downloaded
[20:26:54] + 40960 bytes downloaded

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Tue May 03, 2011 9:38 pm
by bruce
It's possible that Project: 6801 (Run 6018, Clone 3, Gen 9) is a bad WU but that's difficult to establish. I see no evidence of the client making an error report to the server and there are no other reports regarding this WU.

It's strange, though, because you did get credit for the other WU

Hi Atom (team 37726),
Your WU (P6801 R4184 C3 G31) was added to the stats database on 2011-03-26 15:05:32 for 1348 points of credit.

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Tue May 03, 2011 10:27 pm
by 7im
Atom wrote:I actually don't think it's permission-related. I have another three-way machine with 450s in it that has been running like a top for a while. Each GPU has completed more than 100 WU (as many as 124) and yesterday entered an EUE pause. Today each card started throwing "Corestatus = 63 (99)" errors. I can't see how permissions were fine yesterday and not fine today, when nobody has touched the machine at all.

Drivers didn't change overnight. Permissions didn't change overnight. The only thing that changed were the work units. I was actually watching the machine as two GPUs completed work units, uploaded their results, then downloaded new work units. THAT'S when they threw the error. After completing 124 work units, I don't think it's a generic driver issue. It seems like the core either wasn't shut down correctly, or wasn't started correctly.
Actually, you just answered your own question there at the end.

Permissions aren't always directly user account related. Many times it's file related. As you said, fahcore didn't stop or start correctly, likely causing a work unit data file corruption. And when the client "can't read the WU data file" it can't tell the different between a locked file, and a corrupted file. The end result is the exact same error. 63/99 ;)

Stop the client, delete the work folder, queue.dat file, unitinfo.txt file, and start the client again. Fixed?

Re: GPU client: "Error starting Folding@Home core" then slee

Posted: Wed May 04, 2011 12:06 am
by Atom
Yeah, I cleaned up the offending data files, then simply restarted and it's off to the races again.

Re: Project: 6801 (Run 6018, Clone 3, Gen 9)

Posted: Wed May 04, 2011 1:14 am
by bruce
I'm going to change the title and make this about what might be a bad WU. That way if others have trouble with the same WU, they'll find this report.

Re: Project: 6801 (Run 6018, Clone 3, Gen 9)

Posted: Wed May 04, 2011 12:44 pm
by jtktam
I had just ran into this problem last night.

I will get the offending WU ids, it was causing all sorts of headaches for me until I found this post.. I thought it was my setup :(

-joe

Re: Project: 6801 (Run 6018, Clone 3, Gen 9)

Posted: Wed May 04, 2011 4:17 pm
by bruce
jtktam wrote:I had just ran into this problem last night.

I will get the offending WU ids, it was causing all sorts of headaches for me until I found this post.. I thought it was my setup :(

-joe
Same exact WU?
Project: 6801 (Run 6018, Clone 3, Gen 9)?