This WU seemed to hang at 64% for several hours.
Normally the GPU will process 1% of a WU in about 5 minutes and is supposed to checkpoint every 15 minutes.
After 90 minutes, I didn't see any evidence of progress or checkpoints.
I then shutdown and restarted the client multiple times (each time waiting 1.5-2 hours) and saw the same lack of progress.
Am I jumping to the conclusion that it's hung prematurely, or if it's truely hung, how can I tell if it is a WU or GPU issue?
GPU: ATI Radeon 4870
Catalyst Version: 09.11
Windows XP SP3
- --- Opening Log file [September 11 08:47:58 UTC] 
 [08:47:58]
 [08:47:58] Loaded queue successfully.
 [08:47:58] Initialization complete
 [08:47:58]
 [08:47:58] + Processing work unit
 [08:47:58] Core required: FahCore_11.exe
 [08:47:58] Core found.
 [08:47:58] Working on queue slot 04 [September 11 08:47:58 UTC]
 [08:47:58] + Working ...
 [08:47:58]
 [08:47:58] *------------------------------*
 [08:47:58] Folding@Home GPU Core - Beta
 [08:47:58] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
 [08:47:58]
 [08:47:58] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
 [08:47:58] Build host: amoeba
 [08:47:58] Board Type: AMD
 [08:47:58] Core :
 [08:47:58] Preparing to commence simulation
 [08:47:58] - Ensuring status. Please wait.
 [08:48:07] - Looking at optimizations...
 [08:48:07] - Working with standard loops on this execution.
 [08:48:07] - Previous termination of core was improper.
 [08:48:07] - Files status OK
 [08:48:07] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
 [08:48:07] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
 [08:48:07] - Digital signature verified
 [08:48:07]
 [08:48:07] Project: 5736 (Run 3, Clone 515, Gen 119)
 [08:48:07]
 [08:48:07] Entering M.D.
 [08:48:13] Will resume from checkpoint file
 [08:48:13] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
 [08:48:14] Working on Protein
 [08:48:14] Client config found, loading data.
 [08:48:14] Starting GUI Server
 [08:48:19] Resuming from checkpoint
 [08:48:19] fcCheckPointResume: retreived and current tpr file hash:
 [08:48:19] 0 1445190852 1445190852
 [08:48:19] 1 3527609112 3527609112
 [08:48:19] 2 2623324236 2623324236
 [08:48:19] 3 1655012693 1655012693
 [08:48:19] 4 199698481 199698481
 [08:48:19] Verified work/wudata_04.log
 [08:48:19] Verified work/wudata_04.edr
 [08:48:19] Verified work/wudata_04.xtc
 [08:48:19] Completed 15%
 
 ---snip---
 
 [13:25:35] Completed 63%
 [13:30:13] Completed 64%
 [14:47:58] + Working...
 
 !!! 1 hours 45 minutes no progress
 
 Folding@Home Client Shutdown.
 
 
 --- Opening Log file [September 11 15:18:53 UTC]
 
 
 [15:18:53]
 [15:18:53] Loaded queue successfully.
 [15:18:53] Initialization complete
 [15:18:53]
 [15:18:53] + Processing work unit
 [15:18:53] Core required: FahCore_11.exe
 [15:18:53] Core found.
 [15:18:53] Working on queue slot 04 [September 11 15:18:53 UTC]
 [15:18:53] + Working ...
 [15:18:54]
 [15:18:54] *------------------------------*
 [15:18:54] Folding@Home GPU Core - Beta
 [15:18:54] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
 [15:18:54]
 [15:18:54] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
 [15:18:54] Build host: amoeba
 [15:18:54] Board Type: AMD
 [15:18:54] Core :
 [15:18:54] Preparing to commence simulation
 [15:18:54] - Looking at optimizations...
 [15:18:54] - Files status OK
 [15:18:54] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
 [15:18:54] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
 [15:18:54] - Digital signature verified
 [15:18:54]
 [15:18:54] Project: 5736 (Run 3, Clone 515, Gen 119)
 [15:18:54]
 [15:19:01] Assembly optimizations on if available.
 [15:19:01] Entering M.D.
 [15:19:15] Will resume from checkpoint file
 [15:19:15] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
 [15:19:28] Working on Protein
 [15:20:01] Client config found, loading data.
 [15:20:05] Starting GUI Server
 [15:25:59] Resuming from checkpoint
 [15:25:59] fcCheckPointResume: retreived and current tpr file hash:
 [15:25:59] 0 1445190852 1445190852
 [15:25:59] 1 3527609112 3527609112
 [15:25:59] 2 2623324236 2623324236
 [15:25:59] 3 1655012693 1655012693
 [15:25:59] 4 199698481 199698481
 [15:25:59] Verified work/wudata_04.log
 [15:25:59] Verified work/wudata_04.edr
 [15:26:01] Verified work/wudata_04.xtc
 [15:26:07] Completed 64%
 
 !!! 1 hour 20 minutes no progress
 
 Folding@Home Client Shutdown.
 
 
 --- Opening Log file [September 11 16:49:09 UTC]
 
 
 [16:49:09]
 [16:49:09] Loaded queue successfully.
 [16:49:09] Initialization complete
 [16:49:09]
 [16:49:09] + Processing work unit
 [16:49:09] Core required: FahCore_11.exe
 [16:49:09] Core found.
 [16:49:09] Working on queue slot 04 [September 11 16:49:09 UTC]
 [16:49:09] + Working ...
 [16:49:09]
 [16:49:09] *------------------------------*
 [16:49:09] Folding@Home GPU Core - Beta
 [16:49:09] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
 [16:49:09]
 [16:49:09] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
 [16:49:09] Build host: amoeba
 [16:49:09] Board Type: AMD
 [16:49:09] Core :
 [16:49:09] Preparing to commence simulation
 [16:49:09] - Looking at optimizations...
 [16:49:09] - Files status OK
 [16:49:09] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
 [16:49:09] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
 [16:49:09] - Digital signature verified
 [16:49:09]
 [16:49:09] Project: 5736 (Run 3, Clone 515, Gen 119)
 [16:49:09]
 [16:49:30] Assembly optimizations on if available.
 [16:49:30] Entering M.D.
 [16:49:36] Will resume from checkpoint file
 [16:49:41] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
 [16:50:22] Working on Protein
 [16:50:48] Client config found, loading data.
 [16:50:49] Starting GUI Server
 [16:57:34] Resuming from checkpoint
 [16:57:34] fcCheckPointResume: retreived and current tpr file hash:
 [16:57:34] 0 1445190852 1445190852
 [16:57:34] 1 3527609112 3527609112
 [16:57:34] 2 2623324236 2623324236
 [16:57:34] 3 1655012693 1655012693
 [16:57:34] 4 199698481 199698481
 [16:57:39] Verified work/wudata_04.log
 [16:57:39] Verified work/wudata_04.edr
 [16:57:39] Verified work/wudata_04.xtc
 [16:57:40] Completed 64%
 
 !!! 2 1/2 hours, no progress
 
 --- Opening Log file [September 11 19:37:08 UTC]
 
 
 [19:37:08]
 [19:37:09] Loaded queue successfully.
 [19:37:09] Initialization complete
 [19:37:09]
 [19:37:09] + Processing work unit
 [19:37:09] Core required: FahCore_11.exe
 [19:37:09] Core found.
 [19:37:09] Working on queue slot 04 [September 11 19:37:09 UTC]
 [19:37:09] + Working ...
 [19:37:10]
 [19:37:10] *------------------------------*
 [19:37:10] Folding@Home GPU Core - Beta
 [19:37:10] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
 [19:37:10]
 [19:37:10] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
 [19:37:10] Build host: amoeba
 [19:37:10] Board Type: AMD
 [19:37:10] Core :
 [19:37:10] Preparing to commence simulation
 [19:37:10] - Ensuring status. Please wait.
 [19:37:20] - Looking at optimizations...
 [19:37:20] - Working with standard loops on this execution.
 [19:37:20] - Previous termination of core was improper.
 [19:37:20] - Files status OK
 [19:37:20] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
 [19:37:20] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
 [19:37:20] - Digital signature verified
 [19:37:20]
 [19:37:20] Project: 5736 (Run 3, Clone 515, Gen 119)
 [19:37:20]
 [19:37:33] Entering M.D.
 [19:37:39] Will resume from checkpoint file
 [19:37:39] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
 [19:38:22] Working on Protein
 [19:38:41] Client config found, loading data.
 [19:38:46] Starting GUI Server
 [19:45:22] Resuming from checkpoint
 [19:45:22] fcCheckPointResume: retreived and current tpr file hash:
 [19:45:22] 0 1445190852 1445190852
 [19:45:22] 1 3527609112 3527609112
 [19:45:22] 2 2623324236 2623324236
 [19:45:22] 3 1655012693 1655012693
 [19:45:22] 4 199698481 199698481
 [19:45:28] Verified work/wudata_04.log
 [19:45:28] Verified work/wudata_04.edr
 [19:45:31] Verified work/wudata_04.xtc
 [19:45:32] Completed 64%
 [19:45:32] mdrun_gpu returned
 [19:45:32] Calculated & specified T inconsisitent
 [19:45:32]
 [19:45:32] Folding@home Core Shutdown: UNSTABLE_MACHINE
 [19:45:46] CoreStatus = 7A (122)
 [19:45:46] Sending work to server
 [19:45:46] Project: 5736 (Run 3, Clone 515, Gen 119)
 [19:45:46] - Read packet limit of 540015616... Set to 524286976.
 
 
 [19:45:46] + Attempting to send results [September 11 19:45:46 UTC]
 [19:45:46] - Error: Could not read results file work/wuresults_04.dat from disk
 [19:45:46] - Error: Could not read unit 04 file. Removing from queue.
 [19:45:46] - Preparing to get new work unit...
 [19:45:46] + Attempting to get work packet
 [19:45:46] - Connecting to assignment server
 [19:45:47] - Successful: assigned to (171.64.65.102).
 [19:45:47] + News From Folding@Home: Welcome to Folding@Home
 [19:45:47] Loaded queue successfully.
 
 ---snip---
 
 [19:45:53] - Digital signature verified
 [19:45:54]
 [19:45:54] Project: 5736 (Run 3, Clone 515, Gen 119)
 
 !!! Same WU
 
 [19:45:54]
 [19:46:02] Assembly optimizations on if available.
 [19:46:02] Entering M.D.
 [19:46:08] Tpr hash work/wudata_05.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
 [19:46:55] Working on Protein
 [19:47:15] Client config found, loading data.
 [19:47:21] Starting GUI Server
 
 !!! Over 2 hours - no progress
 
 Folding@Home Client Shutdown.

