Project: 5736 (Run 3, Clone 515, Gen 119)
Posted: Sat Sep 11, 2010 10:30 pm
I'm not sure if this is a WU issue or a GPU issue.
This WU seemed to hang at 64% for several hours.
Normally the GPU will process 1% of a WU in about 5 minutes and is supposed to checkpoint every 15 minutes.
After 90 minutes, I didn't see any evidence of progress or checkpoints.
I then shutdown and restarted the client multiple times (each time waiting 1.5-2 hours) and saw the same lack of progress.
Am I jumping to the conclusion that it's hung prematurely, or if it's truely hung, how can I tell if it is a WU or GPU issue?
GPU: ATI Radeon 4870
Catalyst Version: 09.11
Windows XP SP3
This WU seemed to hang at 64% for several hours.
Normally the GPU will process 1% of a WU in about 5 minutes and is supposed to checkpoint every 15 minutes.
After 90 minutes, I didn't see any evidence of progress or checkpoints.
I then shutdown and restarted the client multiple times (each time waiting 1.5-2 hours) and saw the same lack of progress.
Am I jumping to the conclusion that it's hung prematurely, or if it's truely hung, how can I tell if it is a WU or GPU issue?
GPU: ATI Radeon 4870
Catalyst Version: 09.11
Windows XP SP3
- --- Opening Log file [September 11 08:47:58 UTC]
[08:47:58]
[08:47:58] Loaded queue successfully.
[08:47:58] Initialization complete
[08:47:58]
[08:47:58] + Processing work unit
[08:47:58] Core required: FahCore_11.exe
[08:47:58] Core found.
[08:47:58] Working on queue slot 04 [September 11 08:47:58 UTC]
[08:47:58] + Working ...
[08:47:58]
[08:47:58] *------------------------------*
[08:47:58] Folding@Home GPU Core - Beta
[08:47:58] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[08:47:58]
[08:47:58] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[08:47:58] Build host: amoeba
[08:47:58] Board Type: AMD
[08:47:58] Core :
[08:47:58] Preparing to commence simulation
[08:47:58] - Ensuring status. Please wait.
[08:48:07] - Looking at optimizations...
[08:48:07] - Working with standard loops on this execution.
[08:48:07] - Previous termination of core was improper.
[08:48:07] - Files status OK
[08:48:07] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
[08:48:07] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
[08:48:07] - Digital signature verified
[08:48:07]
[08:48:07] Project: 5736 (Run 3, Clone 515, Gen 119)
[08:48:07]
[08:48:07] Entering M.D.
[08:48:13] Will resume from checkpoint file
[08:48:13] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
[08:48:14] Working on Protein
[08:48:14] Client config found, loading data.
[08:48:14] Starting GUI Server
[08:48:19] Resuming from checkpoint
[08:48:19] fcCheckPointResume: retreived and current tpr file hash:
[08:48:19] 0 1445190852 1445190852
[08:48:19] 1 3527609112 3527609112
[08:48:19] 2 2623324236 2623324236
[08:48:19] 3 1655012693 1655012693
[08:48:19] 4 199698481 199698481
[08:48:19] Verified work/wudata_04.log
[08:48:19] Verified work/wudata_04.edr
[08:48:19] Verified work/wudata_04.xtc
[08:48:19] Completed 15%
---snip---
[13:25:35] Completed 63%
[13:30:13] Completed 64%
[14:47:58] + Working...
!!! 1 hours 45 minutes no progress
Folding@Home Client Shutdown.
--- Opening Log file [September 11 15:18:53 UTC]
[15:18:53]
[15:18:53] Loaded queue successfully.
[15:18:53] Initialization complete
[15:18:53]
[15:18:53] + Processing work unit
[15:18:53] Core required: FahCore_11.exe
[15:18:53] Core found.
[15:18:53] Working on queue slot 04 [September 11 15:18:53 UTC]
[15:18:53] + Working ...
[15:18:54]
[15:18:54] *------------------------------*
[15:18:54] Folding@Home GPU Core - Beta
[15:18:54] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[15:18:54]
[15:18:54] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[15:18:54] Build host: amoeba
[15:18:54] Board Type: AMD
[15:18:54] Core :
[15:18:54] Preparing to commence simulation
[15:18:54] - Looking at optimizations...
[15:18:54] - Files status OK
[15:18:54] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
[15:18:54] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
[15:18:54] - Digital signature verified
[15:18:54]
[15:18:54] Project: 5736 (Run 3, Clone 515, Gen 119)
[15:18:54]
[15:19:01] Assembly optimizations on if available.
[15:19:01] Entering M.D.
[15:19:15] Will resume from checkpoint file
[15:19:15] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
[15:19:28] Working on Protein
[15:20:01] Client config found, loading data.
[15:20:05] Starting GUI Server
[15:25:59] Resuming from checkpoint
[15:25:59] fcCheckPointResume: retreived and current tpr file hash:
[15:25:59] 0 1445190852 1445190852
[15:25:59] 1 3527609112 3527609112
[15:25:59] 2 2623324236 2623324236
[15:25:59] 3 1655012693 1655012693
[15:25:59] 4 199698481 199698481
[15:25:59] Verified work/wudata_04.log
[15:25:59] Verified work/wudata_04.edr
[15:26:01] Verified work/wudata_04.xtc
[15:26:07] Completed 64%
!!! 1 hour 20 minutes no progress
Folding@Home Client Shutdown.
--- Opening Log file [September 11 16:49:09 UTC]
[16:49:09]
[16:49:09] Loaded queue successfully.
[16:49:09] Initialization complete
[16:49:09]
[16:49:09] + Processing work unit
[16:49:09] Core required: FahCore_11.exe
[16:49:09] Core found.
[16:49:09] Working on queue slot 04 [September 11 16:49:09 UTC]
[16:49:09] + Working ...
[16:49:09]
[16:49:09] *------------------------------*
[16:49:09] Folding@Home GPU Core - Beta
[16:49:09] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[16:49:09]
[16:49:09] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:49:09] Build host: amoeba
[16:49:09] Board Type: AMD
[16:49:09] Core :
[16:49:09] Preparing to commence simulation
[16:49:09] - Looking at optimizations...
[16:49:09] - Files status OK
[16:49:09] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
[16:49:09] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
[16:49:09] - Digital signature verified
[16:49:09]
[16:49:09] Project: 5736 (Run 3, Clone 515, Gen 119)
[16:49:09]
[16:49:30] Assembly optimizations on if available.
[16:49:30] Entering M.D.
[16:49:36] Will resume from checkpoint file
[16:49:41] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
[16:50:22] Working on Protein
[16:50:48] Client config found, loading data.
[16:50:49] Starting GUI Server
[16:57:34] Resuming from checkpoint
[16:57:34] fcCheckPointResume: retreived and current tpr file hash:
[16:57:34] 0 1445190852 1445190852
[16:57:34] 1 3527609112 3527609112
[16:57:34] 2 2623324236 2623324236
[16:57:34] 3 1655012693 1655012693
[16:57:34] 4 199698481 199698481
[16:57:39] Verified work/wudata_04.log
[16:57:39] Verified work/wudata_04.edr
[16:57:39] Verified work/wudata_04.xtc
[16:57:40] Completed 64%
!!! 2 1/2 hours, no progress
--- Opening Log file [September 11 19:37:08 UTC]
[19:37:08]
[19:37:09] Loaded queue successfully.
[19:37:09] Initialization complete
[19:37:09]
[19:37:09] + Processing work unit
[19:37:09] Core required: FahCore_11.exe
[19:37:09] Core found.
[19:37:09] Working on queue slot 04 [September 11 19:37:09 UTC]
[19:37:09] + Working ...
[19:37:10]
[19:37:10] *------------------------------*
[19:37:10] Folding@Home GPU Core - Beta
[19:37:10] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[19:37:10]
[19:37:10] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:37:10] Build host: amoeba
[19:37:10] Board Type: AMD
[19:37:10] Core :
[19:37:10] Preparing to commence simulation
[19:37:10] - Ensuring status. Please wait.
[19:37:20] - Looking at optimizations...
[19:37:20] - Working with standard loops on this execution.
[19:37:20] - Previous termination of core was improper.
[19:37:20] - Files status OK
[19:37:20] - Expanded 96714 -> 489152 (decompressed 505.7 percent)
[19:37:20] Called DecompressByteArray: compressed_data_size=96714 data_size=489152, decompressed_data_size=489152 diff=0
[19:37:20] - Digital signature verified
[19:37:20]
[19:37:20] Project: 5736 (Run 3, Clone 515, Gen 119)
[19:37:20]
[19:37:33] Entering M.D.
[19:37:39] Will resume from checkpoint file
[19:37:39] Tpr hash work/wudata_04.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
[19:38:22] Working on Protein
[19:38:41] Client config found, loading data.
[19:38:46] Starting GUI Server
[19:45:22] Resuming from checkpoint
[19:45:22] fcCheckPointResume: retreived and current tpr file hash:
[19:45:22] 0 1445190852 1445190852
[19:45:22] 1 3527609112 3527609112
[19:45:22] 2 2623324236 2623324236
[19:45:22] 3 1655012693 1655012693
[19:45:22] 4 199698481 199698481
[19:45:28] Verified work/wudata_04.log
[19:45:28] Verified work/wudata_04.edr
[19:45:31] Verified work/wudata_04.xtc
[19:45:32] Completed 64%
[19:45:32] mdrun_gpu returned
[19:45:32] Calculated & specified T inconsisitent
[19:45:32]
[19:45:32] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:45:46] CoreStatus = 7A (122)
[19:45:46] Sending work to server
[19:45:46] Project: 5736 (Run 3, Clone 515, Gen 119)
[19:45:46] - Read packet limit of 540015616... Set to 524286976.
[19:45:46] + Attempting to send results [September 11 19:45:46 UTC]
[19:45:46] - Error: Could not read results file work/wuresults_04.dat from disk
[19:45:46] - Error: Could not read unit 04 file. Removing from queue.
[19:45:46] - Preparing to get new work unit...
[19:45:46] + Attempting to get work packet
[19:45:46] - Connecting to assignment server
[19:45:47] - Successful: assigned to (171.64.65.102).
[19:45:47] + News From Folding@Home: Welcome to Folding@Home
[19:45:47] Loaded queue successfully.
---snip---
[19:45:53] - Digital signature verified
[19:45:54]
[19:45:54] Project: 5736 (Run 3, Clone 515, Gen 119)
!!! Same WU
[19:45:54]
[19:46:02] Assembly optimizations on if available.
[19:46:02] Entering M.D.
[19:46:08] Tpr hash work/wudata_05.tpr: 1445190852 3527609112 2623324236 1655012693 199698481
[19:46:55] Working on Protein
[19:47:15] Client config found, loading data.
[19:47:21] Starting GUI Server
!!! Over 2 hours - no progress
Folding@Home Client Shutdown.