Project: 5760 (Run 2, Clone 423, Gen 1) is EVIL! EVIL! I tell you!!!
This WU blows up within a second of starting! Happens to me everytime for this one WU. Then it tosses me into the 24 hour pause... Every other WU executes fine, This very same GPU just successfully finished a Project: 5911 (Run 4, Clone 490, Gen 9) , 1888 points, in 13 hours of solid computing with no errors.
Yes, I O.C., but that's not the problem. I'm fairly certain I saw this WU do the same thing on another of my GPUs. beyond the fact that over the last few months of 24x7 computing 6 GPUs, this is the only WU failure I've had.
Environment:
Nvidia 9600GSO 768MB
CUDA Driver: NVIDIA Driver for Windows 7 (64-bit) 185.85 Cuda 2.2 Drivers from Nvidia Cuda Zone Downloads, aka. GeForce video185.85 (8.15.11.8585)
Windows 7
[03:17:02] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:17:02]
[03:17:02] Assembly optimizations on if available.
[03:17:02] Entering M.D.
[03:17:09] Working on Protein
[03:17:11] Client config found, loading data.
[03:17:11] Starting GUI Server
[03:17:11] mdrun_gpu returned
[03:17:11] NANs detected on GPU
[03:17:11]
[03:17:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:17:15] CoreStatus = 7A (122)
[03:17:15] Sending work to server
[03:17:15] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:17:15] - Read packet limit of 540015616... Set to 524286976.
[03:17:15] - Error: Could not get length of results file work/wuresults_06.dat
[03:17:15] - Error: Could not read unit 06 file. Removing from queue.
[03:17:15] EUE limit exceeded. Pausing 24 hours.
Folding@Home Client Shutdown.
Code: Select all
[13:45:51] + Attempting to get work packet
[13:45:51] - Connecting to assignment server
[13:45:51] - Successful: assigned to (171.64.65.20).
[13:45:51] + News From Folding@Home: Welcome to Folding@Home
[13:45:51] Loaded queue successfully.
[13:45:52] + Closed connections
[13:45:52]
[13:45:52] + Processing work unit
[13:45:52] Core required: FahCore_14.exe
[13:45:52] Core found.
[13:45:52] Working on queue slot 01 [June 30 13:45:52 UTC]
[13:45:52] + Working ...
[13:45:52]
[13:45:52] *------------------------------*
[13:45:52] Folding@Home GPU Core - Beta
[13:45:52] Version 1.25 (Mon Mar 2 19:49:32 PST 2009)
[13:45:52]
[13:45:52] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[13:45:52] Build host: vspm46
[13:45:52] Board Type: Nvidia
[13:45:52] Core :
[13:45:52] Preparing to commence simulation
[13:45:52] - Looking at optimizations...
[13:45:52] - Created dyn
[13:45:52] - Files status OK
[13:45:52] - Expanded 68552 -> 357580 (decompressed 521.6 percent)
[13:45:52] Called DecompressByteArray: compressed_data_size=68552 data_size=357580, decompressed_data_size=357580 diff=0
[13:45:52] - Digital signature verified
[13:45:52]
[13:45:52] Project: 5911 (Run 4, Clone 490, Gen 9)
[13:45:52]
[13:45:52] Assembly optimizations on if available.
[13:45:52] Entering M.D.
[13:45:58] Tpr hash work/wudata_01.tpr: 3482127431 3117879785 3848815526 733803127 1289464858
[13:45:58] Working on Protein
[13:45:59] Client config found, loading data.
[13:46:00] Starting GUI Server
[13:50:08] + Working...
[13:54:07] Completed 1%
... snip ...
[03:07:11] Completed 99%
[03:15:22] Completed 100%
[03:15:23] Successful run
[03:15:23] DynamicWrapper: Finished Work Unit: sleep=10000
[03:15:33] Reserved 33696 bytes for xtc file; Cosm status=0
[03:15:33] Allocated 33696 bytes for xtc file
[03:15:33] - Reading up to 33696 from "work/wudata_01.xtc": Read 33696
[03:15:33] Read 33696 bytes from xtc file; available packet space=786396768
[03:15:33] xtc file hash check passed.
[03:15:33] Reserved 23472 23472 786396768 bytes for arc file=<work/wudata_01.trr> Cosm status=0
[03:15:33] Allocated 23472 bytes for arc file
[03:15:33] - Reading up to 23472 from "work/wudata_01.trr": Read 23472
[03:15:33] Read 23472 bytes from arc file; available packet space=786373296
[03:15:33] trr file hash check passed.
[03:15:33] Allocated 560 bytes for edr file
[03:15:33] Read bedfile
[03:15:33] edr file hash check passed.
[03:15:33] Allocated 0 bytes for logfile
[03:15:33] Could not open/read logfile=<work/wudata_01.log>; Cosm status=-1
[03:15:33] GuardedRun: success in DynamicWrapper
[03:15:33] GuardedRun: done
[03:15:33] Run: GuardedRun completed.
[03:15:36] - Writing 58240 bytes of core data to disk...
[03:15:36] Done: 57728 -> 57506 (compressed to 99.6 percent)
[03:15:36] ... Done.
[03:15:36] - Shutting down core
[03:15:36]
[03:15:36] Folding@home Core Shutdown: FINISHED_UNIT
[03:15:40] CoreStatus = 64 (100)
[03:15:40] Sending work to server
[03:15:40] Project: 5911 (Run 4, Clone 490, Gen 9)
[03:15:40] - Read packet limit of 540015616... Set to 524286976.
[03:15:40] + Attempting to send results [July 1 03:15:40 UTC]
[03:15:40] + Results successfully sent
[03:15:40] Thank you for your contribution to Folding@Home.
[03:15:40] + Number of Units Completed: 98
[03:15:44] - Preparing to get new work unit...
[03:15:44] + Attempting to get work packet
[03:15:44] - Connecting to assignment server
[03:15:44] - Successful: assigned to (171.64.65.106).
[03:15:44] + News From Folding@Home: Welcome to Folding@Home
[03:15:44] Loaded queue successfully.
[03:15:45] + Closed connections
[03:15:45]
[03:15:45] + Processing work unit
[03:15:45] Core required: FahCore_11.exe
[03:15:45] Core found.
[03:15:45] Working on queue slot 02 [July 1 03:15:45 UTC]
[03:15:45] + Working ...
[03:15:46]
[03:15:46] *------------------------------*
[03:15:46] Folding@Home GPU Core - Beta
[03:15:46] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[03:15:46]
[03:15:46] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:15:46] Build host: amoeba
[03:15:46] Board Type: Nvidia
[03:15:46] Core :
[03:15:46] Preparing to commence simulation
[03:15:46] - Looking at optimizations...
[03:15:46] - Created dyn
[03:15:46] - Files status OK
[03:15:46] - Expanded 68587 -> 357580 (decompressed 521.3 percent)
[03:15:46] Called DecompressByteArray: compressed_data_size=68587 data_size=357580, decompressed_data_size=357580 diff=0
[03:15:46] - Digital signature verified
[03:15:46]
[03:15:46] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:15:46]
[03:15:46] Assembly optimizations on if available.
[03:15:46] Entering M.D.
[03:15:52] Working on Protein
[03:15:54] Client config found, loading data.
[03:15:54] Starting GUI Server
[03:15:54] mdrun_gpu returned
[03:15:54] NANs detected on GPU
[03:15:54]
[03:15:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:15:58] CoreStatus = 7A (122)
[03:15:58] Sending work to server
[03:15:58] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:15:58] - Read packet limit of 540015616... Set to 524286976.
[03:15:58] - Error: Could not get length of results file work/wuresults_02.dat
[03:15:58] - Error: Could not read unit 02 file. Removing from queue.
[03:15:58] - Preparing to get new work unit...
[03:15:58] + Attempting to get work packet
[03:15:58] - Connecting to assignment server
[03:15:58] - Successful: assigned to (171.64.65.106).
[03:15:58] + News From Folding@Home: Welcome to Folding@Home
[03:15:58] Loaded queue successfully.
[03:15:59] + Closed connections
[03:16:04]
[03:16:04] + Processing work unit
[03:16:04] Core required: FahCore_11.exe
[03:16:04] Core found.
[03:16:04] Working on queue slot 03 [July 1 03:16:04 UTC]
[03:16:04] + Working ...
[03:16:04]
[03:16:04] *------------------------------*
[03:16:04] Folding@Home GPU Core - Beta
[03:16:04] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[03:16:04]
[03:16:04] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:16:04] Build host: amoeba
[03:16:04] Board Type: Nvidia
[03:16:04] Core :
[03:16:04] Preparing to commence simulation
[03:16:04] - Looking at optimizations...
[03:16:04] - Created dyn
[03:16:04] - Files status OK
[03:16:04] - Expanded 68587 -> 357580 (decompressed 521.3 percent)
[03:16:04] Called DecompressByteArray: compressed_data_size=68587 data_size=357580, decompressed_data_size=357580 diff=0
[03:16:04] - Digital signature verified
[03:16:04]
[03:16:04] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:16:04]
[03:16:04] Assembly optimizations on if available.
[03:16:04] Entering M.D.
[03:16:11] Working on Protein
[03:16:13] Client config found, loading data.
[03:16:13] mdrun_gpu returned
[03:16:13] NANs detected on GPU
[03:16:13]
[03:16:13] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:16:17] CoreStatus = 7A (122)
[03:16:17] Sending work to server
[03:16:17] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:16:17] - Read packet limit of 540015616... Set to 524286976.
[03:16:17] - Error: Could not get length of results file work/wuresults_03.dat
[03:16:17] - Error: Could not read unit 03 file. Removing from queue.
[03:16:17] - Preparing to get new work unit...
[03:16:17] + Attempting to get work packet
[03:16:17] - Connecting to assignment server
[03:16:20] - Successful: assigned to (171.64.65.106).
[03:16:20] + News From Folding@Home: Welcome to Folding@Home
[03:16:20] Loaded queue successfully.
[03:16:20] + Closed connections
[03:16:25]
[03:16:25] + Processing work unit
[03:16:25] Core required: FahCore_11.exe
[03:16:25] Core found.
[03:16:25] Working on queue slot 04 [July 1 03:16:25 UTC]
[03:16:25] + Working ...
[03:16:25]
[03:16:25] *------------------------------*
[03:16:25] Folding@Home GPU Core - Beta
[03:16:25] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[03:16:25]
[03:16:25] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:16:25] Build host: amoeba
[03:16:25] Board Type: Nvidia
[03:16:25] Core :
[03:16:25] Preparing to commence simulation
[03:16:25] - Looking at optimizations...
[03:16:25] - Created dyn
[03:16:25] - Files status OK
[03:16:25] - Expanded 68587 -> 357580 (decompressed 521.3 percent)
[03:16:25] Called DecompressByteArray: compressed_data_size=68587 data_size=357580, decompressed_data_size=357580 diff=0
[03:16:25] - Digital signature verified
[03:16:25]
[03:16:25] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:16:25]
[03:16:25] Assembly optimizations on if available.
[03:16:25] Entering M.D.
[03:16:32] Working on Protein
[03:16:34] Client config found, loading data.
[03:16:34] Starting GUI Server
[03:16:34] mdrun_gpu returned
[03:16:34] NANs detected on GPU
[03:16:34]
[03:16:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:16:38] CoreStatus = 7A (122)
[03:16:38] Sending work to server
[03:16:38] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:16:38] - Read packet limit of 540015616... Set to 524286976.
[03:16:38] - Error: Could not get length of results file work/wuresults_04.dat
[03:16:38] - Error: Could not read unit 04 file. Removing from queue.
[03:16:38] - Preparing to get new work unit...
[03:16:38] + Attempting to get work packet
[03:16:38] - Connecting to assignment server
[03:16:38] - Successful: assigned to (171.64.65.106).
[03:16:38] + News From Folding@Home: Welcome to Folding@Home
[03:16:38] Loaded queue successfully.
[03:16:39] + Closed connections
[03:16:44]
[03:16:44] + Processing work unit
[03:16:44] Core required: FahCore_11.exe
[03:16:44] Core found.
[03:16:44] Working on queue slot 05 [July 1 03:16:44 UTC]
[03:16:44] + Working ...
[03:16:44]
[03:16:44] *------------------------------*
[03:16:44] Folding@Home GPU Core - Beta
[03:16:44] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[03:16:44]
[03:16:44] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:16:44] Build host: amoeba
[03:16:44] Board Type: Nvidia
[03:16:44] Core :
[03:16:44] Preparing to commence simulation
[03:16:44] - Looking at optimizations...
[03:16:44] - Created dyn
[03:16:44] - Files status OK
[03:16:44] - Expanded 68587 -> 357580 (decompressed 521.3 percent)
[03:16:44] Called DecompressByteArray: compressed_data_size=68587 data_size=357580, decompressed_data_size=357580 diff=0
[03:16:44] - Digital signature verified
[03:16:44]
[03:16:44] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:16:44]
[03:16:44] Assembly optimizations on if available.
[03:16:44] Entering M.D.
[03:16:50] Working on Protein
[03:16:52] Client config found, loading data.
[03:16:52] Starting GUI Server
[03:16:52] mdrun_gpu returned
[03:16:52] NANs detected on GPU
[03:16:52]
[03:16:52] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:16:56] CoreStatus = 7A (122)
[03:16:56] Sending work to server
[03:16:56] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:16:56] - Read packet limit of 540015616... Set to 524286976.
[03:16:56] - Error: Could not get length of results file work/wuresults_05.dat
[03:16:56] - Error: Could not read unit 05 file. Removing from queue.
[03:16:56] - Preparing to get new work unit...
[03:16:56] + Attempting to get work packet
[03:16:56] - Connecting to assignment server
[03:16:57] - Successful: assigned to (171.64.65.106).
[03:16:57] + News From Folding@Home: Welcome to Folding@Home
[03:16:57] Loaded queue successfully.
[03:16:57] + Closed connections
[03:17:02]
[03:17:02] + Processing work unit
[03:17:02] Core required: FahCore_11.exe
[03:17:02] Core found.
[03:17:02] Working on queue slot 06 [July 1 03:17:02 UTC]
[03:17:02] + Working ...
[03:17:02]
[03:17:02] *------------------------------*
[03:17:02] Folding@Home GPU Core - Beta
[03:17:02] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[03:17:02]
[03:17:02] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:17:02] Build host: amoeba
[03:17:02] Board Type: Nvidia
[03:17:02] Core :
[03:17:02] Preparing to commence simulation
[03:17:02] - Looking at optimizations...
[03:17:02] - Created dyn
[03:17:02] - Files status OK
[03:17:02] - Expanded 68587 -> 357580 (decompressed 521.3 percent)
[03:17:02] Called DecompressByteArray: compressed_data_size=68587 data_size=357580, decompressed_data_size=357580 diff=0
[03:17:02] - Digital signature verified
[03:17:02]
[03:17:02] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:17:02]
[03:17:02] Assembly optimizations on if available.
[03:17:02] Entering M.D.
[03:17:09] Working on Protein
[03:17:11] Client config found, loading data.
[03:17:11] Starting GUI Server
[03:17:11] mdrun_gpu returned
[03:17:11] NANs detected on GPU
[03:17:11]
[03:17:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:17:15] CoreStatus = 7A (122)
[03:17:15] Sending work to server
[03:17:15] Project: 5760 (Run 2, Clone 423, Gen 1)
[03:17:15] - Read packet limit of 540015616... Set to 524286976.
[03:17:15] - Error: Could not get length of results file work/wuresults_06.dat
[03:17:15] - Error: Could not read unit 06 file. Removing from queue.
[03:17:15] EUE limit exceeded. Pausing 24 hours.
Folding@Home Client Shutdown.