Project: 5766 (Run 0, Clone 49, Gen 406)
Posted: Thu Apr 16, 2009 4:43 pm
Hi,
Had an EUE problem with this unit - instant NANs detected on GPU and shutdown + pause for 24 hours. Lucky i checked when i did! The log file is shown below.
Havent seen an EUE in ages, so this is a bit out of the blue. I will delete the WU and carry on. What with Windows update restarting the machine this morning by itself and now an EUE, today has not been a productive day!
Cheers,
Chris
EDIT: Forgot to mention system specs. GTX280 w/ 50Mhz overclock on the shaders (never been a problem), Q6600 @ 3.3Ghz, Vista 64, 181.20 CUDA driver
Had an EUE problem with this unit - instant NANs detected on GPU and shutdown + pause for 24 hours. Lucky i checked when i did! The log file is shown below.
Code: Select all
[16:22:48] + Processing work unit
[16:22:48] Core required: FahCore_11.exe
[16:22:48] Core found.
[16:22:48] Working on queue slot 03 [April 16 16:22:48 UTC]
[16:22:48] + Working ...
[16:22:48] - Calling '.\FahCore_11.exe -dir work/ -suffix 03 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 4496 -version 623'
[16:22:48]
[16:22:48] *------------------------------*
[16:22:48] Folding@Home GPU Core - Beta
[16:22:48] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[16:22:48]
[16:22:48] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:22:48] Build host: amoeba
[16:22:48] Board Type: Nvidia
[16:22:48] Core :
[16:22:48] Preparing to commence simulation
[16:22:48] - Looking at optimizations...
[16:22:48] - Created dyn
[16:22:48] - Files status OK
[16:22:48] - Expanded 46735 -> 252912 (decompressed 541.1 percent)
[16:22:48] Called DecompressByteArray: compressed_data_size=46735 data_size=252912, decompressed_data_size=252912 diff=0
[16:22:48] - Digital signature verified
[16:22:48]
[16:22:48] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:22:48]
[16:22:48] Assembly optimizations on if available.
[16:22:48] Entering M.D.
[16:22:54] Working on Protein
[16:22:55] Client config found, loading data.
[16:22:55] mdrun_gpu returned
[16:22:55] NANs detected on GPU
[16:22:55]
[16:22:55] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:22:58] CoreStatus = 7A (122)
[16:22:58] Sending work to server
[16:22:58] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:22:58] - Error: Could not get length of results file work/wuresults_03.dat
[16:22:58] - Error: Could not read unit 03 file. Removing from queue.
[16:22:58] Trying to send all finished work units
[16:22:58] + No unsent completed units remaining.
[16:22:58] - Preparing to get new work unit...
[16:22:58] + Attempting to get work packet
[16:22:58] - Will indicate memory of 4094 MB
[16:22:58] - Connecting to assignment server
[16:22:58] Connecting to http://assign-GPU.stanford.edu:8080/
[16:22:59] Posted data.
[16:22:59] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[16:22:59] + News From Folding@Home: GPU folding beta
[16:22:59] Loaded queue successfully.
[16:22:59] Connecting to http://171.67.108.11:8080/
[16:23:00] Posted data.
[16:23:00] Initial: 0000; - Receiving payload (expected size: 47247)
[16:23:00] Conversation time very short, giving reduced weight in bandwidth avg
[16:23:00] - Downloaded at ~92 kB/s
[16:23:00] - Averaged speed for that direction ~115 kB/s
[16:23:00] + Received work.
[16:23:00] Trying to send all finished work units
[16:23:00] + No unsent completed units remaining.
[16:23:00] + Closed connections
[16:23:05]
[16:23:05] + Processing work unit
[16:23:05] Core required: FahCore_11.exe
[16:23:05] Core found.
[16:23:05] Working on queue slot 04 [April 16 16:23:05 UTC]
[16:23:05] + Working ...
[16:23:05] - Calling '.\FahCore_11.exe -dir work/ -suffix 04 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 4496 -version 623'
[16:23:05]
[16:23:05] *------------------------------*
[16:23:05] Folding@Home GPU Core - Beta
[16:23:05] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[16:23:05]
[16:23:05] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:23:05] Build host: amoeba
[16:23:05] Board Type: Nvidia
[16:23:05] Core :
[16:23:05] Preparing to commence simulation
[16:23:05] - Looking at optimizations...
[16:23:05] - Created dyn
[16:23:05] - Files status OK
[16:23:05] - Expanded 46735 -> 252912 (decompressed 541.1 percent)
[16:23:05] Called DecompressByteArray: compressed_data_size=46735 data_size=252912, decompressed_data_size=252912 diff=0
[16:23:05] - Digital signature verified
[16:23:05]
[16:23:05] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:23:05]
[16:23:05] Assembly optimizations on if available.
[16:23:05] Entering M.D.
[16:23:12] Working on Protein
[16:23:13] Client config found, loading data.
[16:23:13] mdrun_gpu returned
[16:23:13] NANs detected on GPU
[16:23:13]
[16:23:13] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:23:13] Starting GUI Server
[16:23:16] CoreStatus = 7A (122)
[16:23:16] Sending work to server
[16:23:16] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:23:16] - Error: Could not get length of results file work/wuresults_04.dat
[16:23:16] - Error: Could not read unit 04 file. Removing from queue.
[16:23:16] Trying to send all finished work units
[16:23:16] + No unsent completed units remaining.
[16:23:16] - Preparing to get new work unit...
[16:23:16] + Attempting to get work packet
[16:23:16] - Will indicate memory of 4094 MB
[16:23:16] - Connecting to assignment server
[16:23:16] Connecting to http://assign-GPU.stanford.edu:8080/
[16:23:17] Posted data.
[16:23:17] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[16:23:17] + News From Folding@Home: GPU folding beta
[16:23:17] Loaded queue successfully.
[16:23:17] Connecting to http://171.67.108.11:8080/
[16:23:18] Posted data.
[16:23:18] Initial: 0000; - Receiving payload (expected size: 47247)
[16:23:18] Conversation time very short, giving reduced weight in bandwidth avg
[16:23:18] - Downloaded at ~92 kB/s
[16:23:18] - Averaged speed for that direction ~112 kB/s
[16:23:18] + Received work.
[16:23:18] Trying to send all finished work units
[16:23:18] + No unsent completed units remaining.
[16:23:18] + Closed connections
[16:23:23]
[16:23:23] + Processing work unit
[16:23:23] Core required: FahCore_11.exe
[16:23:23] Core found.
[16:23:23] Working on queue slot 05 [April 16 16:23:23 UTC]
[16:23:23] + Working ...
[16:23:23] - Calling '.\FahCore_11.exe -dir work/ -suffix 05 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 4496 -version 623'
[16:23:24]
[16:23:24] *------------------------------*
[16:23:24] Folding@Home GPU Core - Beta
[16:23:24] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[16:23:24]
[16:23:24] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:23:24] Build host: amoeba
[16:23:24] Board Type: Nvidia
[16:23:24] Core :
[16:23:24] Preparing to commence simulation
[16:23:24] - Looking at optimizations...
[16:23:24] - Created dyn
[16:23:24] - Files status OK
[16:23:24] - Expanded 46735 -> 252912 (decompressed 541.1 percent)
[16:23:24] Called DecompressByteArray: compressed_data_size=46735 data_size=252912, decompressed_data_size=252912 diff=0
[16:23:24] - Digital signature verified
[16:23:24]
[16:23:24] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:23:24]
[16:23:24] Assembly optimizations on if available.
[16:23:24] Entering M.D.
[16:23:30] Working on Protein
[16:23:31] Client config found, loading data.
[16:23:31] mdrun_gpu returned
[16:23:31] NANs detected on GPU
[16:23:31]
[16:23:31] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:23:34] CoreStatus = 7A (122)
[16:23:34] Sending work to server
[16:23:34] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:23:34] - Error: Could not get length of results file work/wuresults_05.dat
[16:23:34] - Error: Could not read unit 05 file. Removing from queue.
[16:23:34] Trying to send all finished work units
[16:23:34] + No unsent completed units remaining.
[16:23:34] - Preparing to get new work unit...
[16:23:34] + Attempting to get work packet
[16:23:34] - Will indicate memory of 4094 MB
[16:23:34] - Connecting to assignment server
[16:23:34] Connecting to http://assign-GPU.stanford.edu:8080/
[16:23:34] Posted data.
[16:23:34] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[16:23:34] + News From Folding@Home: GPU folding beta
[16:23:35] Loaded queue successfully.
[16:23:35] Connecting to http://171.67.108.11:8080/
[16:23:36] Posted data.
[16:23:36] Initial: 0000; - Receiving payload (expected size: 47247)
[16:23:36] Conversation time very short, giving reduced weight in bandwidth avg
[16:23:36] - Downloaded at ~92 kB/s
[16:23:36] - Averaged speed for that direction ~110 kB/s
[16:23:36] + Received work.
[16:23:36] Trying to send all finished work units
[16:23:36] + No unsent completed units remaining.
[16:23:36] + Closed connections
[16:23:41]
[16:23:41] + Processing work unit
[16:23:41] Core required: FahCore_11.exe
[16:23:41] Core found.
[16:23:41] Working on queue slot 06 [April 16 16:23:41 UTC]
[16:23:41] + Working ...
[16:23:41] - Calling '.\FahCore_11.exe -dir work/ -suffix 06 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 4496 -version 623'
[16:23:41]
[16:23:41] *------------------------------*
[16:23:41] Folding@Home GPU Core - Beta
[16:23:41] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[16:23:41]
[16:23:41] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:23:41] Build host: amoeba
[16:23:41] Board Type: Nvidia
[16:23:41] Core :
[16:23:41] Preparing to commence simulation
[16:23:41] - Looking at optimizations...
[16:23:41] - Created dyn
[16:23:41] - Files status OK
[16:23:41] - Expanded 46735 -> 252912 (decompressed 541.1 percent)
[16:23:41] Called DecompressByteArray: compressed_data_size=46735 data_size=252912, decompressed_data_size=252912 diff=0
[16:23:41] - Digital signature verified
[16:23:41]
[16:23:41] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:23:41]
[16:23:41] Assembly optimizations on if available.
[16:23:41] Entering M.D.
[16:23:48] Working on Protein
[16:23:49] Client config found, loading data.
[16:23:49] mdrun_gpu returned
[16:23:49] NANs detected on GPU
[16:23:49]
[16:23:49] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:23:49] Starting GUI Server
[16:23:51] CoreStatus = 7A (122)
[16:23:51] Sending work to server
[16:23:51] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:23:51] - Error: Could not get length of results file work/wuresults_06.dat
[16:23:51] - Error: Could not read unit 06 file. Removing from queue.
[16:23:51] Trying to send all finished work units
[16:23:51] + No unsent completed units remaining.
[16:23:51] - Preparing to get new work unit...
[16:23:51] + Attempting to get work packet
[16:23:51] - Will indicate memory of 4094 MB
[16:23:51] - Connecting to assignment server
[16:23:51] Connecting to http://assign-GPU.stanford.edu:8080/
[16:23:52] Posted data.
[16:23:52] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[16:23:52] + News From Folding@Home: GPU folding beta
[16:23:53] Loaded queue successfully.
[16:23:53] Connecting to http://171.67.108.11:8080/
[16:23:54] Posted data.
[16:23:54] Initial: 0000; - Receiving payload (expected size: 47247)
[16:23:54] Conversation time very short, giving reduced weight in bandwidth avg
[16:23:54] - Downloaded at ~92 kB/s
[16:23:54] - Averaged speed for that direction ~108 kB/s
[16:23:54] + Received work.
[16:23:54] Trying to send all finished work units
[16:23:54] + No unsent completed units remaining.
[16:23:54] + Closed connections
[16:23:59]
[16:23:59] + Processing work unit
[16:23:59] Core required: FahCore_11.exe
[16:23:59] Core found.
[16:23:59] Working on queue slot 07 [April 16 16:23:59 UTC]
[16:23:59] + Working ...
[16:23:59] - Calling '.\FahCore_11.exe -dir work/ -suffix 07 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 4496 -version 623'
[16:23:59]
[16:23:59] *------------------------------*
[16:23:59] Folding@Home GPU Core - Beta
[16:23:59] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[16:23:59]
[16:23:59] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:23:59] Build host: amoeba
[16:23:59] Board Type: Nvidia
[16:23:59] Core :
[16:23:59] Preparing to commence simulation
[16:23:59] - Looking at optimizations...
[16:23:59] - Created dyn
[16:23:59] - Files status OK
[16:24:00] - Expanded 46735 -> 252912 (decompressed 541.1 percent)
[16:24:00] Called DecompressByteArray: compressed_data_size=46735 data_size=252912, decompressed_data_size=252912 diff=0
[16:24:00] - Digital signature verified
[16:24:00]
[16:24:00] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:24:00]
[16:24:00] Assembly optimizations on if available.
[16:24:00] Entering M.D.
[16:24:06] Working on Protein
[16:24:07] Client config found, loading data.
[16:24:07] mdrun_gpu returned
[16:24:07] NANs detected on GPU
[16:24:07]
[16:24:07] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:24:07] Starting GUI Server
[16:24:09] CoreStatus = 7A (122)
[16:24:09] Sending work to server
[16:24:09] Project: 5766 (Run 0, Clone 49, Gen 406)
[16:24:09] - Error: Could not get length of results file work/wuresults_07.dat
[16:24:09] - Error: Could not read unit 07 file. Removing from queue.
[16:24:09] EUE limit exceeded. Pausing 24 hours.
[16:38:25] ***** Got a SIGTERM signal (2)
[16:38:25] Killing all core threads
Folding@Home Client Shutdown.
Cheers,
Chris
EDIT: Forgot to mention system specs. GTX280 w/ 50Mhz overclock on the shaders (never been a problem), Q6600 @ 3.3Ghz, Vista 64, 181.20 CUDA driver