Project: 6604 (Run 8, Clone 395, Gen 42) - Mutiple EUE

Moderators: Site Moderators, FAHC Science Team

Post Reply
Senture
Posts: 8
Joined: Fri Sep 12, 2008 12:38 am

Project: 6604 (Run 8, Clone 395, Gen 42) - Mutiple EUE

Post by Senture »

Just had this WU EUE several times, every time in a different place.

Code: Select all

[10:39:05] + Attempting to send results [April 28 10:39:05 UTC]
[10:39:08] + Results successfully sent
[10:39:08] Thank you for your contribution to Folding@Home.
[10:39:08] + Number of Units Completed: 50

[10:39:12] - Preparing to get new work unit...
[10:39:12] + Attempting to get work packet
[10:39:12] - Connecting to assignment server
[10:39:13] - Successful: assigned to (171.64.65.61).
[10:39:13] + News From Folding@Home: Welcome to Folding@Home
[10:39:13] Loaded queue successfully.
[10:39:15] + Closed connections
[10:39:15] 
[10:39:15] + Processing work unit
[10:39:15] Core required: FahCore_11.exe
[10:39:15] Core found.
[10:39:15] Working on queue slot 02 [April 28 10:39:15 UTC]
[10:39:15] + Working ...
[10:39:15] 
[10:39:15] *------------------------------*
[10:39:15] Folding@Home GPU Core
[10:39:15] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[10:39:15] 
[10:39:15] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[10:39:15] Build host: amoeba
[10:39:15] Board Type: Nvidia
[10:39:15] Core      : 
[10:39:15] Preparing to commence simulation
[10:39:15] - Looking at optimizations...
[10:39:15] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[10:39:15] - Created dyn
[10:39:15] - Files status OK
[10:39:15] - Expanded 73689 -> 383588 (decompressed 520.5 percent)
[10:39:15] Called DecompressByteArray: compressed_data_size=73689 data_size=383588, decompressed_data_size=383588 diff=0
[10:39:15] - Digital signature verified
[10:39:15] 
[10:39:15] Project: 6604 (Run 8, Clone 395, Gen 42)
[10:39:15] 
[10:39:15] Assembly optimizations on if available.
[10:39:15] Entering M.D.
[10:39:21] Tpr hash work/wudata_02.tpr:  2997436786 4124991434 1373664617 2016349556 1061161968
[10:39:21] 
[10:39:21] Calling fah_main args: 14 usage=100
[10:39:21] 
[10:39:22] Working on Protein
[10:39:25] Client config found, loading data.
[10:39:25] Starting GUI Server
[10:40:38] Completed 1%
[10:41:51] Completed 2%
[10:43:04] Completed 3%
[10:44:17] Completed 4%
[10:45:30] Completed 5%
[10:46:43] Completed 6%
[10:47:55] Completed 7%
[10:49:08] Completed 8%
[10:50:21] Completed 9%
[10:51:34] Completed 10%
[10:52:47] Completed 11%
[10:54:00] Completed 12%
[10:55:13] Completed 13%
[10:56:26] Completed 14%
[10:57:38] Completed 15%
[10:58:51] Completed 16%
[11:00:04] Completed 17%
[11:01:17] Completed 18%
[11:02:31] Completed 19%
[11:03:43] Completed 20%
[11:04:56] Completed 21%
[11:06:09] Completed 22%
[11:07:22] Completed 23%
[11:08:35] Completed 24%
[11:09:47] Completed 25%
[11:11:00] Completed 26%
[11:12:14] Completed 27%
[11:13:27] Completed 28%
[11:14:40] Completed 29%
[11:15:53] Completed 30%
[11:17:05] Completed 31%
[11:18:18] Completed 32%
[11:19:31] Completed 33%
[11:20:43] Completed 34%
[11:21:56] Completed 35%
[11:23:10] Completed 36%
[11:24:23] Completed 37%
[11:25:36] Completed 38%
[11:26:40] Completed 39%
[11:26:41] mdrun_gpu returned 
[11:26:41] NANs detected on GPU
[11:26:41] 
[11:26:41] Folding@home Core Shutdown: UNSTABLE_MACHINE
[11:26:43] CoreStatus = 7A (122)
[11:26:43] Sending work to server
[11:26:43] Project: 6604 (Run 8, Clone 395, Gen 42)
[11:26:43] - Read packet limit of 540015616... Set to 524286976.
[11:26:43] - Error: Could not get length of results file work/wuresults_02.dat
[11:26:43] - Error: Could not read unit 02 file. Removing from queue.
[11:26:43] - Preparing to get new work unit...
[11:26:43] + Attempting to get work packet
[11:26:43] - Connecting to assignment server
[11:26:45] - Successful: assigned to (171.64.65.61).
[11:26:45] + News From Folding@Home: Welcome to Folding@Home
[11:26:45] Loaded queue successfully.
[11:26:46] + Closed connections
[11:26:51] 
[11:26:51] + Processing work unit
[11:26:51] Core required: FahCore_11.exe
[11:26:51] Core found.
[11:26:51] Working on queue slot 03 [April 28 11:26:51 UTC]
[11:26:51] + Working ...
[11:26:52] 
[11:26:52] *------------------------------*
[11:26:52] Folding@Home GPU Core
[11:26:52] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[11:26:52] 
[11:26:52] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[11:26:52] Build host: amoeba
[11:26:52] Board Type: Nvidia
[11:26:52] Core      : 
[11:26:52] Preparing to commence simulation
[11:26:52] - Looking at optimizations...
[11:26:52] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[11:26:52] - Created dyn
[11:26:52] - Files status OK
[11:26:52] - Expanded 73689 -> 383588 (decompressed 520.5 percent)
[11:26:52] Called DecompressByteArray: compressed_data_size=73689 data_size=383588, decompressed_data_size=383588 diff=0
[11:26:52] - Digital signature verified
[11:26:52] 
[11:26:52] Project: 6604 (Run 8, Clone 395, Gen 42)
[11:26:52] 
[11:26:52] Assembly optimizations on if available.
[11:26:52] Entering M.D.
[11:26:58] Tpr hash work/wudata_03.tpr:  2997436786 4124991434 1373664617 2016349556 1061161968
[11:26:58] 
[11:26:58] Calling fah_main args: 14 usage=100
[11:26:58] 
[11:26:58] Working on Protein
[11:27:00] Client config found, loading data.
[11:27:00] Starting GUI Server
[11:28:13] Completed 1%
[11:29:25] Completed 2%
[11:29:25] mdrun_gpu returned 
[11:29:25] NANs detected on GPU
[11:29:25] 
[11:29:25] Folding@home Core Shutdown: UNSTABLE_MACHINE
[11:29:29] CoreStatus = 7A (122)
[11:29:29] Sending work to server
[11:29:29] Project: 6604 (Run 8, Clone 395, Gen 42)
[11:29:29] - Read packet limit of 540015616... Set to 524286976.
[11:29:29] - Error: Could not get length of results file work/wuresults_03.dat
[11:29:29] - Error: Could not read unit 03 file. Removing from queue.
[11:29:29] - Preparing to get new work unit...
[11:29:29] + Attempting to get work packet
[11:29:29] - Connecting to assignment server
[11:29:29] - Successful: assigned to (171.64.65.61).
[11:29:29] + News From Folding@Home: Welcome to Folding@Home
[11:29:30] Loaded queue successfully.
[11:29:31] + Closed connections
[11:29:36] 
[11:29:36] + Processing work unit
[11:29:36] Core required: FahCore_11.exe
[11:29:36] Core found.
[11:29:36] Working on queue slot 04 [April 28 11:29:36 UTC]
[11:29:36] + Working ...
[11:29:36] 
[11:29:36] *------------------------------*
[11:29:36] Folding@Home GPU Core
[11:29:36] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[11:29:36] 
[11:29:36] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[11:29:36] Build host: amoeba
[11:29:36] Board Type: Nvidia
[11:29:36] Core      : 
[11:29:36] Preparing to commence simulation
[11:29:36] - Looking at optimizations...
[11:29:36] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[11:29:36] - Created dyn
[11:29:36] - Files status OK
[11:29:36] - Expanded 73689 -> 383588 (decompressed 520.5 percent)
[11:29:36] Called DecompressByteArray: compressed_data_size=73689 data_size=383588, decompressed_data_size=383588 diff=0
[11:29:36] - Digital signature verified
[11:29:36] 
[11:29:36] Project: 6604 (Run 8, Clone 395, Gen 42)
[11:29:36] 
[11:29:36] Assembly optimizations on if available.
[11:29:36] Entering M.D.
[11:29:43] Tpr hash work/wudata_04.tpr:  2997436786 4124991434 1373664617 2016349556 1061161968
[11:29:43] 
[11:29:43] Calling fah_main args: 14 usage=100
[11:29:43] 
[11:29:43] Working on Protein
[11:29:45] Client config found, loading data.
[11:29:45] Starting GUI Server
[11:30:58] Completed 1%
[11:32:11] Completed 2%
[11:33:23] Completed 3%
[11:34:36] Completed 4%
[11:35:49] Completed 5%
[11:37:02] Completed 6%
[11:38:15] Completed 7%
[11:39:28] Completed 8%
[11:40:35] Completed 9%
[11:40:35] mdrun_gpu returned 
[11:40:35] NANs detected on GPU
[11:40:35] 
[11:40:35] Folding@home Core Shutdown: UNSTABLE_MACHINE
[11:40:38] CoreStatus = 7A (122)
[11:40:38] Sending work to server
[11:40:38] Project: 6604 (Run 8, Clone 395, Gen 42)
[11:40:38] - Read packet limit of 540015616... Set to 524286976.
[11:40:38] - Error: Could not get length of results file work/wuresults_04.dat
[11:40:38] - Error: Could not read unit 04 file. Removing from queue.
[11:40:38] - Preparing to get new work unit...
[11:40:38] + Attempting to get work packet
[11:40:38] - Connecting to assignment server
[11:40:39] - Successful: assigned to (171.64.65.61).
[11:40:39] + News From Folding@Home: Welcome to Folding@Home
[11:40:39] Loaded queue successfully.
[11:40:40] + Closed connections
[11:40:45] 
[11:40:45] + Processing work unit
[11:40:45] Core required: FahCore_11.exe
[11:40:45] Core found.
[11:40:45] Working on queue slot 05 [April 28 11:40:45 UTC]
[11:40:45] + Working ...
[11:40:46] 
[11:40:46] *------------------------------*
[11:40:46] Folding@Home GPU Core
[11:40:46] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[11:40:46] 
[11:40:46] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[11:40:46] Build host: amoeba
[11:40:46] Board Type: Nvidia
[11:40:46] Core      : 
[11:40:46] Preparing to commence simulation
[11:40:46] - Looking at optimizations...
[11:40:46] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[11:40:46] - Created dyn
[11:40:46] - Files status OK
[11:40:46] - Expanded 73689 -> 383588 (decompressed 520.5 percent)
[11:40:46] Called DecompressByteArray: compressed_data_size=73689 data_size=383588, decompressed_data_size=383588 diff=0
[11:40:46] - Digital signature verified
[11:40:46] 
[11:40:46] Project: 6604 (Run 8, Clone 395, Gen 42)
[11:40:46] 
[11:40:46] Assembly optimizations on if available.
[11:40:46] Entering M.D.
[11:40:52] Tpr hash work/wudata_05.tpr:  2997436786 4124991434 1373664617 2016349556 1061161968
[11:40:52] 
[11:40:52] Calling fah_main args: 14 usage=100
[11:40:52] 
[11:40:52] Working on Protein
[11:40:54] Client config found, loading data.
[11:40:54] Starting GUI Server
[11:42:07] Completed 1%
[11:43:20] Completed 2%
[11:44:33] Completed 3%
[11:45:46] Completed 4%
[11:46:58] Completed 5%
[11:48:11] Completed 6%
[11:49:24] Completed 7%
[11:50:36] Completed 8%
[11:51:50] Completed 9%
[11:53:03] Completed 10%
[11:54:19] Completed 11%
[11:55:31] Completed 12%
[11:56:44] Completed 13%
[11:57:57] Completed 14%
[11:59:09] Completed 15%
[12:00:22] Completed 16%
[12:01:35] Completed 17%
[12:02:48] Completed 18%
[12:04:01] Completed 19%
[12:05:14] Completed 20%
[12:06:26] Completed 21%
[12:07:39] Completed 22%
[12:08:52] Completed 23%
[12:10:05] Completed 24%
[12:11:17] Completed 25%
[12:12:31] Completed 26%
[12:13:44] Completed 27%
[12:14:57] Completed 28%
[12:16:10] Completed 29%
[12:17:22] Completed 30%
[12:18:35] Completed 31%
[12:19:48] Completed 32%
[12:21:01] Completed 33%
[12:22:14] Completed 34%
[12:23:27] Completed 35%
[12:24:42] Completed 36%
[12:25:55] Completed 37%
[12:27:08] Completed 38%
[12:28:22] Completed 39%
[12:29:19] Completed 40%
[12:29:19] mdrun_gpu returned 
[12:29:19] NANs detected on GPU
[12:29:19] 
[12:29:19] Folding@home Core Shutdown: UNSTABLE_MACHINE
[12:29:23] CoreStatus = 7A (122)
[12:29:23] Sending work to server
[12:29:23] Project: 6604 (Run 8, Clone 395, Gen 42)
[12:29:23] - Read packet limit of 540015616... Set to 524286976.
[12:29:23] - Error: Could not get length of results file work/wuresults_05.dat
[12:29:23] - Error: Could not read unit 05 file. Removing from queue.
[12:29:23] - Preparing to get new work unit...
[12:29:23] + Attempting to get work packet
[12:29:23] - Connecting to assignment server
[12:29:24] - Successful: assigned to (171.64.65.61).
[12:29:24] + News From Folding@Home: Welcome to Folding@Home
[12:29:25] Loaded queue successfully.
[12:29:26] + Closed connections
[12:29:31] 
[12:29:31] + Processing work unit
[12:29:31] Core required: FahCore_11.exe
[12:29:31] Core found.
[12:29:31] Working on queue slot 06 [April 28 12:29:31 UTC]
[12:29:31] + Working ...
[12:29:31] 
[12:29:31] *------------------------------*
[12:29:31] Folding@Home GPU Core
[12:29:31] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[12:29:31] 
[12:29:31] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[12:29:31] Build host: amoeba
[12:29:31] Board Type: Nvidia
[12:29:31] Core      : 
[12:29:31] Preparing to commence simulation
[12:29:31] - Looking at optimizations...
[12:29:31] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[12:29:31] - Created dyn
[12:29:31] - Files status OK
[12:29:31] - Expanded 73689 -> 383588 (decompressed 520.5 percent)
[12:29:31] Called DecompressByteArray: compressed_data_size=73689 data_size=383588, decompressed_data_size=383588 diff=0
[12:29:31] - Digital signature verified
[12:29:31] 
[12:29:31] Project: 6604 (Run 8, Clone 395, Gen 42)
[12:29:31] 
[12:29:31] Assembly optimizations on if available.
[12:29:31] Entering M.D.
[12:29:38] Tpr hash work/wudata_06.tpr:  2997436786 4124991434 1373664617 2016349556 1061161968
[12:29:38] 
[12:29:38] Calling fah_main args: 14 usage=100
[12:29:38] 
[12:29:38] Working on Protein
[12:29:40] Client config found, loading data.
[12:29:40] Starting GUI Server
[12:30:53] Completed 1%
[12:32:06] Completed 2%
[12:33:22] Completed 3%
[12:34:35] Completed 4%
[12:35:28] Completed 5%
[12:35:28] mdrun_gpu returned 
[12:35:28] NANs detected on GPU
[12:35:28] 
[12:35:28] Folding@home Core Shutdown: UNSTABLE_MACHINE
[12:35:31] CoreStatus = 7A (122)
[12:35:31] Sending work to server
[12:35:31] Project: 6604 (Run 8, Clone 395, Gen 42)
[12:35:31] - Read packet limit of 540015616... Set to 524286976.
[12:35:31] - Error: Could not get length of results file work/wuresults_06.dat
[12:35:31] - Error: Could not read unit 06 file. Removing from queue.
[12:35:31] EUE limit exceeded. Pausing 24 hours.
Had 39 consecutive successful units since my reinstall, then Project: 6605 (Run 8, Clone 541, Gen 44) decided to EUE at 100%

Code: Select all

[08:46:30] Project: 6605 (Run 8, Clone 541, Gen 44)
[08:46:30] 
[08:46:30] Assembly optimizations on if available.
[08:46:30] Entering M.D.
[08:46:36] Tpr hash work/wudata_00.tpr:  3098053179 3578126811 1545620632 1859153881 2734468322
[08:46:36] 
[08:46:36] Calling fah_main args: 14 usage=100
[08:46:36] 
[08:46:36] Working on Protein
[08:46:38] Client config found, loading data.
[08:46:38] Starting GUI Server
[08:47:51] Completed 1%
[08:49:05] Completed 2%
[08:50:19] Completed 3%
[08:51:35] Completed 4%
[08:52:48] Completed 5%
[08:54:02] Completed 6%
[08:55:15] Completed 7%
[08:56:29] Completed 8%
[08:57:42] Completed 9%
[08:58:55] Completed 10%
[09:00:08] Completed 11%
[09:01:21] Completed 12%
[09:02:34] Completed 13%
[09:03:46] Completed 14%
[09:04:59] Completed 15%
[09:06:12] Completed 16%
[09:07:25] Completed 17%
[09:08:38] Completed 18%
[09:09:51] Completed 19%
[09:11:04] Completed 20%
[09:12:18] Completed 21%
[09:13:32] Completed 22%
[09:14:45] Completed 23%
[09:15:59] Completed 24%
[09:17:12] Completed 25%
[09:18:25] Completed 26%
[09:19:37] Completed 27%
[09:20:50] Completed 28%
[09:22:03] Completed 29%
[09:23:16] Completed 30%
[09:24:29] Completed 31%
[09:25:42] Completed 32%
[09:26:55] Completed 33%
[09:28:08] Completed 34%
[09:29:23] Completed 35%
[09:30:39] Completed 36%
[09:31:53] Completed 37%
[09:33:06] Completed 38%
[09:34:20] Completed 39%
[09:35:33] Completed 40%
[09:36:47] Completed 41%
[09:38:00] Completed 42%
[09:39:13] Completed 43%
[09:40:30] Completed 44%
[09:41:48] Completed 45%
[09:43:05] Completed 46%
[09:44:22] Completed 47%
[09:45:40] Completed 48%
[09:46:58] Completed 49%
[09:48:15] Completed 50%
[09:49:33] Completed 51%
[09:50:50] Completed 52%
[09:52:08] Completed 53%
[09:53:25] Completed 54%
[09:54:42] Completed 55%
[09:56:00] Completed 56%
[09:57:17] Completed 57%
[09:58:35] Completed 58%
[09:59:52] Completed 59%
[10:01:09] Completed 60%
[10:02:27] Completed 61%
[10:03:45] Completed 62%
[10:05:02] Completed 63%
[10:06:19] Completed 64%
[10:07:37] Completed 65%
[10:08:54] Completed 66%
[10:10:12] Completed 67%
[10:11:29] Completed 68%
[10:12:46] Completed 69%
[10:14:04] Completed 70%
[10:15:21] Completed 71%
[10:16:40] Completed 72%
[10:17:57] Completed 73%
[10:19:15] Completed 74%
[10:20:32] Completed 75%
[10:21:49] Completed 76%
[10:23:07] Completed 77%
[10:24:25] Completed 78%
[10:25:42] Completed 79%
[10:27:00] Completed 80%
[10:28:17] Completed 81%
[10:29:35] Completed 82%
[10:30:52] Completed 83%
[10:32:11] Completed 84%
[10:33:27] Completed 85%
[10:34:41] Completed 86%
[10:35:53] Completed 87%
[10:37:06] Completed 88%
[10:38:19] Completed 89%
[10:39:32] Completed 90%
[10:40:45] Completed 91%
[10:41:58] Completed 92%
[10:43:11] Completed 93%
[10:44:25] Completed 94%
[10:45:42] Completed 95%
[10:46:59] Completed 96%
[10:48:14] Completed 97%
[10:49:27] Completed 98%
[10:50:39] Completed 99%
[10:51:51] Completed 100%
[10:51:51] mdrun_gpu returned 
[10:51:51] NANs detected on GPU
[10:51:51] 
[10:51:51] Folding@home Core Shutdown: UNSTABLE_MACHINE
[10:51:54] CoreStatus = 7A (122)
[10:51:54] Sending work to server
[10:51:54] Project: 6605 (Run 8, Clone 541, Gen 44)
[10:51:54] - Read packet limit of 540015616... Set to 524286976.
[10:51:54] - Error: Could not get length of results file work/wuresults_00.dat
[10:51:54] - Error: Could not read unit 00 file. Removing from queue.
[10:51:54] - Preparing to get new work unit...
[10:51:54] + Attempting to get work packet
Then 5 failures in a row of 6604 (Run 8, Clone 395, Gen 42).
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6604 (Run 8, Clone 395, Gen 42) - Mutiple EUE

Post by bruce »

I suppose that with each of your failures, the project was reassigned to someone else. Three of those people have successfully completed the WU. That suggests some sort of instability in your system. When was the last time you cleaned out the dust?
Senture
Posts: 8
Joined: Fri Sep 12, 2008 12:38 am

Re: Project: 6604 (Run 8, Clone 395, Gen 42) - Mutiple EUE

Post by Senture »

In a literal sense? about a week ago. Had to do some RAM testing in the machine and took the time to disassemble the GPU and sort out the thermal issues. Temps dropped from 94 degrees C to 79 degrees C. I also removed Windows Server 2008 R2 due to incompatibility issues and reinstated Windows 7. I've set the client going again and it has picked up the same WU once more, we'll see how it goes i guess.

System information:
HP XW8400
Dual Xeon 5160 3Ghz
8GB (8x1GB) FB-DIMM ECC 667Mhz
EVGA 9800GX2 (single client running)
197.45 Drivers

Not sure if that helps.
Post Reply