Page 1 of 1
P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 12:36 am
by larryb63
My NV 9800GT is just stuck on this WU, and it displays the "Self-test failure" message. Tries to get another WU, and ends up with the same unit....and crashes again.
Already hit the max EUE's twice, paused...ran memtestg80, and no errors there.
A bad WU??
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 1:50 am
by bruce
Please post FAHlog.txt
What version of drivers are you using?
Nobody has returned that WU yet.
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 3:11 am
by larryb63
Here's the dump of the log file... It includes some new WU's since I restarted again. Then it EUE'd again.
Code: Select all
8, Gen 1430)
[00:25:40] - Read packet limit of 540015616... Set to 524286976.
[00:25:40] - Error: Could not get length of results file work/wuresults_03.dat
[00:25:40] - Error: Could not read unit 03 file. Removing from queue.
[00:25:40] - Preparing to get new work unit...
[00:25:40] + Attempting to get work packet
[00:25:40] - Connecting to assignment server
[00:25:41] - Successful: assigned to (171.67.108.11).
[00:25:41] + News From Folding@Home: Welcome to Folding@Home
[00:25:41] Loaded queue successfully.
[00:25:42] + Closed connections
[00:25:47]
[00:25:47] + Processing work unit
[00:25:47] Core required: FahCore_11.exe
[00:25:47] Core found.
[00:25:47] Working on queue slot 04 [March 6 00:25:47 UTC]
[00:25:47] + Working ...
[00:25:47]
[00:25:47] *------------------------------*
[00:25:47] Folding@Home GPU Core
[00:25:47] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[00:25:47]
[00:25:47] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[00:25:47] Build host: amoeba
[00:25:47] Board Type: Nvidia
[00:25:47] Core :
[00:25:47] Preparing to commence simulation
[00:25:47] - Looking at optimizations...
[00:25:47] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[00:25:47] - Created dyn
[00:25:47] - Files status OK
[00:25:47] - Expanded 46728 -> 252912 (decompressed 541.2 percent)
[00:25:47] Called DecompressByteArray: compressed_data_size=46728 data_size=252912, decompressed_data_size=252912 diff=0
[00:25:47] - Digital signature verified
[00:25:47]
[00:25:47] Project: 5765 (Run 2, Clone 108, Gen 1430)
[00:25:47]
[00:25:47] Assembly optimizations on if available.
[00:25:47] Entering M.D.
[00:25:53] Tpr hash work/wudata_04.tpr: 2095785337 3751819365 4071811089 1625471365 807181723
[00:25:53]
[00:25:53] Calling fah_main args: 14 usage=100
[00:25:53]
[00:25:54] Working on Protein
[00:25:54] mdrun_gpu returned
[00:25:54] Self-test failure
[00:25:54]
[00:25:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:25:57] CoreStatus = 7A (122)
[00:25:57] Sending work to server
[00:25:57] Project: 5765 (Run 2, Clone 108, Gen 1430)
[00:25:57] - Read packet limit of 540015616... Set to 524286976.
[00:25:57] - Error: Could not get length of results file work/wuresults_04.dat
[00:25:57] - Error: Could not read unit 04 file. Removing from queue.
[00:25:57] - Preparing to get new work unit...
[00:25:57] + Attempting to get work packet
[00:25:57] - Connecting to assignment server
[00:25:58] - Successful: assigned to (171.67.108.11).
[00:25:58] + News From Folding@Home: Welcome to Folding@Home
[00:25:58] Loaded queue successfully.
[00:25:59] + Closed connections
[00:26:04]
[00:26:04] + Processing work unit
[00:26:04] Core required: FahCore_11.exe
[00:26:04] Core found.
[00:26:04] Working on queue slot 05 [March 6 00:26:04 UTC]
[00:26:04] + Working ...
[00:26:04]
[00:26:04] *------------------------------*
[00:26:04] Folding@Home GPU Core
[00:26:04] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[00:26:04]
[00:26:04] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[00:26:04] Build host: amoeba
[00:26:04] Board Type: Nvidia
[00:26:04] Core :
[00:26:04] Preparing to commence simulation
[00:26:04] - Looking at optimizations...
[00:26:04] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[00:26:04] - Created dyn
[00:26:04] - Files status OK
[00:26:04] - Expanded 46728 -> 252912 (decompressed 541.2 percent)
[00:26:04] Called DecompressByteArray: compressed_data_size=46728 data_size=252912, decompressed_data_size=252912 diff=0
[00:26:04] - Digital signature verified
[00:26:04]
[00:26:04] Project: 5765 (Run 2, Clone 108, Gen 1430)
[00:26:04]
[00:26:04] Assembly optimizations on if available.
[00:26:04] Entering M.D.
[00:26:10] Tpr hash work/wudata_05.tpr: 2095785337 3751819365 4071811089 1625471365 807181723
[00:26:10]
[00:26:10] Calling fah_main args: 14 usage=100
[00:26:10]
[00:26:11] Working on Protein
[00:26:11] mdrun_gpu returned
[00:26:11] Self-test failure
[00:26:11]
[00:26:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:26:15] CoreStatus = 7A (122)
[00:26:15] Sending work to server
[00:26:15] Project: 5765 (Run 2, Clone 108, Gen 1430)
[00:26:15] - Read packet limit of 540015616... Set to 524286976.
[00:26:15] - Error: Could not get length of results file work/wuresults_05.dat
[00:26:15] - Error: Could not read unit 05 file. Removing from queue.
[00:26:15] EUE limit exceeded. Pausing 24 hours.
Folding@Home Client Shutdown.
--- Opening Log file [March 6 03:01:33 UTC]
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Documents and Settings\User\Application Data\Folding@home-gpu
Arguments: -gpu 0
[03:01:33] - Ask before connecting: No
[03:01:33] - User name: larryb_home (Team 39340)
[03:01:33] - User ID: 1A9967B4ABFEF73
[03:01:33] - Machine ID: 2
[03:01:33]
[03:01:33] Loaded queue successfully.
[03:01:33] Initialization complete
[03:01:33] - Preparing to get new work unit...
[03:01:33] + Attempting to get work packet
[03:01:33] - Connecting to assignment server
[03:01:34] - Successful: assigned to (171.64.65.71).
[03:01:34] + News From Folding@Home: Welcome to Folding@Home
[03:01:34] Loaded queue successfully.
[03:01:35] + Closed connections
[03:01:35]
[03:01:35] + Processing work unit
[03:01:35] Core required: FahCore_11.exe
[03:01:35] Core found.
[03:01:35] Working on queue slot 06 [March 6 03:01:35 UTC]
[03:01:35] + Working ...
[03:01:35]
[03:01:35] *------------------------------*
[03:01:35] Folding@Home GPU Core
[03:01:35] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[03:01:35]
[03:01:35] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:01:35] Build host: amoeba
[03:01:35] Board Type: Nvidia
[03:01:35] Core :
[03:01:35] Preparing to commence simulation
[03:01:35] - Looking at optimizations...
[03:01:35] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[03:01:35] - Created dyn
[03:01:35] - Files status OK
[03:01:35] - Expanded 88717 -> 447307 (decompressed 504.1 percent)
[03:01:35] Called DecompressByteArray: compressed_data_size=88717 data_size=447307, decompressed_data_size=447307 diff=0
[03:01:35] - Digital signature verified
[03:01:35]
[03:01:35] Project: 10105 (Run 55, Clone 5, Gen 12)
[03:01:35]
[03:01:36] Assembly optimizations on if available.
[03:01:36] Entering M.D.
[03:01:42] Tpr hash work/wudata_06.tpr: 749661574 677877775 1568515253 260840342 3969723686
[03:01:42]
[03:01:42] Calling fah_main args: 14 usage=100
[03:01:42]
[03:01:42] Working on p10105_lambda_370K
[03:01:42] mdrun_gpu returned
[03:01:42] Self-test failure
[03:01:42]
[03:01:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:01:45] CoreStatus = 7A (122)
[03:01:45] Sending work to server
[03:01:45] Project: 10105 (Run 55, Clone 5, Gen 12)
[03:01:45] - Read packet limit of 540015616... Set to 524286976.
[03:01:45] - Error: Could not get length of results file work/wuresults_06.dat
[03:01:45] - Error: Could not read unit 06 file. Removing from queue.
[03:01:45] - Preparing to get new work unit...
[03:01:45] + Attempting to get work packet
[03:01:45] - Connecting to assignment server
[03:01:46] - Successful: assigned to (171.64.65.71).
[03:01:46] + News From Folding@Home: Welcome to Folding@Home
[03:01:46] Loaded queue successfully.
[03:01:48] + Closed connections
[03:01:53]
[03:01:53] + Processing work unit
[03:01:53] Core required: FahCore_11.exe
[03:01:53] Core found.
[03:01:53] Working on queue slot 07 [March 6 03:01:53 UTC]
[03:01:53] + Working ...
[03:01:53]
[03:01:53] *------------------------------*
[03:01:53] Folding@Home GPU Core
[03:01:53] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[03:01:53]
[03:01:53] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:01:53] Build host: amoeba
[03:01:53] Board Type: Nvidia
[03:01:53] Core :
[03:01:53] Preparing to commence simulation
[03:01:53] - Looking at optimizations...
[03:01:53] DeleteFrameFiles: successfully deleted file=work/wudata_07.ckp
[03:01:53] - Created dyn
[03:01:53] - Files status OK
[03:01:53] - Expanded 88648 -> 447307 (decompressed 504.5 percent)
[03:01:53] Called DecompressByteArray: compressed_data_size=88648 data_size=447307, decompressed_data_size=447307 diff=0
[03:01:53] - Digital signature verified
[03:01:53]
[03:01:53] Project: 10105 (Run 44, Clone 9, Gen 14)
[03:01:53]
[03:01:53] Assembly optimizations on if available.
[03:01:53] Entering M.D.
[03:01:59] Tpr hash work/wudata_07.tpr: 557207243 3098624540 4075087713 1779113968 2998535250
[03:01:59]
[03:01:59] Calling fah_main args: 14 usage=100
[03:01:59]
[03:01:59] Working on p10105_lambda_370K
[03:02:00] mdrun_gpu returned
[03:02:00] Self-test failure
[03:02:00]
[03:02:00] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:03] CoreStatus = 7A (122)
[03:02:03] Sending work to server
[03:02:03] Project: 10105 (Run 44, Clone 9, Gen 14)
[03:02:03] - Read packet limit of 540015616... Set to 524286976.
[03:02:03] - Error: Could not get length of results file work/wuresults_07.dat
[03:02:03] - Error: Could not read unit 07 file. Removing from queue.
[03:02:03] - Preparing to get new work unit...
[03:02:03] + Attempting to get work packet
[03:02:03] - Connecting to assignment server
[03:02:03] - Successful: assigned to (171.64.65.71).
[03:02:03] + News From Folding@Home: Welcome to Folding@Home
[03:02:04] Loaded queue successfully.
[03:02:05] + Closed connections
[03:02:10]
[03:02:10] + Processing work unit
[03:02:10] Core required: FahCore_11.exe
[03:02:10] Core found.
[03:02:10] Working on queue slot 08 [March 6 03:02:10 UTC]
[03:02:10] + Working ...
[03:02:10]
[03:02:10] *------------------------------*
[03:02:10] Folding@Home GPU Core
[03:02:10] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[03:02:10]
[03:02:10] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:02:10] Build host: amoeba
[03:02:10] Board Type: Nvidia
[03:02:10] Core :
[03:02:10] Preparing to commence simulation
[03:02:10] - Looking at optimizations...
[03:02:10] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[03:02:10] - Created dyn
[03:02:10] - Files status OK
[03:02:10] - Expanded 88631 -> 447307 (decompressed 504.6 percent)
[03:02:10] Called DecompressByteArray: compressed_data_size=88631 data_size=447307, decompressed_data_size=447307 diff=0
[03:02:10] - Digital signature verified
[03:02:10]
[03:02:10] Project: 10105 (Run 207, Clone 6, Gen 24)
[03:02:10]
[03:02:10] Assembly optimizations on if available.
[03:02:10] Entering M.D.
[03:02:16] Tpr hash work/wudata_08.tpr: 3862008505 1055726078 2712716885 661417887 1484102152
[03:02:16]
[03:02:16] Calling fah_main args: 14 usage=100
[03:02:16]
[03:02:17] Working on p10105_lambda_370K
[03:02:17] mdrun_gpu returned
[03:02:17] Self-test failure
[03:02:17]
[03:02:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:20] CoreStatus = 7A (122)
[03:02:20] Sending work to server
[03:02:20] Project: 10105 (Run 207, Clone 6, Gen 24)
[03:02:20] - Read packet limit of 540015616... Set to 524286976.
[03:02:20] - Error: Could not get length of results file work/wuresults_08.dat
[03:02:20] - Error: Could not read unit 08 file. Removing from queue.
[03:02:20] - Preparing to get new work unit...
[03:02:20] + Attempting to get work packet
[03:02:20] - Connecting to assignment server
[03:02:21] - Successful: assigned to (171.64.65.71).
[03:02:21] + News From Folding@Home: Welcome to Folding@Home
[03:02:21] Loaded queue successfully.
[03:02:23] + Closed connections
[03:02:28]
[03:02:28] + Processing work unit
[03:02:28] Core required: FahCore_11.exe
[03:02:28] Core found.
[03:02:28] Working on queue slot 09 [March 6 03:02:28 UTC]
[03:02:28] + Working ...
[03:02:28]
[03:02:28] *------------------------------*
[03:02:28] Folding@Home GPU Core
[03:02:28] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[03:02:28]
[03:02:28] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:02:28] Build host: amoeba
[03:02:28] Board Type: Nvidia
[03:02:28] Core :
[03:02:28] Preparing to commence simulation
[03:02:28] - Looking at optimizations...
[03:02:28] DeleteFrameFiles: successfully deleted file=work/wudata_09.ckp
[03:02:28] - Created dyn
[03:02:28] - Files status OK
[03:02:28] - Expanded 88631 -> 447307 (decompressed 504.6 percent)
[03:02:28] Called DecompressByteArray: compressed_data_size=88631 data_size=447307, decompressed_data_size=447307 diff=0
[03:02:28] - Digital signature verified
[03:02:28]
[03:02:28] Project: 10105 (Run 114, Clone 3, Gen 18)
[03:02:28]
[03:02:28] Assembly optimizations on if available.
[03:02:28] Entering M.D.
[03:02:34] Tpr hash work/wudata_09.tpr: 729998965 2964310235 3023001811 1329131895 2432733201
[03:02:34]
[03:02:34] Calling fah_main args: 14 usage=100
[03:02:34]
[03:02:34] Working on p10105_lambda_370K
[03:02:35] mdrun_gpu returned
[03:02:35] Self-test failure
[03:02:35]
[03:02:35] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:38] CoreStatus = 7A (122)
[03:02:38] Sending work to server
[03:02:38] Project: 10105 (Run 114, Clone 3, Gen 18)
[03:02:38] - Read packet limit of 540015616... Set to 524286976.
[03:02:38] - Error: Could not get length of results file work/wuresults_09.dat
[03:02:38] - Error: Could not read unit 09 file. Removing from queue.
[03:02:38] - Preparing to get new work unit...
[03:02:38] + Attempting to get work packet
[03:02:38] - Connecting to assignment server
[03:02:39] - Successful: assigned to (171.64.65.71).
[03:02:39] + News From Folding@Home: Welcome to Folding@Home
[03:02:39] Loaded queue successfully.
[03:02:40] + Closed connections
[03:02:45]
[03:02:45] + Processing work unit
[03:02:45] Core required: FahCore_11.exe
[03:02:45] Core found.
[03:02:45] Working on queue slot 00 [March 6 03:02:45 UTC]
[03:02:45] + Working ...
[03:02:45]
[03:02:45] *------------------------------*
[03:02:45] Folding@Home GPU Core
[03:02:45] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[03:02:45]
[03:02:45] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[03:02:45] Build host: amoeba
[03:02:45] Board Type: Nvidia
[03:02:45] Core :
[03:02:45] Preparing to commence simulation
[03:02:45] - Looking at optimizations...
[03:02:45] DeleteFrameFiles: successfully deleted file=work/wudata_00.ckp
[03:02:45] - Created dyn
[03:02:45] - Files status OK
[03:02:45] - Expanded 88717 -> 447307 (decompressed 504.1 percent)
[03:02:45] Called DecompressByteArray: compressed_data_size=88717 data_size=447307, decompressed_data_size=447307 diff=0
[03:02:45] - Digital signature verified
[03:02:45]
[03:02:45] Project: 10105 (Run 55, Clone 5, Gen 12)
[03:02:45]
[03:02:45] Assembly optimizations on if available.
[03:02:45] Entering M.D.
[03:02:52] Tpr hash work/wudata_00.tpr: 749661574 677877775 1568515253 260840342 3969723686
[03:02:52]
[03:02:52] Calling fah_main args: 14 usage=100
[03:02:52]
[03:02:52] Working on p10105_lambda_370K
[03:02:52] mdrun_gpu returned
[03:02:52] Self-test failure
[03:02:52]
[03:02:52] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:55] CoreStatus = 7A (122)
[03:02:55] Sending work to server
[03:02:55] Project: 10105 (Run 55, Clone 5, Gen 12)
[03:02:55] - Read packet limit of 540015616... Set to 524286976.
[03:02:55] - Error: Could not get length of results file work/wuresults_00.dat
[03:02:55] - Error: Could not read unit 00 file. Removing from queue.
[03:02:55] EUE limit exceeded. Pausing 24 hours.
XP Nvidia driver version: 196.21
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 5:44 am
by bruce
The message is pretty clear . . . it's not a specific WU, it's anything they send you. That's at least 7 consecutive failures on at least 5 different WUs.
You do have a serious problem of some sort. Perhaps your hardware has failed or you don't have enough stable power or you don't have enough cooling or you've got a bad set of drivers or unsustainable clock speeds or .... FAH expects you to have dependable hardware and the 24 hour pause is supposed to give you time to find and fix the problem before you discard another series of WUs.
Code: Select all
[00:25:47] Project: 5765 (Run 2, Clone 108, Gen 1430)
[00:25:54] Self-test failure
[00:25:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:25:57] CoreStatus = 7A (122)
[00:26:04] *------------------------------*
[00:26:04] Project: 5765 (Run 2, Clone 108, Gen 1430)
[00:26:11] Self-test failure
[00:26:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:26:15] CoreStatus = 7A (122)
[03:01:35] *------------------------------*
[03:01:35] Project: 10105 (Run 55, Clone 5, Gen 12)
[03:01:42] Self-test failure
[03:01:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:01:45] CoreStatus = 7A (122)
[03:01:53] *------------------------------*
[03:01:53] Project: 10105 (Run 44, Clone 9, Gen 14)
[03:02:00] Self-test failure
[03:02:00] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:03] CoreStatus = 7A (122)
[03:02:10] *------------------------------*
[03:02:10] Project: 10105 (Run 207, Clone 6, Gen 24)
[03:02:17] Self-test failure
[03:02:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:20] CoreStatus = 7A (122)
[03:02:28] *------------------------------*
[03:02:28] Project: 10105 (Run 114, Clone 3, Gen 18)
[03:02:35] Self-test failure
[03:02:35] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:38] CoreStatus = 7A (122)
[03:02:45] *------------------------------*
[03:02:45] Project: 10105 (Run 55, Clone 5, Gen 12)
[03:02:52] Self-test failure
[03:02:52] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:02:55] CoreStatus = 7A (122)
[03:02:55] EUE limit exceeded. Pausing 24 hours.
Four of the five have been reassigned and successfully completed by someone else.
WU (P10105 R55 C5 G12) was added to the stats database on 2010-03-05 13:02:30 for 548 points of credit.
WU (P10105 R44 C9 G14) was added to the stats database on 2010-03-05 14:02:39 for 548 points of credit.
WU (P10105 R207 C6 G24) was added to the stats database on 2010-03-05 15:02:46 for 548 points of credit.
WU (P10105 R114 C3 G18) was added to the stats database on 2010-03-05 15:02:46 for 548 points of credit.
EDIT:
It doesn't surprise me to report that the other WU has been successfully completed by someone else.
I just took longer to be returned.
WU (P5765 R2 C108 G1430) was added to the stats database on 2010-03-06 06:03:22 for 353 points of credit.
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 11:09 am
by toTOW
Running MemtestG80 on this board is a good idea ...
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 11:46 am
by larryb63
bruce: if memtestg80 is reporting no errors...WHAT else can I use to test the card? I do not run anything on the system but F@H, and another GPU in the box is running just fine. The GPU in question here is the main display card, and it's giving me no indication of any problems.
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 1:14 pm
by bollix47
Any difference if you add the following to your arguments?
Re: P5765 Run 2, Clone 108, Gen 1430 EUE's
Posted: Sat Mar 06, 2010 4:29 pm
by Wrish
The constant immediate failures suggest a driver or configuration problem. Has the system ever run two GPU units at once without failing? If so, look at changes to the Nvidia control panel; possibly do a system rollback if you don't remember what changed.