Page 1 of 1

Project: 5768 (Run 7, Clone 18, Gen 529)

Posted: Mon Nov 02, 2009 10:18 pm
by paulb39
Latest GPU Client, usually when I get unstable machine I quit Folding@Home and then open it again, and it works, but it won't get past this error, how can I fix this?

Code: Select all

# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\Paul\AppData\Roaming\Folding@home-gpu


[21:57:50] - Ask before connecting: No
[21:57:50] - User name: louieb39 (Team 9999)
[21:57:50] - User ID: 73AAB0F25E19FDA9
[21:57:50] - Machine ID: 2
[21:57:50] 
[21:57:50] Loaded queue successfully.
[21:57:50] Initialization complete
[21:57:50] - Preparing to get new work unit...
[21:57:50] + Attempting to get work packet
[21:57:50] - Connecting to assignment server
[21:57:51] - Successful: assigned to (171.67.108.11).
[21:57:51] + News From Folding@Home: Welcome to Folding@Home
[21:57:51] Loaded queue successfully.
[21:57:52] + Closed connections
[21:57:52] 
[21:57:52] + Processing work unit
[21:57:52] Core required: FahCore_11.exe
[21:57:52] Core found.
[21:57:52] Working on queue slot 03 [November 2 21:57:52 UTC]
[21:57:52] + Working ...
[21:57:52] 
[21:57:52] *------------------------------*
[21:57:52] Folding@Home GPU Core - Beta
[21:57:52] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:57:52] 
[21:57:52] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:57:52] Build host: amoeba
[21:57:52] Board Type: Nvidia
[21:57:52] Core      : 
[21:57:52] Preparing to commence simulation
[21:57:52] - Looking at optimizations...
[21:57:52] - Created dyn
[21:57:52] - Files status OK
[21:57:52] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:57:52] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:57:52] - Digital signature verified
[21:57:52] 
[21:57:52] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:57:52] 
[21:57:53] Assembly optimizations on if available.
[21:57:53] Entering M.D.
[21:58:00] Working on Protein
[21:58:04] Client config found, loading data.
[21:58:04] Starting GUI Server
[21:58:05] mdrun_gpu returned 
[21:58:05] NANs detected on GPU
[21:58:05] 
[21:58:05] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:58:08] CoreStatus = 7A (122)
[21:58:08] Sending work to server
[21:58:08] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:08] - Error: Could not get length of results file work/wuresults_03.dat
[21:58:08] - Error: Could not read unit 03 file. Removing from queue.
[21:58:08] - Preparing to get new work unit...
[21:58:08] + Attempting to get work packet
[21:58:08] - Connecting to assignment server
[21:58:09] - Successful: assigned to (171.67.108.11).
[21:58:09] + News From Folding@Home: Welcome to Folding@Home
[21:58:09] Loaded queue successfully.
[21:58:10] + Closed connections
[21:58:15] 
[21:58:15] + Processing work unit
[21:58:15] Core required: FahCore_11.exe
[21:58:15] Core found.
[21:58:15] Working on queue slot 04 [November 2 21:58:15 UTC]
[21:58:15] + Working ...
[21:58:15] 
[21:58:15] *------------------------------*
[21:58:15] Folding@Home GPU Core - Beta
[21:58:15] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:58:15] 
[21:58:15] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:58:15] Build host: amoeba
[21:58:15] Board Type: Nvidia
[21:58:15] Core      : 
[21:58:15] Preparing to commence simulation
[21:58:15] - Looking at optimizations...
[21:58:15] - Created dyn
[21:58:15] - Files status OK
[21:58:15] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:58:15] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:58:15] - Digital signature verified
[21:58:15] 
[21:58:15] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:15] 
[21:58:16] Assembly optimizations on if available.
[21:58:16] Entering M.D.
[21:58:23] Working on Protein
[21:58:27] Client config found, loading data.
[21:58:27] Starting GUI Server
[21:58:28] mdrun_gpu returned 
[21:58:28] NANs detected on GPU
[21:58:28] 
[21:58:28] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:58:31] CoreStatus = 7A (122)
[21:58:31] Sending work to server
[21:58:31] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:31] - Error: Could not get length of results file work/wuresults_04.dat
[21:58:31] - Error: Could not read unit 04 file. Removing from queue.
[21:58:31] - Preparing to get new work unit...
[21:58:31] + Attempting to get work packet
[21:58:31] - Connecting to assignment server
[21:58:32] - Successful: assigned to (171.67.108.11).
[21:58:32] + News From Folding@Home: Welcome to Folding@Home
[21:58:32] Loaded queue successfully.
[21:58:33] + Closed connections
[21:58:38] 
[21:58:38] + Processing work unit
[21:58:38] Core required: FahCore_11.exe
[21:58:38] Core found.
[21:58:38] Working on queue slot 05 [November 2 21:58:38 UTC]
[21:58:38] + Working ...
[21:58:38] 
[21:58:38] *------------------------------*
[21:58:38] Folding@Home GPU Core - Beta
[21:58:38] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:58:38] 
[21:58:38] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:58:38] Build host: amoeba
[21:58:38] Board Type: Nvidia
[21:58:38] Core      : 
[21:58:38] Preparing to commence simulation
[21:58:38] - Looking at optimizations...
[21:58:38] - Created dyn
[21:58:38] - Files status OK
[21:58:38] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:58:38] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:58:38] - Digital signature verified
[21:58:38] 
[21:58:38] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:38] 
[21:58:39] Assembly optimizations on if available.
[21:58:39] Entering M.D.
[21:58:46] Working on Protein
[21:58:50] Client config found, loading data.
[21:58:50] Starting GUI Server
[21:58:51] mdrun_gpu returned 
[21:58:51] NANs detected on GPU
[21:58:51] 
[21:58:51] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:58:54] CoreStatus = 7A (122)
[21:58:54] Sending work to server
[21:58:54] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:54] - Error: Could not get length of results file work/wuresults_05.dat
[21:58:54] - Error: Could not read unit 05 file. Removing from queue.
[21:58:54] - Preparing to get new work unit...
[21:58:54] + Attempting to get work packet
[21:58:54] - Connecting to assignment server
[21:58:54] - Successful: assigned to (171.67.108.11).
[21:58:54] + News From Folding@Home: Welcome to Folding@Home
[21:58:55] Loaded queue successfully.
[21:58:56] + Closed connections
[21:59:01] 
[21:59:01] + Processing work unit
[21:59:01] Core required: FahCore_11.exe
[21:59:01] Core found.
[21:59:01] Working on queue slot 06 [November 2 21:59:01 UTC]
[21:59:01] + Working ...
[21:59:01] 
[21:59:01] *------------------------------*
[21:59:01] Folding@Home GPU Core - Beta
[21:59:01] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:59:01] 
[21:59:01] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:59:01] Build host: amoeba
[21:59:01] Board Type: Nvidia
[21:59:01] Core      : 
[21:59:01] Preparing to commence simulation
[21:59:01] - Looking at optimizations...
[21:59:01] - Created dyn
[21:59:01] - Files status OK
[21:59:01] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:59:01] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:59:01] - Digital signature verified
[21:59:01] 
[21:59:01] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:01] 
[21:59:02] Assembly optimizations on if available.
[21:59:02] Entering M.D.
[21:59:09] Working on Protein
[21:59:14] Client config found, loading data.
[21:59:14] mdrun_gpu returned 
[21:59:14] NANs detected on GPU
[21:59:14] 
[21:59:14] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:59:17] CoreStatus = 7A (122)
[21:59:17] Sending work to server
[21:59:17] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:17] - Error: Could not get length of results file work/wuresults_06.dat
[21:59:17] - Error: Could not read unit 06 file. Removing from queue.
[21:59:17] - Preparing to get new work unit...
[21:59:17] + Attempting to get work packet
[21:59:17] - Connecting to assignment server
[21:59:17] - Successful: assigned to (171.67.108.11).
[21:59:17] + News From Folding@Home: Welcome to Folding@Home
[21:59:17] Loaded queue successfully.
[21:59:18] + Closed connections
[21:59:23] 
[21:59:23] + Processing work unit
[21:59:23] Core required: FahCore_11.exe
[21:59:23] Core found.
[21:59:23] Working on queue slot 07 [November 2 21:59:23 UTC]
[21:59:23] + Working ...
[21:59:23] 
[21:59:23] *------------------------------*
[21:59:23] Folding@Home GPU Core - Beta
[21:59:23] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:59:23] 
[21:59:23] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:59:23] Build host: amoeba
[21:59:23] Board Type: Nvidia
[21:59:23] Core      : 
[21:59:23] Preparing to commence simulation
[21:59:23] - Looking at optimizations...
[21:59:23] - Created dyn
[21:59:23] - Files status OK
[21:59:23] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:59:23] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:59:23] - Digital signature verified
[21:59:23] 
[21:59:23] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:23] 
[21:59:24] Assembly optimizations on if available.
[21:59:24] Entering M.D.
[21:59:31] Working on Protein
[21:59:35] Client config found, loading data.
[21:59:35] Starting GUI Server
[21:59:36] mdrun_gpu returned 
[21:59:36] NANs detected on GPU
[21:59:36] 
[21:59:36] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:59:40] CoreStatus = 7A (122)
[21:59:40] Sending work to server
[21:59:40] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:40] - Error: Could not get length of results file work/wuresults_07.dat
[21:59:40] - Error: Could not read unit 07 file. Removing from queue.
[21:59:40] EUE limit exceeded. Pausing 24 hours.

Re: Unstable Machine Loop

Posted: Tue Nov 03, 2009 12:30 am
by OlivierZ
Stop the gpu client.
In the directory where you have your gpu client :

Delete :
queue.dat
unitinfo.txt
work directory

Edit client.cfg (if possible with an editor like UltraEdit) :
change the value of machineid

Re: Unstable Machine Loop

Posted: Tue Nov 03, 2009 12:45 am
by paulb39
OlivierZ wrote:Stop the gpu client.
In the directory where you have your gpu client :

Delete :
queue.dat
unitinfo.txt
work directory

Edit client.cfg (if possible with an editor like UltraEdit) :
change the value of machineid
Thank you, its working now. Can any one explain why I needed to do that? What made it stop working? And does this mean all the work I did is invalid now?

Re: Unstable Machine Loop

Posted: Tue Nov 03, 2009 3:27 am
by bruce
paulb39 wrote:Thank you, its working now. Can any one explain why I needed to do that? What made it stop working? And does this mean all the work I did is invalid now?
The "unstable machine" error is designed to recognize hardware that is either defective or sufficiently overclocked or overheated that it can no longer make accurate calculations. Unfortunately I have seen situations where the WU itself is defective and it triggers the same error message. I have no way of knowing whether it's the WU or it's your hardware.

This error SHOULD stop your machine for 24 hours, and it's not doing that. The Work Server reassigned assumes (incorrectly) that the WU was corrupted during the download so it just re-sends the same WU. Hopefully this problem will be fixed either in the new client code or in the new server code.

In the portion of the log that you posted, no work was done, so the question about whether it was invalid or not doesn't really have an answer. If you want to dig up a log where some work was done, we can comment on what that log says.

You did successfully complete p5770 r13 c319 g1091 at 2009-11-01 18:18:43 (Stanford time)

Re: Unstable Machine Loop

Posted: Tue Nov 03, 2009 8:40 pm
by paulb39
bruce wrote:
paulb39 wrote:Thank you, its working now. Can any one explain why I needed to do that? What made it stop working? And does this mean all the work I did is invalid now?
The "unstable machine" error is designed to recognize hardware that is either defective or sufficiently overclocked or overheated that it can no longer make accurate calculations. Unfortunately I have seen situations where the WU itself is defective and it triggers the same error message. I have no way of knowing whether it's the WU or it's your hardware.

This error SHOULD stop your machine for 24 hours, and it's not doing that. The Work Server reassigned assumes (incorrectly) that the WU was corrupted during the download so it just re-sends the same WU. Hopefully this problem will be fixed either in the new client code or in the new server code.

In the portion of the log that you posted, no work was done, so the question about whether it was invalid or not doesn't really have an answer. If you want to dig up a log where some work was done, we can comment on what that log says.

You did successfully complete p5770 r13 c319 g1091 at 2009-11-01 18:18:43 (Stanford time)
Thanks for the info, sadly the log (status/log) doesn't go far back enough where work was successfully done.

Re: Project: 5768 (Run 7, Clone 18, Gen 529)

Posted: Tue Nov 03, 2009 9:57 pm
by OlivierZ
Try reading directly the following files in the directory of your gpu client :
FAHlog.txt
FAHlog-Prev.txt

Re: Project: 5768 (Run 7, Clone 18, Gen 529)

Posted: Wed Nov 04, 2009 2:16 am
by paulb39
OlivierZ wrote:Try reading directly the following files in the directory of your gpu client :
FAHlog.txt
FAHlog-Prev.txt

Both only go to Nov 2, doesn't show work that was successfully done.

Re: Project: 5768 (Run 7, Clone 18, Gen 529)

Posted: Sun Nov 08, 2009 3:14 pm
by paulb39
Bumping because after a couple of days its doing the same thing.

Can some one tell me why this is happening?

Posted my log in a google doc since the forums says its to many characters

http://docs.google.com/View?id=dfp79536_27hgf7qwm4