Page 1 of 1

Project: 4743 (Run 9, Clone 902, Gen 11)

Posted: Fri Jan 30, 2009 2:05 am
by blisk
I believe its a problem with the work unit I'm being assigned. I had just successfully sent and competed a work unit, then I recieved this work unit: p4743_lam5w_300K

ever since then my log has been filled with just getting the p4743_lam5w_300K work unit, and then getting the unstable machine error, here's my log:

Code: Select all

[01:38:39] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:38:39] 
[01:38:39] Assembly optimizations on if available.
[01:38:39] Entering M.D.
[01:38:45] Working on p4743_lam5w_300K
[01:38:46] Client config found, loading data.
[01:38:46] Starting GUI Server
[01:38:50] mdrun_gpu returned 
[01:38:50] NANs detected on GPU
[01:38:50] 
[01:38:50] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:38:53] CoreStatus = 7A (122)
[01:38:53] Sending work to server
[01:38:53] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:38:53] - Read packet limit of 540015616... Set to 524286976.
[01:38:53] - Error: Could not get length of results file work/wuresults_04.dat
[01:38:53] - Error: Could not read unit 04 file. Removing from queue.
[01:38:53] - Preparing to get new work unit...
[01:38:53] + Attempting to get work packet
[01:38:53] - Connecting to assignment server
[01:38:53] - Successful: assigned to (171.64.65.103).
[01:38:53] + News From Folding@Home: GPU folding beta
[01:38:54] Loaded queue successfully.
[01:38:55] + Closed connections
[01:39:00] 
[01:39:00] + Processing work unit
[01:39:00] Core required: FahCore_11.exe
[01:39:00] Core found.
[01:39:00] Working on queue slot 05 [January 30 01:39:00 UTC]
[01:39:00] + Working ...
[01:39:00] 
[01:39:00] *------------------------------*
[01:39:00] Folding@Home GPU Core - Beta
[01:39:00] Version 1.22 (Mon Dec 8 12:57:56 PST 2008)
[01:39:00] 
[01:39:00] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[01:39:00] Build host: amoeba
[01:39:00] Board Type: AMD
[01:39:00] Core      : 
[01:39:00] Preparing to commence simulation
[01:39:00] - Looking at optimizations...
[01:39:00] - Created dyn
[01:39:00] - Files status OK
[01:39:00] - Expanded 88298 -> 447304 (decompressed 506.5 percent)
[01:39:00] Called DecompressByteArray: compressed_data_size=88298 data_size=447304, decompressed_data_size=447304 diff=0
[01:39:00] - Digital signature verified
[01:39:00] 
[01:39:00] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:39:00] 
[01:39:00] Assembly optimizations on if available.
[01:39:00] Entering M.D.
[01:39:06] Working on p4743_lam5w_300K
[01:39:06] Client config found, loading data.
[01:39:07] Starting GUI Server
[01:39:11] mdrun_gpu returned 
[01:39:11] NANs detected on GPU
[01:39:11] 
[01:39:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:39:16] CoreStatus = 7A (122)
[01:39:16] Sending work to server
[01:39:16] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:39:16] - Read packet limit of 540015616... Set to 524286976.
[01:39:16] - Error: Could not get length of results file work/wuresults_05.dat
[01:39:16] - Error: Could not read unit 05 file. Removing from queue.
[01:39:16] - Preparing to get new work unit...
[01:39:16] + Attempting to get work packet
[01:39:16] - Connecting to assignment server
[01:39:16] - Successful: assigned to (171.64.65.103).
[01:39:16] + News From Folding@Home: GPU folding beta
[01:39:17] Loaded queue successfully.
[01:39:18] + Closed connections
EDIT: Just noticed this on another post:
"If your GPU has never given you problems running FAH, it could simply be a bad WU. Delete the work files, queue.dat, and unitinfo.txt until you get a different WU. If you continue to get problems, please post your log file."

Unfortunately it doesn't work.

Re: Issue with "UNSTABLE_MACHINE"

Posted: Fri Jan 30, 2009 2:48 am
by DanGe
I noticed the server is assigning you the same WU (same project, run, clone, gen numbers). The assignment servers normally reassign the same WUs 3-4 times when your WU fails. Since this *might* be a bad WU, you have to pretty much delete the work folder, queue.dat, and unitinfo.txt files a few times before you get a different WU.

If your GPU continues to fail on different WUs, we will need to have more specifics, such as hardware specs, whether you overclocked your GPU, and OS.

Re: Issue with "UNSTABLE_MACHINE"

Posted: Fri Jan 30, 2009 4:15 am
by blisk
It finally ended up giving me a good one, I had given up and just checked now after a couple hours and its a new one.. working fine.

How often do these bad WU's come up? How is it finally determined that its a bad work unit?

Re: "unstable_machine" in log file

Posted: Fri Jan 30, 2009 5:26 am
by DanGe
Glad to hear it's working again :)

Bad WUs do not come up too often. With WUs for beta clients like GPU, though, they do come up in a *slightly* higher frequency, I suppose. We usually finally determine a WU is a bad one if we find that the WU stops at nearly the same place for many people.

If you suspect your WU is a bad one, post in the Issues with a Specific WU section of the forum (viewforum.php?f=19).

To the mods: I think this thread should be moved to the aforementioned forum since this might be a bad WU.

Re: Project: 4743 (Run 9, Clone 902, Gen 11)

Posted: Fri Jan 30, 2009 10:48 am
by bruce
Thread moved and title changed.

Project: 4743 (Run 9, Clone 902, Gen 11) has been successfully completed by someone, so it's not really a bad WU.