Page 1 of 1

Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Posted: Thu Mar 24, 2011 6:27 pm
by iancook221188
im having problems with this work unit i think it gone Bad gtx460 keep getting this work unit back and it eue at 1% getting an A7 corestatus

Code: Select all

[09:25:13] *------------------------------*
[09:25:13] Folding@Home GPU Core
[09:25:13] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[09:25:13] 
[09:25:13] Build host: SimbiosNvdWin7
[09:25:13] Board Type: NVIDIA/CUDA
[09:25:13] Core      : x=15
[09:25:13]  Window's signal control handler registered.
[09:25:13] Preparing to commence simulation
[09:25:13] - Looking at optimizations...
[09:25:13] DeleteFrameFiles: successfully deleted file=work/wudata_07.ckp
[09:25:13] - Created dyn
[09:25:13] - Files status OK
[09:25:13] sizeof(CORE_PACKET_HDR) = 512 file=<>
[09:25:13] - Expanded 39020 -> 169787 (decompressed 435.1 percent)
[09:25:13] Called DecompressByteArray: compressed_data_size=39020 data_size=169787, decompressed_data_size=169787 diff=0
[09:25:13] - Digital signature verified
[09:25:13] 
[09:25:13] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:25:13] 
[09:25:13] Assembly optimizations on if available.
[09:25:13] Entering M.D.
[09:25:15] Tpr hash work/wudata_07.tpr:  3290305804 2565796240 2960336733 638709811 1462519941
[09:25:15] Working on PEPTIDE (1-42)
[09:25:15] Client config found, loading data.
[09:25:15] Starting GUI Server
[09:25:15] Setting checkpoint frequency: 500000
[09:25:15] Setting checkpoint frequency: 500000
[09:26:54] Completed    500000 out of 50000000 steps (1%).
[09:26:54] mdrun_gpu returned 52
[09:26:54] NANs detected on GPU
[09:26:54] 
[09:26:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[09:26:58] CoreStatus = 7A (122)
[09:26:58] Sending work to server
[09:26:58] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:26:58] - Error: Could not get length of results file work/wuresults_07.dat
[09:26:58] - Error: Could not read unit 07 file. Removing from queue.
[09:26:58] - Preparing to get new work unit...
[09:26:58] Cleaning up work directory
[09:26:58] + Attempting to get work packet
[09:26:58] Passkey found
[09:26:58] Gpu type=3 species=30.
[09:26:58] - Connecting to assignment server
[09:26:59] - Successful: assigned to (171.64.65.64).
[09:26:59] + News From Folding@Home: Welcome to Folding@Home
[09:26:59] Loaded queue successfully.
[09:26:59] Gpu type=3 species=30.
[09:27:01] + Closed connections
[09:27:06] 
[09:27:06] + Processing work unit
[09:27:06] Core required: FahCore_15.exe
[09:27:06] Core found.
[09:27:06] Working on queue slot 08 [March 24 09:27:06 UTC]
[09:27:06] + Working ...
[09:27:06] 
[09:27:06] *------------------------------*
[09:27:06] Folding@Home GPU Core
[09:27:06] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[09:27:06] 
[09:27:06] Build host: SimbiosNvdWin7
[09:27:06] Board Type: NVIDIA/CUDA
[09:27:06] Core      : x=15
[09:27:06]  Window's signal control handler registered.
[09:27:06] Preparing to commence simulation
[09:27:06] - Looking at optimizations...
[09:27:06] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[09:27:06] - Created dyn
[09:27:06] - Files status OK
[09:27:06] sizeof(CORE_PACKET_HDR) = 512 file=<>
[09:27:06] - Expanded 39020 -> 169787 (decompressed 435.1 percent)
[09:27:06] Called DecompressByteArray: compressed_data_size=39020 data_size=169787, decompressed_data_size=169787 diff=0
[09:27:06] - Digital signature verified
[09:27:06] 
[09:27:06] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:27:06] 
[09:27:06] Assembly optimizations on if available.
[09:27:06] Entering M.D.
[09:27:08] Tpr hash work/wudata_08.tpr:  3290305804 2565796240 2960336733 638709811 1462519941
[09:27:08] Working on PEPTIDE (1-42)
[09:27:08] Client config found, loading data.
[09:27:08] Starting GUI Server
[09:27:08] Setting checkpoint frequency: 500000
[09:27:08] Setting checkpoint frequency: 500000
[09:28:47] Completed    500000 out of 50000000 steps (1%).
[09:28:48] mdrun_gpu returned 52
[09:28:48] NANs detected on GPU
[09:28:48] 
[09:28:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[09:28:51] CoreStatus = 7A (122)
[09:28:51] Sending work to server
[09:28:51] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:28:51] - Error: Could not get length of results file work/wuresults_08.dat
[09:28:51] - Error: Could not read unit 08 file. Removing from queue.
[09:28:51] - Preparing to get new work unit...
[09:28:51] Cleaning up work directory
[09:28:51] + Attempting to get work packet
[09:28:51] Passkey found
[09:28:51] Gpu type=3 species=30.
[09:28:51] - Connecting to assignment server
[09:28:51] - Successful: assigned to (171.64.65.64).
[09:28:51] + News From Folding@Home: Welcome to Folding@Home
[09:28:52] Loaded queue successfully.
[09:28:52] Gpu type=3 species=30.
[09:28:53] + Closed connections
[09:28:58] 
[09:28:58] + Processing work unit
[09:28:58] Core required: FahCore_15.exe
[09:28:58] Core found.
[09:28:58] Working on queue slot 09 [March 24 09:28:58 UTC]
[09:28:58] + Working ...
[09:28:58] 
[09:28:58] *------------------------------*

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Posted: Fri Mar 25, 2011 12:52 am
by bruce
A known limitation of V6 is that when you get an EUE that's followed by a message: "...Removing from queue." the server will reassign the same WU. [There's no word on whether this problem is fixed in V7 or not.]

When the same WU fails repeatedly in the same way it's logical to assume it might be a bad WU but we actually have no way of knowing whether your GPU is failing instead.

To get rid of that WU,
* Stop the client
* Delete queue.dat
* Reconfigure the client to use a different MachineID
* Restart

If the next few WUs and the previous several WUs are all completed, it was a bad WU. If you have troubles with several DIFFERENT WUs, it's your hardware.

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Posted: Fri Mar 25, 2011 11:09 am
by iancook221188
thx bruce it been a while sins ive had to dump a work unit nearly forgot 8-)

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Posted: Sat Apr 09, 2011 10:21 pm
by Eno
I recently took on this work unit and also was getting constant errors on it... Thanks for the tip bruce- that was the step I was missing because it kept reloading the same one.

I think it's safe to assume there's something wrong with the WU.

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Posted: Sun Apr 10, 2011 4:53 am
by bruce
I've reported the WU (P6800,R7915,C0,G37) as a bad WU.