Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Moderators: Site Moderators, FAHC Science Team

Post Reply
iancook221188
Posts: 16
Joined: Tue Dec 15, 2009 5:29 pm

Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Post by iancook221188 »

im having problems with this work unit i think it gone Bad gtx460 keep getting this work unit back and it eue at 1% getting an A7 corestatus

Code: Select all

[09:25:13] *------------------------------*
[09:25:13] Folding@Home GPU Core
[09:25:13] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[09:25:13] 
[09:25:13] Build host: SimbiosNvdWin7
[09:25:13] Board Type: NVIDIA/CUDA
[09:25:13] Core      : x=15
[09:25:13]  Window's signal control handler registered.
[09:25:13] Preparing to commence simulation
[09:25:13] - Looking at optimizations...
[09:25:13] DeleteFrameFiles: successfully deleted file=work/wudata_07.ckp
[09:25:13] - Created dyn
[09:25:13] - Files status OK
[09:25:13] sizeof(CORE_PACKET_HDR) = 512 file=<>
[09:25:13] - Expanded 39020 -> 169787 (decompressed 435.1 percent)
[09:25:13] Called DecompressByteArray: compressed_data_size=39020 data_size=169787, decompressed_data_size=169787 diff=0
[09:25:13] - Digital signature verified
[09:25:13] 
[09:25:13] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:25:13] 
[09:25:13] Assembly optimizations on if available.
[09:25:13] Entering M.D.
[09:25:15] Tpr hash work/wudata_07.tpr:  3290305804 2565796240 2960336733 638709811 1462519941
[09:25:15] Working on PEPTIDE (1-42)
[09:25:15] Client config found, loading data.
[09:25:15] Starting GUI Server
[09:25:15] Setting checkpoint frequency: 500000
[09:25:15] Setting checkpoint frequency: 500000
[09:26:54] Completed    500000 out of 50000000 steps (1%).
[09:26:54] mdrun_gpu returned 52
[09:26:54] NANs detected on GPU
[09:26:54] 
[09:26:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[09:26:58] CoreStatus = 7A (122)
[09:26:58] Sending work to server
[09:26:58] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:26:58] - Error: Could not get length of results file work/wuresults_07.dat
[09:26:58] - Error: Could not read unit 07 file. Removing from queue.
[09:26:58] - Preparing to get new work unit...
[09:26:58] Cleaning up work directory
[09:26:58] + Attempting to get work packet
[09:26:58] Passkey found
[09:26:58] Gpu type=3 species=30.
[09:26:58] - Connecting to assignment server
[09:26:59] - Successful: assigned to (171.64.65.64).
[09:26:59] + News From Folding@Home: Welcome to Folding@Home
[09:26:59] Loaded queue successfully.
[09:26:59] Gpu type=3 species=30.
[09:27:01] + Closed connections
[09:27:06] 
[09:27:06] + Processing work unit
[09:27:06] Core required: FahCore_15.exe
[09:27:06] Core found.
[09:27:06] Working on queue slot 08 [March 24 09:27:06 UTC]
[09:27:06] + Working ...
[09:27:06] 
[09:27:06] *------------------------------*
[09:27:06] Folding@Home GPU Core
[09:27:06] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[09:27:06] 
[09:27:06] Build host: SimbiosNvdWin7
[09:27:06] Board Type: NVIDIA/CUDA
[09:27:06] Core      : x=15
[09:27:06]  Window's signal control handler registered.
[09:27:06] Preparing to commence simulation
[09:27:06] - Looking at optimizations...
[09:27:06] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[09:27:06] - Created dyn
[09:27:06] - Files status OK
[09:27:06] sizeof(CORE_PACKET_HDR) = 512 file=<>
[09:27:06] - Expanded 39020 -> 169787 (decompressed 435.1 percent)
[09:27:06] Called DecompressByteArray: compressed_data_size=39020 data_size=169787, decompressed_data_size=169787 diff=0
[09:27:06] - Digital signature verified
[09:27:06] 
[09:27:06] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:27:06] 
[09:27:06] Assembly optimizations on if available.
[09:27:06] Entering M.D.
[09:27:08] Tpr hash work/wudata_08.tpr:  3290305804 2565796240 2960336733 638709811 1462519941
[09:27:08] Working on PEPTIDE (1-42)
[09:27:08] Client config found, loading data.
[09:27:08] Starting GUI Server
[09:27:08] Setting checkpoint frequency: 500000
[09:27:08] Setting checkpoint frequency: 500000
[09:28:47] Completed    500000 out of 50000000 steps (1%).
[09:28:48] mdrun_gpu returned 52
[09:28:48] NANs detected on GPU
[09:28:48] 
[09:28:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[09:28:51] CoreStatus = 7A (122)
[09:28:51] Sending work to server
[09:28:51] Project: 6800 (Run 7915, Clone 0, Gen 37)
[09:28:51] - Error: Could not get length of results file work/wuresults_08.dat
[09:28:51] - Error: Could not read unit 08 file. Removing from queue.
[09:28:51] - Preparing to get new work unit...
[09:28:51] Cleaning up work directory
[09:28:51] + Attempting to get work packet
[09:28:51] Passkey found
[09:28:51] Gpu type=3 species=30.
[09:28:51] - Connecting to assignment server
[09:28:51] - Successful: assigned to (171.64.65.64).
[09:28:51] + News From Folding@Home: Welcome to Folding@Home
[09:28:52] Loaded queue successfully.
[09:28:52] Gpu type=3 species=30.
[09:28:53] + Closed connections
[09:28:58] 
[09:28:58] + Processing work unit
[09:28:58] Core required: FahCore_15.exe
[09:28:58] Core found.
[09:28:58] Working on queue slot 09 [March 24 09:28:58 UTC]
[09:28:58] + Working ...
[09:28:58] 
[09:28:58] *------------------------------*
Last edited by bruce on Fri Mar 25, 2011 12:54 am, edited 1 time in total.
Reason: Added [code] tags
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Post by bruce »

A known limitation of V6 is that when you get an EUE that's followed by a message: "...Removing from queue." the server will reassign the same WU. [There's no word on whether this problem is fixed in V7 or not.]

When the same WU fails repeatedly in the same way it's logical to assume it might be a bad WU but we actually have no way of knowing whether your GPU is failing instead.

To get rid of that WU,
* Stop the client
* Delete queue.dat
* Reconfigure the client to use a different MachineID
* Restart

If the next few WUs and the previous several WUs are all completed, it was a bad WU. If you have troubles with several DIFFERENT WUs, it's your hardware.
iancook221188
Posts: 16
Joined: Tue Dec 15, 2009 5:29 pm

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Post by iancook221188 »

thx bruce it been a while sins ive had to dump a work unit nearly forgot 8-)
Image
Eno
Posts: 13
Joined: Sat Jan 01, 2011 3:27 am
Hardware configuration: 980X / 920 / 930.
4.4 / 4.2 / 3.8

133 x 33 / 200 x 21 / 180 x 21

P6TDv2 / P6T / Rampage 2G
Location: Sidney, BC / Fort McMurray, AB
Contact:

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Post by Eno »

I recently took on this work unit and also was getting constant errors on it... Thanks for the tip bruce- that was the step I was missing because it kept reloading the same one.

I think it's safe to assume there's something wrong with the WU.
Ian "Eno" McLeod
Folding for team 11108
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6800 (Run 7915, Clone 0, Gen 37) EUE limit

Post by bruce »

I've reported the WU (P6800,R7915,C0,G37) as a bad WU.
Post Reply