Page 1 of 1
Project 5767 Multiple run clone gen's
Posted: Wed Dec 24, 2008 3:34 pm
by MtM
Originally Posted by Wesleynator
I got one of the new WU's but it EUE'd after 16%. Anyone else have an EUE problem with one of these? My 9800GT had been running the 511 point WU's with no problem.
01:43:13] Project: 5766 (Run 4, Clone 4, Gen 0)
[01:43:13]
[01:43:13] Assembly optimizations on if available.
[01:43:13] Entering M.D.
[01:43:20] Working on Protein
[01:43:20] Client config found, loading data.
[01:43:21] Starting GUI Server
[01:44:14] Completed 1%
[01:45:06] Completed 2%
[01:45:57] Completed 3%
[01:46:49] Completed 4%
[01:47:41] Completed 5%
[01:48:32] Completed 6%
[01:49:24] Completed 7%
[01:50:16] Completed 8%
[01:51:07] Completed 9%
[01:51:59] Completed 10%
[01:52:51] Completed 11%
[01:53:42] Completed 12%
[01:54:34] Completed 13%
[01:55:26] Completed 14%
[01:56:18] Completed 15%
[01:57:09] Completed 16%
[01:57:17] Run: exception thrown during GuardedRun
[01:57:17] Run: exception thrown in GuardedRun -- Gromacs cannot continue further.
[01:57:17] Going to send back what have done -- stepsTotalG=15000000
[01:57:17] Work fraction=0.1616 steps=15000000.
[01:57:22] logfile size=0 infoLength=0 edr=0 trr=23
[01:57:22] - Writing 642 bytes of core data to disk...
[01:57:22] Done: 130 -> 128 (compressed to 98.4 percent)
[01:57:22] ... Done.
[01:57:22]
[01:57:22] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:57:25] CoreStatus = 7A (122)
[01:57:25] Sending work to server
[01:57:25] Project: 5766 (Run 4, Clone 4, Gen 0)
Same on one of my 8800GT's
Quote:
[11:47:18] Folding@home Core Shutdown: UNSTABLE_MACHINE
[11:47:20] CoreStatus = 7A (122)
[11:47:20] Sending work to server
[11:47:20] Project: 5767 (Run 11, Clone 29, Gen 5)
[11:47:20] - Error: Could not get length of results file work/wuresults_06.dat
[11:47:20] - Error: Could not read unit 06 file. Removing from queue.
[11:47:20] EUE limit exceeded. Pausing 24 hours.
As posted on eoc forum. Asked the last poster which other eue's he had since the pause doesn't happen from one eue alone
Re: Project 5767 Multiple run clone gen's
Posted: Wed Dec 24, 2008 4:21 pm
by Razor_FX_II
After the error, GPU2 client just stops so I exit and delete the log's and que files and restart it to get it running again. So atm I dont have the others.
The next batch I get I'll copy up to the forum.
So far this last error was on Vista32, 178.24 drivers, 8800 GTS (G92) 512mb that has folded over 2k work units with no probs.
I had one error this morning a 353 point work unit on my main folding rig - Vista64, 181.00 drivers, GTX 260 that has folded over 2k work units with no probs.
Code: Select all
[13:26:06] Folding@Home GPU Core - Beta
[13:26:06] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[13:26:06]
[13:26:06] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[13:26:06] Build host: amoeba
[13:26:06] Board Type: Nvidia
[13:26:06] Core :
[13:26:06] Preparing to commence simulation
[13:26:06] - Looking at optimizations...
[13:26:06] - Created dyn
[13:26:06] - Files status OK
[13:26:06] - Expanded 43942 -> 252912 (decompressed 575.5 percent)
[13:26:06] Called DecompressByteArray: compressed_data_size=43942 data_size=252912, decompressed_data_size=252912 diff=0
[13:26:06] - Digital signature verified
[13:26:06]
[13:26:06] Project: 5766 (Run 2, Clone 63, Gen 0)
[13:26:06]
[13:26:06] Assembly optimizations on if available.
[13:26:06] Entering M.D.
[13:26:13] Working on Protein
[13:26:13] Client config found, loading data.
[13:26:13] mdrun_gpu returned
[13:26:13] NANs detected on GPU
[13:26:13]
[13:26:13] Folding@home Core Shutdown: UNSTABLE_MACHINE
[13:26:17] CoreStatus = 7A (122)
[13:26:17] Sending work to server
[13:26:17] Project: 5766 (Run 2, Clone 63, Gen 0)
[13:26:17] - Read packet limit of 540015616... Set to 524286976.
[13:26:17] - Error: Could not get length of results file work/wuresults_03.dat
[13:26:17] - Error: Could not read unit 03 file. Removing from queue.
[13:26:17] EUE limit exceeded. Pausing 24 hours.
[13:56:05] - Autosending finished units... [December 24 13:56:05 UTC]
[13:56:05] Trying to send all finished work units
[13:56:05] + No unsent completed units remaining.
[13:56:05] - Autosend completed
What is "[13:26:13] NANs detected on GPU"?
Re: Project 5767 Multiple run clone gen's
Posted: Wed Dec 24, 2008 5:46 pm
by MtM
NAN = Not A Number. Depends on the coding language being used what it means, wiki has an article about it
http://en.wikipedia.org/wiki/NaN
Re: Project 5767 Multiple run clone gen's
Posted: Wed Dec 24, 2008 9:59 pm
by Razor_FX_II
Just had this on Vista64 with 181.00 drivers GTX 260.
Code: Select all
[21:27:39]
[21:27:39] *------------------------------*
[21:27:39] Folding@Home GPU Core - Beta
[21:27:39] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:27:39]
[21:27:39] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[21:27:39] Build host: amoeba
[21:27:39] Board Type: Nvidia
[21:27:39] Core :
[21:27:39] Preparing to commence simulation
[21:27:39] - Looking at optimizations...
[21:27:39] - Created dyn
[21:27:39] - Files status OK
[21:27:39] - Expanded 46678 -> 252912 (decompressed 541.8 percent)
[21:27:39] Called DecompressByteArray: compressed_data_size=46678 data_size=252912, decompressed_data_size=252912 diff=0
[21:27:39] - Digital signature verified
[21:27:39]
[21:27:39] Project: 5768 (Run 8, Clone 47, Gen 8)
[21:27:39]
[21:27:39] Assembly optimizations on if available.
[21:27:39] Entering M.D.
[21:27:46] Working on Protein
[21:27:46] Client config found, loading data.
[21:27:46] mdrun_gpu returned
[21:27:46] NANs detected on GPU
[21:27:46]
[21:27:46] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:27:49] CoreStatus = 7A (122)
[21:27:49] Sending work to server
[21:27:49] Project: 5768 (Run 8, Clone 47, Gen 8)
[21:27:49] - Read packet limit of 540015616... Set to 524286976.
[21:27:49] - Error: Could not get length of results file work/wuresults_04.dat
[21:27:49] - Error: Could not read unit 04 file. Removing from queue.
[21:27:49] EUE limit exceeded. Pausing 24 hours.
Re: Project 5767 Multiple run clone gen's
Posted: Thu Dec 25, 2008 12:47 am
by Razor_FX_II
EUE on Project: 5768 (Run 8, Clone 9, Gen 9) running Vista32, 181.00 drivers, 8800 GTS (G92) 512mb (very stable).
Code: Select all
[00:38:40] *------------------------------*
[00:38:40] Folding@Home GPU Core - Beta
[00:38:40] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[00:38:40]
[00:38:40] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[00:38:40] Build host: amoeba
[00:38:40] Board Type: Nvidia
[00:38:40] Core :
[00:38:40] Preparing to commence simulation
[00:38:40] - Looking at optimizations...
[00:38:40] - Created dyn
[00:38:40] - Files status OK
[00:38:40] - Expanded 46616 -> 252912 (decompressed 542.5 percent)
[00:38:40] Called DecompressByteArray: compressed_data_size=46616 data_size=252912, decompressed_data_size=252912 diff=0
[00:38:40] - Digital signature verified
[00:38:40]
[00:38:40] Project: 5768 (Run 8, Clone 9, Gen 9)
[00:38:40]
[00:38:40] Assembly optimizations on if available.
[00:38:40] Entering M.D.
[00:38:46] Working on Protein
[00:38:47] Client config found, loading data.
[00:38:47] mdrun_gpu returned
[00:38:47] NANs detected on GPU
[00:38:47]
[00:38:47] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:38:51] CoreStatus = 7A (122)
[00:38:51] Sending work to server
[00:38:51] Project: 5768 (Run 8, Clone 9, Gen 9)
[00:38:51] - Read packet limit of 540015616... Set to 524286976.
[00:38:51] - Error: Could not get length of results file work/wuresults_01.dat
[00:38:51] - Error: Could not read unit 01 file. Removing from queue.
[00:38:51] EUE limit exceeded. Pausing 24 hours.