Page 1 of 1

Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Sat Nov 15, 2008 4:09 pm
by Xilikon
This is the 2nd time I got assigned to that WU which fail so often that I'm struck with being shut down for 24 hours. This is on a GTX 260 core 216 which is otherwise rock stable.

Code: Select all

[15:42:04] Project: 5506 (Run 1, Clone 781, Gen 174)
[15:42:04] 
[15:42:04] Assembly optimizations on if available.
[15:42:04] Entering M.D.
[15:42:11] Working on p5506_supervillin_e1
[15:42:11] Client config found, loading data.
[15:42:12] mdrun_gpu returned 
[15:42:12] NANs detected on GPU
[15:42:12] 
[15:42:12] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:42:14] CoreStatus = 7A (122)
[15:42:14] Sending work to server
[15:42:14] Project: 5506 (Run 1, Clone 781, Gen 174)
[15:42:14] - Read packet limit of 540015616... Set to 524286976.
[15:42:14] - Error: Could not get length of results file work/wuresults_04.dat
[15:42:14] - Error: Could not read unit 04 file. Removing from queue.
[15:42:14] Trying to send all finished work units
[15:42:14] + No unsent completed units remaining.
[15:42:14] - Preparing to get new work unit...
[15:42:14] + Attempting to get work packet
[15:42:14] - Will indicate memory of 4094 MB

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Sat Nov 15, 2008 5:18 pm
by toTOW
This one is a bad WU, there are many reports for 0 credit, and no one was able to get any points for it.

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Sun Nov 16, 2008 8:21 am
by Bobby-Uschi
01:13:31] Project: 5506 (Run 1, Clone 781, Gen 174)
[01:13:31] - Read packet limit of 540015616... Set to 524286976.
[01:13:31] - Error: Could not get length of results file work/wuresults_08.dat
[01:13:31] - Error: Could not read unit 08 file. Removing from queue.
[01:13:31] EUE limit exceeded. Pausing 24 hours.
What makes this terrible work,please elimate this.
Ursula

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Mon Nov 17, 2008 7:46 pm
by jevans64
I just wanted to chime in that I have not been able to complete this unit either. I found it stalled / looped on two different clients.

Code: Select all

[21:12:41] 
[21:12:41] *------------------------------*
[21:12:41] Folding@Home GPU Core - Beta
[21:12:41] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:12:41] 
[21:12:41] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:12:41] Build host: amoeba
[21:12:41] Board Type: Nvidia
[21:12:41] Core      : 
[21:12:41] Preparing to commence simulation
[21:12:41] - Looking at optimizations...
[21:12:41] - Created dyn
[21:12:41] - Files status OK
[21:12:41] - Expanded 45606 -> 246249 (decompressed 539.9 percent)
[21:12:41] Called DecompressByteArray: compressed_data_size=45606 data_size=246249, decompressed_data_size=246249 diff=0
[21:12:41] - Digital signature verified
[21:12:41] 
[21:12:41] Project: 5506 (Run 1, Clone 781, Gen 174)
[21:12:41] 
[21:12:41] Assembly optimizations on if available.
[21:12:41] Entering M.D.
[21:12:48] Working on p5506_supervillin_e1
[21:12:48] Client config found, loading data.
[21:12:48] mdrun_gpu returned 
[21:12:48] NANs detected on GPU
[21:12:48] 
[21:12:48] Folding@home Core Shutdown: UNSTABLE_MACHINE

Code: Select all

[15:30:44] 
[15:30:44] *------------------------------*
[15:30:44] Folding@Home GPU Core - Beta
[15:30:44] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[15:30:44] 
[15:30:44] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[15:30:44] Build host: amoeba
[15:30:44] Board Type: Nvidia
[15:30:44] Core      : 
[15:30:44] Preparing to commence simulation
[15:30:44] - Looking at optimizations...
[15:30:44] - Created dyn
[15:30:44] - Files status OK
[15:30:44] - Expanded 45606 -> 246249 (decompressed 539.9 percent)
[15:30:44] Called DecompressByteArray: compressed_data_size=45606 data_size=246249, decompressed_data_size=246249 diff=0
[15:30:44] - Digital signature verified
[15:30:44] 
[15:30:44] Project: 5506 (Run 1, Clone 781, Gen 174)
[15:30:44] 
[15:30:44] Assembly optimizations on if available.
[15:30:44] Entering M.D.
[15:30:50] Working on p5506_supervillin_e1
[15:30:51] Client config found, loading data.
[15:30:51] mdrun_gpu returned 
[15:30:51] NANs detected on GPU
[15:30:51] 
[15:30:51] Folding@home Core Shutdown: UNSTABLE_MACHINE 

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Tue Nov 18, 2008 1:22 am
by Teddy
Yep I got it too!!!

[20:24:51]
[20:24:51] + Processing work unit
[20:24:51] Core required: FahCore_11.exe
[20:24:51] Core found.
[20:24:51] Working on queue slot 08 [November 17 20:24:51 UTC]
[20:24:51] + Working ...
[20:24:51] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -priority 96 -checkpoint 3 -verbose -lifeline 6032 -version 620'

[20:24:51]
[20:24:51] *------------------------------*
[20:24:51] Folding@Home GPU Core - Beta
[20:24:51] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[20:24:51]
[20:24:51] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[20:24:51] Build host: amoeba
[20:24:51] Board Type: Nvidia
[20:24:51] Core :
[20:24:51] Preparing to commence simulation
[20:24:51] - Looking at optimizations...
[20:24:51] - Created dyn
[20:24:51] - Files status OK
[20:24:51] - Expanded 45606 -> 246249 (decompressed 539.9 percent)
[20:24:51] Called DecompressByteArray: compressed_data_size=45606 data_size=246249, decompressed_data_size=246249 diff=0
[20:24:51] - Digital signature verified
[20:24:51]
[20:24:51] Project: 5506 (Run 1, Clone 781, Gen 174)
[20:24:51]
[20:24:51] Assembly optimizations on if available.
[20:24:51] Entering M.D.
[20:24:57] Working on p5506_supervillin_e1
[20:24:58] Client config found, loading data.
[20:24:58] mdrun_gpu returned
[20:24:58] NANs detected on GPU
[20:24:58]
[20:24:58] Folding@home Core Shutdown: UNSTABLE_MACHINE
[20:25:01] CoreStatus = 7A (122)
[20:25:01] Sending work to server
[20:25:01] Project: 5506 (Run 1, Clone 781, Gen 174)
[20:25:01] - Read packet limit of 540015616... Set to 524286976.
[20:25:01] - Error: Could not get length of results file work/wuresults_08.dat
[20:25:01] - Error: Could not read unit 08 file. Removing from queue.
[20:25:01] EUE limit exceeded. Pausing 24 hours.
[21:00:07] - Autosending finished units... [November 17 21:00:07 UTC]
[21:00:07] Trying to send all finished work units
[21:00:07] + No unsent completed units remaining.
[21:00:07] - Autosend completed
[01:12:37] ***** Got a SIGTERM signal (2)
[01:12:37] Killing all core threads

Folding@Home Client Shutdown.


Only noticed it coz I came home from work to have lunch, otherwise it would have been stuck a few more hrs.
I wonder how many other people got this work unit?

Teddy

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Tue Nov 18, 2008 11:58 am
by Torin3
Sorry, I didn't see this first, (though I did a quick scan, sorry) but I got this one too:
viewtopic.php?f=19&t=6990

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Tue Nov 18, 2008 12:08 pm
by Drugless
Got it too! Client asleep for 10 hours!

Code: Select all

[01:55:47] Project: 5506 (Run 1, Clone 781, Gen 174)
[01:55:47] 
[01:55:47] Assembly optimizations on if available.
[01:55:47] Entering M.D.
[01:55:53] Working on p5506_supervillin_e1
[01:55:54] Client config found, loading data.
[01:55:54] mdrun_gpu returned 
[01:55:54] NANs detected on GPU
[01:55:54] 
[01:55:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:55:57] CoreStatus = 7A (122)
[01:55:57] Sending work to server
[01:55:57] Project: 5506 (Run 1, Clone 781, Gen 174)
[01:55:57] - Read packet limit of 540015616... Set to 524286976.
[01:55:57] - Error: Could not get length of results file work/wuresults_01.dat
[01:55:57] - Error: Could not read unit 01 file. Removing from queue.
[01:55:57] EUE limit exceeded. Pausing 24 hours.

Re: Project: 5506 (Run 1, Clone 781, Gen 174)

Posted: Tue Nov 18, 2008 3:39 pm
by VijayPande
We've manually stopped this WU. We're also looking into new client and/or server code to better handle these situations.