Page 1 of 1

Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sat Mar 05, 2011 5:07 pm
by sfield
This work-unit faults nearly immediately, several times in a row before the client pulled a 6900. This is a stable machine.

[16:01:20] Project: 6901 (Run 14, Clone 20, Gen 2)
[16:01:20]
[16:01:21] Assembly optimizations on if available.
[16:01:21] Entering M.D.
[16:01:27] Mapping NT from 24 to 24
[16:01:30] Completed 0 out of 250000 steps (0%)
[16:03:24] CoreStatus = C0000005 (-1073741819)
[16:03:25] Client-core communications error: ERROR 0xc0000005
[16:03:25] Deleting current work unit & continuing...

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sat Mar 05, 2011 6:23 pm
by toTOW
There's no other report for this WU in the DB yet ...

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sat Mar 05, 2011 6:36 pm
by sfield
Can you see all fault reports in the database? Is there a threshold of fault counts such that units will get pulled out of rotation (for investigation) automatically?

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sat Mar 05, 2011 7:02 pm
by toTOW
I can see something if the client send some results (even partial).

But with the error you got, nothing is reported since it deletes the WU (and partial results if it has been able to write some) : "Deleting current work unit & continuing..."

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sat Mar 05, 2011 8:43 pm
by sfield
I'd say 99% of the faults I have seen were at 0%. I had a few in the early days of setting up an SR-2 system, but those are long gone.
toTOW wrote:I can see something if the client send some results (even partial).

But with the error you got, nothing is reported since it deletes the WU (and partial results if it has been able to write some) : "Deleting current work unit & continuing..."

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sun Mar 06, 2011 9:46 pm
by sfield
This unit is still out there and faulting immediately -- can you mark bad?

Code: Select all

[21:43:42] Working on queue slot 07 [March 6 21:43:42 UTC]
[21:43:42] + Working ...
[21:43:43]
[21:43:43] *------------------------------*
[21:43:43] Folding@Home Gromacs SMP Core
[21:43:43] Version 2.27 (Mar 12, 2010)
[21:43:43]
[21:43:43] Preparing to commence simulation
[21:43:43] - Assembly optimizations manually forced on.
[21:43:43] - Not checking prior termination.
[21:43:47] - Expanded 24858880 -> 30796292 (decompressed 123.8 percent)
[21:43:47] Called DecompressByteArray: compressed_data_size=24858880 data_size=3
0796292, decompressed_data_size=30796292 diff=0
[21:43:47] - Digital signature verified
[21:43:47]
[21:43:47] Project: 6901 (Run 14, Clone 20, Gen 2)
[21:43:47]
[21:43:47] Assembly optimizations on if available.
[21:43:47] Entering M.D.
[21:43:54] Mapping NT from 24 to 24
[21:43:56] Completed 0 out of 250000 steps  (0%)
[21:44:15] CoreStatus = C0000005 (-1073741819)
[21:44:15] Client-core communications error: ERROR 0xc0000005
[21:44:15] Deleting current work unit & continuing...

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Mon Mar 07, 2011 2:45 am
by bruce
Is the machine overclocked? Have you run extensive memory diagnostics recently?

By policy, we ether wait for multiple uploads or reports from multiple people. That's why we often say "marked for followup"

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Mon Mar 07, 2011 4:38 am
by sfield
Not applicable -- see PM as of a couple minutes ago for much more detail.
bruce wrote:Is the machine overclocked? Have you run extensive memory diagnostics recently?

By policy, we ether wait for multiple uploads or reports from multiple people. That's why we often say "marked for followup"

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sun Mar 13, 2011 11:31 pm
by PantherX
No data in the WU DAtabase yet.

Re: Project: 6901 (Run 14, Clone 20, Gen 2) faulting

Posted: Sat Apr 30, 2011 1:03 am
by sortofageek
The WU (P6901,R14,C20,G2) has been reported as a bad WU. Note that the list of reported WUs are stoped daily at 8am pacific time.