Page 1 of 2

Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Sat Dec 15, 2007 9:41 pm
by Ivoshiee
When ever you have the same WU fail multiple times at the same point with UNKNOWN error, 0x0, 0x1 or something else than EUE then you should back up the WU data by stopping it just a bit before when it will error out and try to run it on some other computer. You can post it for someone else to test it out as well. This will make it possible to improve the FAH core files to detect those errors and classify those as EUEs.

Why is it needed? If for no other reason then for the points - WUs with UNKNOWN errors, 0x0 nor 0x1 will get you no points, but EUE will get partial credit.


For example:
http://foldingforum.org/viewtopic.php?t=258

Also:
http://fahwiki.net/index.php/Common_Error_Messages
http://fahwiki.net/index.php/Error_0x0_and_0x1

Note: When ever you have an excessive amount of the WU failures, you should test your computer for errors - memory (http://www.memtest.org/), temperatures, ...

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Thu Dec 27, 2007 6:19 am
by klasseng
Ivoshiee:

I've just had a WU failure (multiple times at the same point) and came across this post . . . but I need more information about:
a) "you should back up the WU" . . . just how is that done?
b) "run it on some other computer" . . . how does that get done?
c) "you can post it for someone else to test" . . . post what and where?

peace,
klasseng

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Thu Dec 27, 2007 8:00 am
by codysluder
klasseng wrote:Ivoshiee:

I've just had a WU failure (multiple times at the same point) and came across this post . . . but I need more information about:
a) "you should back up the WU" . . . just how is that done?
b) "run it on some other computer" . . . how does that get done?
c) "you can post it for someone else to test" . . . post what and where?

peace,
klasseng
a) copy the entire installation directory somewhere else (or, you can be more selective and copy less data if you know what you're doing. (See the WIKI instructions for "sneakernetting.")

b) If you have more than one computer running the same OS, see the WIKI for instructions about "sneakernetting"

c) If you have only one computer, contact someone else with the same OS and see if they can process your backup. "What" is the same thing backed up in step a. "Where" depends on whether you have your own website or if you need to upload the data to one of the advertising supported hosts. In some cases, the data can be emailed but there are often limitations on the size of email attachments that prohibit this method.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Mon Jan 14, 2008 9:32 pm
by MacBozo
I keep getting the following on Protein: p3045_FORMIN BINDING PROTEIN WUs:
[09:00:39] Completed 585000 out of 1500000 steps (39)
Warning: 1-4 interaction at distance larger than 3.24
These are ignored for the rest of the simulation
turn on -debug for more information
[09:08:46] CoreStatus = 0 (0)
[09:08:46] Client-core communications error: ERROR 0x0
[09:08:46] Deleting current work unit & continuing...
I've successfully completed other WUs without problem, but these 3045s keep cutting out at the same point with the same error. Is there a way to block them from being downloaded? Mac OS X 10.5.1, Client v6.o text (terminal)

Thanks,
Michael

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Mon Jan 14, 2008 10:59 pm
by bruce
MacBozo wrote:Is there a way to block them from being downloaded?
There's nothing that YOU can do to block them except to make the kind of report you just made (preferably with a title "Project 3404 Run xxxx Clone xxx Gen XX"indicating the specific WU you're having trouble with. The Pande Group has already taken a few Run/Clone combinations off-line when it's clear that something is wrong with that WU.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Mon Feb 25, 2008 11:14 am
by Oldhat
Ivoshiee wrote:When ever you have the same WU fail multiple times at the same point with UNKNOWN error, 0x0, 0x1 or something else than EUE then you should back up the WU data by stopping it just a bit before when it will error out and try to run it on some other computer. You can post it for someone else to test it out as well. This will make it possible to improve the FAH core files to detect those errors and classify those as EUEs.
You mention stopping the client prior to the error and then trying it on a different computer.

With the Linux client I have found that merely stopping it at any point prior to the error and restarting is normally sufficient to allow successful completion of the WU.

Only a few times has this been unsuccessful.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Tue Apr 01, 2008 7:10 pm
by LookN2Find
So, what's going on, axactly? I cannot complete a WU on my laptop or GPU clients for a couple of weeks now. The GPU client was running great, and then all of the sudden it started tossing out blank WU's over and over reapeatedly, and now gives everything a sense of false measurement. My PS3 is folding fine, and another friend of mine's PS3 is folding fine, but our Core 2 Duo's will not complete a WU in the Conolse Client to save anyones life (hm, literally). I have not tried running the graphical client for our CPUs yet.

I am running an ATI X1950 Pro GPU/Video Card. I am running a 1.66Ghz Core 2 Duo that I have been folding with for almost a solid year and a half, 24/7. I am also having problems on a Pentium D unit, and a Celeron D unit. All of them failed within the same time frame of 24 hours, and none of them will complete a work unit. I have re-installed clients. I have tried Beta's and standards. I have read forums. I have changed settings, and back tracked Video Catalysts to recommended versions, etc. I think I have done everything that can possibly be done. I am about to attempt to fire up another GPU and an E6850 Core 2, but is F@H having software issues on both clients, now??

None of my units will run. I would very much appreciate if someone would let me know if we're waiting for a release of another "soon to come" client...? I am just so exhausted with reinstalling 4 different versions, on 5 individual clients, and times that toward changing setting 3-4 different times per unit, and the downloading etc, and you got what could estimate to a minimum of 60, and maximum of 80 attempts/failures (not to mention how many WU's failed within each test of settings and clients). I'm seriously exhausted from trying to fold :shock: .

Someone please help so I can help!

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Tue Apr 01, 2008 7:12 pm
by Ivoshiee
LookN2Find wrote:So, what's going on, axactly? I cannot complete a WU on my laptop or GPU clients for a couple of weeks now. The GPU client was running great, and then all of the sudden it started tossing out blank WU's over and over reapeatedly, and now gives everything a sense of false measurement. My PS3 is folding fine, and another friend of mine's PS3 is folding fine, but our Core 2 Duo's will not complete a WU in the Conolse Client to save anyones life (hm, literally). I have not tried running the graphical client for our CPUs yet.

I am running an ATI X1950 Pro GPU/Video Card. I am running a 1.66Ghz Core 2 Duo that I have been folding with for almost a solid year and a half, 24/7. I am also having problems on a Pentium D unit, and a Celeron D unit. All of them failed within the same time frame of 24 hours, and none of them will complete a work unit. I have re-installed clients. I have tried Beta's and standards. I have read forums. I have changed settings, and back tracked Video Catalysts to recommended versions, etc. I think I have done everything that can possibly be done. I am about to attempt to fire up another GPU and an E6850 Core 2, but is F@H having software issues on both clients, now??

None of my units will run. I would very much appreciate if someone would let me know if we're waiting for a release of another "soon to come" client...? I am just so exhausted with reinstalling 4 different versions, on 5 individual clients, and times that toward changing setting 3-4 different times per unit, and the downloading etc, and you got what could estimate to a minimum of 60, and maximum of 80 attempts/failures (not to mention how many WU's failed within each test of settings and clients). I'm seriously exhausted from trying to fold :shock: .

Someone please help so I can help!
Have you checked the GPU temperatures?
I had over 5000 WUs EUE during a course of 2 days because of failed GPU cooler.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Thu Apr 17, 2008 6:50 pm
by spazzcat
I think I'm having the same issue? The WU was ambda5_99sbExtra SSE boost OK.

[17:37:50] Warning: long 1-4 interactions
[17:37:54] CoreStatus = 0 (0)
[17:37:54] Client-core communications error: ERROR 0x0
[17:37:54] Deleting current work unit & continuing...
[17:42:22] - Warning: Could not delete all work unit files (1): Core returned invalid code
[17:42:22] Trying to send all finished work units
[17:42:22] + No unsent completed units remaining.
[17:42:22] - Preparing to get new work unit...
[17:42:22] + Attempting to get work packet
[17:42:22] - Will indicate memory of 3894 MB
[17:42:22] - Connecting to assignment server
[17:42:22] Connecting to http://assign.stanford.edu:8080/
[17:42:23] Posted data.
[17:42:23] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:42:23] + News From Folding@Home: Welcome to Folding@Home
[17:42:23] Loaded queue successfully.
[17:42:23] Connecting to http://171.64.65.63:8080/
[17:42:23] Posted data.
[17:42:23] Initial: 0000; + Could not connect to Work Server
[17:42:23] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[17:42:34] + Attempting to get work packet
[17:42:34] - Will indicate memory of 3894 MB
[17:42:34] - Connecting to assignment server
[17:42:34] Connecting to http://assign.stanford.edu:8080/
[17:42:34] Posted data.
[17:42:34] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:42:34] + News From Folding@Home: Welcome to Folding@Home
[17:42:34] Loaded queue successfully.
[17:42:34] Connecting to http://171.64.65.63:8080/

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Mon Apr 28, 2008 7:24 am
by jrweiss
I've been away from home for a week. I noticed by monitoring my stats that my SMP setup is not producing. i was finally to check it remotely from Amsterdam (I've been WAY more remote than that!) and have found that it keeps downloading 3062 5/6/93 and gets EUEs at 44%. I cannot run it on another machine.

I'll open a new topic or look for a current one, and post the logs.

How do I ensure this particular WU doesn't just re-appear yet again?

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Mon Apr 28, 2008 7:58 am
by Ivoshiee
jrweiss wrote:I've been away from home for a week. I noticed by monitoring my stats that my SMP setup is not producing. i was finally to check it remotely from Amsterdam (I've been WAY more remote than that!) and have found that it keeps downloading 3062 5/6/93 and gets EUEs at 44%. I cannot run it on another machine.

I'll open a new topic or look for a current one, and post the logs.

How do I ensure this particular WU doesn't just re-appear yet again?
As the 0x0 will not get reported the FAH DC system should assign the WU again to you about 3-5 times before moving on. If you insist not having the WU again the you have nothing more to do than to keep dumping the WU until you'll get another one.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Thu May 08, 2008 4:07 am
by jrweiss
Well, some of us don't have the luxury of moving a WU to another machine, and some of us can't dump a WU by remote control from half way around the world...

Maybe the EUE re-assignment process should be rethunk, so it doesn't go back to the same computer...

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Thu Jul 24, 2008 8:12 pm
by Sunin
The way it works now, i believe, is until the WU expires you will be regiven that same failed WU... I've gotten numerous identical WUs to rechug that had EUE... and of course everything before them worked great and everything after has worked flawlessly... but for a few days and maybe a set # of failures before it reassigns them.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Sat Aug 16, 2008 1:15 pm
by rada
you can be more selective and copy less data if you know what you're doing. (See the WIKI instructions for "sneakernetting.") <to figure out what to back up and what to post to help troubleshooters>

Well, missed it on first few reads, but it seems to say queue.dat and all of work/ directory is enough.

Unfortunately, that barely cut a couple % from compressed size of whole folding directory on my problem unit. So I will archive all of folding dir with tar + bzipping or bzip2'ng unless it creates problems. Anyway figured I'd post that here since it was buried deep in the wiki text.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Posted: Sun Sep 14, 2008 9:01 pm
by jayrex
Hi there,

I'm not sure if it's multiply errors I'm getting. But it certainly is one big error.

I'm just finished doing my second workunit when my software goes to download the next work unit. It attempts to download bytes and says

'Core download error (#1), waiting before retry'

This attempt will happen more than once #2, #3, #4 and so on.

What should I do?