Suggestion for handling ERROR 0x0
Posted: Mon May 03, 2010 11:20 am
Hello.
Some of us (including me), when encounter this error sometimes need to do workaround (deleting work/, queue.dat, unitlist.txt, even machinedependent.dat) so it won't get the same (bad?) WU again. Now, if the machine that's doing the WU is headless (without monitor), and leave running without user intervention for quite a while, there's a possibility this machine will run the same (bad?) WU over and over again. Means waste of power.
Here's my suggestion.
- When client A encounter this error (or any other error that hinting bad WU) at some checkpoint, the client report this WU to the server.
- The server mark the WU.
- client A get different WU.
- The server send the marked WU to other client. Say client B and C.
- If client B and C report the same error at the same checkpoint, most likely the WU is a bad WU.
- Else, if either client B or C send the completed WU, there's possibility client A machine is not very stable.
This way, client A won't waste its resources to do the same WU over and over again. And it provides a good way to test whether a WU is a bad one.
Just my 2 cent
Some of us (including me), when encounter this error sometimes need to do workaround (deleting work/, queue.dat, unitlist.txt, even machinedependent.dat) so it won't get the same (bad?) WU again. Now, if the machine that's doing the WU is headless (without monitor), and leave running without user intervention for quite a while, there's a possibility this machine will run the same (bad?) WU over and over again. Means waste of power.
Here's my suggestion.
- When client A encounter this error (or any other error that hinting bad WU) at some checkpoint, the client report this WU to the server.
- The server mark the WU.
- client A get different WU.
- The server send the marked WU to other client. Say client B and C.
- If client B and C report the same error at the same checkpoint, most likely the WU is a bad WU.
- Else, if either client B or C send the completed WU, there's possibility client A machine is not very stable.
This way, client A won't waste its resources to do the same WU over and over again. And it provides a good way to test whether a WU is a bad one.
Just my 2 cent