Page 1 of 1
feature request: optional EUE verification by the client
Posted: Tue Oct 21, 2008 12:48 pm
by sdack
Hello,
I am not sure where to make this request, but I think it is best to make it here. I would like to have a feature in the FAH client that enables me to automatically verify an EUE for myself. It is not meant to replace the verification that is done by the projects themselves but to assist in solving problems on my side.
An option like "-verify-eue" shall rerun the simulation, before it continues to download a new WU, after an EUE has occurred. If the rerun results in an EUE, too, shall it compare both results and report if they match or differ. In addition, should the rerun turn out to be successful (no EUE occured) shall the new result be uploaded, too (and no or just a few points given). It shall not serve as a permanent verification but to assist in solving problems locally.
The advantage over asking (here on the forum) is that one can see the result immediately, solve local problems faster and more conveniently. For the Pande Group the advantage is that they eventually would have to verify less EUEs and have more time for other things.
Re: feature request: optional EUE verification by the client
Posted: Tue Oct 21, 2008 2:29 pm
by osgorth
Interesting idea, I second this request!
Re: feature request: optional EUE verification by the client
Posted: Wed Oct 22, 2008 10:35 pm
by al2
This made me think if EUEs are handled automatically now is it implied that in these cases the hardware is not at fault ?or Perhaps the possibility is estimated in. When done manually imo it was a good way to test hardware "in practice/in the field" + contribute to the project (as opposed to just testing with stresscpu2).
Anyway i imagine the new system has better potential for the project overall .
Re: feature request: optional EUE verification by the client
Posted: Thu Oct 23, 2008 10:52 am
by sdack
EUEs can get caused by unstable hardware and when the simulation runs out of bounds. The Pande Group always knows if an EUE is caused by unstable hardware or just by the simulation itself, but those who run the hardware just see the error message. It would be nice if one could have such an option and run the client for a week or even an entire month with it. Most people have the patience to fix instability issues when they occur within 24h or 48h. Long time tests require every single EUE to be verified and most people, including myself, do not want to monitor their clients for such a long time and ask for confirmation of each EUE.
This would help those who build their own machines or have bought a new one, have added a new piece of hardware to it or who do over-clocking. I believe that many would profit from an optional client-side verification including the Pande Group themselves.
Re: feature request: optional EUE verification by the client
Posted: Thu Oct 23, 2008 4:57 pm
by rpmouton
sdack wrote:EUEs can get caused by unstable hardware and when the simulation runs out of bounds. The Pande Group always knows if an EUE is caused by unstable hardware or just by the simulation itself, but those who run the hardware just see the error message. It would be nice if one could have such an option and run the client for a week or even an entire month with it. Most people have the patience to fix instability issues when they occur within 24h or 48h. Long time tests require every single EUE to be verified and most people, including myself, do not want to monitor their clients for such a long time and ask for confirmation of each EUE.
This would help those who build their own machines or have bought a new one, have added a new piece of hardware to it or who do over-clocking. I believe that many would profit from an optional client-side verification including the Pande Group themselves.
Hey sdack,
This is a good idea in general although I am not sure that the Pande group always knows before we return WU's whether they are EUE for out of bounds or not.
I am also not sure that keeping WU's for weeks or a month to test is good for the system either.
However, I think that your desire to test cards, configurations or what not without affecting the science is a good one and my thought was that if we had a command switch to allow us to run known good WUs as well as known out of bounds WUs without increasing server loads or delaying science results would be cool.
I sure could have used the option while I was trying fruitlessly to get my 8600gt's to run WUs with the 1.15 core.
regards,
Re: feature request: optional EUE verification by the client
Posted: Thu Oct 23, 2008 6:38 pm
by sdack
rpmouton wrote:This is a good idea in general although I am not sure that the Pande group always knows before we return WU's whether they are EUE for out of bounds or not.
I am also not sure that keeping WU's for weeks or a month to test is good for the system either.
For a clarification: The idea implies a minimum in change of operation. Only when an EUE occurs and after the client has send back the result shall the client restart the simulation and only for a single time. Then shall the client continue regardless of the outcome of the verification. It must not run the same WU over and over again. And as an addition, should this second run succeed shall the result not be discarded but instead returned to the server.