feature request: optional EUE verification by the client

Moderators: Site Moderators, FAHC Science Team

Post Reply
sdack

feature request: optional EUE verification by the client

Post by sdack »

Hello,

I am not sure where to make this request, but I think it is best to make it here. I would like to have a feature in the FAH client that enables me to automatically verify an EUE for myself. It is not meant to replace the verification that is done by the projects themselves but to assist in solving problems on my side.

An option like "-verify-eue" shall rerun the simulation, before it continues to download a new WU, after an EUE has occurred. If the rerun results in an EUE, too, shall it compare both results and report if they match or differ. In addition, should the rerun turn out to be successful (no EUE occured) shall the new result be uploaded, too (and no or just a few points given). It shall not serve as a permanent verification but to assist in solving problems locally.

The advantage over asking (here on the forum) is that one can see the result immediately, solve local problems faster and more conveniently. For the Pande Group the advantage is that they eventually would have to verify less EUEs and have more time for other things.
Last edited by sdack on Wed Oct 22, 2008 7:07 am, edited 4 times in total.
osgorth
Posts: 72
Joined: Fri Sep 12, 2008 10:46 am

Re: feature request: optional EUE verification by the client

Post by osgorth »

Interesting idea, I second this request! :)
Image
al2
Posts: 10
Joined: Tue Jan 01, 2008 3:48 pm
Location: U.K.

Re: feature request: optional EUE verification by the client

Post by al2 »

This made me think if EUEs are handled automatically now is it implied that in these cases the hardware is not at fault ?or Perhaps the possibility is estimated in. When done manually imo it was a good way to test hardware "in practice/in the field" + contribute to the project (as opposed to just testing with stresscpu2).

Anyway i imagine the new system has better potential for the project overall .
Folding on XPMce 32-bit in Dell9200 machine (stock) with;

C2D E6600

2GB DDR 533Mhz (Kingston)
sdack

Re: feature request: optional EUE verification by the client

Post by sdack »

EUEs can get caused by unstable hardware and when the simulation runs out of bounds. The Pande Group always knows if an EUE is caused by unstable hardware or just by the simulation itself, but those who run the hardware just see the error message. It would be nice if one could have such an option and run the client for a week or even an entire month with it. Most people have the patience to fix instability issues when they occur within 24h or 48h. Long time tests require every single EUE to be verified and most people, including myself, do not want to monitor their clients for such a long time and ask for confirmation of each EUE.

This would help those who build their own machines or have bought a new one, have added a new piece of hardware to it or who do over-clocking. I believe that many would profit from an optional client-side verification including the Pande Group themselves.
rpmouton
Posts: 40
Joined: Mon Jun 23, 2008 1:09 pm
Hardware configuration: 1-MSI 990FXA-GD65V2 AM3+, AMD FX-8120 8-Core Black Edition-3.1 GHz, Mushkin Enhanced Blackline 8GB (2 x 4GB) 1600 MHz and ASUS GeForce GTX 550 Ti. Win 7 64x, 7.1x client with SMP and GPU slots ~14k ppd

2-ASUS M2NE-SLI AM2, AMD Phenom 4 @ 2.3 GHZ, 4 GB @ 800 MHz and ASUS GeForce GTX 550 Ti. Win Vista 64x, 7.1x client with SMP and GPU slots ~10k ppd

3-MSI 785GTM-E45 AM2+, AMD Phenom 4 Propus @ 3 GHZ, 4GB @ 800 MHZ, Win 7 64x, 7.1x client with SMP slots ~4k ppd

4-DELL 2950 Gen III, 2 Xeon E5405 Quad core @ 2GHz, 8 GB @ 669MHz, Ubuntu 12.04, 7.1 client with one SMP slot (bigadv) ~12k ppd
Location: Orlando, Florida

Re: feature request: optional EUE verification by the client

Post by rpmouton »

sdack wrote:EUEs can get caused by unstable hardware and when the simulation runs out of bounds. The Pande Group always knows if an EUE is caused by unstable hardware or just by the simulation itself, but those who run the hardware just see the error message. It would be nice if one could have such an option and run the client for a week or even an entire month with it. Most people have the patience to fix instability issues when they occur within 24h or 48h. Long time tests require every single EUE to be verified and most people, including myself, do not want to monitor their clients for such a long time and ask for confirmation of each EUE.

This would help those who build their own machines or have bought a new one, have added a new piece of hardware to it or who do over-clocking. I believe that many would profit from an optional client-side verification including the Pande Group themselves.
Hey sdack,

This is a good idea in general although I am not sure that the Pande group always knows before we return WU's whether they are EUE for out of bounds or not.

I am also not sure that keeping WU's for weeks or a month to test is good for the system either.

However, I think that your desire to test cards, configurations or what not without affecting the science is a good one and my thought was that if we had a command switch to allow us to run known good WUs as well as known out of bounds WUs without increasing server loads or delaying science results would be cool.

I sure could have used the option while I was trying fruitlessly to get my 8600gt's to run WUs with the 1.15 core.

regards,
Roger
sdack

Re: feature request: optional EUE verification by the client

Post by sdack »

rpmouton wrote:This is a good idea in general although I am not sure that the Pande group always knows before we return WU's whether they are EUE for out of bounds or not.
I am also not sure that keeping WU's for weeks or a month to test is good for the system either.
For a clarification: The idea implies a minimum in change of operation. Only when an EUE occurs and after the client has send back the result shall the client restart the simulation and only for a single time. Then shall the client continue regardless of the outcome of the verification. It must not run the same WU over and over again. And as an addition, should this second run succeed shall the result not be discarded but instead returned to the server.
Post Reply