Page 3 of 3
Re: 18027 Bad Work Unit
Posted: Thu Dec 30, 2021 10:08 pm
by v00d00
Workunits fail. That's part of being in beta. That's why running beta is opt in. The chance of things going awry is generally quite high with beta workunits. If you want more stable workunits remove the beta or advanced flag from your client. Their is no shame attached to it. Most people who do beta just accept that things can go wrong. Obviously the more workunits you successfully complete make the likelihood of losing QRB less, so you could just opt out of beta for a while and build up a large number of completed workunits to lessen the chance of hitting the 80% cut off point. 20% of 1000 is a considerably bigger buffer than 20% of 100.
I have removed beta a few times in the last couple of months due to unstable workunits from certain projects. I then fold regular workunits for a couple of days and then go back on to beta. For whatever reason some of these workunits hate my RTX 2080 Ti, I have lost a handful now on these 182xx unit, but have also folded many more without issue, so i am leaning more towards an issue with the projects, as if it was hardware related, would they not all fail?
Re: 18027 Bad Work Unit
Posted: Thu Dec 30, 2021 10:37 pm
by Neil-B
Believe the project in question may actually have been released full fah hence folders concerns ... and I believe representations were made to the researcher about this.
I have run some 750+ p18201s and some 50+ p18202s and not had a single failure on my RTX3070 Win 11 setup so for me they have been stable ... different projects work gpus in different ways and even within projects there can be some slight variations in workload on gpus ... p18201s being decently large (from an atom count perspective) utilise my rtx3070 to the max (unlike some smaller atom count wus) and push to max power usage (running 2025MHz clocks) so it might be that they push your gpu right to the borderline on stability with the occasional one pushing it just too far? ... Have the errors been NAN ones?
Re: 18027 Bad Work Unit
Posted: Thu Dec 30, 2021 10:57 pm
by Joe_H
Yes, Project 18027 was released by the researcher to full F@h with no beta testing, little internal testing, and with no advance notice. There have been messages sent to that researcher.
Re: 18027 Bad Work Unit
Posted: Fri Dec 31, 2021 12:06 am
by psaam0001
Like WT*.... Seems like someone was in a little bit of a hurry to get going on this, but forgot to institute the proper testing protocols (which may make the process of getting valid information from the data more difficult).
FWIW: I'm running what gets assigned to my system.
Paul
Re: 18027 Bad Work Unit
Posted: Fri Dec 31, 2021 3:51 pm
by v00d00
Oh ok, ignore my post.
This has happened before and it sucks. Probably a researcher who either doesn't know that new workunits are supposed to be tested by betateam for a week or two to make sure these mishaps don't happen, or maybe someone who doesn't care about following protocol. Either way someone from Mod team should shoot the problem up the food chain to whomever owns the project and let them know it needs tweaking. With any luck the project will get pulled from public and pushed back into beta for a bit until the problem is ironed out.
psaam0001 wrote:FWIW: I'm running what gets assigned to my system.
Same here. Not really bothered what I fold, as long as I fold.
Neil-B wrote:... Have the errors been NAN ones?
My errors were a mixture of NaN and Cuda/random weird error codes, plus Core_18 has dumped a few times for no reason. I've done all the usual checks on my 2080, cleaned out the cooling system, checked the RM850 is giving correct voltages, etc, and it doesnt appear to be the Card, PSU or OS. I even tried underclocking the card by 100MHz to see if it would work, but core_18 still dumped. So reset back to stock and just started keeping an eye on it for when it dies. I do wish Windows would kill and restart it automatically instead of telling me it died.