WU on gpu randomly stops? this will maybe help.

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
FrancescoSciumè
Posts: 10
Joined: Fri Mar 20, 2020 5:41 pm
Hardware configuration: CPU: i7-4790k
MOBO: ASrock z97 extreme4
GPU: Palit Gtx 780ti jetstream
RAM: G.Skill ares 8Gb (4x2gb) 2133MHz C9
SSD: Crucial MX 500Gb
HDD: WD Blue 1TB
PSU: XFX TS 650W

WU on gpu randomly stops? this will maybe help.

Post by FrancescoSciumè »

I have an update on my precedent issue about WU randomly stopping on my GPU, wich i discussed in this post viewtopic.php?f=61&t=33053.
I decided to open a new post because i think this could help other since i've spend days on the issue.

To recap the problem i noticed that on longer WU issued to my gpu, they randomly stopped without any signs whatsoever. I rolled back any overclocking on the gpu to be sure i was stable, leaving it stock. Remember this, it will be important.
In the previous post it was suggested to me to do a memtest64 because sometimes memory errors could cause this problem. The test went fine, but in the middle of it the video driver resetted. So i reinstalled the driver (disinstalling it with DDU) but the problems didn't went away. So i tried to find the problem directly in the log files of the client where i found that everytime the WU on the gpu stopped i was issued with these lines:
08:01:16:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
08:01:16:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
08:01:20:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
08:01:20:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
08:01:24:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
08:01:24:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
08:01:24:WU00:FS01:0x22:ERROR:114: Max Retries Reached
08:01:24:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
08:01:24:WU00:FS01:0x22:Saving result file badstate-0.xml
08:01:24:WU00:FS01:0x22:Saving result file badstate-1.xml
08:01:24:WU00:FS01:0x22:Saving result file badstate-2.xml
08:01:24:WU00:FS01:0x22:Saving result file checkpointState.xml
08:01:24:WU00:FS01:0x22:Saving result file checkpt.crc
08:01:24:WU00:FS01:0x22:Saving result file positions.xtc
08:01:24:WU00:FS01:0x22:Saving result file science.log
08:01:24:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
08:01:25:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
This drived me crazy because i had rolled back any overclocking. So i tried many configuration with afterburner and the problem continued to show up.
So i did a research online and i found an old post here in the forum were people was discussing about stability with the new core22. A post in particular catched my attention in wich someone said that in fact aftermarket GPUs comes already overclocked.
It istantly striked me. My gpu is in fact an aftermarket one, a Palit 780ti jetstream (i'm from Italy, Palit is very common here). At this point i configured in afterburner the gpu to resemble an original gtx 780ti and, low and behold, i haven't had a problem ever since.
I very much face-palmed myself, i know, i should have think of it from the start. But still, i decided to write this post hoping it will help some F****r like me XD

So, be sure your GPU is absolutely stable or roll it back at its very stock configuration. You don't want to be in the position to be at 80% of a 8h WU to see it going down, it doesn't help anyone.

Have a nice day. Be safe, stay at home and wash your hands. We are doing a good thing.
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: WU on gpu randomly stops? this will maybe help.

Post by Jesse_V »

Yes, good find and really good work diagnosing this!

F@h thoroughly uses the hardware and really can't tolerate much overclocking at all. I'm really glad you were able to roll it back and get it stable again.

Good luck over there in Italy, It seems really locked down at this point. Stay safe!
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
FrancescoSciumè
Posts: 10
Joined: Fri Mar 20, 2020 5:41 pm
Hardware configuration: CPU: i7-4790k
MOBO: ASrock z97 extreme4
GPU: Palit Gtx 780ti jetstream
RAM: G.Skill ares 8Gb (4x2gb) 2133MHz C9
SSD: Crucial MX 500Gb
HDD: WD Blue 1TB
PSU: XFX TS 650W

Re: WU on gpu randomly stops? this will maybe help.

Post by FrancescoSciumè »

Jesse_V wrote: Good luck over there in Italy, It seems really locked down at this point. Stay safe!
It really is unfortunaly. But it is working, today and yesterday statistics on diffusion are starting to go down. The people very much needed this news, but it is not over yet.
So please, follow our example. Remember, statistics are always lagging behind the reality. You don't need the government to force you, it is on us to protect our loved ones. We learned it at our expanses.
Rel25917
Posts: 303
Joined: Wed Aug 15, 2012 2:31 am

Re: WU on gpu randomly stops? this will maybe help.

Post by Rel25917 »

Something else worth mentioning is on nvidia cards sometimes reducing the memory speed can actually let you get higher speed on the core, the memory speed loss has very little impact on performance for F@H.
Post Reply