Multiple Issues with AMD GPU Processing?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

mwroggenbuck
Posts: 127
Joined: Tue Mar 24, 2020 12:47 pm

Re: Multiple Issues with AMD GPU Processing?

Post by mwroggenbuck »

A quick update.

I think I am seeing something different than other people. This morning, Einstein at home caused a Radeon control crash and reset. This was followed by no more work being done in the GPU (no power or utilization).

So apparently, my issue is not isolated to FAH.

This is a new card. It is possible something is wrong with it, but I have not had any problems outside of OpenCL software. All stress tests that I run leave the GPU below 75 degrees. I have been running Einstein at home for several days before this (by itself) without issues, but it does tend to have less utilization than FAH.

In any event, I am going to discontinue OpenCL processing until I have had more time to think about this.

If I do run FAH any more, I will leave the log level at 3 and make sure to save my log file.
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Post by kwthom »

An update...

Appreciate all of the pointers to other threads here where I could learn a bit more about the AMD issues - ugh.

Anyhow...a magic button somewhere must have been pressed. Every time I'm in here looking at my Client Advanced Control screen, I've had a WU crunching away on my GPU.

Nice!
Image
mwroggenbuck
Posts: 127
Joined: Tue Mar 24, 2020 12:47 pm

Re: Multiple Issues with AMD GPU Processing?

Post by mwroggenbuck »

I actually have another thread about this, but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. I have an RMA and will get a new card. It will be interesting to see if it acts differently.
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Post by kwthom »

mwroggenbuck wrote:<...>but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. <...>
I read this, then really started mucking about with settings...

I now have a stable (I think...) set of settings, but it is underclocked and undervolted by a bit.

A solid day or two will then confirm this, then I can start tweaking as the WU's are now coming a bit more regularly these days.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple Issues with AMD GPU Processing?

Post by bruce »

mwroggenbuck wrote:This morning, Einstein at home caused a Radeon control crash and reset.
FAH has absolutely no connection with Einstein@home. We can't provide any kind of support for their projects. They may or may not use the sortshortlist so any connection you can draw between the information provied on the previous pages is entirely your responsibility.

Changing your clock rate won't bypass that problem, but it certainly could bypass some other problems.
mwroggenbuck
Posts: 127
Joined: Tue Mar 24, 2020 12:47 pm

Re: Multiple Issues with AMD GPU Processing?

Post by mwroggenbuck »

Update: my new card work fine. There was a definite stability problem at the clock rate it was supposed to be able to use.

I realize that FAH and Einstein@home are different programs, but they both exercise OpenCL and the GPU. I was not seeing the shortlist problem that initiated this thread. The fact that my error finally occurred outside of FAH made me believe the problem was not FAH.

I apologize if I sent anyone down the wrong path.

Ultimately, I was fighting two different issues: 1) unstable hardware, 2) Anti-virus that locked a file FAH needed to rename.
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Post by kwthom »

I've tweaked a few more things; things are running a bit better.

Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.

No, I've not saved anything from my last 'unplanned termination event', but a general question...

Is there a public accessible repository of the WU's my system(s) have crunched?
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Multiple Issues with AMD GPU Processing?

Post by PantherX »

kwthom wrote:...Is there a public accessible repository of the WU's my system(s) have crunched?
Not officially. However, you can either:
1) Save your log files and get the PRCG details from it.
2) Use HFM.NET to maintain a WU database across your clients (https://github.com/harlam357/hfm-net)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Multiple Issues with AMD GPU Processing?

Post by Crawdaddy79 »

kwthom wrote:I've tweaked a few more things; things are running a bit better.

Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.

No, I've not saved anything from my last 'unplanned termination event', but a general question...

Is there a public accessible repository of the WU's my system(s) have crunched?
I painstakingly built out a spreadsheet over the course of two weeks to try to find a pattern to my crashes. I found that Project 16435, for whatever reason, was the project that failed on my system (causing a crash) by a far and large margin. Often if the CPU was folding at the time, that work unit would come back with Guru Meditation errors and get dumped while the GPU would pick up where it left off (but if it crashes once, it inevitably crashes again and again until it fails) (I have about a 60% success rate of finishing these). It's enough that when I notice I've picked up 16435, I pause the CPU slot just to preserve the work. I have never had a crash when the CPU is folding by itself.

I hope at least some portion of my post is helpful.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple Issues with AMD GPU Processing?

Post by bruce »

kwthom wrote:Is there a public accessible repository of the WU's my system(s) have crunched?
It's not what you're looking for, but the last WU from each of your slots can be found here:
https://apps.foldingathome.org/cpu

If you've reinstalled FAH, you will find the WUs that have been processed both by the old and the new installation. If you have several machines running FAH, you'll find all of them that use the name that you enter in the User field at the top.

I see five slots all reporting that the last WU was successfully completed and got bonus points.
Post Reply