Multiple Issues with AMD GPU Processing?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: Multiple Issues with AMD GPU Processing?
A quick update.
I think I am seeing something different than other people. This morning, Einstein at home caused a Radeon control crash and reset. This was followed by no more work being done in the GPU (no power or utilization).
So apparently, my issue is not isolated to FAH.
This is a new card. It is possible something is wrong with it, but I have not had any problems outside of OpenCL software. All stress tests that I run leave the GPU below 75 degrees. I have been running Einstein at home for several days before this (by itself) without issues, but it does tend to have less utilization than FAH.
In any event, I am going to discontinue OpenCL processing until I have had more time to think about this.
If I do run FAH any more, I will leave the log level at 3 and make sure to save my log file.
I think I am seeing something different than other people. This morning, Einstein at home caused a Radeon control crash and reset. This was followed by no more work being done in the GPU (no power or utilization).
So apparently, my issue is not isolated to FAH.
This is a new card. It is possible something is wrong with it, but I have not had any problems outside of OpenCL software. All stress tests that I run leave the GPU below 75 degrees. I have been running Einstein at home for several days before this (by itself) without issues, but it does tend to have less utilization than FAH.
In any event, I am going to discontinue OpenCL processing until I have had more time to think about this.
If I do run FAH any more, I will leave the log level at 3 and make sure to save my log file.
Re: Multiple Issues with AMD GPU Processing?
An update...
Appreciate all of the pointers to other threads here where I could learn a bit more about the AMD issues - ugh.
Anyhow...a magic button somewhere must have been pressed. Every time I'm in here looking at my Client Advanced Control screen, I've had a WU crunching away on my GPU.
Nice!
Appreciate all of the pointers to other threads here where I could learn a bit more about the AMD issues - ugh.
Anyhow...a magic button somewhere must have been pressed. Every time I'm in here looking at my Client Advanced Control screen, I've had a WU crunching away on my GPU.
Nice!
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: Multiple Issues with AMD GPU Processing?
I actually have another thread about this, but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. I have an RMA and will get a new card. It will be interesting to see if it acts differently.
Re: Multiple Issues with AMD GPU Processing?
I read this, then really started mucking about with settings...mwroggenbuck wrote:<...>but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. <...>
I now have a stable (I think...) set of settings, but it is underclocked and undervolted by a bit.
A solid day or two will then confirm this, then I can start tweaking as the WU's are now coming a bit more regularly these days.
Re: Multiple Issues with AMD GPU Processing?
FAH has absolutely no connection with Einstein@home. We can't provide any kind of support for their projects. They may or may not use the sortshortlist so any connection you can draw between the information provied on the previous pages is entirely your responsibility.mwroggenbuck wrote:This morning, Einstein at home caused a Radeon control crash and reset.
Changing your clock rate won't bypass that problem, but it certainly could bypass some other problems.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: Multiple Issues with AMD GPU Processing?
Update: my new card work fine. There was a definite stability problem at the clock rate it was supposed to be able to use.
I realize that FAH and Einstein@home are different programs, but they both exercise OpenCL and the GPU. I was not seeing the shortlist problem that initiated this thread. The fact that my error finally occurred outside of FAH made me believe the problem was not FAH.
I apologize if I sent anyone down the wrong path.
Ultimately, I was fighting two different issues: 1) unstable hardware, 2) Anti-virus that locked a file FAH needed to rename.
I realize that FAH and Einstein@home are different programs, but they both exercise OpenCL and the GPU. I was not seeing the shortlist problem that initiated this thread. The fact that my error finally occurred outside of FAH made me believe the problem was not FAH.
I apologize if I sent anyone down the wrong path.
Ultimately, I was fighting two different issues: 1) unstable hardware, 2) Anti-virus that locked a file FAH needed to rename.
Re: Multiple Issues with AMD GPU Processing?
I've tweaked a few more things; things are running a bit better.
Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.
No, I've not saved anything from my last 'unplanned termination event', but a general question...
Is there a public accessible repository of the WU's my system(s) have crunched?
Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.
No, I've not saved anything from my last 'unplanned termination event', but a general question...
Is there a public accessible repository of the WU's my system(s) have crunched?
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Multiple Issues with AMD GPU Processing?
Not officially. However, you can either:kwthom wrote:...Is there a public accessible repository of the WU's my system(s) have crunched?
1) Save your log files and get the PRCG details from it.
2) Use HFM.NET to maintain a WU database across your clients (https://github.com/harlam357/hfm-net)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Multiple Issues with AMD GPU Processing?
I painstakingly built out a spreadsheet over the course of two weeks to try to find a pattern to my crashes. I found that Project 16435, for whatever reason, was the project that failed on my system (causing a crash) by a far and large margin. Often if the CPU was folding at the time, that work unit would come back with Guru Meditation errors and get dumped while the GPU would pick up where it left off (but if it crashes once, it inevitably crashes again and again until it fails) (I have about a 60% success rate of finishing these). It's enough that when I notice I've picked up 16435, I pause the CPU slot just to preserve the work. I have never had a crash when the CPU is folding by itself.kwthom wrote:I've tweaked a few more things; things are running a bit better.
Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.
No, I've not saved anything from my last 'unplanned termination event', but a general question...
Is there a public accessible repository of the WU's my system(s) have crunched?
I hope at least some portion of my post is helpful.
Re: Multiple Issues with AMD GPU Processing?
It's not what you're looking for, but the last WU from each of your slots can be found here:kwthom wrote:Is there a public accessible repository of the WU's my system(s) have crunched?
https://apps.foldingathome.org/cpu
If you've reinstalled FAH, you will find the WUs that have been processed both by the old and the new installation. If you have several machines running FAH, you'll find all of them that use the name that you enter in the User field at the top.
I see five slots all reporting that the last WU was successfully completed and got bonus points.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.