Page 2 of 3

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Wed Apr 01, 2020 3:24 am
by m1geo
Hey, thanks for the confirmation. I've been doing some digging, and I notice something weird. As the GPU changes up through the power levels, the FAHBench https://fahbench.github.io/ benchmarker falls over as soon as the GPU enters power level 3 (P3). Now, I haven't ever overclocked the GPU (GTX 1070) and PSU is more than big enough to run the 150W dissipation.

The GPU will happily sit running a hacked about version of the matrix multiply (just the multiply operation in a infinite loop, recompiled):
Image

Again, thanks for the help.

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Wed Apr 01, 2020 6:58 am
by Joe_H
The GPU folding core does not use CUDA, it uses OpenCL.

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Wed Apr 01, 2020 1:20 pm
by m1geo
Yeah, I realise this. It was more a proof of concept that the GPU is there, will talk to the PC, and will run something without crashing.

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Wed Apr 01, 2020 5:46 pm
by m1geo
Some more debugging with FAHBench https://fahbench.github.io/...

The GPU benchmark runs to about 10% before falling over with either "NaN" error or some random exception (usually clEnqueueMapBuffer).

Image

Image

I'm not too sure what to do with this information, or how to debug further...
Thanks...

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Wed Apr 01, 2020 11:18 pm
by toTOW
It's not a sign of good shape of your GPU ... :(

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Thu Apr 02, 2020 1:01 am
by m1geo
That's what I thought.

I'm new to GPUs. I'm an electronic engineer, so I don't know the details of GPUs, Nvidia settings, parameters, etc., but I have a pragmatic considered approach.

What I have learned this evening is that reducing the power limit down makes things behave, and I can complete the test.

Image

My current working theory is that there is either a power issue or a clock speed issue which the lower power limit prevents the GPU from entering. The GPU came from a friend, but maybe he dabbled with overclocking it in the past and didn't remember. Thanks all.

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Thu Apr 02, 2020 2:55 am
by ipkh
I've never managed to get a bad GPU to work via downclocking. And I have sent at least 4 cards back for warranty service in the past 5 years. 24/7 folding just has a habit of revealing faults with graphics cards.
You should definitely contact the manufacturer about warranty status on that card.

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Thu Apr 02, 2020 12:50 pm
by m1geo
Thanks for the heads up.

Using Uengine Heaven https://benchmark.unigine.com/heaven looping, I am able to get the GPU performance at
[*]Graphics: -70
[*]Memory: +500

However, a 10 minute FAHBench session doesn't like that. For FAHBench, I need to run:
[*]Graphics: -90
[*]Memory: +300

The card is an Asus ROG Strix GTX 1070 O8G with the factory heavy overclock, so I guess I'm just reducing that default a little. If I had a Windows key, I'd check to see how it performed on Windows. Maybe worth a shot.

Thanks for the help/advice.

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Thu Apr 02, 2020 3:05 pm
by kevinjos
m1geo - I do not see the same compile error reported by florinandrei. I think we may be dealing with separate issues. Do you consistently see the BAD_WORK_UNIT warning?

florinandrei - have you tried to compile the cuda samples as shown by m1geo above?

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Thu Apr 02, 2020 11:59 pm
by toTOW
Rule number one to overclocking with FAH : don't touch the VRAM clocks ! It creates more issues that it adds performances.

However factory overclocks should work ... it they don't, then the card need a RMA.

Does the card runs fine in Furmak with manufacturer default clocks ?

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Fri Apr 03, 2020 1:16 pm
by m1geo
I finally found my issue. It's bizarre! One of the fan bearings has failed. When the controller tried to spin the fans up, the fan would spin a bit, then jam, then drag the 12V rail down on the GPU. That caused all kinds of weirdness. Simply unplugging the one fan and the card works fine. I have ordered 3 new fans. Thanks for the patience and the advice!

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Fri Apr 03, 2020 1:44 pm
by Neil-B
Wow … damn good spot/catch … at least fans are (I believe) cheaper than new GPU card !!

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Sat Apr 04, 2020 12:35 pm
by toTOW
That was a nasty one ... :(

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Sat Apr 04, 2020 6:54 pm
by kevinjos
Right on, good catch! I'm curios if florinandrei was able to sort out their compile error. Are there a set of standard programs to test the system's ability to compile OpenCL code?

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Posted: Sun Apr 05, 2020 1:02 pm
by Roadpower
Nice catch indeed.