Page 1 of 1
RTX folding problem - Sometimes it doesn't work
Posted: Tue Apr 09, 2019 7:00 am
by goldfries
Hi all,
I wonder if anyone of you could assist me on this matter as it's taking too much of my time trouble shooting.
As with the title, folding on RTX cards has been a hit and miss so far.
[case 1]
A machine that I had been folding with the RTX 2080 worked flawlessly for a whole month, you can refer to my record of some 30 mil points acquired
https://folding.extremeoverclocking.com ... =&u=452964
I had an event on 31st March so I needed the RTX 2080 so I swapped it with a Radeon Vega 56, again no problem it folded on as it should. As of today when I put the RTX 2080 back into place - nope, it's a no go.
Same machine, same everything AND YET it refuses to run.
GPU is seen as READY, it keeps download projects yet once done it doesn't move at all.
[case 2]
Newly setup machine, again no it doesn't work. Latest drivers and software and all, yet it doesn't work.
-------------------------------
INFO : No, I'm not new to folding at home. I've been doing it for a super long time and 2nd highest folder in the country.
Highest if you count daily rate. I'm just having tremendous problem to get RTX cards to run.
My hope right now is that it miraculously runs after posting this.
I have other RTX cards (2070, 2060........) but no, I've not the time to try them out yet. Hope you guys can shed some light on this matter if you solved the same.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Tue Apr 09, 2019 12:22 pm
by Theodore
Linux or Windows?
Re: RTX folding problem - Sometimes it doesn't work
Posted: Tue Apr 09, 2019 2:42 pm
by JimboPalmer
If you post logs so we can see what your set up is and what the program thinks happened, I would be happy to post what I think happened.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Tue Apr 09, 2019 5:35 pm
by bruce
Case 1: Do a clean install of the drivers for the Vega. (remove the drivers for the nVidia). Then reinstall FAHClient, removing DATA.
Case 2: Post the log.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Tue Apr 09, 2019 10:09 pm
by toTOW
Yes, we need logs showing what the client detects at startup ...
Quick guesses :
Case 1 : there's probably too many OpenCL platforms installed (NV, AMD and eventually also Intel) and the client doesn't know which one to use.
Case 2 : either you installed the client as a service or you let Windows use its own driver which doesn't include OpenCL ... install drivers from a package downloaded from NV.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Wed Apr 10, 2019 2:56 am
by goldfries
Theodore wrote:Linux or Windows?
Windows 10. 1809.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Wed Apr 10, 2019 3:10 am
by goldfries
Ignore Case 2 for now, it's because of Case 2 that I'm now encountering it on Case 1.
Code: Select all
02:37:11:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:13816 run:0 clone:56 gen:170 core:0x21 unit:0x000000e280fccb045b36d01a34e00550
02:37:11:WU01:FS01:Connecting to 128.252.203.4:8080
02:37:11:WU00:FS01:Connecting to 65.254.110.245:8080
02:37:12:WU01:FS01:Server responded WORK_ACK (400)
02:37:12:WU01:FS01:Cleaning up
02:37:12:WU00:FS01:Assigned to work server 155.247.166.220
02:37:12:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:TU104 [GeForce RTX 2080] from 155.247.166.220
02:37:12:WU00:FS01:Connecting to 155.247.166.220:8080
02:37:14:WU00:FS01:Downloading 557.15KiB
02:37:16:WU00:FS01:Download complete
02:37:16:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14164 run:0 clone:88 gen:44 core:0x21 unit:0x000000470002894c5c38bfc630a57c10
02:37:16:WU00:FS01:Starting
02:37:16:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:37:16:WU00:FS01:Starting
02:37:16:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:38:17:WU00:FS01:Starting
02:38:17:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:39:54:WU00:FS01:Starting
02:39:54:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:42:31:WU00:FS01:Starting
02:42:31:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:46:45:WU00:FS01:Starting
02:46:45:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:46:45:WU00:FS01:Sending unit results: id:00 state:SEND error:FAILED project:14164 run:0 clone:88 gen:44 core:0x21 unit:0x000000470002894c5c38bfc630a57c10
02:46:45:WU00:FS01:Connecting to 155.247.166.220:8080
02:46:46:WU01:FS01:Connecting to 65.254.110.245:8080
02:46:46:WU00:FS01:Server responded WORK_ACK (400)
02:46:46:WU00:FS01:Cleaning up
02:46:47:WU01:FS01:Assigned to work server 128.252.203.4
02:46:47:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:TU104 [GeForce RTX 2080] from 128.252.203.4
02:46:47:WU01:FS01:Connecting to 128.252.203.4:8080
02:46:48:WU01:FS01:Downloading 11.66MiB
02:46:54:WU01:FS01:Download 35.91%
02:47:00:WU01:FS01:Download 93.80%
02:47:00:WU01:FS01:Download complete
02:47:00:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13816 run:0 clone:644 gen:164 core:0x21 unit:0x000000d580fccb045b36d01c7cfff2f4
02:47:00:WU01:FS01:Starting
02:47:00:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:47:01:WU01:FS01:Starting
02:47:01:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:48:01:WU01:FS01:Starting
02:48:01:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:49:38:WU01:FS01:Starting
02:49:38:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:52:15:WU01:FS01:Starting
02:52:15:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:56:29:WU01:FS01:Starting
02:56:29:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:56:29:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:13816 run:0 clone:644 gen:164 core:0x21 unit:0x000000d580fccb045b36d01c7cfff2f4
02:56:29:WU01:FS01:Connecting to 128.252.203.4:8080
02:56:30:WU00:FS01:Connecting to 65.254.110.245:8080
02:56:30:WU01:FS01:Server responded WORK_ACK (400)
02:56:30:WU01:FS01:Cleaning up
02:56:31:WU00:FS01:Assigned to work server 155.247.166.220
02:56:31:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:TU104 [GeForce RTX 2080] from 155.247.166.220
02:56:31:WU00:FS01:Connecting to 155.247.166.220:8080
02:56:33:WU00:FS01:Downloading 504.85KiB
02:56:35:WU00:FS01:Download complete
02:56:35:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14163 run:14 clone:23 gen:41 core:0x21 unit:0x000000430002894c5c38be82d27f768a
02:56:35:WU00:FS01:Starting
02:56:35:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:56:36:WU00:FS01:Starting
02:56:36:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:57:36:WU00:FS01:Starting
02:57:36:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
02:59:13:WU00:FS01:Starting
02:59:13:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
So it looks like FAH keeps thinking it to run OpenCL when it should be running CUDA, right?
I've removed the GPU from slot and re-added, even resorted to reinstall the the software with data cleared.
The FAHClient will download new WU / Update core, so it never crossed my mind to check this.
How do I go about now?
*just done DDU, reinstalling latest drivers for Nvidia*
Re: RTX folding problem - Sometimes it doesn't work
Posted: Wed Apr 10, 2019 3:28 am
by goldfries
btw thanks all. Did a clean up and now it's working.
My mistake to assume that FAHClient is able to detect and use the right hashing method.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Wed Apr 10, 2019 3:33 am
by Joe_H
goldfries wrote:So it looks like FAH keeps thinking it to run OpenCL when it should be running CUDA, right?
No, because no folding cores for GPU folding use CUDA currently, they all use OpenCL At some point in the future they might release a CUDA version of Core 22 that is currently in beta testing, but that has not happened yet and has not been determined to be definitely happening.
What can be happening is that the wrong OpenCL library is being accessed, potentially there could be ones for AMD, nVidia, Intel, or some other hardware.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Wed Apr 10, 2019 5:39 am
by bruce
When running OpenCL for nVidia, the CUDA drivers are also required. Apparently much of their hardware support is inside their drivers and somehow CUDA makes that work.
The OpenCL drivers for Intel have never worked very well, even on Intel hardware and they seem to recognize ONLY the Intel hardware. Again, there's some mixing of low-level hardware drivers and higher level OpenCL drivers. I suspect that only ATI/AMD have managed to keep hardware drivers and OpenCL independent of each other, but that may not be true either.
NVidia does support CUDA without OpenCL (it's their proprietary language) but as Joe said, FAH is not currently supporting any FAHCores that interface directly with CUDA.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Fri Apr 12, 2019 10:26 pm
by Theodore
If I'm not mistaken, it's been a while since I ran FAH on Windows,
Last time I checked, NVidia drivers can choose CUDA over OpenGl, when you set which card you want to use for CUDA in their Nvidia-windows Control panel.
Anyway, I noticed one of my cards getting nearly twice the PPD rating, while the other card was getting very low ratings.
I assumed that resources from one card were being used for the other's WU, possibly resulting in more PCIE bandwidth (not tested), and lower overall PPD (unless Cuda was using unused resources of the secondary card, in a case where the WU the card ran, didn't use all the resources of the GPU, and they were relocated to the other WU).
I'm speculating here, but I did see NVidia cards using CUDA for folding in the past, and an unrealistic PPD count for one card (in the likes of 2.2M PPD for a 2070, while the other (2080) card ran at 500k PPD).
I'm not sure if CUDA in Windows, is working well at this point yet...
There may be a benefit using CUDA, on multi GPU systems that don't fold continuously.
When you're finishing the WUs to shut down your PC, there is a possibility that once one WU is finished, CUDA can speed up the second WU, by using two GPUs working on the remaining WU.
I'm just speculating here, as I haven't taken the time to test these theories yet.
Re: RTX folding problem - Sometimes it doesn't work
Posted: Sat Apr 13, 2019 3:44 am
by bruce
Theodore wrote:, NVidia drivers can choose CUDA over OpenGl, when you set which card you want to use for CUDA in their Nvidia-windows Control panel.
Did you intend to say OpenGL or OpenCL? Except for FAHViewer, FAH uses OpenCL.
There would be an ada\vantage to re-writing the FAHCore to use CUDA instead of OpenCL -- for those who have NVidia hardware, of course. The DISadvantage is that FAH would have to support both an OpenCL version and a CUDA version (Yes, some of our Donors have ATI/AMD GPUs). FAH has to weigh the cost of porting a functioning FAHCore to CUDA vs. the increase to FAH's global throughput. That will never happen until the OpenCL version is bug-free ... and it may never happen (as in the case of Core_21.
FAH's Development resources are scarce and there are many competing ways to increase global FAH throughput. Development costs are expended judiciously.