Page 1 of 1

ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 4:43 am
by Dravor
Hello all,

I've been folding pretty much just CPU to this point and today tried to setup GPU folding on my Ubuntu 18.04VM, with a Nvidia P2000 passed through. I can perform hardware transcoding, and generally the GPU seems to work outside of folding.

I do see a lot of the No WU's available for this configuration, which seems to be normal due to so many folks folding.

When I do get a WU this happens:

Code: Select all

04:38:34:WU01:FS01:Connecting to 18.218.241.186:80
04:38:34:WU01:FS01:Assigned to work server 13.82.98.119
04:38:34:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106GL [Quadro P2000] [MED-XN71]  3935 from 13.82.98.119
04:38:34:WU01:FS01:Connecting to 13.82.98.119:8080
04:38:54:WU00:FS01:Upload complete
04:38:54:WU00:FS01:Server responded WORK_ACK (400)
04:38:54:WU00:FS01:Cleaning up
04:39:14:WU01:FS01:Downloading 161.50MiB
04:39:19:WU01:FS01:Download complete
04:39:19:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13874 run:0 clone:1723 gen:48 core:0x22 unit:0x0000003b0d5262775e7ade27d2d17acb
04:39:19:WU01:FS01:Starting
04:39:19:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 22008 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
04:39:19:WU01:FS01:Started FahCore on PID 22496
04:39:19:WU01:FS01:Core PID:22500
04:39:19:WU01:FS01:FahCore 0x22 started
04:39:19:WU01:FS01:0x22:*********************** Log Started 2020-04-23T04:39:19Z ***********************
04:39:19:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
04:39:19:WU01:FS01:0x22:       Type: 0x22
04:39:19:WU01:FS01:0x22:       Core: Core22
04:39:19:WU01:FS01:0x22:    Website: https://foldingathome.org/
04:39:19:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
04:39:19:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
04:39:19:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
04:39:19:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22496 -checkpoint 15
04:39:19:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
04:39:19:WU01:FS01:0x22:             0 -gpu 0
04:39:19:WU01:FS01:0x22:     Config: <none>
04:39:19:WU01:FS01:0x22:************************************ Build *************************************
04:39:19:WU01:FS01:0x22:    Version: 0.0.2
04:39:19:WU01:FS01:0x22:       Date: Dec 6 2019
04:39:19:WU01:FS01:0x22:       Time: 21:20:17
04:39:19:WU01:FS01:0x22: Repository: Git
04:39:19:WU01:FS01:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
04:39:19:WU01:FS01:0x22:     Branch: core22
04:39:19:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
04:39:19:WU01:FS01:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
04:39:19:WU01:FS01:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
04:39:19:WU01:FS01:0x22:       Bits: 64
04:39:19:WU01:FS01:0x22:       Mode: Release
04:39:19:WU01:FS01:0x22:************************************ System ************************************
04:39:19:WU01:FS01:0x22:        CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
04:39:19:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 37 Stepping 1
04:39:19:WU01:FS01:0x22:       CPUs: 24
04:39:19:WU01:FS01:0x22:     Memory: 137.72GiB
04:39:19:WU01:FS01:0x22:Free Memory: 12.13GiB
04:39:19:WU01:FS01:0x22:    Threads: POSIX_THREADS
04:39:19:WU01:FS01:0x22: OS Version: 4.15
04:39:19:WU01:FS01:0x22:Has Battery: false
04:39:19:WU01:FS01:0x22: On Battery: false
04:39:19:WU01:FS01:0x22: UTC Offset: -4
04:39:19:WU01:FS01:0x22:        PID: 22500
04:39:19:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
04:39:19:WU01:FS01:0x22:         OS: Linux 4.15.0-72-generic x86_64
04:39:19:WU01:FS01:0x22:    OS Arch: AMD64
04:39:19:WU01:FS01:0x22:********************************************************************************
04:39:19:WU01:FS01:0x22:Project: 13874 (Run 0, Clone 1723, Gen 48)
04:39:19:WU01:FS01:0x22:Unit: 0x0000003b0d5262775e7ade27d2d17acb
04:39:19:WU01:FS01:0x22:Reading tar file core.xml
04:39:19:WU01:FS01:0x22:Reading tar file integrator.xml
04:39:19:WU01:FS01:0x22:Reading tar file state.xml
04:39:20:WU01:FS01:0x22:Reading tar file system.xml
04:39:21:WU01:FS01:0x22:Digital signatures verified
04:39:21:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
04:39:21:WU01:FS01:0x22:Version 0.0.2
04:39:47:WU01:FS01:0x22:ERROR:exception: Error compiling kernel: 
04:39:47:WU01:FS01:0x22:Saving result file ../logfile_01.txt
04:39:47:WU01:FS01:0x22:Saving result file science.log
04:39:47:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
04:39:47:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:39:47:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13874 run:0 clone:1723 gen:48 core:0x22 unit:0x0000003b0d5262775e7ade27d2d17acb
04:39:47:WU01:FS01:Uploading 7.50KiB to 13.82.98.119
04:39:47:WU01:FS01:Connecting to 13.82.98.119:8080
04:39:47:WU00:FS01:Connecting to 65.254.110.245:80
04:39:47:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
04:39:47:WU00:FS01:Connecting to 18.218.241.186:80
04:39:47:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
04:39:47:WU00:FS01:Connecting to 65.254.110.245:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
04:39:48:WU00:FS01:Connecting to 18.218.241.186:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
04:39:48:ERROR:WU00:FS01:Exception: Could not get an assignment
04:39:48:WU00:FS01:Connecting to 65.254.110.245:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
04:39:48:WU00:FS01:Connecting to 18.218.241.186:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
04:39:48:WU00:FS01:Connecting to 65.254.110.245:80
04:39:48:WU00:FS01:Assigned to work server 3.133.76.19
04:39:48:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106GL [Quadro P2000] [MED-XN71]  3935 from 3.133.76.19
04:39:48:WU00:FS01:Connecting to 3.133.76.19:8080
It happens pretty regularly, which leads me to believe I have an issue with my config.

Here is my config, cuda and nvidia drivers are both found.

I also installed opencl, since initially I got a opencl error.

Code: Select all

*********************** Log Started 2020-04-23T04:24:36Z ***********************
04:24:36:****************************** FAHClient ******************************
04:24:36:        Version: 7.6.9
04:24:36:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
04:24:36:      Copyright: 2020 foldingathome.org
04:24:36:       Homepage: https://foldingathome.org/
04:24:36:           Date: Apr 17 2020
04:24:36:           Time: 18:11:26
04:24:36:       Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
04:24:36:         Branch: master
04:24:36:       Compiler: GNU 8.3.0
04:24:36:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
04:24:36:                 -funroll-loops -fno-pie
04:24:36:       Platform: linux2 4.19.0-5-amd64
04:24:36:           Bits: 64
04:24:36:           Mode: Release
04:24:36:           Args: --child /etc/fahclient/config.xml --run-as fahclient
04:24:36:                 --pid-file=/var/run/fahclient.pid --daemon
04:24:36:         Config: /etc/fahclient/config.xml
04:24:36:******************************** CBang ********************************
04:24:36:           Date: Apr 17 2020
04:24:36:           Time: 18:10:13
04:24:36:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
04:24:36:         Branch: master
04:24:36:       Compiler: GNU 8.3.0
04:24:36:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
04:24:36:                 -funroll-loops -fno-pie -fPIC
04:24:36:       Platform: linux2 4.19.0-5-amd64
04:24:36:           Bits: 64
04:24:36:           Mode: Release
04:24:36:******************************* System ********************************
04:24:36:            CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
04:24:36:         CPU ID: GenuineIntel Family 6 Model 37 Stepping 1
04:24:36:           CPUs: 24
04:24:36:         Memory: 137.72GiB
04:24:36:    Free Memory: 15.48GiB
04:24:36:        Threads: POSIX_THREADS
04:24:36:     OS Version: 4.15
04:24:36:    Has Battery: false
04:24:36:     On Battery: false
04:24:36:     UTC Offset: -4
04:24:36:            PID: 22008
04:24:36:            CWD: /var/lib/fahclient
04:24:36:             OS: Linux 4.15.0-72-generic x86_64
04:24:36:        OS Arch: AMD64
04:24:36:           GPUs: 1
04:24:36:          GPU 0: Bus:19 Slot:0 Func:0 NVIDIA:7 GP106GL [Quadro P2000] [MED-XN71]
04:24:36:                 3935
04:24:36:  CUDA Device 0: Platform:0 Device:0 Bus:19 Slot:0 Compute:6.1 Driver:10.1
04:24:36:OpenCL Device 0: Platform:0 Device:0 Bus:19 Slot:0 Compute:1.2 Driver:430.64
04:24:36:******************************* libFAH ********************************
04:24:36:           Date: Apr 15 2020
04:24:36:           Time: 21:43:24
04:24:36:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
04:24:36:         Branch: master
04:24:36:       Compiler: GNU 8.3.0
04:24:36:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
04:24:36:                 -funroll-loops -fno-pie
04:24:36:       Platform: linux2 4.19.0-5-amd64
04:24:36:           Bits: 64
04:24:36:           Mode: Release
04:24:36:***********************************************************************
04:24:36:<config>
04:24:36:  <!-- Client Control -->
04:24:36:  <fold-anon v='true'/>
04:24:36:
04:24:36:  <!-- HTTP Server -->
04:24:36:  <allow v='192.168.1.0/24'/>
04:24:36:
04:24:36:  <!-- Slot Control -->
04:24:36:  <power v='MEDIUM'/>
04:24:36:
04:24:36:  <!-- User Information -->
04:24:36:  <team v='227802'/>
04:24:36:  <user v='Dravor'/>
04:24:36:
04:24:36:  <!-- Web Server -->
04:24:36:  <web-allow v='192.168.1.0/24'/>
04:24:36:
04:24:36:  <!-- Folding Slots -->
04:24:36:  <slot id='1' type='GPU'/>
04:24:36:</config>
Any help would be totally appreciated!

Thanks!

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 8:41 am
by PantherX
Welcome to the F@H Forum Dravor,

What temperatures do you see when folding starts on your GPU?

Also, I noticed that you're not using a passkey. It is recommended to use one due to security reasons and bonus points. Here's the link if you would like to read more about it: https://foldingathome.org/support/faq/points/passkey/

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 2:04 pm
by Dravor
PantherX wrote:Welcome to the F@H Forum Dravor,

What temperatures do you see when folding starts on your GPU?

Also, I noticed that you're not using a passkey. It is recommended to use one due to security reasons and bonus points. Here's the link if you would like to read more about it: https://foldingathome.org/support/faq/points/passkey/
From the logs it doesn't look like it ever starts folding, it just throws that error.

Temps this morning have been between 40/44 degrees celsius. The card is inside of a Dell r720, so cooling up to this point has never been an issue.

Thanks!

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 5:27 pm
by Joe_H
Did you install the drivers, coda or OpenCL after installing the client? If so it is possible the GPU slot is not configured with the right indexes and is trying to run the folding core on the wrong device. Could you try deleting and recreating the GPU folding slot.

The installer does a better job setting up normally, but the 7.6.9 client is not always detecting GPUs properly at installation time. There is a beta out as well if you want to try that - https://foldingathome.org/beta/.

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 6:30 pm
by Dravor
Joe_H wrote:Did you install the drivers, coda or OpenCL after installing the client? If so it is possible the GPU slot is not configured with the right indexes and is trying to run the folding core on the wrong device. Could you try deleting and recreating the GPU folding slot.

The installer does a better job setting up normally, but the 7.6.9 client is not always detecting GPUs properly at installation time. There is a beta out as well if you want to try that - https://foldingathome.org/beta/.
I believe I installed OpenCL after, but what's old is I even stopped the FahClient and grabbed a docker install as well and saw the same error.

I've gone ahead and installed the beta, removed the GPU folding slot, and let fahclient add it back. Waiting to get a workload to see what happens.

I have no CPU slot enabled so it should keep the logs pretty clean.

Thanks again!

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 6:32 pm
by Dravor
And same thing...

Is it just --verbose to turn on all the verbose logging?


18:30:54:WU00:FS00:0x22:Project: 14432 (Run 0, Clone 939, Gen 31)
18:30:54:WU00:FS00:0x22:Unit: 0x000000270d5262775e8b4d5dea98d64c
18:30:54:WU00:FS00:0x22:Reading tar file core.xml
18:30:54:WU00:FS00:0x22:Reading tar file integrator.xml
18:30:54:WU00:FS00:0x22:Reading tar file state.xml
18:30:54:WU00:FS00:0x22:Reading tar file system.xml
18:30:55:WU00:FS00:0x22:Digital signatures verified
18:30:55:WU00:FS00:0x22:Folding@home GPU Core22 Folding@home Core
18:30:55:WU00:FS00:0x22:Version 0.0.2
18:31:05:WU00:FS00:0x22:ERROR:exception: Error compiling kernel:
18:31:05:WU00:FS00:0x22:Saving result file ../logfile_01.txt
18:31:05:WU00:FS00:0x22:Saving result file science.log
18:31:05:WU00:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:31:06:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:31:06:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:14432 run:0 clone:939 gen:31 core:0x22 unit:0x000000270d5262775e8b4d5dea98d64c
18:31:06:WU00:FS00:Uploading 8.00KiB to 13.82.98.119
18:31:06:WU00:FS00:Connecting to 13.82.98.119:8080

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 6:34 pm
by HaloJones
You say this is on a Ubuntu VM. What is the actual host running? Generally speaking, VM's don't necessarily have the same level of access to the underlying hardware as a native OS.

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 6:45 pm
by Dravor
HaloJones wrote:You say this is on a Ubuntu VM. What is the actual host running? Generally speaking, VM's don't necessarily have the same level of access to the underlying hardware as a native OS.

It's running on top of ESXi 6.5, the GPU is passed through. While I agree with you, there are a ton of unRaid users running P2000's in VM's via the passthrough method. From a functional standpoint I can have the gpu used for video transcoding with zero issues.

So I turned verbose logging on but I still get limited info on the failure:

18:40:07:WU01:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:40:07:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:40:07:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:13873 run:0 clone:959 gen:58 core:0x22 unit:0x000000490d5262775e791a4799e3aaf9


Is there any way for me to check WU's I am getting against a known list of WU's that are having issues? Is it possible that WU's are so spare, that I'm only picking up ones which have issues?

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 6:58 pm
by HaloJones
No way that all the units could be bad. All units that make it to normal have been through multiple phases of testing. 13783 (your most recent failed unit) has been in normal running for nearly a month.

This is something else not the units themselves.

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Thu Apr 23, 2020 7:04 pm
by Neil-B
If you put the WUs PRCGs into the WU status App you will be able to check if anyone else has received any of them and returned OK

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Fri Apr 24, 2020 4:41 am
by Dravor
It was the 430.64 driver. Updated to 435,21 and I'm folding.

Thanks!


| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Posted: Fri Apr 24, 2020 7:29 am
by PantherX
Dravor wrote:...I turned verbose logging on but I still get limited info on the failure...
Please note that for the client, we recommend to use the default value of 3 even while troubleshooting. Any higher values will make it difficult and isn't recommended to use unless specifically asked by the developers.