ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Moderators: Site Moderators, FAHC Science Team

Post Reply
Dravor
Posts: 10
Joined: Thu Apr 23, 2020 3:55 am

ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Dravor »

Hello all,

I've been folding pretty much just CPU to this point and today tried to setup GPU folding on my Ubuntu 18.04VM, with a Nvidia P2000 passed through. I can perform hardware transcoding, and generally the GPU seems to work outside of folding.

I do see a lot of the No WU's available for this configuration, which seems to be normal due to so many folks folding.

When I do get a WU this happens:

Code: Select all

04:38:34:WU01:FS01:Connecting to 18.218.241.186:80
04:38:34:WU01:FS01:Assigned to work server 13.82.98.119
04:38:34:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106GL [Quadro P2000] [MED-XN71]  3935 from 13.82.98.119
04:38:34:WU01:FS01:Connecting to 13.82.98.119:8080
04:38:54:WU00:FS01:Upload complete
04:38:54:WU00:FS01:Server responded WORK_ACK (400)
04:38:54:WU00:FS01:Cleaning up
04:39:14:WU01:FS01:Downloading 161.50MiB
04:39:19:WU01:FS01:Download complete
04:39:19:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13874 run:0 clone:1723 gen:48 core:0x22 unit:0x0000003b0d5262775e7ade27d2d17acb
04:39:19:WU01:FS01:Starting
04:39:19:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 22008 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
04:39:19:WU01:FS01:Started FahCore on PID 22496
04:39:19:WU01:FS01:Core PID:22500
04:39:19:WU01:FS01:FahCore 0x22 started
04:39:19:WU01:FS01:0x22:*********************** Log Started 2020-04-23T04:39:19Z ***********************
04:39:19:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
04:39:19:WU01:FS01:0x22:       Type: 0x22
04:39:19:WU01:FS01:0x22:       Core: Core22
04:39:19:WU01:FS01:0x22:    Website: https://foldingathome.org/
04:39:19:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
04:39:19:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
04:39:19:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
04:39:19:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22496 -checkpoint 15
04:39:19:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
04:39:19:WU01:FS01:0x22:             0 -gpu 0
04:39:19:WU01:FS01:0x22:     Config: <none>
04:39:19:WU01:FS01:0x22:************************************ Build *************************************
04:39:19:WU01:FS01:0x22:    Version: 0.0.2
04:39:19:WU01:FS01:0x22:       Date: Dec 6 2019
04:39:19:WU01:FS01:0x22:       Time: 21:20:17
04:39:19:WU01:FS01:0x22: Repository: Git
04:39:19:WU01:FS01:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
04:39:19:WU01:FS01:0x22:     Branch: core22
04:39:19:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
04:39:19:WU01:FS01:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
04:39:19:WU01:FS01:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
04:39:19:WU01:FS01:0x22:       Bits: 64
04:39:19:WU01:FS01:0x22:       Mode: Release
04:39:19:WU01:FS01:0x22:************************************ System ************************************
04:39:19:WU01:FS01:0x22:        CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
04:39:19:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 37 Stepping 1
04:39:19:WU01:FS01:0x22:       CPUs: 24
04:39:19:WU01:FS01:0x22:     Memory: 137.72GiB
04:39:19:WU01:FS01:0x22:Free Memory: 12.13GiB
04:39:19:WU01:FS01:0x22:    Threads: POSIX_THREADS
04:39:19:WU01:FS01:0x22: OS Version: 4.15
04:39:19:WU01:FS01:0x22:Has Battery: false
04:39:19:WU01:FS01:0x22: On Battery: false
04:39:19:WU01:FS01:0x22: UTC Offset: -4
04:39:19:WU01:FS01:0x22:        PID: 22500
04:39:19:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
04:39:19:WU01:FS01:0x22:         OS: Linux 4.15.0-72-generic x86_64
04:39:19:WU01:FS01:0x22:    OS Arch: AMD64
04:39:19:WU01:FS01:0x22:********************************************************************************
04:39:19:WU01:FS01:0x22:Project: 13874 (Run 0, Clone 1723, Gen 48)
04:39:19:WU01:FS01:0x22:Unit: 0x0000003b0d5262775e7ade27d2d17acb
04:39:19:WU01:FS01:0x22:Reading tar file core.xml
04:39:19:WU01:FS01:0x22:Reading tar file integrator.xml
04:39:19:WU01:FS01:0x22:Reading tar file state.xml
04:39:20:WU01:FS01:0x22:Reading tar file system.xml
04:39:21:WU01:FS01:0x22:Digital signatures verified
04:39:21:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
04:39:21:WU01:FS01:0x22:Version 0.0.2
04:39:47:WU01:FS01:0x22:ERROR:exception: Error compiling kernel: 
04:39:47:WU01:FS01:0x22:Saving result file ../logfile_01.txt
04:39:47:WU01:FS01:0x22:Saving result file science.log
04:39:47:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
04:39:47:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:39:47:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13874 run:0 clone:1723 gen:48 core:0x22 unit:0x0000003b0d5262775e7ade27d2d17acb
04:39:47:WU01:FS01:Uploading 7.50KiB to 13.82.98.119
04:39:47:WU01:FS01:Connecting to 13.82.98.119:8080
04:39:47:WU00:FS01:Connecting to 65.254.110.245:80
04:39:47:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
04:39:47:WU00:FS01:Connecting to 18.218.241.186:80
04:39:47:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
04:39:47:WU00:FS01:Connecting to 65.254.110.245:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
04:39:48:WU00:FS01:Connecting to 18.218.241.186:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
04:39:48:ERROR:WU00:FS01:Exception: Could not get an assignment
04:39:48:WU00:FS01:Connecting to 65.254.110.245:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
04:39:48:WU00:FS01:Connecting to 18.218.241.186:80
04:39:48:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
04:39:48:WU00:FS01:Connecting to 65.254.110.245:80
04:39:48:WU00:FS01:Assigned to work server 3.133.76.19
04:39:48:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106GL [Quadro P2000] [MED-XN71]  3935 from 3.133.76.19
04:39:48:WU00:FS01:Connecting to 3.133.76.19:8080
It happens pretty regularly, which leads me to believe I have an issue with my config.

Here is my config, cuda and nvidia drivers are both found.

I also installed opencl, since initially I got a opencl error.

Code: Select all

*********************** Log Started 2020-04-23T04:24:36Z ***********************
04:24:36:****************************** FAHClient ******************************
04:24:36:        Version: 7.6.9
04:24:36:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
04:24:36:      Copyright: 2020 foldingathome.org
04:24:36:       Homepage: https://foldingathome.org/
04:24:36:           Date: Apr 17 2020
04:24:36:           Time: 18:11:26
04:24:36:       Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
04:24:36:         Branch: master
04:24:36:       Compiler: GNU 8.3.0
04:24:36:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
04:24:36:                 -funroll-loops -fno-pie
04:24:36:       Platform: linux2 4.19.0-5-amd64
04:24:36:           Bits: 64
04:24:36:           Mode: Release
04:24:36:           Args: --child /etc/fahclient/config.xml --run-as fahclient
04:24:36:                 --pid-file=/var/run/fahclient.pid --daemon
04:24:36:         Config: /etc/fahclient/config.xml
04:24:36:******************************** CBang ********************************
04:24:36:           Date: Apr 17 2020
04:24:36:           Time: 18:10:13
04:24:36:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
04:24:36:         Branch: master
04:24:36:       Compiler: GNU 8.3.0
04:24:36:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
04:24:36:                 -funroll-loops -fno-pie -fPIC
04:24:36:       Platform: linux2 4.19.0-5-amd64
04:24:36:           Bits: 64
04:24:36:           Mode: Release
04:24:36:******************************* System ********************************
04:24:36:            CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
04:24:36:         CPU ID: GenuineIntel Family 6 Model 37 Stepping 1
04:24:36:           CPUs: 24
04:24:36:         Memory: 137.72GiB
04:24:36:    Free Memory: 15.48GiB
04:24:36:        Threads: POSIX_THREADS
04:24:36:     OS Version: 4.15
04:24:36:    Has Battery: false
04:24:36:     On Battery: false
04:24:36:     UTC Offset: -4
04:24:36:            PID: 22008
04:24:36:            CWD: /var/lib/fahclient
04:24:36:             OS: Linux 4.15.0-72-generic x86_64
04:24:36:        OS Arch: AMD64
04:24:36:           GPUs: 1
04:24:36:          GPU 0: Bus:19 Slot:0 Func:0 NVIDIA:7 GP106GL [Quadro P2000] [MED-XN71]
04:24:36:                 3935
04:24:36:  CUDA Device 0: Platform:0 Device:0 Bus:19 Slot:0 Compute:6.1 Driver:10.1
04:24:36:OpenCL Device 0: Platform:0 Device:0 Bus:19 Slot:0 Compute:1.2 Driver:430.64
04:24:36:******************************* libFAH ********************************
04:24:36:           Date: Apr 15 2020
04:24:36:           Time: 21:43:24
04:24:36:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
04:24:36:         Branch: master
04:24:36:       Compiler: GNU 8.3.0
04:24:36:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
04:24:36:                 -funroll-loops -fno-pie
04:24:36:       Platform: linux2 4.19.0-5-amd64
04:24:36:           Bits: 64
04:24:36:           Mode: Release
04:24:36:***********************************************************************
04:24:36:<config>
04:24:36:  <!-- Client Control -->
04:24:36:  <fold-anon v='true'/>
04:24:36:
04:24:36:  <!-- HTTP Server -->
04:24:36:  <allow v='192.168.1.0/24'/>
04:24:36:
04:24:36:  <!-- Slot Control -->
04:24:36:  <power v='MEDIUM'/>
04:24:36:
04:24:36:  <!-- User Information -->
04:24:36:  <team v='227802'/>
04:24:36:  <user v='Dravor'/>
04:24:36:
04:24:36:  <!-- Web Server -->
04:24:36:  <web-allow v='192.168.1.0/24'/>
04:24:36:
04:24:36:  <!-- Folding Slots -->
04:24:36:  <slot id='1' type='GPU'/>
04:24:36:</config>
Any help would be totally appreciated!

Thanks!
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by PantherX »

Welcome to the F@H Forum Dravor,

What temperatures do you see when folding starts on your GPU?

Also, I noticed that you're not using a passkey. It is recommended to use one due to security reasons and bonus points. Here's the link if you would like to read more about it: https://foldingathome.org/support/faq/points/passkey/
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Dravor
Posts: 10
Joined: Thu Apr 23, 2020 3:55 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Dravor »

PantherX wrote:Welcome to the F@H Forum Dravor,

What temperatures do you see when folding starts on your GPU?

Also, I noticed that you're not using a passkey. It is recommended to use one due to security reasons and bonus points. Here's the link if you would like to read more about it: https://foldingathome.org/support/faq/points/passkey/
From the logs it doesn't look like it ever starts folding, it just throws that error.

Temps this morning have been between 40/44 degrees celsius. The card is inside of a Dell r720, so cooling up to this point has never been an issue.

Thanks!
Joe_H
Site Admin
Posts: 7993
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Joe_H »

Did you install the drivers, coda or OpenCL after installing the client? If so it is possible the GPU slot is not configured with the right indexes and is trying to run the folding core on the wrong device. Could you try deleting and recreating the GPU folding slot.

The installer does a better job setting up normally, but the 7.6.9 client is not always detecting GPUs properly at installation time. There is a beta out as well if you want to try that - https://foldingathome.org/beta/.
Image
Dravor
Posts: 10
Joined: Thu Apr 23, 2020 3:55 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Dravor »

Joe_H wrote:Did you install the drivers, coda or OpenCL after installing the client? If so it is possible the GPU slot is not configured with the right indexes and is trying to run the folding core on the wrong device. Could you try deleting and recreating the GPU folding slot.

The installer does a better job setting up normally, but the 7.6.9 client is not always detecting GPUs properly at installation time. There is a beta out as well if you want to try that - https://foldingathome.org/beta/.
I believe I installed OpenCL after, but what's old is I even stopped the FahClient and grabbed a docker install as well and saw the same error.

I've gone ahead and installed the beta, removed the GPU folding slot, and let fahclient add it back. Waiting to get a workload to see what happens.

I have no CPU slot enabled so it should keep the logs pretty clean.

Thanks again!
Dravor
Posts: 10
Joined: Thu Apr 23, 2020 3:55 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Dravor »

And same thing...

Is it just --verbose to turn on all the verbose logging?


18:30:54:WU00:FS00:0x22:Project: 14432 (Run 0, Clone 939, Gen 31)
18:30:54:WU00:FS00:0x22:Unit: 0x000000270d5262775e8b4d5dea98d64c
18:30:54:WU00:FS00:0x22:Reading tar file core.xml
18:30:54:WU00:FS00:0x22:Reading tar file integrator.xml
18:30:54:WU00:FS00:0x22:Reading tar file state.xml
18:30:54:WU00:FS00:0x22:Reading tar file system.xml
18:30:55:WU00:FS00:0x22:Digital signatures verified
18:30:55:WU00:FS00:0x22:Folding@home GPU Core22 Folding@home Core
18:30:55:WU00:FS00:0x22:Version 0.0.2
18:31:05:WU00:FS00:0x22:ERROR:exception: Error compiling kernel:
18:31:05:WU00:FS00:0x22:Saving result file ../logfile_01.txt
18:31:05:WU00:FS00:0x22:Saving result file science.log
18:31:05:WU00:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:31:06:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:31:06:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:14432 run:0 clone:939 gen:31 core:0x22 unit:0x000000270d5262775e8b4d5dea98d64c
18:31:06:WU00:FS00:Uploading 8.00KiB to 13.82.98.119
18:31:06:WU00:FS00:Connecting to 13.82.98.119:8080
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by HaloJones »

You say this is on a Ubuntu VM. What is the actual host running? Generally speaking, VM's don't necessarily have the same level of access to the underlying hardware as a native OS.
single 1070

Image
Dravor
Posts: 10
Joined: Thu Apr 23, 2020 3:55 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Dravor »

HaloJones wrote:You say this is on a Ubuntu VM. What is the actual host running? Generally speaking, VM's don't necessarily have the same level of access to the underlying hardware as a native OS.

It's running on top of ESXi 6.5, the GPU is passed through. While I agree with you, there are a ton of unRaid users running P2000's in VM's via the passthrough method. From a functional standpoint I can have the gpu used for video transcoding with zero issues.

So I turned verbose logging on but I still get limited info on the failure:

18:40:07:WU01:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:40:07:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:40:07:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:13873 run:0 clone:959 gen:58 core:0x22 unit:0x000000490d5262775e791a4799e3aaf9


Is there any way for me to check WU's I am getting against a known list of WU's that are having issues? Is it possible that WU's are so spare, that I'm only picking up ones which have issues?
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by HaloJones »

No way that all the units could be bad. All units that make it to normal have been through multiple phases of testing. 13783 (your most recent failed unit) has been in normal running for nearly a month.

This is something else not the units themselves.
single 1070

Image
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Neil-B »

If you put the WUs PRCGs into the WU status App you will be able to check if anyone else has received any of them and returned OK
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Dravor
Posts: 10
Joined: Thu Apr 23, 2020 3:55 am

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by Dravor »

It was the 430.64 driver. Updated to 435,21 and I'm folding.

Thanks!


| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: ubuntu 18.04, GPU Folding,Nothing but bad WU's.

Post by PantherX »

Dravor wrote:...I turned verbose logging on but I still get limited info on the failure...
Please note that for the client, we recommend to use the default value of 3 even while troubleshooting. Any higher values will make it difficult and isn't recommended to use unless specifically asked by the developers.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply