Page 7 of 8

Re: Waiting for work on 192.0.2.1 ?

Posted: Wed Aug 05, 2020 2:25 am
by JimboPalmer
JohnChodera, I think this instance and the entirety of the Intel white list/black list fiasco strongly hint why you need a test system and a production system. Making changes without knowing the effects is unacceptable in Production.

"Every company has a test environment. Some very lucky ones are fortunate enough to have a completely separate one they can dedicate to Production." - KD5MDK

Re: Waiting for work on 192.0.2.1 ?

Posted: Wed Aug 05, 2020 2:44 am
by anandhanju
"Every company has a test environment. Some very lucky ones are fortunate enough to have a completely separate one they can dedicate to Production." - KD5MDK

Gold! :D

Re: Waiting for work on 192.0.2.1 ?

Posted: Wed Aug 05, 2020 4:12 pm
by TurboAsterix
JimboPalmer wrote:JohnChodera, I think this instance and the entirety of the Intel white list/black list fiasco strongly hint why you need a test system and a production system. Making changes without knowing the effects is unacceptable in Production.
At FAH's scale minimum a three to four tier approach ...
Development System -> Quality System -> Public Quality System (I am sure you will find some volunteers) -> Production

But maybe this or similar is even the case and sometimes errors just slip through :mrgreen: that's just the way it is.

Best regards,
TurboAsterix

Re: Waiting for work on 192.0.2.1 ?

Posted: Wed Aug 05, 2020 4:15 pm
by DocJonz
@JohnChodera - I appreciate the candour, as I'm sure other folks will.

Re: Waiting for work on 192.0.2.1 ?

Posted: Fri Aug 07, 2020 7:35 pm
by JohnChodera
> JohnChodera, I think this instance and the entirety of the Intel white list/black list fiasco strongly hint why you need a test system and a production system. Making changes without knowing the effects is unacceptable in Production.

@JimboPalmer : We hear you! We actually have four stages of quality control that correspond to various levels of test (INTERNAL, BETA, ADVANCED) and production (FAH), but the issues you bring up impact the one central point of all of these systems: The assignment server and the GPUs.txt file shared between these systems. As you surmise, this is a liability, and we're looking into how we might better partition these systems without creating too much additional complexity that creates even more potential for failure.

Re: Waiting for work on 192.0.2.1 ?

Posted: Fri Aug 07, 2020 7:47 pm
by JohnChodera
Just wanted to clarify in a bit more technical detail what occurred with clients being redirected to 192.0.2.1:
1. On Fri 31 July, we noted that a lot of core22 projects were erroring out with reports of misconfigured OpenCL platforms:

Code: Select all

ERROR:There is no registered Platform called "OpenCL"
2. Perceiving this as an issue that was saturating the work servers, we made a change to the AS on Fri night that redirected clients reporting misconfigured OpenCL platforms to 192.0.2.1
3. This surprisingly took out a lot of volunteers who had been happily folding with GPUs
4. @Bruce noticed this and alerted us about the problem
5. FAH devs met Mon 3 Aug and determined this was due to a flaw in client logic: If a client had more than one OpenCL driver installed and one of them did not have any devices configured, the client would mistakenly believe no OpenCL devices were configured on any platform, even though core22 could run just fine. To fix this, we (1) updated AS constraints to allow any configured OpenCL or CUDA platforms to get work, which should hopefully restore most affected volunteers. A more permanent fix will correct this logic error in the next client update, which should be out very soon.

Again, the FAH dev team working on GPU cores is working to improve the overall experience for everyone. Thanks so much for sticking with us!

~ John Chodera // MSKCC

Re: Waiting for work on 192.0.2.1 ?

Posted: Fri Aug 07, 2020 8:25 pm
by JohnChodera
@JimboPalmer: You'll be pleased to hear that the issue of a completely separate test FAH network again came up in our FAH operations chat today. We definitely want to make this happen---we're just still short developer resources to make this happen.
We have a supplement proposal into the NIH to help out with developer resources that we should hear about soon---please send good vibes our way!

~ John Chodera // MSKCC

Re: Waiting for work on 192.0.2.1 ?

Posted: Fri Aug 07, 2020 9:41 pm
by JimboPalmer
JohnChodera: I fought my bosses for a Development systems, a Test/Training system and a Production system. I know the ability of bosses to not understand why Production has to be pure. I wish you well!

Re: Waiting for work on 192.0.2.1 ?

Posted: Fri Aug 14, 2020 7:38 pm
by Whompithian
I have been unable to get the workaround (reinstalling/reconfiguring OpenCL) to work on CentoOS 8 with AMD ROCm 3.5.1 in a configuration that previously succeeded in finishing GPU work units:

Code: Select all

*********************** Log Started 2020-08-14T00:44:34Z ***********************
2020-08-14:00:44:34:Trying to access database...
2020-08-14:00:44:34:Successfully acquired database lock
2020-08-14:00:44:34:Read GPUs.txt
2020-08-14:00:44:34:Enabled folding slot 00: READY cpu:64
2020-08-14:00:44:34:************************ FAHClient *************************
2020-08-14:00:44:34:    Version: 7.6.13
2020-08-14:00:44:34:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
2020-08-14:00:44:34:  Copyright: 2020 foldingathome.org
2020-08-14:00:44:34:   Homepage: https://foldingathome.org/
2020-08-14:00:44:34:       Date: Apr 28 2020
2020-08-14:00:44:34:       Time: 04:20:27
2020-08-14:00:44:34:   Revision: 5a652817f46116b6e135503af97f18e094414e3b
2020-08-14:00:44:34:     Branch: master
2020-08-14:00:44:34:   Compiler: GNU 4.9.4
2020-08-14:00:44:34:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:44:34:   Platform: linux2 4.19.0-5-amd64
2020-08-14:00:44:34:       Bits: 64
2020-08-14:00:44:34:       Mode: Release
2020-08-14:00:44:34:       Args: --config=/etc/fahclient/config.xml --chdir=/var/lib/fahclient/
2020-08-14:00:44:34:             --child
2020-08-14:00:44:34:     Config: /etc/fahclient/config.xml
2020-08-14:00:44:34:************************** CBang ***************************
2020-08-14:00:44:34:       Date: Apr 25 2020
2020-08-14:00:44:34:       Time: 00:07:55
2020-08-14:00:44:34:   Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
2020-08-14:00:44:34:     Branch: master
2020-08-14:00:44:34:   Compiler: GNU 4.9.4
2020-08-14:00:44:34:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:44:34:             -fPIC
2020-08-14:00:44:34:   Platform: linux2 4.19.0-5-amd64
2020-08-14:00:44:34:       Bits: 64
2020-08-14:00:44:34:       Mode: Release
2020-08-14:00:44:34:************************** System **************************
2020-08-14:00:44:34:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
2020-08-14:00:44:34:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
2020-08-14:00:44:34:       CPUs: 64
2020-08-14:00:44:34:     Memory: 125.52GiB
2020-08-14:00:44:34:Free Memory: 124.65GiB
2020-08-14:00:44:34:    Threads: POSIX_THREADS
2020-08-14:00:44:34: OS Version: 4.18
2020-08-14:00:44:34:Has Battery: false
2020-08-14:00:44:34: On Battery: false
2020-08-14:00:44:34: UTC Offset: -7
2020-08-14:00:44:34:        PID: 1575
2020-08-14:00:44:34:        CWD: /var/lib/fahclient
2020-08-14:00:44:34:         OS: Linux 4.18.0-147.el8.x86_64 x86_64
2020-08-14:00:44:34:    OS Arch: AMD64
2020-08-14:00:44:34:       GPUs: 2
2020-08-14:00:44:34:      GPU 0: Bus:10 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-14:00:44:34:      GPU 1: Bus:69 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-14:00:44:34:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
2020-08-14:00:44:34:             libcuda.so: cannot open shared object file: No such file or
2020-08-14:00:44:34:             directory
2020-08-14:00:44:34:     OpenCL: Not detected: clGetDeviceIDs() returned -1
2020-08-14:00:44:34:************************** libFAH **************************
2020-08-14:00:44:34:       Date: Apr 15 2020
2020-08-14:00:44:34:       Time: 21:43:27
2020-08-14:00:44:34:   Revision: 216968bc7025029c841ed6e36e81a03a316890d3
2020-08-14:00:44:34:     Branch: master
2020-08-14:00:44:34:   Compiler: GNU 4.9.4
2020-08-14:00:44:34:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:44:34:   Platform: linux2 4.19.0-5-amd64
2020-08-14:00:44:34:       Bits: 64
2020-08-14:00:44:34:       Mode: Release
2020-08-14:00:44:34:************************************************************
2020-08-14:00:44:34:<config>
2020-08-14:00:44:34:  <!-- Folding Core -->
2020-08-14:00:44:34:  <checkpoint v='5'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Folding Slot Configuration -->
2020-08-14:00:44:34:  <client-type v='advanced'/>
2020-08-14:00:44:34:  <cpus v='64'/>
2020-08-14:00:44:34:  <disable-viz v='true'/>
2020-08-14:00:44:34:  <max-packet-size v='big'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- GUI -->
2020-08-14:00:44:34:  <gui-enabled v='false'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- HTTP Server -->
2020-08-14:00:44:34:  <max-connect-time v='604800'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Logging -->
2020-08-14:00:44:34:  <log-date v='true'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Remote Command Server -->
2020-08-14:00:44:34:  <command-address v='127.0.0.1'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Slot Control -->
2020-08-14:00:44:34:  <power v='FULL'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- User Information -->
2020-08-14:00:44:34:  <passkey v='*****'/>
2020-08-14:00:44:34:  <team v='40524'/>
2020-08-14:00:44:34:  <user v='Whompithian'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Web Server Sessions -->
2020-08-14:00:44:34:  <session-lifetime v='0'/>
2020-08-14:00:44:34:  <session-timeout v='0'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Work Unit Control -->
2020-08-14:00:44:34:  <stall-detection-enabled v='true'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34:  <!-- Folding Slots -->
2020-08-14:00:44:34:  <slot id='0' type='CPU'/>
2020-08-14:00:44:34:</config>
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign1.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign2.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign3.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign4.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Failed to find any IP addresses for assignment servers
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign1.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign2.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign3.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign4.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Failed to find any IP addresses for assignment servers
2020-08-14:00:45:34:WU00:FS00:Connecting to assign1.foldingathome.org:80
2020-08-14:00:45:34:WU00:FS00:Assigned to work server 3.21.157.11
2020-08-14:00:45:34:WU00:FS00:Requesting new work unit for slot 00: READY cpu:64 from 3.21.157.11
2020-08-14:00:45:34:WU00:FS00:Connecting to 3.21.157.11:8080
2020-08-14:00:45:34:16:127.0.0.1:New Web session
2020-08-14:00:45:35:WU00:FS00:Downloading 2.83MiB
2020-08-14:00:45:36:WU00:FS00:Download complete
2020-08-14:00:45:36:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14802 run:1655 clone:0 gen:178 core:0xa7 unit:0x000000c603159d0b5eb180b5645c5592
2020-08-14:00:45:36:WU00:FS00:Starting
2020-08-14:00:45:36:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 1575 -checkpoint 5 -np 64
2020-08-14:00:45:36:WU00:FS00:Started FahCore on PID 2397
2020-08-14:00:45:36:WU00:FS00:Core PID:2401
2020-08-14:00:45:36:WU00:FS00:FahCore 0xa7 started
2020-08-14:00:45:37:WU00:FS00:0xa7:*********************** Log Started 2020-08-14T00:45:36Z ***********************
2020-08-14:00:45:37:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
2020-08-14:00:45:37:WU00:FS00:0xa7:       Type: 0xa7
2020-08-14:00:45:37:WU00:FS00:0xa7:       Core: Gromacs
2020-08-14:00:45:37:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2397 -checkpoint 5 -np 64
2020-08-14:00:45:37:WU00:FS00:0xa7:************************************ CBang *************************************
2020-08-14:00:45:37:WU00:FS00:0xa7:       Date: Nov 27 2019
2020-08-14:00:45:37:WU00:FS00:0xa7:       Time: 11:26:54
2020-08-14:00:45:37:WU00:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
2020-08-14:00:45:37:WU00:FS00:0xa7:     Branch: master
2020-08-14:00:45:37:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
2020-08-14:00:45:37:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:45:37:WU00:FS00:0xa7:             -fno-pie -fPIC
2020-08-14:00:45:37:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
2020-08-14:00:45:37:WU00:FS00:0xa7:       Bits: 64
2020-08-14:00:45:37:WU00:FS00:0xa7:       Mode: Release
2020-08-14:00:45:37:WU00:FS00:0xa7:************************************ System ************************************
2020-08-14:00:45:37:WU00:FS00:0xa7:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
2020-08-14:00:45:37:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
2020-08-14:00:45:37:WU00:FS00:0xa7:       CPUs: 64
2020-08-14:00:45:37:WU00:FS00:0xa7:     Memory: 125.52GiB
2020-08-14:00:45:37:WU00:FS00:0xa7:Free Memory: 124.31GiB
2020-08-14:00:45:37:WU00:FS00:0xa7:    Threads: POSIX_THREADS
2020-08-14:00:45:37:WU00:FS00:0xa7: OS Version: 4.18
2020-08-14:00:45:37:WU00:FS00:0xa7:Has Battery: false
2020-08-14:00:45:37:WU00:FS00:0xa7: On Battery: false
2020-08-14:00:45:37:WU00:FS00:0xa7: UTC Offset: -7
2020-08-14:00:45:37:WU00:FS00:0xa7:        PID: 2401
2020-08-14:00:45:37:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
2020-08-14:00:45:37:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
2020-08-14:00:45:37:WU00:FS00:0xa7:    Version: 0.0.19
2020-08-14:00:45:37:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
2020-08-14:00:45:37:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
2020-08-14:00:45:37:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
2020-08-14:00:45:37:WU00:FS00:0xa7:       Date: Nov 26 2019
2020-08-14:00:45:37:WU00:FS00:0xa7:       Time: 00:41:42
2020-08-14:00:45:37:WU00:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
2020-08-14:00:45:37:WU00:FS00:0xa7:     Branch: master
2020-08-14:00:45:37:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
2020-08-14:00:45:37:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:45:37:WU00:FS00:0xa7:             -fno-pie
2020-08-14:00:45:37:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
2020-08-14:00:45:37:WU00:FS00:0xa7:       Bits: 64
2020-08-14:00:45:37:WU00:FS00:0xa7:       Mode: Release
2020-08-14:00:45:37:WU00:FS00:0xa7:************************************ Build *************************************
2020-08-14:00:45:37:WU00:FS00:0xa7:       SIMD: avx_256
2020-08-14:00:45:37:WU00:FS00:0xa7:********************************************************************************
2020-08-14:00:45:37:WU00:FS00:0xa7:Project: 14802 (Run 1655, Clone 0, Gen 178)
2020-08-14:00:45:37:WU00:FS00:0xa7:Unit: 0x000000c603159d0b5eb180b5645c5592
2020-08-14:00:45:37:WU00:FS00:0xa7:Reading tar file core.xml
2020-08-14:00:45:37:WU00:FS00:0xa7:Reading tar file frame178.tpr
2020-08-14:00:45:37:WU00:FS00:0xa7:Digital signatures verified
2020-08-14:00:45:37:WU00:FS00:0xa7:Calling: mdrun -s frame178.tpr -o frame178.trr -cpt 5 -nt 64
2020-08-14:00:45:37:WU00:FS00:0xa7:Steps: first=0 total=250000
2020-08-14:00:45:38:WU00:FS00:0xa7:Completed 1 out of 250000 steps (0%)
2020-08-14:00:45:52:WU00:FS00:0xa7:Completed 2500 out of 250000 steps (1%)
...
Still getting "OpenCL: Not detected: clGetDeviceIDs() returned -1". Note that, in the above log, the assignment server errors are due to network interfaces not being initialized before FAHClient starts. I removed the GPU configuration blocks and increased the CPU core limit to make CPU folding more efficient while this GPU work server black hole issue remains. With GPU slots configured with "gpu-index" and "opencl-index" I get assigned to work server 192.0.2.1 which fails with "Network unreachable."

My concern at this point is that, with the bogus work server assignment, I can't even tell if my configuration still works. I made multiple changes in an attempt to fix the issue, without success, and I may not have reverted everything completely. Is there a timeline on the release of the updated client that is expected to resolve this issue?

Re: Waiting for work on 192.0.2.1 ?

Posted: Fri Aug 14, 2020 8:44 pm
by bruce
The bogus IP address means that the Client has encountered a GPU in a state that cannot be configured. A GPU cannot fold unless OpenCL can be connected to it. The client has attempted to assign an index value associating the GPU with OpenCL has failed. (That's a common reason, but not the only one.) Under certain conditions, the failure of the GPU association process has resulted in the client proceeding to download a new assignment but your hardware has been unable to process it. What we do NOT want is an endless loop of downloading a fresh assignment and then dumping it because the hardware configuration didn't allow it to be processed. We don't want you to dump an endless series of WUs.

Installing the correct drivers and a functional OpenCL support package must be done by hand. If you need help, ask for it and we'll do what we can to solve the configuration setup.

Re: Waiting for work on 192.0.2.1 ?

Posted: Sat Aug 15, 2020 2:55 am
by Whompithian
The bogus IP address means that the Client has encountered a GPU in a state that cannot be configured.
This is clearly not the only reason for the IP assignment since, like most others who have posted to this topic, I was folding on GPUs with very few issues until 2020-07-31. As mentioned in my previous post, I had to set "gpu-index" and "opencl-index" explicitly in the "config.xml" in order for it to work, but it did work. A more accurate way to describe the current situation is that the client is unable to detect a working OpenCL configuration, even in many circumstances where one is present. My previous question was, is there a timeline for a client that supports proper detection of the host's OpenCL configuration?

If you still believe this is a configuration issue on my end, here is what I have:

Code: Select all

[root@folding ~]# cat /etc/redhat-release
CentOS Linux release 8.2.2004 (Core)

Code: Select all

[root@folding ~]# dnf list installed | grep @ROCm
comgr3.5.0.x86_64                             1.6.0.143_rocm_rel_3.5_30_e24e8c1-1            @ROCm
hsa-amd-aqlprofile3.5.0.x86_64                1.0.0-1                                        @ROCm
hsa-ext-rocr-dev3.5.0.x86_64                  1.1.30500.0_rocm_rel_3.5_30_def83d8a-1         @ROCm
hsa-rocr-dev3.5.0.x86_64                      1.1.30500.0_rocm_rel_3.5_30_def83d8a-1         @ROCm
hsakmt-roct.x86_64                            1.0.9_347_gd4b224f-1                           @ROCm
rocm-opencl-dev3.5.0.x86_64                   2.0.20191-1                                    @ROCm
rocm-opencl3.5.0.x86_64                       2.0.20191-1                                    @ROCm

Code: Select all

[root@folding ~]# dnf info ocl-icd
Last metadata expiration check: 0:09:12 ago on Fri 14 Aug 2020 07:07:16 PM PDT.
Installed Packages
Name         : ocl-icd
Version      : 2.2.12
Release      : 1.el8
Architecture : x86_64
Size         : 145 k
Source       : ocl-icd-2.2.12-1.el8.src.rpm
Repository   : @System
From repo    : AppStream
Summary      : OpenCL ICD Bindings
URL          : https://forge.imag.fr/projects/ocl-icd/
License      : BSD
Description  : OpenCL ICD Bindings.

Code: Select all

[root@folding ~]# tree /opt/
/opt/
├── rocm -> rocm-3.5.0
└── rocm-3.5.0
    ├── hsa
    │   ├── include
    │   │   └── hsa
    │   │       ├── Brig.h
    │   │       ├── amd_hsa_common.h
    │   │       ├── amd_hsa_elf.h
    │   │       ├── amd_hsa_kernel_code.h
    │   │       ├── amd_hsa_queue.h
    │   │       ├── amd_hsa_signal.h
    │   │       ├── hsa.h
    │   │       ├── hsa_api_trace.h
    │   │       ├── hsa_ext_amd.h
    │   │       ├── hsa_ext_finalize.h
    │   │       ├── hsa_ext_image.h
    │   │       ├── hsa_ven_amd_aqlprofile.h
    │   │       └── hsa_ven_amd_loader.h
    │   └── lib
    │       ├── libhsa-ext-image64.so -> libhsa-ext-image64.so.1
    │       ├── libhsa-ext-image64.so.1 -> libhsa-ext-image64.so.1.1.30500
    │       ├── libhsa-ext-image64.so.1.1.30500
    │       ├── libhsa-runtime64.so -> libhsa-runtime64.so.1
    │       ├── libhsa-runtime64.so.1 -> libhsa-runtime64.so.1.1.30500
    │       └── libhsa-runtime64.so.1.1.30500
    ├── hsa-amd-aqlprofile
    │   └── lib
    │       ├── libhsa-amd-aqlprofile64.so -> libhsa-amd-aqlprofile64.so.1
    │       ├── libhsa-amd-aqlprofile64.so.1 -> libhsa-amd-aqlprofile64.so.1.0.30500
    │       └── libhsa-amd-aqlprofile64.so.1.0.30500
    ├── include
    │   ├── amd_comgr.h
    │   ├── hsa -> ../hsa/include/hsa
    │   ├── opencl1.2-c.pch
    │   └── opencl2.0-c.pch
    ├── lib
    │   ├── cmake
    │   │   └── amd_comgr
    │   │       ├── amd_comgr-config-version.cmake
    │   │       ├── amd_comgr-config.cmake
    │   │       ├── amd_comgr-targets-release.cmake
    │   │       └── amd_comgr-targets.cmake
    │   ├── libOpenCL.so -> ../opencl/lib/libOpenCL.so.1.2
    │   ├── libOpenCL.so.1 -> ../opencl/lib/libOpenCL.so.1.2
    │   ├── libOpenCL.so.1.2 -> ../opencl/lib/libOpenCL.so.1.2
    │   ├── libamd_comgr.so -> libamd_comgr.so.1
    │   ├── libamd_comgr.so.1 -> libamd_comgr.so.1.6.30500
    │   ├── libamd_comgr.so.1.6.30500
    │   ├── libhsa-ext-image64.so -> ../hsa/lib/libhsa-ext-image64.so
    │   ├── libhsa-runtime64.so -> ../hsa/lib/libhsa-runtime64.so
    │   └── libhsa-runtime64.so.1 -> ../hsa/lib/libhsa-runtime64.so.1
    ├── lib64
    │   ├── libhsakmt.so -> libhsakmt.so.1
    │   ├── libhsakmt.so.1 -> libhsakmt.so.1.0.30500
    │   └── libhsakmt.so.1.0.30500
    ├── opencl
    │   ├── bin
    │   │   └── clinfo
    │   ├── include
    │   │   └── CL
    │   │       ├── cl.h
    │   │       ├── cl.hpp
    │   │       ├── cl2.hpp
    │   │       ├── cl_dx9_media_sharing_intel.h
    │   │       ├── cl_ext.h
    │   │       ├── cl_ext_intel.h
    │   │       ├── cl_gl.h
    │   │       ├── cl_gl_ext.h
    │   │       ├── cl_icd.h
    │   │       ├── cl_platform.h
    │   │       ├── cl_va_api_media_sharing_intel.h
    │   │       ├── cl_version.h
    │   │       └── opencl.h
    │   └── lib
    │       ├── libOpenCL.so -> libOpenCL.so.1
    │       ├── libOpenCL.so.1 -> libOpenCL.so.1.2
    │       ├── libOpenCL.so.1.2
    │       └── libamdocl64.so
    └── share
        ├── amd_comgr
        │   ├── LICENSE.txt
        │   ├── NOTICES.txt
        │   └── README.md
        └── doc
            └── hsakmt
                └── LICENSE.md

23 directories, 63 files

Code: Select all

[root@folding ~]# /opt/rocm/opencl/bin/clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0 AMD-APP (3137.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology:                               PCI[ B#10, D#0, F#0 ]
  Max compute units:                             64
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1750Mhz
  Address bits:                                  64
  Max memory allocation:                         7287183769
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26751
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            8573157376
  Constant buffer size:                          7287183769
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          2992216473
  Max global variable size:                      7287183769
  Max global variable preferred total size:      8573157376
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x7ff0a4ef9cf0
  Name:                                          gfx900
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0
  Driver version:                                3137.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology:                               PCI[ B#69, D#0, F#0 ]
  Max compute units:                             64
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1750Mhz
  Address bits:                                  64
  Max memory allocation:                         7287183769
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26751
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            8573157376
  Constant buffer size:                          7287183769
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          2992216473
  Max global variable size:                      7287183769
  Max global variable preferred total size:      8573157376
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x7ff0a4ef9cf0
  Name:                                          gfx900
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0
  Driver version:                                3137.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

Code: Select all

[root@folding ~]# sudo -u fahclient /opt/rocm/opencl/bin/clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0 AMD-APP (3137.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology:                               PCI[ B#10, D#0, F#0 ]
  Max compute units:                             64
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1750Mhz
  Address bits:                                  64
  Max memory allocation:                         7287183769
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26751
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            8573157376
  Constant buffer size:                          7287183769
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          2992216473
  Max global variable size:                      7287183769
  Max global variable preferred total size:      8573157376
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x7f2733913cf0
  Name:                                          gfx900
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0
  Driver version:                                3137.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology:                               PCI[ B#69, D#0, F#0 ]
  Max compute units:                             64
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1750Mhz
  Address bits:                                  64
  Max memory allocation:                         7287183769
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26751
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            8573157376
  Constant buffer size:                          7287183769
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          2992216473
  Max global variable size:                      7287183769
  Max global variable preferred total size:      8573157376
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x7f2733913cf0
  Name:                                          gfx900
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0
  Driver version:                                3137.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

Code: Select all

[root@folding ~]# ls -lha /etc/OpenCL/vendors/
total 12K
drwxr-xr-x. 2 root root 4.0K Aug 10 16:53 .
drwxr-xr-x. 3 root root 4.0K Jun 15 19:29 ..
-rw-r--r--. 1 root root   15 Aug 10 16:53 amdocl64.icd

Code: Select all

[root@folding ~]# cat /etc/OpenCL/vendors/amdocl64.icd
libamdocl64.so

Code: Select all

[root@folding ~]# ls -lh /dev/kfd
crw-rw-rw-. 1 root render 237, 0 Aug 14 18:56 /dev/kfd

Code: Select all

[root@folding ~]# ls -lha /dev/dri/
total 0
drwxr-xr-x.  3 root root        140 Aug 14 18:56 .
drwxr-xr-x. 19 root root       3.6K Aug 14 18:56 ..
drwxr-xr-x.  2 root root        120 Aug 14 18:56 by-path
crw-rw----.  1 root video  226,   0 Aug 14 18:56 card0
crw-rw----.  1 root video  226,   1 Aug 14 18:56 card1
crw-rw-rw-.  1 root render 226, 128 Aug 14 18:56 renderD128
crw-rw-rw-.  1 root render 226, 129 Aug 14 18:56 renderD129

Code: Select all

[root@folding ~]# groups fahclient
fahclient : fahclient video render systemd-journal

Code: Select all

[root@folding fold]# /usr/bin/FAHClient --config=./config.xml --chdir=./
2020-08-15:02:43:41:Removing old file 'logs/log-20200806-043504.txt'
2020-08-15:02:43:41:Trying to access database...
2020-08-15:02:43:41:Successfully acquired database lock
2020-08-15:02:43:41:Read GPUs.txt
2020-08-15:02:43:41:Enabled folding slot 00: PAUSED cpu:54 (by user)
2020-08-15:02:43:41:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41:Enabled folding slot 02: READY gpu:1:Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41:ERROR:No compute devices matched GPU #0 {
2020-08-15:02:43:41:ERROR:  "vendor": 4098,
2020-08-15:02:43:41:ERROR:  "device": 26751,
2020-08-15:02:43:41:ERROR:  "type": 1,
2020-08-15:02:43:41:ERROR:  "species": 5,
2020-08-15:02:43:41:ERROR:  "description": "Vega 10 XL/XT [Radeon RX Vega 56/64]"
2020-08-15:02:43:41:ERROR:}.  You may need to update your graphics drivers.
2020-08-15:02:43:41:ERROR:No compute devices matched GPU #1 {
2020-08-15:02:43:41:ERROR:  "vendor": 4098,
2020-08-15:02:43:41:ERROR:  "device": 26751,
2020-08-15:02:43:41:ERROR:  "type": 1,
2020-08-15:02:43:41:ERROR:  "species": 5,
2020-08-15:02:43:41:ERROR:  "description": "Vega 10 XL/XT [Radeon RX Vega 56/64]"
2020-08-15:02:43:41:ERROR:}.  You may need to update your graphics drivers.
2020-08-15:02:43:41:************************ FAHClient *************************
2020-08-15:02:43:41:    Version: 7.6.13
2020-08-15:02:43:41:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
2020-08-15:02:43:41:  Copyright: 2020 foldingathome.org
2020-08-15:02:43:41:   Homepage: https://foldingathome.org/
2020-08-15:02:43:41:       Date: Apr 28 2020
2020-08-15:02:43:41:       Time: 04:20:27
2020-08-15:02:43:41:   Revision: 5a652817f46116b6e135503af97f18e094414e3b
2020-08-15:02:43:41:     Branch: master
2020-08-15:02:43:41:   Compiler: GNU 4.9.4
2020-08-15:02:43:41:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-15:02:43:41:   Platform: linux2 4.19.0-5-amd64
2020-08-15:02:43:41:       Bits: 64
2020-08-15:02:43:41:       Mode: Release
2020-08-15:02:43:41:       Args: --config=./config.xml --chdir=./
2020-08-15:02:43:41:     Config: /root/fold/./config.xml
2020-08-15:02:43:41:************************** CBang ***************************
2020-08-15:02:43:41:       Date: Apr 25 2020
2020-08-15:02:43:41:       Time: 00:07:55
2020-08-15:02:43:41:   Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
2020-08-15:02:43:41:     Branch: master
2020-08-15:02:43:41:   Compiler: GNU 4.9.4
2020-08-15:02:43:41:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-15:02:43:41:             -fPIC
2020-08-15:02:43:41:   Platform: linux2 4.19.0-5-amd64
2020-08-15:02:43:41:       Bits: 64
2020-08-15:02:43:41:       Mode: Release
2020-08-15:02:43:41:************************** System **************************
2020-08-15:02:43:41:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
2020-08-15:02:43:41:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
2020-08-15:02:43:41:       CPUs: 64
2020-08-15:02:43:41:     Memory: 125.52GiB
2020-08-15:02:43:41:Free Memory: 123.61GiB
2020-08-15:02:43:41:    Threads: POSIX_THREADS
2020-08-15:02:43:41: OS Version: 4.18
2020-08-15:02:43:41:Has Battery: false
2020-08-15:02:43:41: On Battery: false
2020-08-15:02:43:41: UTC Offset: -7
2020-08-15:02:43:41:        PID: 2809
2020-08-15:02:43:41:        CWD: /root/fold
2020-08-15:02:43:41:         OS: Linux 4.18.0-147.el8.x86_64 x86_64
2020-08-15:02:43:41:    OS Arch: AMD64
2020-08-15:02:43:41:       GPUs: 2
2020-08-15:02:43:41:      GPU 0: Bus:10 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41:      GPU 1: Bus:69 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
2020-08-15:02:43:41:             libcuda.so: cannot open shared object file: No such file or
2020-08-15:02:43:41:             directory
2020-08-15:02:43:41:     OpenCL: Not detected: clGetDeviceIDs() returned -1
2020-08-15:02:43:41:************************** libFAH **************************
2020-08-15:02:43:41:       Date: Apr 15 2020
2020-08-15:02:43:41:       Time: 21:43:27
2020-08-15:02:43:41:   Revision: 216968bc7025029c841ed6e36e81a03a316890d3
2020-08-15:02:43:41:     Branch: master
2020-08-15:02:43:41:   Compiler: GNU 4.9.4
2020-08-15:02:43:41:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-15:02:43:41:   Platform: linux2 4.19.0-5-amd64
2020-08-15:02:43:41:       Bits: 64
2020-08-15:02:43:41:       Mode: Release
2020-08-15:02:43:41:************************************************************
2020-08-15:02:43:41:<config>
2020-08-15:02:43:41:  <!-- Folding Core -->
2020-08-15:02:43:41:  <checkpoint v='5'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Folding Slot Configuration -->
2020-08-15:02:43:41:  <client-type v='advanced'/>
2020-08-15:02:43:41:  <cpus v='54'/>
2020-08-15:02:43:41:  <disable-viz v='true'/>
2020-08-15:02:43:41:  <max-packet-size v='big'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- GUI -->
2020-08-15:02:43:41:  <gui-enabled v='false'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- HTTP Server -->
2020-08-15:02:43:41:  <http-addresses v='127.0.0.1:7397'/>
2020-08-15:02:43:41:  <max-connect-time v='604800'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Logging -->
2020-08-15:02:43:41:  <log-date v='true'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Remote Command Server -->
2020-08-15:02:43:41:  <command-address v='127.0.0.1'/>
2020-08-15:02:43:41:  <command-port v='36331'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Slot Control -->
2020-08-15:02:43:41:  <power v='FULL'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- User Information -->
2020-08-15:02:43:41:  <passkey v='*****'/>
2020-08-15:02:43:41:  <team v='40524'/>
2020-08-15:02:43:41:  <user v='Whompithian'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Web Server Sessions -->
2020-08-15:02:43:41:  <session-lifetime v='0'/>
2020-08-15:02:43:41:  <session-timeout v='0'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Work Unit Control -->
2020-08-15:02:43:41:  <stall-detection-enabled v='true'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41:  <!-- Folding Slots -->
2020-08-15:02:43:41:  <slot id='0' type='CPU'>
2020-08-15:02:43:41:    <paused v='true'/>
2020-08-15:02:43:41:  </slot>
2020-08-15:02:43:41:  <slot id='1' type='GPU'>
2020-08-15:02:43:41:    <gpu-index v='0'/>
2020-08-15:02:43:41:    <opencl-index v='0'/>
2020-08-15:02:43:41:  </slot>
2020-08-15:02:43:41:  <slot id='2' type='GPU'>
2020-08-15:02:43:41:    <gpu-index v='1'/>
2020-08-15:02:43:41:    <opencl-index v='1'/>
2020-08-15:02:43:41:  </slot>
2020-08-15:02:43:41:</config>
2020-08-15:02:43:41:WU00:FS01:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:41:WU01:FS02:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:42:WU00:FS01:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:42:WU01:FS02:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:43:WU00:FS01:Assigned to work server 192.0.2.1
2020-08-15:02:43:43:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:43:WU00:FS01:Connecting to 192.0.2.1:8080
2020-08-15:02:43:43:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:43:WU00:FS01:Connecting to 192.0.2.1:80
2020-08-15:02:43:43:WU01:FS02:Assigned to work server 192.0.2.1
2020-08-15:02:43:43:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:43:WU01:FS02:Connecting to 192.0.2.1:8080
2020-08-15:02:43:43:ERROR:WU00:FS01:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
2020-08-15:02:43:43:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:43:WU01:FS02:Connecting to 192.0.2.1:80
2020-08-15:02:43:43:WU00:FS01:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:44:ERROR:WU01:FS02:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
2020-08-15:02:43:44:WU01:FS02:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:44:WU00:FS01:Assigned to work server 192.0.2.1
2020-08-15:02:43:44:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:44:WU00:FS01:Connecting to 192.0.2.1:8080
2020-08-15:02:43:44:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:44:WU00:FS01:Connecting to 192.0.2.1:80
2020-08-15:02:43:44:WU01:FS02:Assigned to work server 192.0.2.1
2020-08-15:02:43:44:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:44:WU01:FS02:Connecting to 192.0.2.1:8080
2020-08-15:02:43:44:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:44:WU01:FS02:Connecting to 192.0.2.1:80
2020-08-15:02:43:44:ERROR:WU00:FS01:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
2020-08-15:02:43:45:ERROR:WU01:FS02:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
^C2020-08-15:02:43:46:Caught signal SIGINT(2) on PID 2809
2020-08-15:02:43:46:Exiting, please wait. . .
2020-08-15:02:43:47:Clean exit
I also have "strace" output from "/opt/rocm/opencl/bin/clinfo" and "/usr/bin/FAHClient", but those are quite large and I do not see a way to attach files to this post. I do not have any logs old enough to demonstrate success at GPU folding. Please let me know if there is any more information I can provide that would be of use. Also, please address my question concerning a timeline on a client update.

Re: Waiting for work on 192.0.2.1 ?

Posted: Sat Aug 15, 2020 8:54 am
by bruce
The bogus IP address means that the Client has encountered a GPU in a state that cannot be configured.
If you still believe this is a configuration issue on my end, here is what I have ...
Where in that first quote do the word "you" It wasn't a configuration error that you made. The client did it on it's own and then it encountered a GPU in a state that could not be configured.

The problem is a result of the GPU being configure based on three values of "*index" settings. The latest beta version of FAHClient has eliminated the concept of "*index" and replaced it with a new concept. I suspect this will resolve future cases of the same problem but since that beta just came out today, I'm not 100% confident that it fixed all the corner cases but at first glance it is looking good.

Re: Waiting for work on 192.0.2.1 ?

Posted: Mon Aug 17, 2020 5:46 am
by PantherX
Whompithian wrote:...Is there a timeline on the release of the updated client that is expected to resolve this issue?
Welcome to the F@H Forum Whompithian,

F@H does not provide typical ETAs, as in, a month or a date. Instead, they use this timeline:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Now: It's when the release is available
Very Soon: It's when active development is happening
Soon: It's when it is on the table
Soon-ish: It's when there's discussion happening
Not Soon: It's in the backlog
End Of Time: It's on the (ever growing) wishlist

The new client is in the Very Soon state as it does bring some new GPU detection code that will help make this a much better experience for donors :)

Re: Waiting for work on 192.0.2.1 ?

Posted: Mon Aug 17, 2020 9:32 am
by HaloJones
JohnChodera wrote:@JimboPalmer: You'll be pleased to hear that the issue of a completely separate test FAH network again came up in our FAH operations chat today. We definitely want to make this happen---we're just still short developer resources to make this happen.
We have a supplement proposal into the NIH to help out with developer resources that we should hear about soon---please send good vibes our way!

~ John Chodera // MSKCC
Critical error here was to make a change just before everyone disappears for two days...

I've built multiple $bn websites and none make changes even emergency ones without having support people ready to regress it at a moment's notice.

Just sayin'.

Re: Waiting for work on 192.0.2.1 ?

Posted: Sun Sep 06, 2020 1:05 pm
by wdanwatts
Well!
I allowed my AMD Fedora system to upgrade, and now I get the dreaded assigning to 192.0.2.1 (I tried the "dnf remove mesa-libOpenCL" step, but the system didn't have that library to remove).
Any new hints?

Dan Watts