Waiting for work on 192.0.2.1 ?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Waiting for work on 192.0.2.1 ?
JohnChodera, I think this instance and the entirety of the Intel white list/black list fiasco strongly hint why you need a test system and a production system. Making changes without knowing the effects is unacceptable in Production.
"Every company has a test environment. Some very lucky ones are fortunate enough to have a completely separate one they can dedicate to Production." - KD5MDK
"Every company has a test environment. Some very lucky ones are fortunate enough to have a completely separate one they can dedicate to Production." - KD5MDK
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 522
- Joined: Mon Dec 03, 2007 4:33 am
- Location: Australia
Re: Waiting for work on 192.0.2.1 ?
"Every company has a test environment. Some very lucky ones are fortunate enough to have a completely separate one they can dedicate to Production." - KD5MDK
Gold!
-
- Posts: 9
- Joined: Sun Aug 02, 2020 12:46 pm
Re: Waiting for work on 192.0.2.1 ?
At FAH's scale minimum a three to four tier approach ...JimboPalmer wrote:JohnChodera, I think this instance and the entirety of the Intel white list/black list fiasco strongly hint why you need a test system and a production system. Making changes without knowing the effects is unacceptable in Production.
Development System -> Quality System -> Public Quality System (I am sure you will find some volunteers) -> Production
But maybe this or similar is even the case and sometimes errors just slip through that's just the way it is.
Best regards,
TurboAsterix
-
- Posts: 244
- Joined: Thu Dec 06, 2007 6:31 pm
- Hardware configuration: Folding with: 4x RTX 4070Ti, 1x RTX 4080 Super
- Location: United Kingdom
- Contact:
Re: Waiting for work on 192.0.2.1 ?
@JohnChodera - I appreciate the candour, as I'm sure other folks will.
Folding Stats (HFM.NET): DocJonz Folding Farm Stats
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Waiting for work on 192.0.2.1 ?
> JohnChodera, I think this instance and the entirety of the Intel white list/black list fiasco strongly hint why you need a test system and a production system. Making changes without knowing the effects is unacceptable in Production.
@JimboPalmer : We hear you! We actually have four stages of quality control that correspond to various levels of test (INTERNAL, BETA, ADVANCED) and production (FAH), but the issues you bring up impact the one central point of all of these systems: The assignment server and the GPUs.txt file shared between these systems. As you surmise, this is a liability, and we're looking into how we might better partition these systems without creating too much additional complexity that creates even more potential for failure.
@JimboPalmer : We hear you! We actually have four stages of quality control that correspond to various levels of test (INTERNAL, BETA, ADVANCED) and production (FAH), but the issues you bring up impact the one central point of all of these systems: The assignment server and the GPUs.txt file shared between these systems. As you surmise, this is a liability, and we're looking into how we might better partition these systems without creating too much additional complexity that creates even more potential for failure.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Waiting for work on 192.0.2.1 ?
Just wanted to clarify in a bit more technical detail what occurred with clients being redirected to 192.0.2.1:
1. On Fri 31 July, we noted that a lot of core22 projects were erroring out with reports of misconfigured OpenCL platforms:
2. Perceiving this as an issue that was saturating the work servers, we made a change to the AS on Fri night that redirected clients reporting misconfigured OpenCL platforms to 192.0.2.1
3. This surprisingly took out a lot of volunteers who had been happily folding with GPUs
4. @Bruce noticed this and alerted us about the problem
5. FAH devs met Mon 3 Aug and determined this was due to a flaw in client logic: If a client had more than one OpenCL driver installed and one of them did not have any devices configured, the client would mistakenly believe no OpenCL devices were configured on any platform, even though core22 could run just fine. To fix this, we (1) updated AS constraints to allow any configured OpenCL or CUDA platforms to get work, which should hopefully restore most affected volunteers. A more permanent fix will correct this logic error in the next client update, which should be out very soon.
Again, the FAH dev team working on GPU cores is working to improve the overall experience for everyone. Thanks so much for sticking with us!
~ John Chodera // MSKCC
1. On Fri 31 July, we noted that a lot of core22 projects were erroring out with reports of misconfigured OpenCL platforms:
Code: Select all
ERROR:There is no registered Platform called "OpenCL"
3. This surprisingly took out a lot of volunteers who had been happily folding with GPUs
4. @Bruce noticed this and alerted us about the problem
5. FAH devs met Mon 3 Aug and determined this was due to a flaw in client logic: If a client had more than one OpenCL driver installed and one of them did not have any devices configured, the client would mistakenly believe no OpenCL devices were configured on any platform, even though core22 could run just fine. To fix this, we (1) updated AS constraints to allow any configured OpenCL or CUDA platforms to get work, which should hopefully restore most affected volunteers. A more permanent fix will correct this logic error in the next client update, which should be out very soon.
Again, the FAH dev team working on GPU cores is working to improve the overall experience for everyone. Thanks so much for sticking with us!
~ John Chodera // MSKCC
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Waiting for work on 192.0.2.1 ?
@JimboPalmer: You'll be pleased to hear that the issue of a completely separate test FAH network again came up in our FAH operations chat today. We definitely want to make this happen---we're just still short developer resources to make this happen.
We have a supplement proposal into the NIH to help out with developer resources that we should hear about soon---please send good vibes our way!
~ John Chodera // MSKCC
We have a supplement proposal into the NIH to help out with developer resources that we should hear about soon---please send good vibes our way!
~ John Chodera // MSKCC
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Waiting for work on 192.0.2.1 ?
JohnChodera: I fought my bosses for a Development systems, a Test/Training system and a Production system. I know the ability of bosses to not understand why Production has to be pure. I wish you well!
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 39
- Joined: Thu Jun 25, 2020 12:40 am
Re: Waiting for work on 192.0.2.1 ?
I have been unable to get the workaround (reinstalling/reconfiguring OpenCL) to work on CentoOS 8 with AMD ROCm 3.5.1 in a configuration that previously succeeded in finishing GPU work units:
Still getting "OpenCL: Not detected: clGetDeviceIDs() returned -1". Note that, in the above log, the assignment server errors are due to network interfaces not being initialized before FAHClient starts. I removed the GPU configuration blocks and increased the CPU core limit to make CPU folding more efficient while this GPU work server black hole issue remains. With GPU slots configured with "gpu-index" and "opencl-index" I get assigned to work server 192.0.2.1 which fails with "Network unreachable."
My concern at this point is that, with the bogus work server assignment, I can't even tell if my configuration still works. I made multiple changes in an attempt to fix the issue, without success, and I may not have reverted everything completely. Is there a timeline on the release of the updated client that is expected to resolve this issue?
Code: Select all
*********************** Log Started 2020-08-14T00:44:34Z ***********************
2020-08-14:00:44:34:Trying to access database...
2020-08-14:00:44:34:Successfully acquired database lock
2020-08-14:00:44:34:Read GPUs.txt
2020-08-14:00:44:34:Enabled folding slot 00: READY cpu:64
2020-08-14:00:44:34:************************ FAHClient *************************
2020-08-14:00:44:34: Version: 7.6.13
2020-08-14:00:44:34: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
2020-08-14:00:44:34: Copyright: 2020 foldingathome.org
2020-08-14:00:44:34: Homepage: https://foldingathome.org/
2020-08-14:00:44:34: Date: Apr 28 2020
2020-08-14:00:44:34: Time: 04:20:27
2020-08-14:00:44:34: Revision: 5a652817f46116b6e135503af97f18e094414e3b
2020-08-14:00:44:34: Branch: master
2020-08-14:00:44:34: Compiler: GNU 4.9.4
2020-08-14:00:44:34: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:44:34: Platform: linux2 4.19.0-5-amd64
2020-08-14:00:44:34: Bits: 64
2020-08-14:00:44:34: Mode: Release
2020-08-14:00:44:34: Args: --config=/etc/fahclient/config.xml --chdir=/var/lib/fahclient/
2020-08-14:00:44:34: --child
2020-08-14:00:44:34: Config: /etc/fahclient/config.xml
2020-08-14:00:44:34:************************** CBang ***************************
2020-08-14:00:44:34: Date: Apr 25 2020
2020-08-14:00:44:34: Time: 00:07:55
2020-08-14:00:44:34: Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
2020-08-14:00:44:34: Branch: master
2020-08-14:00:44:34: Compiler: GNU 4.9.4
2020-08-14:00:44:34: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:44:34: -fPIC
2020-08-14:00:44:34: Platform: linux2 4.19.0-5-amd64
2020-08-14:00:44:34: Bits: 64
2020-08-14:00:44:34: Mode: Release
2020-08-14:00:44:34:************************** System **************************
2020-08-14:00:44:34: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
2020-08-14:00:44:34: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
2020-08-14:00:44:34: CPUs: 64
2020-08-14:00:44:34: Memory: 125.52GiB
2020-08-14:00:44:34:Free Memory: 124.65GiB
2020-08-14:00:44:34: Threads: POSIX_THREADS
2020-08-14:00:44:34: OS Version: 4.18
2020-08-14:00:44:34:Has Battery: false
2020-08-14:00:44:34: On Battery: false
2020-08-14:00:44:34: UTC Offset: -7
2020-08-14:00:44:34: PID: 1575
2020-08-14:00:44:34: CWD: /var/lib/fahclient
2020-08-14:00:44:34: OS: Linux 4.18.0-147.el8.x86_64 x86_64
2020-08-14:00:44:34: OS Arch: AMD64
2020-08-14:00:44:34: GPUs: 2
2020-08-14:00:44:34: GPU 0: Bus:10 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-14:00:44:34: GPU 1: Bus:69 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-14:00:44:34: CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
2020-08-14:00:44:34: libcuda.so: cannot open shared object file: No such file or
2020-08-14:00:44:34: directory
2020-08-14:00:44:34: OpenCL: Not detected: clGetDeviceIDs() returned -1
2020-08-14:00:44:34:************************** libFAH **************************
2020-08-14:00:44:34: Date: Apr 15 2020
2020-08-14:00:44:34: Time: 21:43:27
2020-08-14:00:44:34: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
2020-08-14:00:44:34: Branch: master
2020-08-14:00:44:34: Compiler: GNU 4.9.4
2020-08-14:00:44:34: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:44:34: Platform: linux2 4.19.0-5-amd64
2020-08-14:00:44:34: Bits: 64
2020-08-14:00:44:34: Mode: Release
2020-08-14:00:44:34:************************************************************
2020-08-14:00:44:34:<config>
2020-08-14:00:44:34: <!-- Folding Core -->
2020-08-14:00:44:34: <checkpoint v='5'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Folding Slot Configuration -->
2020-08-14:00:44:34: <client-type v='advanced'/>
2020-08-14:00:44:34: <cpus v='64'/>
2020-08-14:00:44:34: <disable-viz v='true'/>
2020-08-14:00:44:34: <max-packet-size v='big'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- GUI -->
2020-08-14:00:44:34: <gui-enabled v='false'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- HTTP Server -->
2020-08-14:00:44:34: <max-connect-time v='604800'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Logging -->
2020-08-14:00:44:34: <log-date v='true'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Remote Command Server -->
2020-08-14:00:44:34: <command-address v='127.0.0.1'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Slot Control -->
2020-08-14:00:44:34: <power v='FULL'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- User Information -->
2020-08-14:00:44:34: <passkey v='*****'/>
2020-08-14:00:44:34: <team v='40524'/>
2020-08-14:00:44:34: <user v='Whompithian'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Web Server Sessions -->
2020-08-14:00:44:34: <session-lifetime v='0'/>
2020-08-14:00:44:34: <session-timeout v='0'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Work Unit Control -->
2020-08-14:00:44:34: <stall-detection-enabled v='true'/>
2020-08-14:00:44:34:
2020-08-14:00:44:34: <!-- Folding Slots -->
2020-08-14:00:44:34: <slot id='0' type='CPU'/>
2020-08-14:00:44:34:</config>
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign1.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign2.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign3.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign4.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Failed to find any IP addresses for assignment servers
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign1.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign2.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign3.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Could not get IP address for assign4.foldingathome.org: Name or service not known
2020-08-14:00:44:34:ERROR:WU00:FS00:Exception: Failed to find any IP addresses for assignment servers
2020-08-14:00:45:34:WU00:FS00:Connecting to assign1.foldingathome.org:80
2020-08-14:00:45:34:WU00:FS00:Assigned to work server 3.21.157.11
2020-08-14:00:45:34:WU00:FS00:Requesting new work unit for slot 00: READY cpu:64 from 3.21.157.11
2020-08-14:00:45:34:WU00:FS00:Connecting to 3.21.157.11:8080
2020-08-14:00:45:34:16:127.0.0.1:New Web session
2020-08-14:00:45:35:WU00:FS00:Downloading 2.83MiB
2020-08-14:00:45:36:WU00:FS00:Download complete
2020-08-14:00:45:36:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14802 run:1655 clone:0 gen:178 core:0xa7 unit:0x000000c603159d0b5eb180b5645c5592
2020-08-14:00:45:36:WU00:FS00:Starting
2020-08-14:00:45:36:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 1575 -checkpoint 5 -np 64
2020-08-14:00:45:36:WU00:FS00:Started FahCore on PID 2397
2020-08-14:00:45:36:WU00:FS00:Core PID:2401
2020-08-14:00:45:36:WU00:FS00:FahCore 0xa7 started
2020-08-14:00:45:37:WU00:FS00:0xa7:*********************** Log Started 2020-08-14T00:45:36Z ***********************
2020-08-14:00:45:37:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
2020-08-14:00:45:37:WU00:FS00:0xa7: Type: 0xa7
2020-08-14:00:45:37:WU00:FS00:0xa7: Core: Gromacs
2020-08-14:00:45:37:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 706 -lifeline 2397 -checkpoint 5 -np 64
2020-08-14:00:45:37:WU00:FS00:0xa7:************************************ CBang *************************************
2020-08-14:00:45:37:WU00:FS00:0xa7: Date: Nov 27 2019
2020-08-14:00:45:37:WU00:FS00:0xa7: Time: 11:26:54
2020-08-14:00:45:37:WU00:FS00:0xa7: Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
2020-08-14:00:45:37:WU00:FS00:0xa7: Branch: master
2020-08-14:00:45:37:WU00:FS00:0xa7: Compiler: GNU 8.3.0
2020-08-14:00:45:37:WU00:FS00:0xa7: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:45:37:WU00:FS00:0xa7: -fno-pie -fPIC
2020-08-14:00:45:37:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
2020-08-14:00:45:37:WU00:FS00:0xa7: Bits: 64
2020-08-14:00:45:37:WU00:FS00:0xa7: Mode: Release
2020-08-14:00:45:37:WU00:FS00:0xa7:************************************ System ************************************
2020-08-14:00:45:37:WU00:FS00:0xa7: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
2020-08-14:00:45:37:WU00:FS00:0xa7: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
2020-08-14:00:45:37:WU00:FS00:0xa7: CPUs: 64
2020-08-14:00:45:37:WU00:FS00:0xa7: Memory: 125.52GiB
2020-08-14:00:45:37:WU00:FS00:0xa7:Free Memory: 124.31GiB
2020-08-14:00:45:37:WU00:FS00:0xa7: Threads: POSIX_THREADS
2020-08-14:00:45:37:WU00:FS00:0xa7: OS Version: 4.18
2020-08-14:00:45:37:WU00:FS00:0xa7:Has Battery: false
2020-08-14:00:45:37:WU00:FS00:0xa7: On Battery: false
2020-08-14:00:45:37:WU00:FS00:0xa7: UTC Offset: -7
2020-08-14:00:45:37:WU00:FS00:0xa7: PID: 2401
2020-08-14:00:45:37:WU00:FS00:0xa7: CWD: /var/lib/fahclient/work
2020-08-14:00:45:37:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
2020-08-14:00:45:37:WU00:FS00:0xa7: Version: 0.0.19
2020-08-14:00:45:37:WU00:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
2020-08-14:00:45:37:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
2020-08-14:00:45:37:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
2020-08-14:00:45:37:WU00:FS00:0xa7: Date: Nov 26 2019
2020-08-14:00:45:37:WU00:FS00:0xa7: Time: 00:41:42
2020-08-14:00:45:37:WU00:FS00:0xa7: Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
2020-08-14:00:45:37:WU00:FS00:0xa7: Branch: master
2020-08-14:00:45:37:WU00:FS00:0xa7: Compiler: GNU 8.3.0
2020-08-14:00:45:37:WU00:FS00:0xa7: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-14:00:45:37:WU00:FS00:0xa7: -fno-pie
2020-08-14:00:45:37:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
2020-08-14:00:45:37:WU00:FS00:0xa7: Bits: 64
2020-08-14:00:45:37:WU00:FS00:0xa7: Mode: Release
2020-08-14:00:45:37:WU00:FS00:0xa7:************************************ Build *************************************
2020-08-14:00:45:37:WU00:FS00:0xa7: SIMD: avx_256
2020-08-14:00:45:37:WU00:FS00:0xa7:********************************************************************************
2020-08-14:00:45:37:WU00:FS00:0xa7:Project: 14802 (Run 1655, Clone 0, Gen 178)
2020-08-14:00:45:37:WU00:FS00:0xa7:Unit: 0x000000c603159d0b5eb180b5645c5592
2020-08-14:00:45:37:WU00:FS00:0xa7:Reading tar file core.xml
2020-08-14:00:45:37:WU00:FS00:0xa7:Reading tar file frame178.tpr
2020-08-14:00:45:37:WU00:FS00:0xa7:Digital signatures verified
2020-08-14:00:45:37:WU00:FS00:0xa7:Calling: mdrun -s frame178.tpr -o frame178.trr -cpt 5 -nt 64
2020-08-14:00:45:37:WU00:FS00:0xa7:Steps: first=0 total=250000
2020-08-14:00:45:38:WU00:FS00:0xa7:Completed 1 out of 250000 steps (0%)
2020-08-14:00:45:52:WU00:FS00:0xa7:Completed 2500 out of 250000 steps (1%)
...
My concern at this point is that, with the bogus work server assignment, I can't even tell if my configuration still works. I made multiple changes in an attempt to fix the issue, without success, and I may not have reverted everything completely. Is there a timeline on the release of the updated client that is expected to resolve this issue?
Re: Waiting for work on 192.0.2.1 ?
The bogus IP address means that the Client has encountered a GPU in a state that cannot be configured. A GPU cannot fold unless OpenCL can be connected to it. The client has attempted to assign an index value associating the GPU with OpenCL has failed. (That's a common reason, but not the only one.) Under certain conditions, the failure of the GPU association process has resulted in the client proceeding to download a new assignment but your hardware has been unable to process it. What we do NOT want is an endless loop of downloading a fresh assignment and then dumping it because the hardware configuration didn't allow it to be processed. We don't want you to dump an endless series of WUs.
Installing the correct drivers and a functional OpenCL support package must be done by hand. If you need help, ask for it and we'll do what we can to solve the configuration setup.
Installing the correct drivers and a functional OpenCL support package must be done by hand. If you need help, ask for it and we'll do what we can to solve the configuration setup.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 39
- Joined: Thu Jun 25, 2020 12:40 am
Re: Waiting for work on 192.0.2.1 ?
This is clearly not the only reason for the IP assignment since, like most others who have posted to this topic, I was folding on GPUs with very few issues until 2020-07-31. As mentioned in my previous post, I had to set "gpu-index" and "opencl-index" explicitly in the "config.xml" in order for it to work, but it did work. A more accurate way to describe the current situation is that the client is unable to detect a working OpenCL configuration, even in many circumstances where one is present. My previous question was, is there a timeline for a client that supports proper detection of the host's OpenCL configuration?The bogus IP address means that the Client has encountered a GPU in a state that cannot be configured.
If you still believe this is a configuration issue on my end, here is what I have:
Code: Select all
[root@folding ~]# cat /etc/redhat-release
CentOS Linux release 8.2.2004 (Core)
Code: Select all
[root@folding ~]# dnf list installed | grep @ROCm
comgr3.5.0.x86_64 1.6.0.143_rocm_rel_3.5_30_e24e8c1-1 @ROCm
hsa-amd-aqlprofile3.5.0.x86_64 1.0.0-1 @ROCm
hsa-ext-rocr-dev3.5.0.x86_64 1.1.30500.0_rocm_rel_3.5_30_def83d8a-1 @ROCm
hsa-rocr-dev3.5.0.x86_64 1.1.30500.0_rocm_rel_3.5_30_def83d8a-1 @ROCm
hsakmt-roct.x86_64 1.0.9_347_gd4b224f-1 @ROCm
rocm-opencl-dev3.5.0.x86_64 2.0.20191-1 @ROCm
rocm-opencl3.5.0.x86_64 2.0.20191-1 @ROCm
Code: Select all
[root@folding ~]# dnf info ocl-icd
Last metadata expiration check: 0:09:12 ago on Fri 14 Aug 2020 07:07:16 PM PDT.
Installed Packages
Name : ocl-icd
Version : 2.2.12
Release : 1.el8
Architecture : x86_64
Size : 145 k
Source : ocl-icd-2.2.12-1.el8.src.rpm
Repository : @System
From repo : AppStream
Summary : OpenCL ICD Bindings
URL : https://forge.imag.fr/projects/ocl-icd/
License : BSD
Description : OpenCL ICD Bindings.
Code: Select all
[root@folding ~]# tree /opt/
/opt/
├── rocm -> rocm-3.5.0
└── rocm-3.5.0
├── hsa
│ ├── include
│ │ └── hsa
│ │ ├── Brig.h
│ │ ├── amd_hsa_common.h
│ │ ├── amd_hsa_elf.h
│ │ ├── amd_hsa_kernel_code.h
│ │ ├── amd_hsa_queue.h
│ │ ├── amd_hsa_signal.h
│ │ ├── hsa.h
│ │ ├── hsa_api_trace.h
│ │ ├── hsa_ext_amd.h
│ │ ├── hsa_ext_finalize.h
│ │ ├── hsa_ext_image.h
│ │ ├── hsa_ven_amd_aqlprofile.h
│ │ └── hsa_ven_amd_loader.h
│ └── lib
│ ├── libhsa-ext-image64.so -> libhsa-ext-image64.so.1
│ ├── libhsa-ext-image64.so.1 -> libhsa-ext-image64.so.1.1.30500
│ ├── libhsa-ext-image64.so.1.1.30500
│ ├── libhsa-runtime64.so -> libhsa-runtime64.so.1
│ ├── libhsa-runtime64.so.1 -> libhsa-runtime64.so.1.1.30500
│ └── libhsa-runtime64.so.1.1.30500
├── hsa-amd-aqlprofile
│ └── lib
│ ├── libhsa-amd-aqlprofile64.so -> libhsa-amd-aqlprofile64.so.1
│ ├── libhsa-amd-aqlprofile64.so.1 -> libhsa-amd-aqlprofile64.so.1.0.30500
│ └── libhsa-amd-aqlprofile64.so.1.0.30500
├── include
│ ├── amd_comgr.h
│ ├── hsa -> ../hsa/include/hsa
│ ├── opencl1.2-c.pch
│ └── opencl2.0-c.pch
├── lib
│ ├── cmake
│ │ └── amd_comgr
│ │ ├── amd_comgr-config-version.cmake
│ │ ├── amd_comgr-config.cmake
│ │ ├── amd_comgr-targets-release.cmake
│ │ └── amd_comgr-targets.cmake
│ ├── libOpenCL.so -> ../opencl/lib/libOpenCL.so.1.2
│ ├── libOpenCL.so.1 -> ../opencl/lib/libOpenCL.so.1.2
│ ├── libOpenCL.so.1.2 -> ../opencl/lib/libOpenCL.so.1.2
│ ├── libamd_comgr.so -> libamd_comgr.so.1
│ ├── libamd_comgr.so.1 -> libamd_comgr.so.1.6.30500
│ ├── libamd_comgr.so.1.6.30500
│ ├── libhsa-ext-image64.so -> ../hsa/lib/libhsa-ext-image64.so
│ ├── libhsa-runtime64.so -> ../hsa/lib/libhsa-runtime64.so
│ └── libhsa-runtime64.so.1 -> ../hsa/lib/libhsa-runtime64.so.1
├── lib64
│ ├── libhsakmt.so -> libhsakmt.so.1
│ ├── libhsakmt.so.1 -> libhsakmt.so.1.0.30500
│ └── libhsakmt.so.1.0.30500
├── opencl
│ ├── bin
│ │ └── clinfo
│ ├── include
│ │ └── CL
│ │ ├── cl.h
│ │ ├── cl.hpp
│ │ ├── cl2.hpp
│ │ ├── cl_dx9_media_sharing_intel.h
│ │ ├── cl_ext.h
│ │ ├── cl_ext_intel.h
│ │ ├── cl_gl.h
│ │ ├── cl_gl_ext.h
│ │ ├── cl_icd.h
│ │ ├── cl_platform.h
│ │ ├── cl_va_api_media_sharing_intel.h
│ │ ├── cl_version.h
│ │ └── opencl.h
│ └── lib
│ ├── libOpenCL.so -> libOpenCL.so.1
│ ├── libOpenCL.so.1 -> libOpenCL.so.1.2
│ ├── libOpenCL.so.1.2
│ └── libamdocl64.so
└── share
├── amd_comgr
│ ├── LICENSE.txt
│ ├── NOTICES.txt
│ └── README.md
└── doc
└── hsakmt
└── LICENSE.md
23 directories, 63 files
Code: Select all
[root@folding ~]# /opt/rocm/opencl/bin/clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (3137.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Vega 10 XL/XT [Radeon RX Vega 56/64]
Device Topology: PCI[ B#10, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1750Mhz
Address bits: 64
Max memory allocation: 7287183769
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 26751
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 7287183769
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2992216473
Max global variable size: 7287183769
Max global variable preferred total size: 8573157376
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7ff0a4ef9cf0
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3137.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Vega 10 XL/XT [Radeon RX Vega 56/64]
Device Topology: PCI[ B#69, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1750Mhz
Address bits: 64
Max memory allocation: 7287183769
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 26751
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 7287183769
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2992216473
Max global variable size: 7287183769
Max global variable preferred total size: 8573157376
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7ff0a4ef9cf0
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3137.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Code: Select all
[root@folding ~]# sudo -u fahclient /opt/rocm/opencl/bin/clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (3137.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Vega 10 XL/XT [Radeon RX Vega 56/64]
Device Topology: PCI[ B#10, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1750Mhz
Address bits: 64
Max memory allocation: 7287183769
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 26751
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 7287183769
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2992216473
Max global variable size: 7287183769
Max global variable preferred total size: 8573157376
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f2733913cf0
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3137.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Vega 10 XL/XT [Radeon RX Vega 56/64]
Device Topology: PCI[ B#69, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1750Mhz
Address bits: 64
Max memory allocation: 7287183769
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 26751
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 7287183769
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2992216473
Max global variable size: 7287183769
Max global variable preferred total size: 8573157376
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f2733913cf0
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3137.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Code: Select all
[root@folding ~]# ls -lha /etc/OpenCL/vendors/
total 12K
drwxr-xr-x. 2 root root 4.0K Aug 10 16:53 .
drwxr-xr-x. 3 root root 4.0K Jun 15 19:29 ..
-rw-r--r--. 1 root root 15 Aug 10 16:53 amdocl64.icd
Code: Select all
[root@folding ~]# cat /etc/OpenCL/vendors/amdocl64.icd
libamdocl64.so
Code: Select all
[root@folding ~]# ls -lh /dev/kfd
crw-rw-rw-. 1 root render 237, 0 Aug 14 18:56 /dev/kfd
Code: Select all
[root@folding ~]# ls -lha /dev/dri/
total 0
drwxr-xr-x. 3 root root 140 Aug 14 18:56 .
drwxr-xr-x. 19 root root 3.6K Aug 14 18:56 ..
drwxr-xr-x. 2 root root 120 Aug 14 18:56 by-path
crw-rw----. 1 root video 226, 0 Aug 14 18:56 card0
crw-rw----. 1 root video 226, 1 Aug 14 18:56 card1
crw-rw-rw-. 1 root render 226, 128 Aug 14 18:56 renderD128
crw-rw-rw-. 1 root render 226, 129 Aug 14 18:56 renderD129
Code: Select all
[root@folding ~]# groups fahclient
fahclient : fahclient video render systemd-journal
Code: Select all
[root@folding fold]# /usr/bin/FAHClient --config=./config.xml --chdir=./
2020-08-15:02:43:41:Removing old file 'logs/log-20200806-043504.txt'
2020-08-15:02:43:41:Trying to access database...
2020-08-15:02:43:41:Successfully acquired database lock
2020-08-15:02:43:41:Read GPUs.txt
2020-08-15:02:43:41:Enabled folding slot 00: PAUSED cpu:54 (by user)
2020-08-15:02:43:41:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41:Enabled folding slot 02: READY gpu:1:Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41:ERROR:No compute devices matched GPU #0 {
2020-08-15:02:43:41:ERROR: "vendor": 4098,
2020-08-15:02:43:41:ERROR: "device": 26751,
2020-08-15:02:43:41:ERROR: "type": 1,
2020-08-15:02:43:41:ERROR: "species": 5,
2020-08-15:02:43:41:ERROR: "description": "Vega 10 XL/XT [Radeon RX Vega 56/64]"
2020-08-15:02:43:41:ERROR:}. You may need to update your graphics drivers.
2020-08-15:02:43:41:ERROR:No compute devices matched GPU #1 {
2020-08-15:02:43:41:ERROR: "vendor": 4098,
2020-08-15:02:43:41:ERROR: "device": 26751,
2020-08-15:02:43:41:ERROR: "type": 1,
2020-08-15:02:43:41:ERROR: "species": 5,
2020-08-15:02:43:41:ERROR: "description": "Vega 10 XL/XT [Radeon RX Vega 56/64]"
2020-08-15:02:43:41:ERROR:}. You may need to update your graphics drivers.
2020-08-15:02:43:41:************************ FAHClient *************************
2020-08-15:02:43:41: Version: 7.6.13
2020-08-15:02:43:41: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
2020-08-15:02:43:41: Copyright: 2020 foldingathome.org
2020-08-15:02:43:41: Homepage: https://foldingathome.org/
2020-08-15:02:43:41: Date: Apr 28 2020
2020-08-15:02:43:41: Time: 04:20:27
2020-08-15:02:43:41: Revision: 5a652817f46116b6e135503af97f18e094414e3b
2020-08-15:02:43:41: Branch: master
2020-08-15:02:43:41: Compiler: GNU 4.9.4
2020-08-15:02:43:41: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-15:02:43:41: Platform: linux2 4.19.0-5-amd64
2020-08-15:02:43:41: Bits: 64
2020-08-15:02:43:41: Mode: Release
2020-08-15:02:43:41: Args: --config=./config.xml --chdir=./
2020-08-15:02:43:41: Config: /root/fold/./config.xml
2020-08-15:02:43:41:************************** CBang ***************************
2020-08-15:02:43:41: Date: Apr 25 2020
2020-08-15:02:43:41: Time: 00:07:55
2020-08-15:02:43:41: Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
2020-08-15:02:43:41: Branch: master
2020-08-15:02:43:41: Compiler: GNU 4.9.4
2020-08-15:02:43:41: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-15:02:43:41: -fPIC
2020-08-15:02:43:41: Platform: linux2 4.19.0-5-amd64
2020-08-15:02:43:41: Bits: 64
2020-08-15:02:43:41: Mode: Release
2020-08-15:02:43:41:************************** System **************************
2020-08-15:02:43:41: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
2020-08-15:02:43:41: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
2020-08-15:02:43:41: CPUs: 64
2020-08-15:02:43:41: Memory: 125.52GiB
2020-08-15:02:43:41:Free Memory: 123.61GiB
2020-08-15:02:43:41: Threads: POSIX_THREADS
2020-08-15:02:43:41: OS Version: 4.18
2020-08-15:02:43:41:Has Battery: false
2020-08-15:02:43:41: On Battery: false
2020-08-15:02:43:41: UTC Offset: -7
2020-08-15:02:43:41: PID: 2809
2020-08-15:02:43:41: CWD: /root/fold
2020-08-15:02:43:41: OS: Linux 4.18.0-147.el8.x86_64 x86_64
2020-08-15:02:43:41: OS Arch: AMD64
2020-08-15:02:43:41: GPUs: 2
2020-08-15:02:43:41: GPU 0: Bus:10 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41: GPU 1: Bus:69 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
2020-08-15:02:43:41: CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
2020-08-15:02:43:41: libcuda.so: cannot open shared object file: No such file or
2020-08-15:02:43:41: directory
2020-08-15:02:43:41: OpenCL: Not detected: clGetDeviceIDs() returned -1
2020-08-15:02:43:41:************************** libFAH **************************
2020-08-15:02:43:41: Date: Apr 15 2020
2020-08-15:02:43:41: Time: 21:43:27
2020-08-15:02:43:41: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
2020-08-15:02:43:41: Branch: master
2020-08-15:02:43:41: Compiler: GNU 4.9.4
2020-08-15:02:43:41: Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
2020-08-15:02:43:41: Platform: linux2 4.19.0-5-amd64
2020-08-15:02:43:41: Bits: 64
2020-08-15:02:43:41: Mode: Release
2020-08-15:02:43:41:************************************************************
2020-08-15:02:43:41:<config>
2020-08-15:02:43:41: <!-- Folding Core -->
2020-08-15:02:43:41: <checkpoint v='5'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Folding Slot Configuration -->
2020-08-15:02:43:41: <client-type v='advanced'/>
2020-08-15:02:43:41: <cpus v='54'/>
2020-08-15:02:43:41: <disable-viz v='true'/>
2020-08-15:02:43:41: <max-packet-size v='big'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- GUI -->
2020-08-15:02:43:41: <gui-enabled v='false'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- HTTP Server -->
2020-08-15:02:43:41: <http-addresses v='127.0.0.1:7397'/>
2020-08-15:02:43:41: <max-connect-time v='604800'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Logging -->
2020-08-15:02:43:41: <log-date v='true'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Remote Command Server -->
2020-08-15:02:43:41: <command-address v='127.0.0.1'/>
2020-08-15:02:43:41: <command-port v='36331'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Slot Control -->
2020-08-15:02:43:41: <power v='FULL'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- User Information -->
2020-08-15:02:43:41: <passkey v='*****'/>
2020-08-15:02:43:41: <team v='40524'/>
2020-08-15:02:43:41: <user v='Whompithian'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Web Server Sessions -->
2020-08-15:02:43:41: <session-lifetime v='0'/>
2020-08-15:02:43:41: <session-timeout v='0'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Work Unit Control -->
2020-08-15:02:43:41: <stall-detection-enabled v='true'/>
2020-08-15:02:43:41:
2020-08-15:02:43:41: <!-- Folding Slots -->
2020-08-15:02:43:41: <slot id='0' type='CPU'>
2020-08-15:02:43:41: <paused v='true'/>
2020-08-15:02:43:41: </slot>
2020-08-15:02:43:41: <slot id='1' type='GPU'>
2020-08-15:02:43:41: <gpu-index v='0'/>
2020-08-15:02:43:41: <opencl-index v='0'/>
2020-08-15:02:43:41: </slot>
2020-08-15:02:43:41: <slot id='2' type='GPU'>
2020-08-15:02:43:41: <gpu-index v='1'/>
2020-08-15:02:43:41: <opencl-index v='1'/>
2020-08-15:02:43:41: </slot>
2020-08-15:02:43:41:</config>
2020-08-15:02:43:41:WU00:FS01:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:41:WU01:FS02:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:42:WU00:FS01:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:42:WU01:FS02:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:43:WU00:FS01:Assigned to work server 192.0.2.1
2020-08-15:02:43:43:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:43:WU00:FS01:Connecting to 192.0.2.1:8080
2020-08-15:02:43:43:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:43:WU00:FS01:Connecting to 192.0.2.1:80
2020-08-15:02:43:43:WU01:FS02:Assigned to work server 192.0.2.1
2020-08-15:02:43:43:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:43:WU01:FS02:Connecting to 192.0.2.1:8080
2020-08-15:02:43:43:ERROR:WU00:FS01:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
2020-08-15:02:43:43:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:43:WU01:FS02:Connecting to 192.0.2.1:80
2020-08-15:02:43:43:WU00:FS01:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:44:ERROR:WU01:FS02:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
2020-08-15:02:43:44:WU01:FS02:Connecting to assign1.foldingathome.org:80
2020-08-15:02:43:44:WU00:FS01:Assigned to work server 192.0.2.1
2020-08-15:02:43:44:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:44:WU00:FS01:Connecting to 192.0.2.1:8080
2020-08-15:02:43:44:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:44:WU00:FS01:Connecting to 192.0.2.1:80
2020-08-15:02:43:44:WU01:FS02:Assigned to work server 192.0.2.1
2020-08-15:02:43:44:WU01:FS02:Requesting new work unit for slot 02: READY gpu:1:Vega 10 XL/XT [Radeon RX Vega 56/64] from 192.0.2.1
2020-08-15:02:43:44:WU01:FS02:Connecting to 192.0.2.1:8080
2020-08-15:02:43:44:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
2020-08-15:02:43:44:WU01:FS02:Connecting to 192.0.2.1:80
2020-08-15:02:43:44:ERROR:WU00:FS01:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
2020-08-15:02:43:45:ERROR:WU01:FS02:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable
^C2020-08-15:02:43:46:Caught signal SIGINT(2) on PID 2809
2020-08-15:02:43:46:Exiting, please wait. . .
2020-08-15:02:43:47:Clean exit
Re: Waiting for work on 192.0.2.1 ?
Where in that first quote do the word "you" It wasn't a configuration error that you made. The client did it on it's own and then it encountered a GPU in a state that could not be configured.If you still believe this is a configuration issue on my end, here is what I have ...The bogus IP address means that the Client has encountered a GPU in a state that cannot be configured.
The problem is a result of the GPU being configure based on three values of "*index" settings. The latest beta version of FAHClient has eliminated the concept of "*index" and replaced it with a new concept. I suspect this will resolve future cases of the same problem but since that beta just came out today, I'm not 100% confident that it fixed all the corner cases but at first glance it is looking good.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Waiting for work on 192.0.2.1 ?
Welcome to the F@H Forum Whompithian,Whompithian wrote:...Is there a timeline on the release of the updated client that is expected to resolve this issue?
F@H does not provide typical ETAs, as in, a month or a date. Instead, they use this timeline:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Now: It's when the release is available
Very Soon: It's when active development is happening
Soon: It's when it is on the table
Soon-ish: It's when there's discussion happening
Not Soon: It's in the backlog
End Of Time: It's on the (ever growing) wishlist
The new client is in the Very Soon state as it does bring some new GPU detection code that will help make this a much better experience for donors
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: Waiting for work on 192.0.2.1 ?
Critical error here was to make a change just before everyone disappears for two days...JohnChodera wrote:@JimboPalmer: You'll be pleased to hear that the issue of a completely separate test FAH network again came up in our FAH operations chat today. We definitely want to make this happen---we're just still short developer resources to make this happen.
We have a supplement proposal into the NIH to help out with developer resources that we should hear about soon---please send good vibes our way!
~ John Chodera // MSKCC
I've built multiple $bn websites and none make changes even emergency ones without having support people ready to regress it at a moment's notice.
Just sayin'.
single 1070
Re: Waiting for work on 192.0.2.1 ?
Well!
I allowed my AMD Fedora system to upgrade, and now I get the dreaded assigning to 192.0.2.1 (I tried the "dnf remove mesa-libOpenCL" step, but the system didn't have that library to remove).
Any new hints?
Dan Watts
I allowed my AMD Fedora system to upgrade, and now I get the dreaded assigning to 192.0.2.1 (I tried the "dnf remove mesa-libOpenCL" step, but the system didn't have that library to remove).
Any new hints?
Dan Watts