Page 1 of 2

unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sat Mar 21, 2020 2:18 pm
by tessa
Hello,
I'm breaking my head the last few days about a problem on my linux machine with an AMD 7970 graphics card. I'm getting the following error:

Code: Select all

WU02:FS01:0x21:ERROR:exception: Error initializing context: clCreateCommandQueue (-6)
My system is Manjaro linux. On an almost identical system with the same graphics card folding under windows 10 everything is just fine

My system:

Code: Select all

15:05:57:******************************** Build ********************************
15:05:57:        Version: 7.5.1
15:05:57:           Date: May 11 2018
15:05:57:           Time: 19:59:04
15:05:57:     Repository: Git
15:05:57:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
15:05:57:         Branch: master
15:05:57:       Compiler: GNU 6.3.0 20170516
15:05:57:        Options: -std=gnu++98 -O3 -funroll-loops
15:05:57:       Platform: linux2 4.14.0-3-amd64
15:05:57:           Bits: 64
15:05:57:           Mode: Release
15:05:57:******************************* System ********************************
15:05:57:            CPU: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
15:05:57:         CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
15:05:57:           CPUs: 4
15:05:57:         Memory: 15.62GiB
15:05:57:    Free Memory: 13.05GiB
15:05:57:        Threads: POSIX_THREADS
15:05:57:     OS Version: 5.4
15:05:57:    Has Battery: false
15:05:57:     On Battery: false
15:05:57:     UTC Offset: 1
15:05:57:            PID: 5199
15:05:57:            CWD: /opt/fah
15:05:57:             OS: Linux 5.4.26-1-MANJARO x86_64
15:05:57:        OS Arch: AMD64
15:05:57:           GPUs: 1
15:05:57:          GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Tahiti PRO [Radeon R9 280/HD
15:05:57:                 7900/8950]
15:05:57:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
15:05:57:                 libcuda.so: cannot open shared object file: No such file or
15:05:57:                 directory
15:05:57:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:2906.7
15:05:57:***********************************************************************
The relevant log part:

Code: Select all

15:10:21:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:Tahiti PRO [Radeon R9 280/HD 7900/8950] from 155.247.166.220
15:10:21:WU02:FS01:Connecting to 155.247.166.220:8080
15:10:21:WU02:FS01:Downloading 5.81MiB
15:10:22:WU02:FS01:Download complete
15:10:22:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14310 run:52 clone:79 gen:2 core:0x21 unit:0x000000030002894c5e6cf6930906b0f6
15:10:22:WU02:FS01:Starting
15:10:22:WU02:FS01:Running FahCore: /opt/fah/FAHCoreWrapper /opt/fah/cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 705 -lifeline 5199 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
15:10:22:WU02:FS01:Started FahCore on PID 5257
15:10:22:WU02:FS01:Core PID:5261
15:10:22:WU02:FS01:FahCore 0x21 started
15:10:23:WU02:FS01:0x21:*********************** Log Started 2020-03-21T15:10:22Z ***********************
15:10:23:WU02:FS01:0x21:Project: 14310 (Run 52, Clone 79, Gen 2)
15:10:23:WU02:FS01:0x21:Unit: 0x000000030002894c5e6cf6930906b0f6
15:10:23:WU02:FS01:0x21:CPU: 0x00000000000000000000000000000000
15:10:23:WU02:FS01:0x21:Machine: 1
15:10:23:WU02:FS01:0x21:Reading tar file core.xml
15:10:23:WU02:FS01:0x21:Reading tar file integrator.xml
15:10:23:WU02:FS01:0x21:Reading tar file state.xml
15:10:23:WU02:FS01:0x21:Reading tar file system.xml
15:10:23:WU02:FS01:0x21:Digital signatures verified
15:10:23:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
15:10:23:WU02:FS01:0x21:Version 0.0.20
15:10:24:WU02:FS01:0x21:ERROR:exception: Error initializing context: clCreateCommandQueue (-6)
15:10:24:WU02:FS01:0x21:Saving result file logfile_01.txt
15:10:24:WU02:FS01:0x21:Saving result file log.txt
15:10:24:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:10:24:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:10:24:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:14310 run:52 clone:79 gen:2 core:0x21 unit:0x000000030002894c5e6cf6930906b0f6
15:10:24:WU02:FS01:Uploading 7.00KiB to 155.247.166.220
15:10:24:WU02:FS01:Connecting to 155.247.166.220:8080
15:10:24:WU01:FS01:Connecting to 65.254.110.245:8080
15:10:24:WU02:FS01:Upload complete
15:10:24:WU02:FS01:Server responded WORK_ACK (400)
15:10:24:WU02:FS01:Cleaning up
Output of clinfo:

Code: Select all

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2906.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     Tahiti
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2906.7)
  Driver Version                                  2906.7
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon HD 7900 Series
  Device Topology (AMD)                           PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               14
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             925MHz
  Graphics IP (AMD)                               6.0
  Device Partition                                (core)
    Max number of sub-devices                     14
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    32, Little-Endian
  Global memory size                              2852286464 (2.656GiB)
  Global free memory (AMD)                        <printDeviceInfo:78: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>
  Global memory channels (AMD)                    12
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           2223443763 (2.071GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1584798319085495106ns (Sat Mar 21 14:45:19 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  No
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             0
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  AMD Accelerated Parallel Processing
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [AMD]
  clCreateContext(NULL, ...) [default]            Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Tahiti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Tahiti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Tahiti

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
I've tried Opencl-Mesa, opencl-amd-amdgpu-pro-orca as different drivers, to no avail. Please help.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sat Mar 21, 2020 4:32 pm
by Joe_H
Some people have also needed to install the OpenCL dev package on Linux to get things going, someone mentioned something about it fixing up some links. As I recall, you need the AMD pro install for OpenCL for the runtime code.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sat Mar 21, 2020 6:38 pm
by tessa
Okay, I'll check if the developer packages are installed. I'm running amdgpu drivers and the Opencl driver from AMDGPU-pro-orca package (AUR)

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sat Mar 21, 2020 9:55 pm
by tessa
After a foray into opencl-development pacakges (checked and tried all packages for my system here: https://wiki.archlinux.org/index.php/GPGPU#OpenCL
The error still persists.
Any other solutions?

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sun Mar 29, 2020 3:21 pm
by lovett1991
Same issue here with an R7 250X, got opencl-amd + amdgpu installed.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Fri Apr 03, 2020 5:15 pm
by lovett1991
Had someone reply to me on github... looks like that particular method is deprecated and amd removed it... https://github.com/FoldingAtHome/fah-issues/issues/1342

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Tue Apr 07, 2020 8:14 pm
by wezh
Hi! I am a Manjaro Linux user but have CPU-on-die videocard and folding on a CPU only.

So if old method clCreateCommandQueue was removed from nowadays OpenCL API and sometime ago it was added as new (cause Joe_H mention OpenCL dev package was required sometime ago, may be years ago), so may be to try to install OpenCL package version which API contains that clCreateCommandQueue method/interface/command?

In the end of the first link in that post you can see
"OpenCL Specification" link to PDF-doc of specification named "The OpenCL Specification. Version: 1.2"

So at least in that version that method exists. May be it is good idea to investigate the versions spectrum which contains the method by specification documents, after it to search available packages and then to choose that to install, for example to base on newest possible package version (which contains that method) criteria.

Joe_H, it looks like FAH should make system requirements section and contain that detailed info to make it one time by pro-person instead of making it by each user having that error. May be it is good idea to add this requirements section to be planned for further FAH user software after killing of COVID19?

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sun Oct 04, 2020 7:57 pm
by wrothran
I know this like 6 months old. But just in case anyone finds it useful even if I didn't because this didn't work for me.
But some people have seemed to have found success setting the following environment variables

export GPU_FORCE_64BIT_PTR=1
export GPU_USE_SYNC_OBJECTS=1
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=100
export GPU_MAX_HEAP_SIZE=100

also clCreateCommandQueue (-6) is CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host
and clinfo previously failed on the Global Free Memory (AMD) check in my cases, but passes after setting the above variables.

This unfortunately didn't work to fix fah's error for me. But maybe this could help some others with the issue.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Mon Oct 05, 2020 4:25 am
by PantherX
Welcome to the F@H Forum wrothran,

Can you please share the details of your failure?

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Wed Oct 07, 2020 1:27 pm
by gunnarre
This is a GCN 1.0 GPU. If only setting some environment variables fixes it, that would be great, but support for OpenCL on GCN 1.0 in the official AMD Linux drivers were dropped sometime after the Catalyst 15.7 drivers for Linux.

The only way I got a Radeon HD 7770 folding under Linux was by downgrading to an earlier kernel, installing old drivers and installing a single file from the AMDGPU Pro's toolkit files (on top of the regular driver), with some help from Joe_H. More details here: viewtopic.php?f=81&t=35320&start=30#p340362
gunnarre wrote:
Joe_H wrote:The HD 7700 series cards are based on the GCN 1st generation Verde chips. OpenCL support for those was dropped by AMD for Linux several years ago, you would have to load older versions of the drivers and may be capped at Ubuntu 16.04 or earlier.
Yes, I looked closer at the AMD Linux driver release notes, and although version 18.20 of the AMD Radeon pro drivers explicity supports the Radeon HD 7700 series cards, the release notes also lists about its components: "OpenCLâ„¢1.2 (not supported for 1st generation GCN products)", as you say.
gunnarre wrote:I made a virtual machine instance for the Radeon 7770 HD based on Ubuntu 14.04 and the Catalyst 15.7 drivers from AMD. I downgraded the kernel to 3.19.0-25, installed the fglrx_core .deb-file and its dependencies. I had to copy the file /etc/ati/amdpcsdb.default from fglrx_ (non-core) .deb file - otherwise clinfo would only work once and then segfault. I also installed fglrx-dev and ocl-icd-opencl-dev, but I'm not sure of that was necessary. The CPU usage seems to be similar as under Windows 10, depending on the work unit. Guest machine memory usage is less than half, down from over 2GB to between 380 and 760 MB when folding.
Only way I would run this was under a virtual machine that is locked down and not used for anything else. Running it on Windows 10 is easier, because GCN 1.0 chips still have OpenCL 1.2 support there, but the performance was actually better under the Linux virtual machine.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sun Oct 11, 2020 8:33 pm
by wrothran
PantherX wrote:Welcome to the F@H Forum wrothran,

Can you please share the details of your failure?
I mean I get a bad work unit error from FAH because it couldn't create a proper opencl context because creating clCreateCommandQueue failed because it thought the host was out of memory.

Code: Select all

20:12:31:WU02:FS02:0x22:Project: 17418 (Run 0, Clone 197, Gen 33)
20:12:31:WU02:FS02:0x22:Unit: 0x0000002780fccb095f6ac18c95946d6d
20:12:31:WU02:FS02:0x22:Reading tar file core.xml
20:12:31:WU02:FS02:0x22:Reading tar file integrator.xml.bz2
20:12:31:WU02:FS02:0x22:Reading tar file state.xml.bz2
20:12:31:WU02:FS02:0x22:Reading tar file system.xml.bz2
20:12:31:WU02:FS02:0x22:Digital signatures verified
20:12:31:WU02:FS02:0x22:Folding@home GPU Core22 Folding@home Core
20:12:31:WU02:FS02:0x22:Version 0.0.13
20:12:31:WU02:FS02:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
20:12:31:WU02:FS02:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
20:12:31:WU02:FS02:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
20:12:31:WU02:FS02:0x22:  Global context and integrator variables write interval: disabled
20:12:31:WU02:FS02:0x22:There are 3 platforms available.
20:12:31:WU02:FS02:0x22:Platform 0: Reference
20:12:31:WU02:FS02:0x22:Platform 1: CPU
20:12:31:WU02:FS02:0x22:Platform 2: OpenCL
20:12:31:WU02:FS02:0x22:  opencl-device 0 specified
20:12:31:FS02:Finishing
20:12:39:WU02:FS02:0x22:Attempting to create OpenCL context:
20:12:39:WU02:FS02:0x22:  Configuring platform OpenCL
20:12:39:WU02:FS02:0x22:Failed to create OpenCL context:
20:12:39:WU02:FS02:0x22:Error initializing context: clCreateCommandQueue (-6)
20:12:39:WU02:FS02:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
20:12:39:WU02:FS02:0x22:Saving result file ../logfile_01.txt
20:12:39:WU02:FS02:0x22:Saving result file science.log
20:12:39:WU02:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
20:12:39:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:12:39:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:17418 run:0 clone:197 gen:33 core:0x22 unit:0x0000002780fccb095f6ac18c95946d6d
20:12:39:WU02:FS02:Uploading 11.50KiB to 128.252.203.9
20:12:39:WU02:FS02:Connecting to 128.252.203.9:8080
20:12:39:WU02:FS02:Upload complete
20:12:39:WU02:FS02:Server responded WORK_ACK (400)
20:12:39:WU02:FS02:Cleaning up
clinfo, note i have two cards and fah works fine with the rx 580, also global free memory (AMD) was previously showing an error for the other gpu, before I set the environment variables I mentioned in the previous comment.

Code: Select all

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3180.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 2
  Device Name                                     Pitcairn
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (3180.7)
  Driver Version                                  3180.7
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon(TM) HD 8800 Series
  Device Topology (AMD)                           PCI-E, 05:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               10
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1050MHz
  Graphics IP (AMD)                               6.0
  Device Partition                                (core)
    Max number of sub-devices                     10
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4266119168 (3.973GiB)
  Global free memory (AMD)                        4147244 (3.955GiB)
  Global memory channels (AMD)                    8
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           4003987456 (3.729GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1602441340144608318ns (Sun Oct 11 12:35:40 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  No
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             0
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_amd_bus_addressable_memory cl_khr_spir cl_khr_gl_event 

  Device Name                                     Ellesmere
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (3180.7)
  Driver Version                                  3180.7
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Board Name (AMD)                         Radeon RX 580 Series
  Device Topology (AMD)                           PCI-E, 0b:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               36
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1360MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     36
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              7843704832 (7.305GiB)
  Global free memory (AMD)                        7629996 (7.277GiB)
  Global memory channels (AMD)                    8
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           7617212416 (7.094GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        7617212416 (7.094GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1602441340144608318ns (Sun Oct 11 12:35:40 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             2036429426
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_amd_bus_addressable_memory cl_khr_spir cl_khr_gl_event 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Pitcairn
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (2)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Pitcairn
    Device Name                                   Ellesmere
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (2)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Pitcairn
    Device Name                                   Ellesmere
        NOTE:   your OpenCL library only supports OpenCL 2.0,
                but some installed platforms support OpenCL 2.1.
                Programs using 2.1 features may crash
                or behave unexpectedly
clinfo is slightly misleading because i actually have an R9 270x not an HD 8800, but they are both Pitcairn and GCN 1st Gen.

Code: Select all

  *-display                 
       description: VGA compatible controller
       product: Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:05:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:119 memory:e0000000-efffffff memory:fc700000-fc73ffff ioport:e000(size=256) memory:fc740000-fc75ffff
  *-display
       description: VGA compatible controller
       product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0b:00.0
       version: e7
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:121 memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:f000(size=256) memory:fce00000-fce3ffff memory:fce40000-fce5ffff
I put my stuff here for the sake of a response, but I think the virtual machine solution by u/gunnarre is the probably the best anyone's gonna get at the moment.

edit: also if anyone was curious what the clinfo error was before I set the environment variables it was:

Code: Select all

Global free memory (AMD) <printDeviceInfo:75: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Sun Oct 11, 2020 10:47 pm
by Joe_H
wrothran wrote:clinfo is slightly misleading because i actually have an R9 270x not an HD 8800, but they are both Pitcairn and GCN 1st Gen.
The R9 270x is a rebadged HD 8870 which was just the OEM rebadging of the HD 7870. AMD gives them the same PCI device ID number, and other than some minor tweaks the cards are essentially the same hardware.

Basically any GCN 1st gen card from AMD will have to be used under Windows, AMD dropped Linux support for them years ago. The drivers required to provide OpenCL support do exist, but are years old and may have other issues with the folding cores created in the last few months. OS support is limited to Ubuntu 16.04, and that can take some kernel tweaking to support the older drivers.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Thu Jan 07, 2021 6:18 am
by FalconFour
Well, fancy that! I inherited a MacPro1,1 (late 2006) room heater (2 socket x 4-core Xeon = 8 core), that has 32GB RAM and a R9 280x GPU that lets it run modern Mac OS. Since Mac OS doesn't support GPU folding at all (groan), I shoehorned Ubuntu 20.04.1 LTS onto it (in 64-bit mode with a tweaked install ISO). From there, I locked it onto kernel 5.4.0-53 with some apt-get install and apt-get remove magic (things like linux-image-..., linux-modules-extra-..., removed the latest, installed the -53 version, removed "unattended-upgrades" so it doesn't break itself automatically like another system of mine did - that GPU driver is kernel-version-locked so upgrades will break it), then from THERE I installed the amdgpu-pro-20.45-1164792-ubuntu-20.04 package (Google around for "amdgpu 20.45" - it was only by shotgunning different version numbers in search, I was able to discover that this recent version exists).

After installing that (and *not* ROCm - though I installed ROCm first, it didn't detect any GPU compute devices for OpenCL so I uninstalled it after encountering a file conflict issue with the GPU driver), I was able to see that the GPU was available for OpenCL. But it still gave me this weird error about "clCreateCommandQueue".

Until... I mixed in this advice:
wrothran wrote:I know this like 6 months old. But just in case anyone finds it useful even if I didn't because this didn't work for me.
But some people have seemed to have found success setting the following environment variables

export GPU_FORCE_64BIT_PTR=1
export GPU_USE_SYNC_OBJECTS=1
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=100
export GPU_MAX_HEAP_SIZE=100

also clCreateCommandQueue (-6) is CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host
and clinfo previously failed on the Global Free Memory (AMD) check in my cases, but passes after setting the above variables.

This unfortunately didn't work to fix fah's error for me. But maybe this could help some others with the issue.
And boom, it's actually crunching, now generating a whopping (*checks FAHControl*)... ha-ha, ahem (alt-tabs away) ... we're just gonna check that PPD a little later after it gets its head right. It's a system that's closer to voting age than not... so... I must forgive its pokey progress.

Still, a functional room heater, thanks to all the tips and tweaks and hacks and nonsense I've pulled together here from various corners of the internet and hours of keyboard mashing.

edit: 231,729 PPD estimate. Not too shabby.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Thu Jan 07, 2021 8:08 am
by Joe_H
As I wrote above, AMD stopped providing full video drivers for older generation cards to be used on Linux several years ago. Actual kernel support for those drivers date to about Ubuntu 16.04 or 18.04, as you have written it takes going to older kernels to get this to work. Congratulations on getting there.

AMD has only supported recent GPUs with ROCm, unless somebody backports it to these older models that is not going to be a workable approach.

Re: unable to fold on 7970 under Linux, Windows10 runs fine

Posted: Thu Jan 07, 2021 8:15 am
by FalconFour
Joe_H wrote:As I wrote above, AMD stopped providing full video drivers for older generation cards to be used on Linux several years ago. Actual kernel support for those drivers date to about Ubuntu 16.04 or 18.04, as you have written it takes going to older kernels to get this to work. Congratulations on getting there.

AMD has only supported recent GPUs with ROCm, unless somebody backports it to these older models that is not going to be a workable approach.
No, it's not an older kernel - it's actually newer than Ubuntu 20.04 itself. It worked out-of-the-box at first on a PC I set up before (ugh, as far as out-of-the-box means all the weird AMD driver installation and additional package patching and such, but not with any weird broken installation issues), but it broke with an automatic update to the kernel subversion (where it would no longer uninstall/reinstall, it'd just fail to install, on the same system, with nothing evident changed). There's some dynamic module compiling going on, and it breaks with the newer -56 patch of the same kernel version - which would be the latest available for any system. The driver pack is only October 2020, so ... something broke since then, and damnit I'm tired of hunting down new versions, so I just whacked unattended-upgrades over the head and told it to stop breaking things.

So IOW, this thing is running perfectly on bleeding edge, newest Ubuntu and second-newest patch version of the kernel.

Linux is a perpetual headache for me any time I try to deal with it. :( Even on relatively new cards, it's always a frustrating mess to get there. But I don't have many newer cards to play with, so who knows.