Page 1 of 4

AMD GPU Error sortShortList on some projects

Posted: Sat Mar 21, 2020 11:57 am
by sam6861
Looks like some specific projects always error on both my AMD GPU. Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
Project 11759 error 3 times, 2 on AMD RX 580 plus 1 on AMD Vega 11.
Project 14533 error on Vega 11 (FS03), passed on NVidia GT 1030 (FS02).
Project 11747, 11752, 11758, 11759, 11764, 11776, 14533, 14551: None of these project was able to complete on my AMD GPU. I believe these projects may work fine on NVidia GPU, when AMD errors.

Most other projects works fine on both my AMD GPU with no errors.

Windows 10 64 bit 1909, was AMD driver 20.2.2, NVidia driver 442.50
Recently updated to AMD driver 20.3.1, but continue to have error sortShortList on some projects.

FS01: AMD RX 580
FS02: NVidia GT 1030
FS03: AMD Vega 11 (integrated graphics)

Code: Select all

*********************** Log Started 2020-03-20T07:18:34Z ***********************
07:18:34:************************* Folding@home Client *************************
07:18:34:      Copyright: (c) 2009-2018 foldingathome.org
07:18:34:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:18:34:           Args: --open-web-control
07:18:34:         Config: C:\Users\sam86\AppData\Roaming\FAHClient\config.xml
07:18:34:******************************** Build ********************************
07:18:34:        Version: 7.5.1
07:18:34:           Date: May 11 2018
07:18:34:           Time: 13:06:32
07:18:34:     Repository: Git
07:18:34:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
07:18:34:         Branch: master
07:18:34:       Compiler: Visual C++ 2008
07:18:34:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:18:34:       Platform: win32 10
07:18:34:           Bits: 32
07:18:34:           Mode: Release
07:18:34:******************************* System ********************************
07:18:34:            CPU: AMD Ryzen 5 2400G with Radeon Vega Graphics
07:18:34:         CPU ID: AuthenticAMD Family 23 Model 17 Stepping 0
07:18:34:           CPUs: 8
07:18:34:         Memory: 31.81GiB
07:18:34:    Free Memory: 28.77GiB
07:18:34:        Threads: WINDOWS_THREADS
07:18:34:     OS Version: 6.2
07:18:34:    Has Battery: false
07:18:34:     On Battery: false
07:18:34:     UTC Offset: -5
07:18:34:            PID: 3128
07:18:34:            CWD: C:\Users\sam86\AppData\Roaming\FAHClient
07:18:34:             OS: Windows 10 Home
07:18:34:        OS Arch: AMD64
07:18:34:           GPUs: 3
07:18:34:          GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
07:18:34:                 470/480/570/580]
07:18:34:          GPU 1: Bus:6 Slot:0 Func:0 NVIDIA:5 GP108 [GeForce GT 1030]
07:18:34:          GPU 2: Bus:8 Slot:0 Func:0 AMD:5 Raven [Ryzen vega 8 moble]
07:18:34:  CUDA Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:6.1 Driver:10.2
07:18:34:OpenCL Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:1.2 Driver:442.50
07:18:34:OpenCL Device 1: Platform:1 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:3004.8
07:18:34:OpenCL Device 2: Platform:1 Device:1 Bus:8 Slot:0 Compute:1.2 Driver:3004.8
07:18:34:  Win32 Service: false
07:18:34:***********************************************************************
07:18:34:<config>
07:18:34:  <!-- Network -->
07:18:34:  <proxy v=':8080'/>
07:18:34:
07:18:34:  <!-- Slot Control -->
07:18:34:  <pause-on-start v='true'/>
07:18:34:  <power v='full'/>
07:18:34:
07:18:34:  <!-- User Information -->
07:18:34:  <passkey v='********************************'/>
07:18:34:  <user v='sam6861'/>
07:18:34:
07:18:34:  <!-- Folding Slots -->
07:18:34:  <slot id='0' type='CPU'>
07:18:34:    <cpus v='6'/>
07:18:34:  </slot>
07:18:34:  <slot id='1' type='GPU'/>
07:18:34:  <slot id='2' type='GPU'/>
07:18:34:  <slot id='3' type='GPU'/>
07:18:34:</config>
07:18:34:Trying to access database...
07:18:34:Successfully acquired database lock
07:18:34:Enabled folding slot 00: PAUSED cpu:6 (by user)
07:18:34:Enabled folding slot 01: PAUSED gpu:0:Ellesmere XT [Radeon RX 470/480/570/580] (by user)
07:18:34:Enabled folding slot 02: PAUSED gpu:1:GP108 [GeForce GT 1030] (by user)
07:18:34:Enabled folding slot 03: PAUSED gpu:2:Raven [Ryzen vega 8 moble] (by user)


C:\Users\sam86\AppData\Roaming\FAHClient>findstr "state:SEND" logs\*.txt log.txt | findstr "project:14533"
log.txt:16:14:28:WU04:FS03:Sending unit results: id:04 state:SEND error:FAULTY project:14533 run:0 clone:13508 gen:3 core:0x22 unit:0x0000000380fccb025e72f2f0d793be66
log.txt:01:44:29:WU01:FS02:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:4088 gen:1 core:0x22 unit:0x0000000280fccb025e72f225d8c17c1e

... removed repeated lines from findstr "state:SEND" logs\*.txt log.txt, and saved as TESET1\1.txt
C:\Users\sam86\AppData\Roaming\FAHClient>findstr "FAULTY" TEST1\1.txt
logs\log-20200317-105142.txt:14:25:58:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11752 run:0 clone:2778 gen:0 core:0x22 unit:0x000000018ca304e75e6a806a9aceef36
logs\log-20200317-105142.txt:04:34:51:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:11747 run:0 clone:735 gen:1 core:0x22 unit:0x000000038ca304e75e6a7fc84de528cd
logs\log-20200317-105142.txt:21:15:57:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:11759 run:0 clone:10460 gen:1 core:0x22 unit:0x0000000180fccb0a5e6eb02ca0f7c315
logs\log-20200317-105142.txt:22:16:24:WU01:FS03:Sending unit results: id:01 state:SEND error:FAULTY project:11759 run:0 clone:5218 gen:1 core:0x22 unit:0x0000000280fccb0a5e6e863ce7701d4f
logs\log-20200318-031735.txt:02:40:40:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14551 run:0 clone:125 gen:0 core:0x22 unit:0x0000000280fccb025e71637d587dab23
logs\log-20200320-071834.txt:05:16:17:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11764 run:0 clone:5180 gen:1 core:0x22 unit:0x0000000380fccb0a5e71130f33021b57
logs\log-20200320-114942.txt:11:44:49:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11759 run:0 clone:6793 gen:7 core:0x22 unit:0x0000000c80fccb0a5e6e96092ccc76d1
log.txt:16:14:28:WU04:FS03:Sending unit results: id:04 state:SEND error:FAULTY project:14533 run:0 clone:13508 gen:3 core:0x22 unit:0x0000000380fccb025e72f2f0d793be66
log.txt:16:16:46:WU02:FS03:Sending unit results: id:02 state:SEND error:FAULTY project:11776 run:0 clone:1911 gen:2 core:0x22 unit:0x00000003287234c95e73c47a4958b245
log.txt:05:41:54:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11764 run:0 clone:2229 gen:11 core:0x22 unit:0x0000001280fccb0a5e6d81a6378f1eb4
log.txt:10:12:32:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11776 run:0 clone:1538 gen:4 core:0x22 unit:0x0000000c287234c95e73c47d47fd7729
log.txt:10:13:08:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11758 run:0 clone:833 gen:0 core:0x22 unit:0x000000099bf7a4d55e6d77116aaf791f

C:\Users\sam86\AppData\Roaming\FAHClient>findstr "ERROR:exception" logs\*.txt log.txt
logs\log-20200317-105142.txt:14:25:58:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200317-105142.txt:04:34:50:WU03:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200317-105142.txt:21:15:56:WU03:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200317-105142.txt:22:16:23:WU01:FS03:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200318-031735.txt:02:39:17:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200320-071834.txt:05:16:17:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200320-114942.txt:11:44:48:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
log.txt:16:06:35:WU02:FS03:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
log.txt:16:14:27:WU04:FS03:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
log.txt:05:41:53:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
log.txt:10:12:31:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
log.txt:10:13:07:WU02:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)

Re: AMD GPU Error sortShortList on some projects

Posted: Sat Mar 21, 2020 8:00 pm
by sam6861
I found something more specific. All errors on my AMD GPU happens on projects with 165550 atoms or higher. Limit projects with this many atoms to NVidia GPU only? My NVidia GT 1030 works fine with this many atoms.

The most amount of atoms my AMD GPU have successfully completed was 110370 atoms from project 11763.
logs\log-20200320-071834.txt:16:54:49:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11763 run:0 clone:692 gen:2 core:0x22 unit:0x0000000680fccb0a5e6d81254462ee3b

Re: AMD GPU Error sortShortList on some projects

Posted: Sat Mar 21, 2020 8:41 pm
by _r2w_ben
I had two similar failures on my RX 460 2GB on Windows 8.1 with Radeon driver 16.12.2. Both match your observation of being projects with 165550 or more atoms.

Code: Select all

13:50:55:WU00:FS01:0x22:*********************** Log Started 2020-03-16T13:50:55Z ***********************
13:50:55:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
13:50:55:WU00:FS01:0x22:       Type: 0x22
13:50:55:WU00:FS01:0x22:       Core: Core22
13:50:55:WU00:FS01:0x22:    Website: https://foldingathome.org/
13:50:55:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
13:50:55:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
13:50:55:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
13:50:55:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 5332 -checkpoint 15
13:50:55:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
13:50:55:WU00:FS01:0x22:     Config: <none>
13:50:55:WU00:FS01:0x22:************************************ Build *************************************
13:50:55:WU00:FS01:0x22:    Version: 0.0.2
13:50:55:WU00:FS01:0x22:       Date: Dec 6 2019
13:50:55:WU00:FS01:0x22:       Time: 21:30:31
13:50:55:WU00:FS01:0x22: Repository: Git
13:50:55:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
13:50:55:WU00:FS01:0x22:     Branch: HEAD
13:50:55:WU00:FS01:0x22:   Compiler: Visual C++ 2008
13:50:55:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
13:50:55:WU00:FS01:0x22:   Platform: win32 10
13:50:55:WU00:FS01:0x22:       Bits: 64
13:50:55:WU00:FS01:0x22:       Mode: Release
13:50:55:WU00:FS01:0x22:************************************ System ************************************
13:50:55:WU00:FS01:0x22:        CPU: AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G
13:50:55:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 21 Model 48 Stepping 1
13:50:55:WU00:FS01:0x22:       CPUs: 4
13:50:55:WU00:FS01:0x22:     Memory: 6.94GiB
13:50:55:WU00:FS01:0x22:Free Memory: 4.85GiB
13:50:55:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
13:50:55:WU00:FS01:0x22: OS Version: 6.2
13:50:55:WU00:FS01:0x22:Has Battery: false
13:50:55:WU00:FS01:0x22: On Battery: false
13:50:55:WU00:FS01:0x22: UTC Offset: -4
13:50:55:WU00:FS01:0x22:        PID: 4640
13:50:55:WU00:FS01:0x22:        CWD: C:\Users\Ben\AppData\Roaming\FAHClient\work
13:50:55:WU00:FS01:0x22:         OS: Windows 8.1
13:50:55:WU00:FS01:0x22:    OS Arch: AMD64
13:50:55:WU00:FS01:0x22:********************************************************************************
13:50:55:WU00:FS01:0x22:Project: 11758 (Run 0, Clone 2190, Gen 0)
13:50:55:WU00:FS01:0x22:Unit: 0x000000019bf7a4d55e6d77154a8b6c7e
13:50:55:WU00:FS01:0x22:Reading tar file core.xml
13:50:55:WU00:FS01:0x22:Reading tar file integrator.xml
13:50:55:WU00:FS01:0x22:Reading tar file state.xml
13:50:55:WU00:FS01:0x22:Reading tar file system.xml
13:50:56:WU00:FS01:0x22:Digital signatures verified
13:50:56:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
13:50:56:WU00:FS01:0x22:Version 0.0.2
13:51:16:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
13:51:16:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
13:51:16:WU00:FS01:0x22:Saving result file science.log
13:51:16:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
13:51:16:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
13:51:16:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11758 run:0 clone:2190 gen:0 core:0x22 unit:0x000000019bf7a4d55e6d77154a8b6c7e

Code: Select all

01:43:18:WU00:FS01:0x22:*********************** Log Started 2020-03-18T01:43:18Z ***********************
01:43:18:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
01:43:18:WU00:FS01:0x22:       Type: 0x22
01:43:18:WU00:FS01:0x22:       Core: Core22
01:43:18:WU00:FS01:0x22:    Website: https://foldingathome.org/
01:43:18:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
01:43:18:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
01:43:18:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
01:43:18:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 1068 -checkpoint 15
01:43:18:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
01:43:18:WU00:FS01:0x22:     Config: <none>
01:43:18:WU00:FS01:0x22:************************************ Build *************************************
01:43:18:WU00:FS01:0x22:    Version: 0.0.2
01:43:18:WU00:FS01:0x22:       Date: Dec 6 2019
01:43:18:WU00:FS01:0x22:       Time: 21:30:31
01:43:18:WU00:FS01:0x22: Repository: Git
01:43:18:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
01:43:18:WU00:FS01:0x22:     Branch: HEAD
01:43:18:WU00:FS01:0x22:   Compiler: Visual C++ 2008
01:43:18:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
01:43:18:WU00:FS01:0x22:   Platform: win32 10
01:43:18:WU00:FS01:0x22:       Bits: 64
01:43:18:WU00:FS01:0x22:       Mode: Release
01:43:18:WU00:FS01:0x22:************************************ System ************************************
01:43:18:WU00:FS01:0x22:        CPU: AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G
01:43:18:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 21 Model 48 Stepping 1
01:43:18:WU00:FS01:0x22:       CPUs: 4
01:43:18:WU00:FS01:0x22:     Memory: 6.94GiB
01:43:18:WU00:FS01:0x22:Free Memory: 5.58GiB
01:43:18:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
01:43:18:WU00:FS01:0x22: OS Version: 6.2
01:43:18:WU00:FS01:0x22:Has Battery: false
01:43:18:WU00:FS01:0x22: On Battery: false
01:43:18:WU00:FS01:0x22: UTC Offset: -4
01:43:18:WU00:FS01:0x22:        PID: 4416
01:43:18:WU00:FS01:0x22:        CWD: C:\Users\Ben\AppData\Roaming\FAHClient\work
01:43:18:WU00:FS01:0x22:         OS: Windows 8.1
01:43:18:WU00:FS01:0x22:    OS Arch: AMD64
01:43:18:WU00:FS01:0x22:********************************************************************************
01:43:18:WU00:FS01:0x22:Project: 11741 (Run 0, Clone 2624, Gen 6)
01:43:18:WU00:FS01:0x22:Unit: 0x0000000c8ca304f15e693a7373f6bad6
01:43:18:WU00:FS01:0x22:Reading tar file core.xml
01:43:18:WU00:FS01:0x22:Reading tar file integrator.xml
01:43:18:WU00:FS01:0x22:Reading tar file state.xml
01:43:20:WU00:FS01:0x22:Reading tar file system.xml
01:43:21:WU00:FS01:0x22:Digital signatures verified
01:43:21:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
01:43:21:WU00:FS01:0x22:Version 0.0.2
01:43:39:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
01:43:39:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
01:43:39:WU00:FS01:0x22:Saving result file science.log
01:43:39:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
01:43:39:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
01:43:39:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11741 run:0 clone:2624 gen:6 core:0x22 unit:0x0000000c8ca304f15e693a7373f6bad6
There's an interesting comment in OpenMM source code when it's deciding on the length of a list that can be considered short.

Code: Select all

unsigned int maxShortList = min(8192, max(maxLocalBuffer, (int) OpenCLContext::ThreadBlockSize*context.getNumThreadBlocks()));
// On Qualcomm's OpenCL, it's essential to check against CL_KERNEL_WORK_GROUP_SIZE.  Otherwise you get a crash.
// But AMD's OpenCL returns an inappropriately small value for it that is much shorter than the actual
// maximum, so including the check hurts performance.  For the moment I'm going to just comment it out.
// If we officially support Qualcomm in the future, we'll need to do something better.
//maxShortList = min(maxShortList, shortListKernel.getWorkGroupInfo<CL_KERNEL_WORK_GROUP_SIZE>(context.getDevice()));
isShortList = (length <= maxShortList);
Perhaps the list is slightly too large to fit in the maximum CL_KERNEL_WORK_GROUP_SIZE.

In case this becomes a bug report, here's the output from GPU Caps Viewer.

Code: Select all

[ OpenCL Capabilities ]
- Num OpenCL platforms: 1
- CL_PLATFORM_NAME: AMD Accelerated Parallel Processing
- CL_PLATFORM_VENDOR: Advanced Micro Devices, Inc.
- CL_PLATFORM_VERSION: OpenCL 2.0 AMD-APP (2348.4)
- CL_PLATFORM_PROFILE: FULL_PROFILE
- Num devices: 3

  - CL_DEVICE_NAME: Baffin
  - CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
  - CL_DRIVER_VERSION: 2348.4
  - CL_DEVICE_PROFILE: FULL_PROFILE
  - CL_DEVICE_VERSION: OpenCL 1.2 AMD-APP (2348.4)
  - CL_DEVICE_TYPE: GPU
  - CL_DEVICE_VENDOR_ID: 0x1002
  - CL_DEVICE_MAX_COMPUTE_UNITS: 14
  - CL_DEVICE_MAX_CLOCK_FREQUENCY: 1210MHz
  - CL_DEVICE_ADDRESS_BITS: 32
  - CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1376256KB
  - CL_DEVICE_GLOBAL_MEM_SIZE: 2048MB
  - CL_DEVICE_MAX_PARAMETER_SIZE: 1024
  - CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64 Bytes
  - CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 16KB
  - CL_DEVICE_ERROR_CORRECTION_SUPPORT: NO
  - CL_DEVICE_LOCAL_MEM_TYPE: Local (scratchpad)
  - CL_DEVICE_LOCAL_MEM_SIZE: 32KB
  - CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 1376256KB
  - CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
  - CL_DEVICE_MAX_WORK_ITEM_SIZES: [256 ; 256 ; 256]
  - CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
  - CL_EXEC_NATIVE_KERNEL: 10920428
  - CL_DEVICE_IMAGE_SUPPORT: YES
  - CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
  - CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
  - CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
  - CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
  - CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
  - CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
  - CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
  - CL_DEVICE_MAX_SAMPLERS: 16
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 4
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 2
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
  - CL_DEVICE_EXTENSIONS: 25
  - Extensions:
    - cl_khr_fp64
    - cl_amd_fp64
    - cl_khr_global_int32_base_atomics
    - cl_khr_global_int32_extended_atomics
    - cl_khr_local_int32_base_atomics
    - cl_khr_local_int32_extended_atomics
    - cl_khr_int64_base_atomics
    - cl_khr_int64_extended_atomics
    - cl_khr_3d_image_writes
    - cl_khr_byte_addressable_store
    - cl_khr_fp16
    - cl_khr_gl_sharing
    - cl_amd_device_attribute_query
    - cl_amd_vec3
    - cl_amd_printf
    - cl_amd_media_ops
    - cl_amd_media_ops2
    - cl_amd_popcnt
    - cl_khr_d3d10_sharing
    - cl_khr_d3d11_sharing
    - cl_khr_dx9_media_sharing
    - cl_khr_image2d_from_buffer
    - cl_khr_spir
    - cl_khr_gl_event
    - cl_amd_liquid_flash

Re: AMD GPU Error sortShortList on some projects

Posted: Sat Mar 21, 2020 8:53 pm
by alxbelu
I've observed the same issues with my R9 290x (4gb) on Win10, specifically for projects 11759, 11746, 11747 and 14533, which are indeed all projects with more than 165550 atoms. I had struggled to see the pattern why they failed, but they all fail with the same error and immediately after starting, e.g:

Code: Select all

12:26:54:WU03:FS01:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:11759 run:0 clone:2210 gen:14 core:0x22 unit:0x0000001480fccb0a5e6d7c9833f7cd06
12:26:54:WU03:FS01:Starting
12:26:54:WU03:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\alxbelu\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 03 -suffix 01 -version 705 -lifeline 15036 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
12:26:54:WU03:FS01:Started FahCore on PID 18348
12:26:54:WU03:FS01:Core PID:21604
12:26:54:WU03:FS01:FahCore 0x22 started
12:26:55:WU03:FS01:0x22:*********************** Log Started 2020-03-21T12:26:54Z ***********************
12:26:55:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
12:26:55:WU03:FS01:0x22:       Type: 0x22
12:26:55:WU03:FS01:0x22:       Core: Core22
12:26:55:WU03:FS01:0x22:    Website: https://foldingathome.org/
12:26:55:WU03:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
12:26:55:WU03:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
12:26:55:WU03:FS01:0x22:             <rafal.wiewiora@choderalab.org>
12:26:55:WU03:FS01:0x22:       Args: -dir 03 -suffix 01 -version 705 -lifeline 18348 -checkpoint 15
12:26:55:WU03:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
12:26:55:WU03:FS01:0x22:     Config: <none>
12:26:55:WU03:FS01:0x22:************************************ Build *************************************
12:26:55:WU03:FS01:0x22:    Version: 0.0.2
12:26:55:WU03:FS01:0x22:       Date: Dec 6 2019
12:26:55:WU03:FS01:0x22:       Time: 21:30:31
12:26:55:WU03:FS01:0x22: Repository: Git
12:26:55:WU03:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
12:26:55:WU03:FS01:0x22:     Branch: HEAD
12:26:55:WU03:FS01:0x22:   Compiler: Visual C++ 2008
12:26:55:WU03:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
12:26:55:WU03:FS01:0x22:   Platform: win32 10
12:26:55:WU03:FS01:0x22:       Bits: 64
12:26:55:WU03:FS01:0x22:       Mode: Release
12:26:55:WU03:FS01:0x22:************************************ System ************************************
12:26:55:WU03:FS01:0x22:        CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
12:26:55:WU03:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
12:26:55:WU03:FS01:0x22:       CPUs: 4
12:26:55:WU03:FS01:0x22:     Memory: 23.88GiB
12:26:55:WU03:FS01:0x22:Free Memory: 16.84GiB
12:26:55:WU03:FS01:0x22:    Threads: WINDOWS_THREADS
12:26:55:WU03:FS01:0x22: OS Version: 6.2
12:26:55:WU03:FS01:0x22:Has Battery: false
12:26:55:WU03:FS01:0x22: On Battery: false
12:26:55:WU03:FS01:0x22: UTC Offset: 1
12:26:55:WU03:FS01:0x22:        PID: 21604
12:26:55:WU03:FS01:0x22:        CWD: C:\Users\alxbelu\AppData\Roaming\FAHClient\work
12:26:55:WU03:FS01:0x22:         OS: Windows 10 Pro
12:26:55:WU03:FS01:0x22:    OS Arch: AMD64
12:26:55:WU03:FS01:0x22:********************************************************************************
12:26:55:WU03:FS01:0x22:Project: 11759 (Run 0, Clone 2210, Gen 14)
12:26:55:WU03:FS01:0x22:Unit: 0x0000001480fccb0a5e6d7c9833f7cd06
12:26:55:WU03:FS01:0x22:Reading tar file core.xml
12:26:55:WU03:FS01:0x22:Reading tar file integrator.xml
12:26:55:WU03:FS01:0x22:Reading tar file state.xml
12:26:55:WU03:FS01:0x22:Reading tar file system.xml
12:26:56:WU03:FS01:0x22:Digital signatures verified
12:26:56:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
12:26:56:WU03:FS01:0x22:Version 0.0.2
12:27:09:WU03:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
I also tried updating my AMD drivers to the latest version (20.3.1) (as someone recommended them here on the forum), without any effect (otoh the reported driver to FAH doesn't seem to have changed since the driver update and is still reporting "OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:3004.8", but I'm assuming that the OpenCL driver simply didn't change from my previous version)

edit: also attaching GPU Caps OpenCL info:

Code: Select all

===================================[ OpenCL Capabilities ]
- Num OpenCL platforms: 2
- CL_PLATFORM_NAME: AMD Accelerated Parallel Processing
- CL_PLATFORM_VENDOR: Advanced Micro Devices, Inc.
- CL_PLATFORM_VERSION: OpenCL 2.1 AMD-APP (3004.8)
- CL_PLATFORM_PROFILE: FULL_PROFILE
- Num devices: 1

  - CL_DEVICE_NAME: Hawaii
  - CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
  - CL_DRIVER_VERSION: 3004.8
  - CL_DEVICE_PROFILE: FULL_PROFILE
  - CL_DEVICE_VERSION: OpenCL 1.2 AMD-APP (3004.8)
  - CL_DEVICE_TYPE: GPU
  - CL_DEVICE_VENDOR_ID: 0x1002
  - CL_DEVICE_MAX_COMPUTE_UNITS: 44
  - CL_DEVICE_MAX_CLOCK_FREQUENCY: 1500MHz
  - CL_DEVICE_ADDRESS_BITS: 32
  - CL_DEVICE_MAX_MEM_ALLOC_SIZE: 3145728KB
  - CL_DEVICE_GLOBAL_MEM_SIZE: 3072MB
  - CL_DEVICE_MAX_PARAMETER_SIZE: 1024
  - CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64 Bytes
  - CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 16KB
  - CL_DEVICE_ERROR_CORRECTION_SUPPORT: NO
  - CL_DEVICE_LOCAL_MEM_TYPE: Local (scratchpad)
  - CL_DEVICE_LOCAL_MEM_SIZE: 32KB
  - CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 3145728KB
  - CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
  - CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 ; 1024 ; 1024]
  - CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
  - CL_EXEC_NATIVE_KERNEL: 12886508
  - CL_DEVICE_IMAGE_SUPPORT: YES
  - CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
  - CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
  - CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
  - CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
  - CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
  - CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
  - CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
  - CL_DEVICE_MAX_SAMPLERS: 16
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 4
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 2
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
  - CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
  - CL_DEVICE_EXTENSIONS: 24
  - Extensions:
    - cl_khr_fp64
    - cl_amd_fp64
    - cl_khr_global_int32_base_atomics
    - cl_khr_global_int32_extended_atomics
    - cl_khr_local_int32_base_atomics
    - cl_khr_local_int32_extended_atomics
    - cl_khr_int64_base_atomics
    - cl_khr_int64_extended_atomics
    - cl_khr_3d_image_writes
    - cl_khr_byte_addressable_store
    - cl_khr_gl_sharing
    - cl_amd_device_attribute_query
    - cl_amd_vec3
    - cl_amd_printf
    - cl_amd_media_ops
    - cl_amd_media_ops2
    - cl_amd_popcnt
    - cl_khr_d3d10_sharing
    - cl_khr_d3d11_sharing
    - cl_khr_dx9_media_sharing
    - cl_khr_image2d_from_buffer
    - cl_khr_spir
    - cl_khr_gl_event
    - cl_amd_liquid_flash

Re: AMD GPU Error sortShortList on some projects

Posted: Sat Mar 21, 2020 11:49 pm
by _r2w_ben
I was reading through more source code and came across references to values from AMD OpenCL extensions that were not included in the output from standard clinfo. Here is the output from another clinfo tool that tries to pull all possible values.

Code: Select all

  Device Name                                     Baffin
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 AMD-APP (2348.4)
  Driver Version                                  2348.4
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         Radeon(TM) RX 460 Graphics 
  Device Topology (AMD)                           PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               14
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1210MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     14
    Supported partition types                     (n/a)
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size (AMD)                 <printDeviceInfo:40: get CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD : error -30>
  Max work group size (AMD)                       <printDeviceInfo:41: get CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD : error -30>
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Global free memory (AMD)                        2045385 (1.951GiB)
  Global memory channels (AMD)                    4
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           1409286144 (1.313GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    1268357376 (1.181GiB)
  Preferred total size of global vars             2147483648 (2GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    16
  Max pipe packet size                            1409286144 (1.313GiB)
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        1409286144 (1.313GiB)
  Preferred constant buffer size (AMD)            <printDeviceInfo:133: get CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD : error -30>
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                262144 (256KiB)
    Max size                                      8388608 (8MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1584799148102833223ns (Sat Mar 21 09:59:08 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             0
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash
Values of constants for RX 460:

OpenCLSort.cpp
CL_DEVICE_MAX_WORK_GROUP_SIZE = 256
CL_DEVICE_LOCAL_MEM_SIZE = 32768
CL_KERNEL_WORK_GROUP_SIZE = ?

OpenCLContext.cpp
CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD = 4
CL_DEVICE_WAVEFRONT_WIDTH_AMD = 64

Re: AMD GPU Error sortShortList on some projects

Posted: Sun Mar 22, 2020 8:58 am
by somata
Nice detective work _r2w_ben! Been having the same issue on my RX 480, and made the connection to larger simulations but didn't go so far as to look at the OpenMM code. Perhaps this insight can help Pande Labs/OpenMM/AMD figure out a solution.

Re: AMD GPU Error sortShortList on some projects

Posted: Sun Mar 22, 2020 6:16 pm
by bruce
I'd say this is one of the disadvantages of supporting a wealth of different GPU models with a single hardware device code.
Surely the ID 0x1002:0x67df (also known as the Ellesmere XT [Radeon RX 470/480/570/580]) represents some variation in hardware capabilites.

FAH treats them as identical hardware.

Re: AMD GPU Error sortShortList on some projects

Posted: Mon Mar 23, 2020 10:34 am
by _r2w_ben
bruce wrote:I'd say this is one of the disadvantages of supporting a wealth of different GPU models with a single hardware device code.
Surely the ID 0x1002:0x67df (also known as the Ellesmere XT [Radeon RX 470/480/570/580]) represents some variation in hardware capabilites.

FAH treats them as identical hardware.
I don't think that's the problem here. These GPUs are all very similar. 470 is a cut down 480. 570 and 580 are minor tweaks to power and clock speeds of the same GPU launched the following year.
https://www.anandtech.com/show/11278/am ... 570-review

OpenMM appears to make an incorrect guess of the memory size available within a CU on Polaris 10 and Polaris 11. The researchers tend to have access to higher-end hardware so this is understandable.
peastman would probably be the most able to confirm this.

Re: AMD GPU Error sortShortList on some projects

Posted: Mon Mar 23, 2020 1:52 pm
by alxbelu
_r2w_ben wrote:
bruce wrote:I'd say this is one of the disadvantages of supporting a wealth of different GPU models with a single hardware device code.
Surely the ID 0x1002:0x67df (also known as the Ellesmere XT [Radeon RX 470/480/570/580]) represents some variation in hardware capabilites.

FAH treats them as identical hardware.
I don't think that's the problem here. These GPUs are all very similar. 470 is a cut down 480. 570 and 580 are minor tweaks to power and clock speeds of the same GPU launched the following year.
https://www.anandtech.com/show/11278/am ... 570-review

OpenMM appears to make an incorrect guess of the memory size available within a CU on Polaris 10 and Polaris 11. The researchers tend to have access to higher-end hardware so this is understandable.
peastman would probably be the most able to confirm this.
As mentioned I seem to be experiencing the same issue on an R9 290x as well, though I guess technically they are all GCN-based, albeit different generations.

edit: adding clinfo for comparisons sake

Code: Select all

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3004.8)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  100ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     Hawaii
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 AMD-APP (3004.8)
  Driver Version                                  3004.8
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon R9 200 Series
  Device Topology (AMD)                           PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               44
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1000MHz
  Graphics IP (AMD)                               7.2
  Device Partition                                (core)
    Max number of sub-devices                     44
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Global free memory (AMD)                        4142762 (3.951GiB)
  Global memory channels (AMD)                    16
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           3422552064 (3.188GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    3080296704 (2.869GiB)
  Preferred total size of global vars             4294967296 (4GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    16
  Max pipe packet size                            3422552064 (3.188GiB)
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        3422552064 (3.188GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                262144 (256KiB)
    Max size                                      8388608 (8MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1584955033209549400ns (Mon Mar 23 10:17:13 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            2
    Max real-time compute units (AMD)             12
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv

Re: AMD GPU Error sortShortList on some projects

Posted: Mon Mar 23, 2020 7:49 pm
by muziqaz
These were failing on Radeon 7 and Vega64 as well. This is some sort of issue with OpenCL config on these projects. Author of the projects is aware of the issue, but with everything what is going on in fah world right now, its kinda hard to concentrate to find the issue.
Simulation size is not a cause for this, as AMD GPUs are known to work better in large atom count simulations. AMD GPUs worked flawlessly with atom count up to 260k. One of the projects was quite recent and is still circulating, can't remember the number.
Also, those failing projects have been tested on very old AMD APU with integrated HD8250 class GPU with ancient opencl version (1800), and what do you know, it is folding.
This is AMD only issue unfortunately.
none the less, nice investigative work there :)

Re: AMD GPU Error sortShortList on some projects

Posted: Mon Mar 23, 2020 8:05 pm
by alxbelu
muziqaz wrote:These were failing on Radeon 7 and Vega64 as well. This is some sort of issue with OpenCL config on these projects. Author of the projects is aware of the issue, but with everything what is going on in fah world right now, its kinda hard to concentrate to find the issue.
Simulation size is not a cause for this, as AMD GPUs are known to work better in large atom count simulations. AMD GPUs worked flawlessly with atom count up to 260k. One of the projects was quite recent and is still circulating, can't remember the number.
Also, those failing projects have been tested on very old AMD APU with integrated HD8250 class GPU with ancient opencl version (1800), and what do you know, it is folding.
This is AMD only issue unfortunately.
none the less, nice investigative work there :)
Was the 260k atom count also with Core22? (I'm new, but got the impression that Core22 was fairly new as well?)
Worth mentioning is that there seems to be at least six different project owners to the projects that have failed here: rafal.wiewiora, voelz, vithanin, totowfr, joseph and jrporter.
(I have personally got logs of: 11746, 11747, 11752, 11759, 11764, 11776, 11781, 14533. But OP has additionally noted 11758 and 14551)

edit: bruce mentioned it had been escalated to the OpenMM team in this thread: viewtopic.php?f=81&t=32771

Re: AMD GPU Error sortShortList on some projects

Posted: Mon Mar 23, 2020 8:19 pm
by muziqaz
alxbelu wrote:
muziqaz wrote:These were failing on Radeon 7 and Vega64 as well. This is some sort of issue with OpenCL config on these projects. Author of the projects is aware of the issue, but with everything what is going on in fah world right now, its kinda hard to concentrate to find the issue.
Simulation size is not a cause for this, as AMD GPUs are known to work better in large atom count simulations. AMD GPUs worked flawlessly with atom count up to 260k. One of the projects was quite recent and is still circulating, can't remember the number.
Also, those failing projects have been tested on very old AMD APU with integrated HD8250 class GPU with ancient opencl version (1800), and what do you know, it is folding.
This is AMD only issue unfortunately.
none the less, nice investigative work there :)
Was the 260k atom count also with Core22? (I'm new, but got the impression that Core22 was fairly new as well?)
Worth mentioning is that there seems to be at least six different project owners to the projects that have failed here: rafal.wiewiora, voelz, vithanin, totowfr, joseph and jrporter.
(I have personally got logs of: 11746, 11747, 11752, 11759, 11764, 11776, 11781, 14533. But OP has additionally noted 11758 and 14551)

edit: bruce mentioned it had been escalated to the OpenMM team in this thread: viewtopic.php?f=81&t=32771
p11738 Atom count 287229. OpenMM core_22 :) Worked flawlessly on my Radeon 7 with 2.2m PPD :)

All the listed failing projects have same roots, and owners share things between each other. If they didn't they would be stepping on each other's toes :) And yes, researchers are aware of these :)
Thank you

Re: AMD GPU Error sortShortList on some projects

Posted: Mon Mar 23, 2020 8:37 pm
by alxbelu
muziqaz wrote: All the listed failing projects have same roots, and owners share things between each other. If they didn't they would be stepping on each other's toes :) And yes, researchers are aware of these :)
Alright, obviously makes sense ;) Thanks for clarifying!

Re: AMD GPU Error sortShortList on some projects

Posted: Tue Mar 24, 2020 9:42 am
by sam6861
9 more errors, now with projects 11747, 11758, 11759, 11776, 11781, 14533

My Hardware ID shown in device manager.
(FS01) AMD RX 580: PCI\VEN_1002&DEV_67DF&SUBSYS_C5801682&REV_E7
(FS03) Vega 11: PCI\VEN_1002&DEV_15DD&SUBSYS_876B1043&REV_C6

My other computer (windows 10, AMD driver 19.2.3) have a somewhat slow GPU and slow download of work units, it completed just 5 work units and haven't downloaded any faulty project numbers which might be good so far.
AMD RX 550: PCI\VEN_1002&DEV_67FF&SUBSYS_E3671DA2&REV_FF

Going farther, I happen to copy a failed work unit and manually run FahCore_22 command line. This specific work unit fails all the time on all 3 of my different AMD GPU (RX 580, RX 550, Vega 11) with sortShortList error. Using gpu-vendor nvidia on computer with NVidia GPU (my GT 1030) appears to work fine, using NVidia CUDA processing, with log showing percent complete, until I manually quit out with ctrl + C.

Maybe this sortShortList error on affected project numbers happens to all AMD GCN?

Code: Select all

logs\log-20200321-112208.txt:10:12:32:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11776 run:0 clone:1538 gen:4 core:0x22 unit:0x0000000c287234c95e73c47d47fd7729
logs\log-20200321-112208.txt:10:13:08:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11758 run:0 clone:833 gen:0 core:0x22 unit:0x000000099bf7a4d55e6d77116aaf791f
logs\log-20200324-084817.txt:14:28:13:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14533 run:0 clone:3956 gen:2 core:0x22 unit:0x0000000680fccb025e72f222517f50f6
logs\log-20200324-084817.txt:09:18:55:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11747 run:0 clone:5750 gen:4 core:0x22 unit:0x000000098ca304e75e6bab8cb89c5ccb
logs\log-20200324-084817.txt:05:48:26:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11759 run:0 clone:7605 gen:11 core:0x22 unit:0x0000001180fccb0a5e6ea0d3aa6b008e
logs\log-20200324-084817.txt:06:34:47:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11781 run:0 clone:2504 gen:7 core:0x22 unit:0x000000090d5a98395e73c51775cd22d4
logs\log-20200324-084817.txt:12:28:20:WU04:FS01:Sending unit results: id:04 state:SEND error:FAULTY project:11747 run:0 clone:515 gen:9 core:0x22 unit:0x000000118ca304e75e6a7fc710aa491e
logs\log-20200324-084817.txt:18:01:21:WU03:FS03:Sending unit results: id:03 state:SEND error:FAULTY project:11758 run:0 clone:1241 gen:0 core:0x22 unit:0x000000099bf7a4d55e6d77129c1e48c9
logs\log-20200324-084817.txt:08:12:14:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11776 run:0 clone:7964 gen:2 core:0x22 unit:0x00000006287234c95e74337cabc82607

C:\Users\sam86\AppData\Roaming\FAHClient>findstr "ERROR:exception" logs\*.txt log.txt
logs\log-20200321-112208.txt:10:12:31:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200321-112208.txt:10:13:07:WU02:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:14:28:12:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:09:18:54:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:05:48:26:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:06:34:46:WU02:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:12:28:19:WU04:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:18:01:19:WU03:FS03:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
logs\log-20200324-084817.txt:08:12:14:WU02:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)

C:\Users\sam86\AppData\Roaming\FAHClient\TEST1\99>C:\Users\sam86\AppData\Roaming\FAHClient\cores\cores.foldingathome.org\v7\win\64bit\Core_22.fah\FahCore_22.exe -dir "C:\Users\sam86\AppData\Roaming\FAHClient\TEST1\99" -suffix 01 -version 705 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
C:\Users\sam86\AppData\Roaming\FAHClient\TEST1\99>type logfile_01.txt
*********************** Log Started 2020-03-24T09:07:46Z ***********************
*************************** Core22 Folding@home Core ***************************
       Type: 0x22
       Core: Core22
  Copyright: (c) 2009-2018 foldingathome.org
     Author: John Chodera <john.chodera ...> and Rafal Wiewiora
             <rafal.wiewiora ...>
       Args: -dir C:\Users\sam86\AppData\Roaming\FAHClient\TEST1\99 -suffix 01
             -version 705 -checkpoint 15 -gpu-vendor amd -opencl-platform 0
             -opencl-device 0 -gpu 0
     Config: <none>
************************************ Build *************************************
    Version: 0.0.2
       Date: Dec 6 2019
       Time: 21:30:31
 Repository: Git
   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
     Branch: HEAD
   Compiler: Visual C++ 2008
    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
   Platform: win32 10
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: AMD Ryzen 5 2400G with Radeon Vega Graphics
     CPU ID: AuthenticAMD Family 23 Model 17 Stepping 0
       CPUs: 8
     Memory: 31.81GiB
Free Memory: 26.16GiB
    Threads: WINDOWS_THREADS
 OS Version: 6.2
Has Battery: false
 On Battery: false
 UTC Offset: -5
        PID: 4740
        CWD: C:\Users\sam86\AppData\Roaming\FAHClient\TEST1\99
         OS: Windows 10 Home
    OS Arch: AMD64
********************************************************************************
Project: 11776 (Run 0, Clone 1911, Gen 2)
Unit: 0x00000003287234c95e73c47a4958b245
Reading tar file core.xml
Reading tar file integrator.xml
Reading tar file state.xml
Reading tar file system.xml
Digital signatures verified
Folding@home GPU Core22 Folding@home Core
Version 0.0.2
ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
Saving result file ..\logfile_01.txt
Saving result file science.log
Folding@home Core Shutdown: BAD_WORK_UNIT


Re: AMD GPU Error sortShortList on some projects

Posted: Tue Mar 24, 2020 12:51 pm
by muziqaz
Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Thank you for understanding