muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?
Successful projects (tracked in the spreadsheet in my sig): 11741-11752, 11755, 11759, 11762-11764, 11776-11778, 11780, 11781
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?
Successful projects (tracked in the spreadsheet in my sig): 11741-11752, 11755, 11759, 11762-11764, 11776-11778, 11780, 11781
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?
Successful projects (tracked in the spreadsheet in my sig): 11741-11752, 11755, 11759, 11762-11764, 11776-11778, 11780, 11781
Thank you for information. Seems that GCN based cards are influenced.
Big Navi can't come quick enough
Yep, and yep! (Was planning on upgrading my desktop this year, my 290x just turned 6 and deserves retirement, but I guess we'll see if launches actually happen as planned this year..)
_r2w_ben wrote:
The restriction needs to be added to p14533. One was assigned at 2020-03-25T23:29:18Z.
Thanks for the info. It was passed to researchers.
On the 5700 XT, I was able to process the only 14533 project I got to 100% and sent the results to the server only to have the server dump the results. So while this is a different result than the kernel message from before, I think it needs to be pointed out for distinction. Whatever the kernel message is about, it is not for all AMD cards.
As pointed out in an earlier post, I can process all of the COVID-19 core22 related projects just fine, not one has erred out for any reason besides me messing with my overclock. See the spreadsheet in my sig, I have tracked 95 successful COVID-19 core22 projects (85 are shown). If any of the devs/researchers need more information, I can provide PRCG numbers for all projects with timestamps or even the full logs (I archive all of them before the client can clean them out).
I would suggest not blocking all AMD cards on these projects and to allow species 6 to continue folding.
I also see this problem.
AMD R9 280X 3GB (ID: 6798 SUB: 3001)
Project: 11776
It often seems to be stuck after the "...0x22:Version 0.0.2" log line. If I leave it alone it will stay there for hours. If I pause/unpause it either get stuck there again or it finishes with the error. At least it will retry to fetch another WU.
Just an update, some people at AMD are aware of this issue and are looking into it
Hopefully we will have it solved sooner rather than later
Thank you for your patience
First a temporary solution from FAH: Those projects will not be assigned to that group of GPUs.
Second, a permanent solution: New AMD drivers or a new FAHCore from FAH will be prepared that fixes the original problem. (Then the temporary solution will be removed.)
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?
It appears that the AMD GPUs are still getting this family of projects.
Project 11776 just failed on my RX 580, and I've had a few work units end in a status "Failure 2" as reported on the stats page.
I've seen a few instances where I have a ~20 credit job submitted, and if I catch it and examine it, find a failed unit.
20:05:04:WU02:FS01:Connecting to 65.254.110.245:8080
20:05:04:WU02:FS01:Assigned to work server 140.163.4.231
20:05:04:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] from 140.163.4.231
20:05:04:WU02:FS01:Connecting to 140.163.4.231:8080
20:05:25:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
20:05:25:WU02:FS01:Connecting to 140.163.4.231:80
20:05:46:ERROR:WU02:FS01:Exception: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2020-04-06 *******************************
01:27:04:WU02:FS01:Connecting to 65.254.110.245:8080
01:27:04:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
01:27:04:WU02:FS01:Connecting to 18.218.241.186:80
01:27:04:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
01:27:04:ERROR:WU02:FS01:Exception: Could not get an assignment
01:51:27:WU02:FS01:Connecting to 65.254.110.245:8080
01:51:28:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
01:51:28:WU02:FS01:Connecting to 18.218.241.186:80
01:51:29:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
01:51:29:ERROR:WU02:FS01:Exception: Could not get an assignment
01:53:04:WU02:FS01:Connecting to 65.254.110.245:8080
01:53:04:WU02:FS01:Assigned to work server 40.114.52.201
01:53:04:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] from 40.114.52.201
01:53:04:WU02:FS01:Connecting to 40.114.52.201:8080
01:53:32:WU02:FS01:Downloading 79.12MiB
01:53:38:WU02:FS01:Download 7.74%
01:53:44:WU02:FS01:Download 19.59%
01:53:50:WU02:FS01:Download 30.10%
01:53:56:WU02:FS01:Download 43.84%
01:54:02:WU02:FS01:Download 57.90%
01:54:08:WU02:FS01:Download 71.96%
01:54:14:WU02:FS01:Download 86.57%
01:54:19:WU02:FS01:Download complete
01:54:19:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11776 run:0 clone:31304 gen:7 core:0x22 unit:0x0000000b287234c95e7931c2b282407f
01:54:20:WU02:FS01:Starting
01:54:20:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Josh\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 705 -lifeline 14036 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
01:54:20:WU02:FS01:Started FahCore on PID 2444
01:54:20:WU02:FS01:Core PID:13684
01:54:20:WU02:FS01:FahCore 0x22 started
01:54:20:WU02:FS01:0x22:*********************** Log Started 2020-04-07T01:54:20Z ***********************
01:54:20:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
01:54:20:WU02:FS01:0x22: Type: 0x22
01:54:20:WU02:FS01:0x22: Core: Core22
01:54:20:WU02:FS01:0x22: Website: https://foldingathome.org/
01:54:20:WU02:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
01:54:20:WU02:FS01:0x22: Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
01:54:20:WU02:FS01:0x22: <rafal.wiewiora@choderalab.org>
01:54:20:WU02:FS01:0x22: Args: -dir 02 -suffix 01 -version 705 -lifeline 2444 -checkpoint 15
01:54:20:WU02:FS01:0x22: -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
01:54:20:WU02:FS01:0x22: Config: <none>
01:54:20:WU02:FS01:0x22:************************************ Build *************************************
01:54:20:WU02:FS01:0x22: Version: 0.0.2
01:54:20:WU02:FS01:0x22: Date: Dec 6 2019
01:54:20:WU02:FS01:0x22: Time: 21:30:31
01:54:20:WU02:FS01:0x22: Repository: Git
01:54:20:WU02:FS01:0x22: Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
01:54:20:WU02:FS01:0x22: Branch: HEAD
01:54:20:WU02:FS01:0x22: Compiler: Visual C++ 2008
01:54:20:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
01:54:20:WU02:FS01:0x22: Platform: win32 10
01:54:20:WU02:FS01:0x22: Bits: 64
01:54:20:WU02:FS01:0x22: Mode: Release
01:54:20:WU02:FS01:0x22:************************************ System ************************************
01:54:20:WU02:FS01:0x22: CPU: AMD Ryzen 5 3600 6-Core Processor
01:54:20:WU02:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
01:54:20:WU02:FS01:0x22: CPUs: 12
01:54:20:WU02:FS01:0x22: Memory: 31.94GiB
01:54:20:WU02:FS01:0x22:Free Memory: 25.86GiB
01:54:20:WU02:FS01:0x22: Threads: WINDOWS_THREADS
01:54:20:WU02:FS01:0x22: OS Version: 6.2
01:54:20:WU02:FS01:0x22:Has Battery: false
01:54:20:WU02:FS01:0x22: On Battery: false
01:54:20:WU02:FS01:0x22: UTC Offset: -7
01:54:20:WU02:FS01:0x22: PID: 13684
01:54:20:WU02:FS01:0x22: CWD: C:\Users\Josh\AppData\Roaming\FAHClient\work
01:54:20:WU02:FS01:0x22: OS: Windows 10 Pro
01:54:20:WU02:FS01:0x22: OS Arch: AMD64
01:54:20:WU02:FS01:0x22:********************************************************************************
01:54:20:WU02:FS01:0x22:Project: 11776 (Run 0, Clone 31304, Gen 7)
01:54:20:WU02:FS01:0x22:Unit: 0x0000000b287234c95e7931c2b282407f
01:54:20:WU02:FS01:0x22:Reading tar file core.xml
01:54:20:WU02:FS01:0x22:Reading tar file integrator.xml
01:54:20:WU02:FS01:0x22:Reading tar file state.xml
01:54:21:WU02:FS01:0x22:Reading tar file system.xml
01:54:21:WU02:FS01:0x22:Digital signatures verified
01:54:21:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
01:54:21:WU02:FS01:0x22:Version 0.0.2
01:54:37:WU02:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
01:54:37:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
01:54:37:WU02:FS01:0x22:Saving result file science.log
01:54:37:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
01:54:37:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
01:54:37:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11776 run:0 clone:31304 gen:7 core:0x22 unit:0x0000000b287234c95e7931c2b282407f
01:54:37:WU02:FS01:Uploading 8.00KiB to 40.114.52.201