Page 5 of 6
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Fri Feb 28, 2020 9:08 am
by MeeLee
Small units can still be done by CPU if need be, or small GPUs (like IGPUS of AMD).
n_w95482 wrote:
One other thing that I've noticed is that both with these core 22 WUs and the core 21 ones in advanced, my 1080 Ti's coil whine has been greatly reduced
.
A byproduct of the capacitors being under near to full load, as opposed to switching on/off all the time.
Core 22 lowers GPU frequency more than core 21.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Sat Mar 07, 2020 9:23 pm
by jima13
I had to shut down one of my 1080ti systems due to constant crashes. I watched the FaH log as it went bad work unit (114 0x72) 4-5 times followed by "too many errors failing"..as soon as 'failing' hit the system reboots...this is with 0x22 11767
I had been having trouble with 8.1 failing updates so installed 10, same problem. I'll try and post from the system log itself 'if' I can get back in since it's been rebooting before I can pause FaH...oh, and checking the windows error log shows a kernal error at that time...
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Sat Mar 07, 2020 9:33 pm
by toTOW
This sounds like a hardware issue ...
I hope you'll be able to post some logs.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Sat Mar 07, 2020 9:58 pm
by jima13
I thought the same thing too, however I can't recollect this ever happening with 0x21. I ran a stress test on the 1080, but I will do another. If all else fails I do have an AMD board ready to install. Will try and get those logs posted if the system lets me in
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Sun Mar 08, 2020 8:45 am
by foldy
0x22 is more demanding on HW than 0x21. So maybe downclock or power limit the failing gtx 1080ti could help.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Tue Mar 10, 2020 3:18 am
by cainscou
Just thought I'd share my experience so far with 0x22. Folding on a dedicated machine (ancient AMD Phenom II X4 925 Processor slot is turned off) with a GTX1070 under Linux Mint 18 "Sarah". Folding is stable. GPU temps are around 82C with the fan running about 70% capacity. No OC. So far, so good. Folding about 700k PPD. I did have it hang, waiting on a WU after an apt-get update. But a reboot fixed that.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Wed Mar 11, 2020 1:01 am
by jpalpant
I've been having some trouble with this core - or possibly with my setup. New folder, I only really started last weekend after I got sick of my GPU sitting around. I have maybe an unconventional setup, but am happy to debug however I can, I'm just not too familiar with tracking down issues like this yet.
Running on Ubuntu 19.10, with an RTX 2080 Super. 440.59 drivers and CUDA 10.2. I'm running with Docker as well (nvidia-container-runtime) which might be complicating things. I've had pretty good luck running other GPU cores for the past few days, but this afternoon Core22 started erroring out. Every time this work unit starts I see an immediate "FahCore returned: INTERRUPTED (102 = 0x66)". Pausing and restarting, restarting the container, rebooting the PC to be sure, and clearing out the `cores/` and `work/` folders for the client and restarting did not help.
Any advice on how to track down what that error code means and where it's coming from?
Full logs for that core, in isolation:
Code: Select all
00:51:18:WU01:FS01:Starting
00:51:18:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/opt/folding/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 705 -lifeline 8 -ch
eckpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
00:51:18:WU01:FS01:Started FahCore on PID 70
00:51:18:WU01:FS01:Core PID:74
00:51:18:WU01:FS01:FahCore 0x22 started
00:51:18:WU01:FS01:0x22:*********************** Log Started 2020-03-11T00:51:18Z ***********************
00:51:18:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:51:18:WU01:FS01:0x22: Type: 0x22
00:51:18:WU01:FS01:0x22: Core: Core22
00:51:18:WU01:FS01:0x22: Website: https://foldingathome.org/
00:51:18:WU01:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
00:51:18:WU01:FS01:0x22: Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
00:51:18:WU01:FS01:0x22: <rafal.wiewiora@choderalab.org>
00:51:18:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 705 -lifeline 70 -checkpoint 15
00:51:18:WU01:FS01:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
00:51:18:WU01:FS01:0x22: 0 -gpu 0
00:51:18:WU01:FS01:0x22: Config: <none>
00:51:18:WU01:FS01:0x22:************************************ Build *************************************
00:51:18:WU01:FS01:0x22: Version: 0.0.2
00:51:18:WU01:FS01:0x22: Date: Dec 6 2019
00:51:18:WU01:FS01:0x22: Time: 21:20:17
00:51:18:WU01:FS01:0x22: Repository: Git
00:51:18:WU01:FS01:0x22: Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
00:51:18:WU01:FS01:0x22: Branch: core22
00:51:18:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
00:51:18:WU01:FS01:0x22: Options: -std=gnu++98 -O3 -funroll-loops
00:51:18:WU01:FS01:0x22: Platform: linux2 4.9.87-linuxkit-aufs
00:51:18:WU01:FS01:0x22: Bits: 64
00:51:18:WU01:FS01:0x22: Mode: Release
00:51:18:WU01:FS01:0x22:************************************ System ************************************
00:51:18:WU01:FS01:0x22: CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
00:51:18:WU01:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 12
00:51:18:WU01:FS01:0x22: CPUs: 16
00:51:18:WU01:FS01:0x22: Memory: 31.30GiB
00:51:18:WU01:FS01:0x22:Free Memory: 15.37GiB
00:51:18:WU01:FS01:0x22: Threads: POSIX_THREADS
00:51:18:WU01:FS01:0x22: OS Version: 5.5
00:51:18:WU01:FS01:0x22:Has Battery: false
00:51:18:WU01:FS01:0x22: On Battery: false
00:51:18:WU01:FS01:0x22: UTC Offset: 0
00:51:18:WU01:FS01:0x22: PID: 74
00:51:18:WU01:FS01:0x22: CWD: /var/opt/folding/work
00:51:18:WU01:FS01:0x22: OS: Linux 5.5.5-050505-generic x86_64
00:51:18:WU01:FS01:0x22: OS Arch: AMD64
00:51:18:WU01:FS01:0x22:********************************************************************************
00:51:18:WU01:FS01:0x22:Project: 11741 (Run 0, Clone 2360, Gen 1)
00:51:18:WU01:FS01:0x22:Unit: 0x000000018ca304f15e67d8cb67bdf2b9
00:51:18:WU01:FS01:0x22:Reading tar file core.xml
00:51:18:WU01:FS01:0x22:Reading tar file integrator.xml
00:51:18:WU01:FS01:0x22:Reading tar file state.xml
00:51:18:WU01:FS01:0x22:Reading tar file system.xml
00:51:19:WU01:FS01:0x22:Digital signatures verified
00:51:19:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:51:19:WU01:FS01:0x22:Version 0.0.2
00:51:19:85:127.0.0.1:New Web connection
00:51:21:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
Plus the logs from startup:
Code: Select all
00:50:15:INFO(1):Read GPUs.txt
00:50:15:Removing old file 'logs/log-20200306-023030.txt'
00:50:15:************************* Folding@home Client *************************
00:50:15: Website: https://foldingathome.org/
00:50:15: Copyright: (c) 2009-2018 foldingathome.org
00:50:15: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:50:15: Args: --web-allow=0/0 --allow=0/0 --cpu-usage=35 --session-lifetime=0
00:50:15: --session-timeout=0 --command-enable=true
00:50:15: --command-address=0.0.0.0 --command-allow-no-pass=0/0
00:50:15: --command-port=36330
00:50:15: Config: /var/opt/folding/config.xml
00:50:15:******************************** Build ********************************
00:50:15: Version: 7.5.1
00:50:15: Date: May 11 2018
00:50:15: Time: 19:59:04
00:50:15: Repository: Git
00:50:15: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
00:50:15: Branch: master
00:50:15: Compiler: GNU 6.3.0 20170516
00:50:15: Options: -std=gnu++98 -O3 -funroll-loops
00:50:15: Platform: linux2 4.14.0-3-amd64
00:50:15: Bits: 64
00:50:15: Mode: Release
00:50:15:******************************* System ********************************
00:50:15: CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
00:50:15: CPU ID: GenuineIntel Family 6 Model 158 Stepping 12
00:50:15: CPUs: 16
00:50:15: Memory: 31.30GiB
00:50:15: Free Memory: 19.51GiB
00:50:15: Threads: POSIX_THREADS
00:50:15: OS Version: 5.5
00:50:15: Has Battery: false
00:50:15: On Battery: false
00:50:15: UTC Offset: 0
00:50:15: PID: 8
00:50:15: CWD: /var/opt/folding
00:50:15: OS: Linux 5.5.5-050505-generic x86_64
00:50:15: OS Arch: AMD64
00:50:15: GPUs: 1
00:50:15: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2080 Super]
00:50:15: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.2
00:50:15:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:440.59
00:50:15:***********************************************************************
00:50:15:<config>
00:50:15: <!-- Slot Control -->
00:50:15: <power v='MEDIUM'/>
00:50:15:
00:50:15: <!-- User Information -->
00:50:15: <passkey v='********************************'/>
00:50:15: <team v='224497'/>
00:50:15: <user v='iavas_ALL_1HGuzc3yMQT2gABNc8Q6B1eaSWY934J55i'/>
00:50:15:
00:50:15: <!-- Folding Slots -->
00:50:15: <slot id='0' type='CPU'/>
00:50:15: <slot id='1' type='GPU'/>
00:50:15:</config>
00:50:15:Trying to access database...
00:50:15:Successfully acquired database lock
00:50:15:Enabled folding slot 00: READY cpu:14
00:50:15:Enabled folding slot 01: READY gpu:0:TU104 [GeForce RTX 2080 Super]
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Wed Mar 11, 2020 3:24 am
by rafwiewiora
Thanks a lot for the report, problem is I don't think I have a way for you to only run core21 until we sort this out. Let me think about this overnight, maybe there is something we can do, I suspect there might be more people like you coming.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Wed Mar 11, 2020 5:05 am
by jpalpant
rafwiewiora wrote:Thanks a lot for the report, problem is I don't think I have a way for you to only run core21 until we sort this out. Let me think about this overnight, maybe there is something we can do, I suspect there might be more people like you coming.
No worries, if there's any other info I can get you I'm happy to.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Wed Mar 11, 2020 7:04 pm
by bruce
I don't use the Cause preference setting so I don't know much about it, but you might be able to choose an option other than "any" and increase the probability of NOT getting Core_22 assignments (or at least projects that don't exhibit the INTERRUPTED error).
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Thu Mar 12, 2020 9:04 pm
by jpalpant
For completeness' sake, I should update and say that I think the issue is likely in my Docker/nvidia-container-runtime/Kubernetes environment and how it interacts with hardware, rather than the core. I don't know why I didn't think of this but I quickly tried running FAHClient from the native OS and the core has no trouble, and proceeds like this:
Code: Select all
21:00:31:WU01:FS01:0x22:*********************** Log Started 2020-03-12T21:00:31Z ***********************
21:00:31:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
21:00:31:WU01:FS01:0x22: Type: 0x22
21:00:31:WU01:FS01:0x22: Core: Core22
21:00:31:WU01:FS01:0x22: Website: https://foldingathome.org/
21:00:31:WU01:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
21:00:31:WU01:FS01:0x22: Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
21:00:31:WU01:FS01:0x22: <rafal.wiewiora@choderalab.org>
21:00:31:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 705 -lifeline 2593542 -checkpoint 15
21:00:31:WU01:FS01:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
21:00:31:WU01:FS01:0x22: 0 -gpu 0
21:00:31:WU01:FS01:0x22: Config: <none>
21:00:31:WU01:FS01:0x22:************************************ Build *************************************
21:00:31:WU01:FS01:0x22: Version: 0.0.2
21:00:31:WU01:FS01:0x22: Date: Dec 6 2019
21:00:31:WU01:FS01:0x22: Time: 21:20:17
21:00:31:WU01:FS01:0x22: Repository: Git
21:00:31:WU01:FS01:0x22: Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
21:00:31:WU01:FS01:0x22: Branch: core22
21:00:31:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
21:00:31:WU01:FS01:0x22: Options: -std=gnu++98 -O3 -funroll-loops
21:00:31:WU01:FS01:0x22: Platform: linux2 4.9.87-linuxkit-aufs
21:00:31:WU01:FS01:0x22: Bits: 64
21:00:31:WU01:FS01:0x22: Mode: Release
21:00:31:WU01:FS01:0x22:************************************ System ************************************
21:00:31:WU01:FS01:0x22: CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
21:00:31:WU01:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 12
21:00:31:WU01:FS01:0x22: CPUs: 16
21:00:31:WU01:FS01:0x22: Memory: 31.30GiB
21:00:31:WU01:FS01:0x22:Free Memory: 1.89GiB
21:00:31:WU01:FS01:0x22: Threads: POSIX_THREADS
21:00:31:WU01:FS01:0x22: OS Version: 5.5
21:00:31:WU01:FS01:0x22:Has Battery: false
21:00:31:WU01:FS01:0x22: On Battery: false
21:00:31:WU01:FS01:0x22: UTC Offset: -7
21:00:31:WU01:FS01:0x22: PID: 2593546
21:00:31:WU01:FS01:0x22: CWD: /home/justin/Downloads/work
21:00:31:WU01:FS01:0x22: OS: Linux 5.5.5-050505-generic x86_64
21:00:31:WU01:FS01:0x22: OS Arch: AMD64
21:00:31:WU01:FS01:0x22:********************************************************************************
21:00:31:WU01:FS01:0x22:Project: 11752 (Run 0, Clone 607, Gen 0)
21:00:31:WU01:FS01:0x22:Unit: 0x000000008ca304e75e6a805cc30d94fd
21:00:31:WU01:FS01:0x22:Digital signatures verified
21:00:31:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
21:00:31:WU01:FS01:0x22:Version 0.0.2
21:00:51:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
21:00:51:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:02:00:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
I'm fairly interested in getting this to work in my environment, so I'll play around with it and let you know if I find anything interesting. I do notice that I get assigned a different project/run/clone/gen number here, but I don't know what that means. Could anyone let me know if that means I'm not making a valid comparison?
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Thu Mar 12, 2020 9:26 pm
by bruce
If Containers are anything like Virtual Machines, it should be noted that most VMs don't expose all the features of the GPU so FAH can't use them. You need a dedicated GPU driver.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Thu Mar 12, 2020 9:28 pm
by foldy
@jpalpant: nvidia opencl-devel docker on Ubuntu 18.04 is a good base for running FAH on nvidia GPU
https://hub.docker.com/r/nvidia/opencl
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Thu Mar 12, 2020 9:33 pm
by jpalpant
@foldy yep, I'm using
Code: Select all
FROM nvidia/opencl:runtime-ubuntu18.04
as my base. This image has worked well on other GPU tasks previously, but since I was assigned Core22 it hasn't been able to succeed.
Re: GPU CORE22 0.0.2 coming to FAH - p11737-9 feedback threa
Posted: Thu Mar 12, 2020 9:36 pm
by foldy
sudo apt-get install ocl-icd-libopencl1
sudo apt-get install ocl-icd-opencl-dev
Maybe the nvidia/opencl:devel-ubuntu18.04 is better? Works for me on vast.ai cloud machines.