Ever since core 22 0.0.13 dropped on my machine, some WU's (mostly 13426 project) have some kind of weird bug. The GPUs (GTX 1060 6GB) go off into la-la land and the entire machine freezes. It _usually_ comes back with a message similar to:
Code: Select all
kernel: [3036338.798684] watchdog: BUG: soft lockup - CPU#3 stuck for 92s! [FahCore_22:11492]
Code: Select all
WARNING:WU02:FS00:Detected clock skew (25 mins 00 secs), I/O delay, laptop hibernation or other slowdown noted, adjusting time estimates
Code: Select all
ERROR:Receive error: 110: Connection timed out
Dunno what's going on with these WU's, but it's a mess. This machine doesn't have any power management crap and _was_ working pretty well until the 26th.
Is there anything I can do here to debug? (also, the machine is not running any CPU slots; just the two GPU slots, as it's an otherwise wimpy machine and it's all it can do to keep the GPU's fed.)
System info:
Code: Select all
12:37:45:****************************** FAHClient ******************************
12:37:45: Version: 7.6.9
12:37:45: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:37:45: Copyright: 2020 foldingathome.org
12:37:45: Homepage: https://foldingathome.org/
12:37:45: Date: Apr 17 2020
12:37:45: Time: 18:11:26
12:37:45: Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
12:37:45: Branch: master
12:37:45: Compiler: GNU 8.3.0
12:37:45: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
12:37:45: -funroll-loops -fno-pie
12:37:45: Platform: linux2 4.19.0-5-amd64
12:37:45: Bits: 64
12:37:45: Mode: Release
12:37:45: Args: --child /etc/fahclient/config.xml --run-as fahclient
12:37:45: --pid-file=/var/run/fahclient.pid --daemon
12:37:45: Config: /etc/fahclient/config.xml
12:37:45:******************************** CBang ********************************
12:37:45: Date: Apr 17 2020
12:37:45: Time: 18:10:13
12:37:45: Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
12:37:45: Branch: master
12:37:45: Compiler: GNU 8.3.0
12:37:45: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
12:37:45: -funroll-loops -fno-pie -fPIC
12:37:45: Platform: linux2 4.19.0-5-amd64
12:37:45: Bits: 64
12:37:45: Mode: Release
12:37:45:******************************* System ********************************
12:37:45: CPU: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
12:37:45: CPU ID: GenuineIntel Family 6 Model 23 Stepping 6
12:37:45: CPUs: 4
12:37:45: Memory: 31.41GiB
12:37:45: Free Memory: 30.70GiB
12:37:45: Threads: POSIX_THREADS
12:37:45: OS Version: 4.15
12:37:45: Has Battery: false
12:37:45: On Battery: false
12:37:45: UTC Offset: 0
12:37:45: PID: 1551
12:37:45: CWD: /var/lib/fahclient
12:37:45: OS: Linux 4.15.0-109-generic x86_64
12:37:45: OS Arch: AMD64
12:37:45: GPUs: 2
12:37:45: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
12:37:45: GPU 1: Bus:5 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
12:37:45: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:10.2
12:37:45: CUDA Device 1: Platform:0 Device:1 Bus:5 Slot:0 Compute:6.1 Driver:10.2
12:37:45:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:440.100
12:37:45:OpenCL Device 1: Platform:0 Device:1 Bus:5 Slot:0 Compute:1.2 Driver:440.100
12:37:45:******************************* libFAH ********************************
12:37:45: Date: Apr 15 2020
12:37:45: Time: 21:43:24
12:37:45: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
12:37:45: Branch: master
12:37:45: Compiler: GNU 8.3.0
12:37:45: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
12:37:45: -funroll-loops -fno-pie
12:37:45: Platform: linux2 4.19.0-5-amd64
12:37:45: Bits: 64
12:37:45: Mode: Release
12:37:45:***********************************************************************