Linux CPU stall with corresponding "Clock Skew" warning.
Posted: Wed Sep 30, 2020 1:24 pm
Hey folks,
Ever since core 22 0.0.13 dropped on my machine, some WU's (mostly 13426 project) have some kind of weird bug. The GPUs (GTX 1060 6GB) go off into la-la land and the entire machine freezes. It _usually_ comes back with a message similar to:
and a corresponding message in the fahclient log like:
Sometimes, if it's a short stall, it only prints a
from losing the fahcontrol app socket from a different system.
Dunno what's going on with these WU's, but it's a mess. This machine doesn't have any power management crap and _was_ working pretty well until the 26th.
Is there anything I can do here to debug? (also, the machine is not running any CPU slots; just the two GPU slots, as it's an otherwise wimpy machine and it's all it can do to keep the GPU's fed.)
System info:
Ever since core 22 0.0.13 dropped on my machine, some WU's (mostly 13426 project) have some kind of weird bug. The GPUs (GTX 1060 6GB) go off into la-la land and the entire machine freezes. It _usually_ comes back with a message similar to:
Code: Select all
kernel: [3036338.798684] watchdog: BUG: soft lockup - CPU#3 stuck for 92s! [FahCore_22:11492]
Code: Select all
WARNING:WU02:FS00:Detected clock skew (25 mins 00 secs), I/O delay, laptop hibernation or other slowdown noted, adjusting time estimates
Code: Select all
ERROR:Receive error: 110: Connection timed out
Dunno what's going on with these WU's, but it's a mess. This machine doesn't have any power management crap and _was_ working pretty well until the 26th.
Is there anything I can do here to debug? (also, the machine is not running any CPU slots; just the two GPU slots, as it's an otherwise wimpy machine and it's all it can do to keep the GPU's fed.)
System info:
Code: Select all
12:37:45:****************************** FAHClient ******************************
12:37:45: Version: 7.6.9
12:37:45: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:37:45: Copyright: 2020 foldingathome.org
12:37:45: Homepage: https://foldingathome.org/
12:37:45: Date: Apr 17 2020
12:37:45: Time: 18:11:26
12:37:45: Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
12:37:45: Branch: master
12:37:45: Compiler: GNU 8.3.0
12:37:45: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
12:37:45: -funroll-loops -fno-pie
12:37:45: Platform: linux2 4.19.0-5-amd64
12:37:45: Bits: 64
12:37:45: Mode: Release
12:37:45: Args: --child /etc/fahclient/config.xml --run-as fahclient
12:37:45: --pid-file=/var/run/fahclient.pid --daemon
12:37:45: Config: /etc/fahclient/config.xml
12:37:45:******************************** CBang ********************************
12:37:45: Date: Apr 17 2020
12:37:45: Time: 18:10:13
12:37:45: Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
12:37:45: Branch: master
12:37:45: Compiler: GNU 8.3.0
12:37:45: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
12:37:45: -funroll-loops -fno-pie -fPIC
12:37:45: Platform: linux2 4.19.0-5-amd64
12:37:45: Bits: 64
12:37:45: Mode: Release
12:37:45:******************************* System ********************************
12:37:45: CPU: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
12:37:45: CPU ID: GenuineIntel Family 6 Model 23 Stepping 6
12:37:45: CPUs: 4
12:37:45: Memory: 31.41GiB
12:37:45: Free Memory: 30.70GiB
12:37:45: Threads: POSIX_THREADS
12:37:45: OS Version: 4.15
12:37:45: Has Battery: false
12:37:45: On Battery: false
12:37:45: UTC Offset: 0
12:37:45: PID: 1551
12:37:45: CWD: /var/lib/fahclient
12:37:45: OS: Linux 4.15.0-109-generic x86_64
12:37:45: OS Arch: AMD64
12:37:45: GPUs: 2
12:37:45: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
12:37:45: GPU 1: Bus:5 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
12:37:45: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:10.2
12:37:45: CUDA Device 1: Platform:0 Device:1 Bus:5 Slot:0 Compute:6.1 Driver:10.2
12:37:45:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:440.100
12:37:45:OpenCL Device 1: Platform:0 Device:1 Bus:5 Slot:0 Compute:1.2 Driver:440.100
12:37:45:******************************* libFAH ********************************
12:37:45: Date: Apr 15 2020
12:37:45: Time: 21:43:24
12:37:45: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
12:37:45: Branch: master
12:37:45: Compiler: GNU 8.3.0
12:37:45: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
12:37:45: -funroll-loops -fno-pie
12:37:45: Platform: linux2 4.19.0-5-amd64
12:37:45: Bits: 64
12:37:45: Mode: Release
12:37:45:***********************************************************************