Linux Mint 19.2/.3 Freezes During CPU or GPU Folding
Posted: Mon Dec 23, 2019 11:29 pm
I am at wits end debugging since last Thursday. With two complete sets of hardware, two dot revs of Linux Mint 19 (.2 and .3), I cannot fold on CPU or GPU. The system crashes by freezing within a minute or two after un-pausing the folding client. It's a hard crash: no Linux crash logs or error messages in folding log. Sometimes, but not always, the FAHClient is not recoverable; the application needs to be removed and reinstalled because stopping and starting fails on the start.
Right now, I am on Mint 19.2 with nVidia driver 435 -- and I have a third box reliably running this specific combination. Since I cannot get past CPU folding, I have only done sporadic GPU testing on 19.2; gave up on 19.3 after two days thinking I had a hardware problem.
By "two complete sets of hardware" I mean different motherboard, processor, memory, power supply (850 watt), storage. The other set used a Core i3-8350K (4 cores -- not sure about AVX). It's not an overheating problem.
Freeze means: mouse frozen, no keyboard, screen drawing stops (such as system monitor cpu graphs). Power-off to recover.
Verbosity 7 did not reveal anything to me:
Work Log is also ambiguous:
Hardware Details:
Right now, I am on Mint 19.2 with nVidia driver 435 -- and I have a third box reliably running this specific combination. Since I cannot get past CPU folding, I have only done sporadic GPU testing on 19.2; gave up on 19.3 after two days thinking I had a hardware problem.
By "two complete sets of hardware" I mean different motherboard, processor, memory, power supply (850 watt), storage. The other set used a Core i3-8350K (4 cores -- not sure about AVX). It's not an overheating problem.
Freeze means: mouse frozen, no keyboard, screen drawing stops (such as system monitor cpu graphs). Power-off to recover.
Code: Select all
*********************** Log Started 2019-12-23T22:28:26Z ***********************
22:28:26:************************* Folding@home Client *************************
22:28:26: Website: https://foldingathome.org/
22:28:26: Copyright: (c) 2009-2018 foldingathome.org
22:28:26: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:28:26: Args: --child --lifeline 1577 /etc/fahclient/config.xml --run-as
22:28:26: fahclient --pid-file=/var/run/fahclient.pid --daemon
22:28:26: Config: /etc/fahclient/config.xml
22:28:26:******************************** Build ********************************
22:28:26: Version: 7.5.1
22:28:26: Date: May 11 2018
22:28:26: Time: 19:59:04
22:28:26: Repository: Git
22:28:26: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
22:28:26: Branch: master
22:28:26: Compiler: GNU 6.3.0 20170516
22:28:26: Options: -std=gnu++98 -O3 -funroll-loops
22:28:26: Platform: linux2 4.14.0-3-amd64
22:28:26: Bits: 64
22:28:26: Mode: Release
22:28:26:******************************* System ********************************
22:28:26: CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
22:28:26: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
22:28:26: CPUs: 6
22:28:26: Memory: 15.58GiB
22:28:26: Free Memory: 14.79GiB
22:28:26: Threads: POSIX_THREADS
22:28:26: OS Version: 4.15
22:28:26: Has Battery: false
22:28:26: On Battery: false
22:28:26: UTC Offset: -5
22:28:26: PID: 1579
22:28:26: CWD: /var/lib/fahclient
22:28:26: OS: Linux 4.15.0-72-generic x86_64
22:28:26: OS Arch: AMD64
22:28:26: GPUs: 2
22:28:26: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
22:28:26: 13448
22:28:26: GPU 1: Bus:2 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
22:28:26: 13448
22:28:26: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.1
22:28:26: CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:7.5 Driver:10.1
22:28:26:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:435.21
22:28:26:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:435.21
22:28:26:***********************************************************************
22:28:26:<config>
22:28:26: <!-- Client Control -->
22:28:26: <fold-anon v='true'/>
22:28:26:
22:28:26: <!-- HTTP Server -->
22:28:26: <allow v='127.0.0.1 192.168.1.0/24'/>
22:28:26:
22:28:26: <!-- Network -->
22:28:26: <proxy v=':8080'/>
22:28:26:
22:28:26: <!-- Remote Command Server -->
22:28:26: <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
22:28:26:
22:28:26: <!-- User Information -->
22:28:26: <passkey v='********************************'/>
22:28:26: <team v='224497'/>
22:28:26: <user v='Catalina588_ALL_1EMQiByPxuaffjHVyb4RDLXChMkwgWmYUn'/>
22:28:26:
22:28:26: <!-- Folding Slots -->
22:28:26: <slot id='0' type='CPU'>
22:28:26: <paused v='true'/>
22:28:26: </slot>
22:28:26:</config>
22:28:26:Switching to user fahclient
22:28:26:Trying to access database...
22:28:26:Successfully acquired database lock
22:28:26:Enabled folding slot 00: PAUSED cpu:5 (by user)
22:30:57:FS00:Unpaused
22:30:57:WU00:FS00:Starting
22:30:57:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1579 -checkpoint 15 -np 5
22:30:57:WU00:FS00:Started FahCore on PID 2128
22:30:57:WU00:FS00:Core PID:2132
22:30:57:WU00:FS00:FahCore 0xa7 started
22:30:58:WU00:FS00:0xa7:*********************** Log Started 2019-12-23T22:30:57Z ***********************
22:30:58:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
22:30:58:WU00:FS00:0xa7: Type: 0xa7
22:30:58:WU00:FS00:0xa7: Core: Gromacs
22:30:58:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 705 -lifeline 2128 -checkpoint 15 -np 5
22:30:58:WU00:FS00:0xa7:************************************ CBang *************************************
22:30:58:WU00:FS00:0xa7: Date: Nov 5 2019
22:30:58:WU00:FS00:0xa7: Time: 06:06:57
22:30:58:WU00:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:30:58:WU00:FS00:0xa7: Branch: master
22:30:58:WU00:FS00:0xa7: Compiler: GNU 8.3.0
22:30:58:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:30:58:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
22:30:58:WU00:FS00:0xa7: Bits: 64
22:30:58:WU00:FS00:0xa7: Mode: Release
22:30:58:WU00:FS00:0xa7:************************************ System ************************************
22:30:58:WU00:FS00:0xa7: CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
22:30:58:WU00:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
22:30:58:WU00:FS00:0xa7: CPUs: 6
22:30:58:WU00:FS00:0xa7: Memory: 15.58GiB
22:30:58:WU00:FS00:0xa7:Free Memory: 14.25GiB
22:30:58:WU00:FS00:0xa7: Threads: POSIX_THREADS
22:30:58:WU00:FS00:0xa7: OS Version: 4.15
22:30:58:WU00:FS00:0xa7:Has Battery: false
22:30:58:WU00:FS00:0xa7: On Battery: false
22:30:58:WU00:FS00:0xa7: UTC Offset: -5
22:30:58:WU00:FS00:0xa7: PID: 2132
22:30:58:WU00:FS00:0xa7: CWD: /var/lib/fahclient/work
22:30:58:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:30:58:WU00:FS00:0xa7: Version: 0.0.18
22:30:58:WU00:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:30:58:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
22:30:58:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
22:30:58:WU00:FS00:0xa7: Date: Nov 5 2019
22:30:58:WU00:FS00:0xa7: Time: 06:13:26
22:30:58:WU00:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:30:58:WU00:FS00:0xa7: Branch: master
22:30:58:WU00:FS00:0xa7: Compiler: GNU 8.3.0
22:30:58:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:30:58:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
22:30:58:WU00:FS00:0xa7: Bits: 64
22:30:58:WU00:FS00:0xa7: Mode: Release
22:30:58:WU00:FS00:0xa7:************************************ Build *************************************
22:30:58:WU00:FS00:0xa7: SIMD: avx_256
22:30:58:WU00:FS00:0xa7:********************************************************************************
22:30:58:WU00:FS00:0xa7:Project: 14244 (Run 0, Clone 8, Gen 200)
22:30:58:WU00:FS00:0xa7:Unit: 0x000000e580fccb0a5d6ee30e3dd1631a
22:30:58:WU00:FS00:0xa7:Digital signatures verified
22:30:58:WU00:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
22:30:58:WU00:FS00:0xa7:Calling: mdrun -s frame200.tpr -o frame200.trr -x frame200.xtc -cpi state.cpt -cpt 15 -nt 4
22:30:58:WU00:FS00:0xa7:Steps: first=50000000 total=250000
22:30:59:WU00:FS00:0xa7:Completed 514 out of 250000 steps (0%)
Code: Select all
19:30:35:WU01:FS00:0xa7:************************************ Build *************************************
19:30:35:WU01:FS00:0xa7: SIMD: avx_256
19:30:35:WU01:FS00:0xa7:********************************************************************************
19:30:35:WU01:FS00:0xa7:Project: 14182 (Run 6, Clone 97, Gen 31)
19:30:35:WU01:FS00:0xa7:Unit: 0x000000240002894b5cf684c40d09b6c2
19:30:35:WU01:FS00:0xa7:Digital signatures verified
19:30:35:WU01:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
19:30:35:WU01:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -cpt 15 -nt 4
19:30:35:WU01:FS00:0xa7:Steps: first=77500000 total=2500000
19:30:36:WU01:FS00:0xa7:Completed 1 out of 2500000 steps (0%)
19:30:45:FS00:Paused
19:30:45:FS00:Shutting core down
19:30:45:WU01:FS00:0xa7:Caught signal SIGINT(2) on PID 1599
19:30:45:WU01:FS00:0xa7:Exiting, please wait. . .
19:30:45:WU01:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
19:30:45:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
19:31:36:Saving configuration to /etc/fahclient/config.xml
Code: Select all
21:48:29:WU00:FS00:0xa7:************************************ Build *************************************
21:48:29:WU00:FS00:0xa7: SIMD: avx_256
21:48:29:WU00:FS00:0xa7:********************************************************************************
21:48:29:WU00:FS00:0xa7:Project: 13831 (Run 654, Clone 1, Gen 87)
21:48:29:WU00:FS00:0xa7:Unit: 0x0000006880fccb095d693ab077870ab3
21:48:29:WU00:FS00:0xa7:Digital signatures verified
21:48:29:WU00:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
21:48:29:WU00:FS00:0xa7:Calling: mdrun -s frame87.tpr -o frame87.trr -x frame87.xtc -cpt 15 -nt 4
21:48:29:WU00:FS00:0xa7:Steps: first=10875000 total=125000
21:48:32:WU00:FS00:0xa7:Completed 1 out of 125000 steps (0%)
21:48:42:Started thread 12 on PID 1642
21:48:42:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
21:48:46:FS00:Paused
21:48:46:FS00:Shutting core down
21:48:47:WU00:FS00:0xa7:Caught signal SIGINT(2) on PID 1656
21:48:47:WU00:FS00:0xa7:Exiting, please wait. . .
21:48:49:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
21:48:50:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Code: Select all
System: Host: white Kernel: 4.15.0-72-generic x86_64 bits: 64 compiler: gcc v: 7.4.0
Desktop: Cinnamon 4.2.4 wm: muffin dm: LightDM Distro: Linux Mint 19.2 Tina
base: Ubuntu 18.04 bionic
Machine: Type: Desktop Mobo: ASRock model: Z390 Phantom Gaming 4 serial: <filter>
UEFI: American Megatrends v: P4.30 date: 08/07/2019
CPU: Topology: 6-Core model: Intel Core i5-9400F bits: 64 type: MCP arch: Kaby Lake rev: A
L2 cache: 9216 KiB
flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 34848
Speed: 801 MHz min/max: 800/4100 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800
5: 800 6: 800
Graphics: Device-1: NVIDIA driver: nvidia v: 435.21 bus ID: 01:00.0 chip ID: 10de:1e04
Device-2: NVIDIA vendor: ASUSTeK driver: nvidia v: 435.21 bus ID: 02:00.0
chip ID: 10de:1e04
Display: x11 server: X.Org 1.19.6 driver: modesetting,nouveau,nvidia
unloaded: fbdev,vesa resolution: 1920x1080~60Hz
OpenGL: renderer: GeForce RTX 2080 Ti/PCIe/SSE2 v: 4.6.0 NVIDIA 435.21
direct render: Yes
Audio: Device-1: Intel Cannon Lake PCH cAVS vendor: ASRock driver: snd_hda_intel v: kernel
bus ID: 00:1f.3 chip ID: 8086:a348
Device-2: NVIDIA driver: snd_hda_intel v: kernel bus ID: 01:00.1 chip ID: 10de:10f7
Device-3: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel bus ID: 02:00.1
chip ID: 10de:10f7
Sound Server: ALSA v: k4.15.0-72-generic
Network: Device-1: Intel Ethernet I219-V vendor: ASRock driver: e1000e v: 3.2.6-k port: efa0
bus ID: 00:1f.6 chip ID: 8086:15bc
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Micro Star type: USB driver: rt2800usb bus ID: 1-9:4 chip ID: 0db0:3871
IF: wlx0008ca315cfd state: up mac: <filter>
Drives: Local Storage: total: 223.57 GiB used: 20.71 GiB (9.3%)
ID-1: /dev/sda vendor: Kingston model: SA400M8240G size: 223.57 GiB speed: 6.0 Gb/s
serial: <filter>
Partition: ID-1: / size: 217.61 GiB used: 20.70 GiB (9.5%) fs: ext4 dev: /dev/dm-0
ID-2: swap-1 size: 976.0 MiB used: 0 KiB (0.0%) fs: swap dev: /dev/dm-1
Sensors: System Temperatures: cpu: 42.0 C mobo: N/A gpu: nvidia temp: 31 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 30%
Repos: No active apt repos in: /etc/apt/sources.list
Active apt repos in: /etc/apt/sources.list.d/chrome-remote-desktop.list
Info: Processes: 272 Uptime: 18m Memory: 15.58 GiB used: 1.42 GiB (9.1%) Init: systemd v: 237
runlevel: 5 Compilers: gcc: 7.4.0 alt: 7 Client: Unknown python3.6 client inxi: 3.0.32