Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Moderators: Site Moderators, FAHC Science Team

Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Catalina588 »

I am at wits end debugging since last Thursday. With two complete sets of hardware, two dot revs of Linux Mint 19 (.2 and .3), I cannot fold on CPU or GPU. The system crashes by freezing within a minute or two after un-pausing the folding client. It's a hard crash: no Linux crash logs or error messages in folding log. Sometimes, but not always, the FAHClient is not recoverable; the application needs to be removed and reinstalled because stopping and starting fails on the start.

Right now, I am on Mint 19.2 with nVidia driver 435 -- and I have a third box reliably running this specific combination. Since I cannot get past CPU folding, I have only done sporadic GPU testing on 19.2; gave up on 19.3 after two days thinking I had a hardware problem.

By "two complete sets of hardware" I mean different motherboard, processor, memory, power supply (850 watt), storage. The other set used a Core i3-8350K (4 cores -- not sure about AVX). It's not an overheating problem.

Freeze means: mouse frozen, no keyboard, screen drawing stops (such as system monitor cpu graphs). Power-off to recover.

Code: Select all

*********************** Log Started 2019-12-23T22:28:26Z ***********************
22:28:26:************************* Folding@home Client *************************
22:28:26:        Website: https://foldingathome.org/
22:28:26:      Copyright: (c) 2009-2018 foldingathome.org
22:28:26:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:28:26:           Args: --child --lifeline 1577 /etc/fahclient/config.xml --run-as
22:28:26:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
22:28:26:         Config: /etc/fahclient/config.xml
22:28:26:******************************** Build ********************************
22:28:26:        Version: 7.5.1
22:28:26:           Date: May 11 2018
22:28:26:           Time: 19:59:04
22:28:26:     Repository: Git
22:28:26:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
22:28:26:         Branch: master
22:28:26:       Compiler: GNU 6.3.0 20170516
22:28:26:        Options: -std=gnu++98 -O3 -funroll-loops
22:28:26:       Platform: linux2 4.14.0-3-amd64
22:28:26:           Bits: 64
22:28:26:           Mode: Release
22:28:26:******************************* System ********************************
22:28:26:            CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
22:28:26:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
22:28:26:           CPUs: 6
22:28:26:         Memory: 15.58GiB
22:28:26:    Free Memory: 14.79GiB
22:28:26:        Threads: POSIX_THREADS
22:28:26:     OS Version: 4.15
22:28:26:    Has Battery: false
22:28:26:     On Battery: false
22:28:26:     UTC Offset: -5
22:28:26:            PID: 1579
22:28:26:            CWD: /var/lib/fahclient
22:28:26:             OS: Linux 4.15.0-72-generic x86_64
22:28:26:        OS Arch: AMD64
22:28:26:           GPUs: 2
22:28:26:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
22:28:26:                 13448
22:28:26:          GPU 1: Bus:2 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
22:28:26:                 13448
22:28:26:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.1
22:28:26:  CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:7.5 Driver:10.1
22:28:26:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:435.21
22:28:26:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:435.21
22:28:26:***********************************************************************
22:28:26:<config>
22:28:26:  <!-- Client Control -->
22:28:26:  <fold-anon v='true'/>
22:28:26:
22:28:26:  <!-- HTTP Server -->
22:28:26:  <allow v='127.0.0.1 192.168.1.0/24'/>
22:28:26:
22:28:26:  <!-- Network -->
22:28:26:  <proxy v=':8080'/>
22:28:26:
22:28:26:  <!-- Remote Command Server -->
22:28:26:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
22:28:26:
22:28:26:  <!-- User Information -->
22:28:26:  <passkey v='********************************'/>
22:28:26:  <team v='224497'/>
22:28:26:  <user v='Catalina588_ALL_1EMQiByPxuaffjHVyb4RDLXChMkwgWmYUn'/>
22:28:26:
22:28:26:  <!-- Folding Slots -->
22:28:26:  <slot id='0' type='CPU'>
22:28:26:    <paused v='true'/>
22:28:26:  </slot>
22:28:26:</config>
22:28:26:Switching to user fahclient
22:28:26:Trying to access database...
22:28:26:Successfully acquired database lock
22:28:26:Enabled folding slot 00: PAUSED cpu:5 (by user)
22:30:57:FS00:Unpaused
22:30:57:WU00:FS00:Starting
22:30:57:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1579 -checkpoint 15 -np 5
22:30:57:WU00:FS00:Started FahCore on PID 2128
22:30:57:WU00:FS00:Core PID:2132
22:30:57:WU00:FS00:FahCore 0xa7 started
22:30:58:WU00:FS00:0xa7:*********************** Log Started 2019-12-23T22:30:57Z ***********************
22:30:58:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
22:30:58:WU00:FS00:0xa7:       Type: 0xa7
22:30:58:WU00:FS00:0xa7:       Core: Gromacs
22:30:58:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 2128 -checkpoint 15 -np 5
22:30:58:WU00:FS00:0xa7:************************************ CBang *************************************
22:30:58:WU00:FS00:0xa7:       Date: Nov 5 2019
22:30:58:WU00:FS00:0xa7:       Time: 06:06:57
22:30:58:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:30:58:WU00:FS00:0xa7:     Branch: master
22:30:58:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:30:58:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:30:58:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:30:58:WU00:FS00:0xa7:       Bits: 64
22:30:58:WU00:FS00:0xa7:       Mode: Release
22:30:58:WU00:FS00:0xa7:************************************ System ************************************
22:30:58:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
22:30:58:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
22:30:58:WU00:FS00:0xa7:       CPUs: 6
22:30:58:WU00:FS00:0xa7:     Memory: 15.58GiB
22:30:58:WU00:FS00:0xa7:Free Memory: 14.25GiB
22:30:58:WU00:FS00:0xa7:    Threads: POSIX_THREADS
22:30:58:WU00:FS00:0xa7: OS Version: 4.15
22:30:58:WU00:FS00:0xa7:Has Battery: false
22:30:58:WU00:FS00:0xa7: On Battery: false
22:30:58:WU00:FS00:0xa7: UTC Offset: -5
22:30:58:WU00:FS00:0xa7:        PID: 2132
22:30:58:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
22:30:58:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:30:58:WU00:FS00:0xa7:    Version: 0.0.18
22:30:58:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:30:58:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
22:30:58:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
22:30:58:WU00:FS00:0xa7:       Date: Nov 5 2019
22:30:58:WU00:FS00:0xa7:       Time: 06:13:26
22:30:58:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:30:58:WU00:FS00:0xa7:     Branch: master
22:30:58:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:30:58:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:30:58:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:30:58:WU00:FS00:0xa7:       Bits: 64
22:30:58:WU00:FS00:0xa7:       Mode: Release
22:30:58:WU00:FS00:0xa7:************************************ Build *************************************
22:30:58:WU00:FS00:0xa7:       SIMD: avx_256
22:30:58:WU00:FS00:0xa7:********************************************************************************
22:30:58:WU00:FS00:0xa7:Project: 14244 (Run 0, Clone 8, Gen 200)
22:30:58:WU00:FS00:0xa7:Unit: 0x000000e580fccb0a5d6ee30e3dd1631a
22:30:58:WU00:FS00:0xa7:Digital signatures verified
22:30:58:WU00:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
22:30:58:WU00:FS00:0xa7:Calling: mdrun -s frame200.tpr -o frame200.trr -x frame200.xtc -cpi state.cpt -cpt 15 -nt 4
22:30:58:WU00:FS00:0xa7:Steps: first=50000000 total=250000
22:30:59:WU00:FS00:0xa7:Completed 514 out of 250000 steps (0%)
Verbosity 7 did not reveal anything to me:

Code: Select all

19:30:35:WU01:FS00:0xa7:************************************ Build *************************************
19:30:35:WU01:FS00:0xa7:       SIMD: avx_256
19:30:35:WU01:FS00:0xa7:********************************************************************************
19:30:35:WU01:FS00:0xa7:Project: 14182 (Run 6, Clone 97, Gen 31)
19:30:35:WU01:FS00:0xa7:Unit: 0x000000240002894b5cf684c40d09b6c2
19:30:35:WU01:FS00:0xa7:Digital signatures verified
19:30:35:WU01:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
19:30:35:WU01:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -cpt 15 -nt 4
19:30:35:WU01:FS00:0xa7:Steps: first=77500000 total=2500000
19:30:36:WU01:FS00:0xa7:Completed 1 out of 2500000 steps (0%)
19:30:45:FS00:Paused
19:30:45:FS00:Shutting core down
19:30:45:WU01:FS00:0xa7:Caught signal SIGINT(2) on PID 1599
19:30:45:WU01:FS00:0xa7:Exiting, please wait. . .
19:30:45:WU01:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
19:30:45:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
19:31:36:Saving configuration to /etc/fahclient/config.xml
Work Log is also ambiguous:

Code: Select all

21:48:29:WU00:FS00:0xa7:************************************ Build *************************************
21:48:29:WU00:FS00:0xa7:       SIMD: avx_256
21:48:29:WU00:FS00:0xa7:********************************************************************************
21:48:29:WU00:FS00:0xa7:Project: 13831 (Run 654, Clone 1, Gen 87)
21:48:29:WU00:FS00:0xa7:Unit: 0x0000006880fccb095d693ab077870ab3
21:48:29:WU00:FS00:0xa7:Digital signatures verified
21:48:29:WU00:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
21:48:29:WU00:FS00:0xa7:Calling: mdrun -s frame87.tpr -o frame87.trr -x frame87.xtc -cpt 15 -nt 4
21:48:29:WU00:FS00:0xa7:Steps: first=10875000 total=125000
21:48:32:WU00:FS00:0xa7:Completed 1 out of 125000 steps (0%)
21:48:42:Started thread 12 on PID 1642
21:48:42:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
21:48:46:FS00:Paused
21:48:46:FS00:Shutting core down
21:48:47:WU00:FS00:0xa7:Caught signal SIGINT(2) on PID 1656
21:48:47:WU00:FS00:0xa7:Exiting, please wait. . .
21:48:49:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
21:48:50:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Hardware Details:

Code: Select all

System:    Host: white Kernel: 4.15.0-72-generic x86_64 bits: 64 compiler: gcc v: 7.4.0 
           Desktop: Cinnamon 4.2.4 wm: muffin dm: LightDM Distro: Linux Mint 19.2 Tina 
           base: Ubuntu 18.04 bionic 
Machine:   Type: Desktop Mobo: ASRock model: Z390 Phantom Gaming 4 serial: <filter> 
           UEFI: American Megatrends v: P4.30 date: 08/07/2019 
CPU:       Topology: 6-Core model: Intel Core i5-9400F bits: 64 type: MCP arch: Kaby Lake rev: A 
           L2 cache: 9216 KiB 
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 34848 
           Speed: 801 MHz min/max: 800/4100 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 
           5: 800 6: 800 
Graphics:  Device-1: NVIDIA driver: nvidia v: 435.21 bus ID: 01:00.0 chip ID: 10de:1e04 
           Device-2: NVIDIA vendor: ASUSTeK driver: nvidia v: 435.21 bus ID: 02:00.0 
           chip ID: 10de:1e04 
           Display: x11 server: X.Org 1.19.6 driver: modesetting,nouveau,nvidia 
           unloaded: fbdev,vesa resolution: 1920x1080~60Hz 
           OpenGL: renderer: GeForce RTX 2080 Ti/PCIe/SSE2 v: 4.6.0 NVIDIA 435.21 
           direct render: Yes 
Audio:     Device-1: Intel Cannon Lake PCH cAVS vendor: ASRock driver: snd_hda_intel v: kernel 
           bus ID: 00:1f.3 chip ID: 8086:a348 
           Device-2: NVIDIA driver: snd_hda_intel v: kernel bus ID: 01:00.1 chip ID: 10de:10f7 
           Device-3: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel bus ID: 02:00.1 
           chip ID: 10de:10f7 
           Sound Server: ALSA v: k4.15.0-72-generic 
Network:   Device-1: Intel Ethernet I219-V vendor: ASRock driver: e1000e v: 3.2.6-k port: efa0 
           bus ID: 00:1f.6 chip ID: 8086:15bc 
           IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           Device-2: Micro Star type: USB driver: rt2800usb bus ID: 1-9:4 chip ID: 0db0:3871 
           IF: wlx0008ca315cfd state: up mac: <filter> 
Drives:    Local Storage: total: 223.57 GiB used: 20.71 GiB (9.3%) 
           ID-1: /dev/sda vendor: Kingston model: SA400M8240G size: 223.57 GiB speed: 6.0 Gb/s 
           serial: <filter> 
Partition: ID-1: / size: 217.61 GiB used: 20.70 GiB (9.5%) fs: ext4 dev: /dev/dm-0 
           ID-2: swap-1 size: 976.0 MiB used: 0 KiB (0.0%) fs: swap dev: /dev/dm-1 
Sensors:   System Temperatures: cpu: 42.0 C mobo: N/A gpu: nvidia temp: 31 C 
           Fan Speeds (RPM): N/A gpu: nvidia fan: 30% 
Repos:     No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/chrome-remote-desktop.list 
           
Info:      Processes: 272 Uptime: 18m Memory: 15.58 GiB used: 1.42 GiB (9.1%) Init: systemd v: 237 
           runlevel: 5 Compilers: gcc: 7.4.0 alt: 7 Client: Unknown python3.6 client inxi: 3.0.32 
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by bruce »

Let's simplify the process. First, pause both GPUs and un=pause the CPU slot. I expect it will work fine. The log shows it did start and made a little progress already.

Changing Verbosity just makes debugging more difficult. We much prefer the default setting.

Second, the fact that it runs for a minute or two after un-pausing before freezing suggests that maybe it's an issue with heat or power on the GPUs. After you're certain that the system is stable with just the CPU slot folding, try un-pausing only 1 of the GPUs. If that's successful, pause it and try the other one.

What power supply do you have?
Are the fans working on the GPUs?

(To manage individual slots, on the "Status" screen of FAHControl, right-click on the device in "Folding Slots")
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Catalina588 »

Problem #1 is I cannot cpu fold. Once we get past that, we can look at gpu folding. What you looked at last night was enabled for gpu but no gpu slots were added or running.

Today, cpu simplified

Changes since last night's post:
- remove FAHclient. It had become non-responsive.
- remove one gpu. One remains as I need it to boot with the Intel 9400F (no integrated gpu)
- reinstall FAHclient with absolutely stock configuration (e.g., user=anonymous) which defaults to cpu folding only.
- Mint Linux 19.2 Cinnamon (3D=uses gpu) switched to Mate, a light, 2D desktop
- Note: ASRock Z390 latest 4.30 BIOS has latest Intel microcode and ME changes. No overclocking.

Log

Code: Select all

*********************** Log Started 2019-12-24T18:55:38Z ***********************

18:55:38:************************* Folding@home Client *************************
18:55:38:        Website: https://foldingathome.org/
18:55:38:      Copyright: (c) 2009-2018 foldingathome.org
18:55:38:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:55:38:           Args: --child --lifeline 1954 /etc/fahclient/config.xml --run-as
18:55:38:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
18:55:38:         Config: /etc/fahclient/config.xml
18:55:38:******************************** Build ********************************
18:55:38:        Version: 7.5.1
18:55:38:           Date: May 11 2018
18:55:38:           Time: 19:59:04
18:55:38:     Repository: Git
18:55:38:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
18:55:38:         Branch: master
18:55:38:       Compiler: GNU 6.3.0 20170516
18:55:38:        Options: -std=gnu++98 -O3 -funroll-loops
18:55:38:       Platform: linux2 4.14.0-3-amd64
18:55:38:           Bits: 64
18:55:38:           Mode: Release
18:55:38:******************************* System ********************************
18:55:38:            CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
18:55:38:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
18:55:38:           CPUs: 6
18:55:38:         Memory: 15.58GiB
18:55:38:    Free Memory: 14.55GiB
18:55:38:        Threads: POSIX_THREADS
18:55:38:     OS Version: 4.15
18:55:38:    Has Battery: false
18:55:38:     On Battery: false
18:55:38:     UTC Offset: -5
18:55:38:            PID: 1956
18:55:38:            CWD: /var/lib/fahclient
18:55:38:             OS: Linux 4.15.0-72-generic x86_64
18:55:38:        OS Arch: AMD64
18:55:38:           GPUs: 0
18:55:38:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.1
18:55:38:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:435.21
18:55:38:***********************************************************************
18:55:38:<config>
18:55:38:  <!-- Client Control -->
18:55:38:  <fold-anon v='true'/>
18:55:38:
18:55:38:  <!-- Folding Slot Configuration -->
18:55:38:  <gpu v='false'/>
18:55:38:
18:55:38:  <!-- Folding Slots -->
18:55:38:  <slot id='0' type='CPU'/>
18:55:38:</config>
18:55:38:Switching to user fahclient
18:55:38:Trying to access database...
18:55:38:Successfully acquired database lock
18:55:38:Enabled folding slot 00: READY cpu:5
18:55:38:WU00:FS00:Starting
18:55:38:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1956 -checkpoint 15 -np 5
18:55:38:WU00:FS00:Started FahCore on PID 1966
18:55:38:WU00:FS00:Core PID:1970
18:55:38:WU00:FS00:FahCore 0xa7 started
18:55:39:WU00:FS00:0xa7:*********************** Log Started 2019-12-24T18:55:38Z ***********************
18:55:39:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
18:55:39:WU00:FS00:0xa7:       Type: 0xa7
18:55:39:WU00:FS00:0xa7:       Core: Gromacs
18:55:39:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 1966 -checkpoint 15 -np 5
18:55:39:WU00:FS00:0xa7:************************************ CBang *************************************
18:55:39:WU00:FS00:0xa7:       Date: Nov 5 2019
18:55:39:WU00:FS00:0xa7:       Time: 06:06:57
18:55:39:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
18:55:39:WU00:FS00:0xa7:     Branch: master
18:55:39:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
18:55:39:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
18:55:39:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
18:55:39:WU00:FS00:0xa7:       Bits: 64
18:55:39:WU00:FS00:0xa7:       Mode: Release
18:55:39:WU00:FS00:0xa7:************************************ System ************************************
18:55:39:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
18:55:39:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
18:55:39:WU00:FS00:0xa7:       CPUs: 6
18:55:39:WU00:FS00:0xa7:     Memory: 15.58GiB
18:55:39:WU00:FS00:0xa7:Free Memory: 14.53GiB
18:55:39:WU00:FS00:0xa7:    Threads: POSIX_THREADS
18:55:39:WU00:FS00:0xa7: OS Version: 4.15
18:55:39:WU00:FS00:0xa7:Has Battery: false
18:55:39:WU00:FS00:0xa7: On Battery: false
18:55:39:WU00:FS00:0xa7: UTC Offset: -5
18:55:39:WU00:FS00:0xa7:        PID: 1970
18:55:39:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
18:55:39:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
18:55:39:WU00:FS00:0xa7:    Version: 0.0.18
18:55:39:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:55:39:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
18:55:39:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
18:55:39:WU00:FS00:0xa7:       Date: Nov 5 2019
18:55:39:WU00:FS00:0xa7:       Time: 06:13:26
18:55:39:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
18:55:39:WU00:FS00:0xa7:     Branch: master
18:55:39:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
18:55:39:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
18:55:39:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
18:55:39:WU00:FS00:0xa7:       Bits: 64
18:55:39:WU00:FS00:0xa7:       Mode: Release
18:55:39:WU00:FS00:0xa7:************************************ Build *************************************
18:55:39:WU00:FS00:0xa7:       SIMD: avx_256
18:55:39:WU00:FS00:0xa7:********************************************************************************
18:55:39:WU00:FS00:0xa7:Project: 14308 (Run 1, Clone 3, Gen 29)
18:55:39:WU00:FS00:0xa7:Unit: 0x000000230002894b5dd6cbdc87c025bc
18:55:39:WU00:FS00:0xa7:Digital signatures verified
18:55:39:WU00:FS00:0xa7:Reducing thread count from 5 to 4 to avoid domain decomposition by a prime number > 3
18:55:39:WU00:FS00:0xa7:Calling: mdrun -s frame29.tpr -o frame29.trr -cpt 15 -nt 4
18:55:39:WU00:FS00:0xa7:Steps: first=14500000 total=500000
18:55:39:WU00:FS00:0xa7:Completed 1 out of 500000 steps (0%)
18:55:44:FS00:Paused
18:55:44:FS00:Shutting core down
18:55:44:WU00:FS00:0xa7:Caught signal SIGINT(2) on PID 1970
18:55:44:WU00:FS00:0xa7:Exiting, please wait. . .
18:55:45:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
18:55:45:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
18:56:39:Saving configuration to /etc/fahclient/config.xml
18:56:39:<config>
18:56:39:  <!-- Client Control -->
18:56:39:  <fold-anon v='true'/>
18:56:39:
18:56:39:  <!-- Folding Slot Configuration -->
18:56:39:  <gpu v='false'/>
18:56:39:
18:56:39:  <!-- Folding Slots -->
18:56:39:  <slot id='0' type='CPU'>
18:56:39:    <paused v='true'/>
18:56:39:  </slot>
18:56:39:</config>
Results
- Runs for hours with FAHClient paused
- After about 100 seconds when client unpaused, freezes desktop with no Alt-F2 recovery. Mouse and keyboard frozen. Must reboot.
- Psensor reports 4 cores ramp to 100C but no higher; 2 cores ~90C. Using stock cooler on 65-watt 9400F cpu. [I admit 100C is eyebrow-raising, but Intel should control/throttle thermals to its liking.]

Next Steps
While waiting for a reply, I'm going to remove cpu folding and try gpu folding on one known-good card.

PS Anyone who wants to chime in with a known good distro and driver version is welcome to chime in. My other rigs are running Mint 19.1, 19.2 and nVidia 430 and 435 (and Windows 10).
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by toTOW »

Are you sure that your motherboard accelerates the CPU fan ? What happens if you try to run something else on the CPU (benchmark or stresstest) ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by bruce »

1) Apparently you didn't install FAHControl. You shouldn't have to remove the CPU slots
2) If CPU folding causes a hardware crash in a few minutes. your stock cooler isn't doing the job it was intended to do. My guess is that it needs a re-application of thermal grease. I don't know how comfortable you are when working inside a computer. Many people would do that themselves, but if you're unsure, you probably need techincal help. Please confirm that the CPU fan is turning.
3) Any new installation will attempt to find a nVidia and/or AMD GPU (excluding integrated GPUs) with OpenCL plus the associated proprietary video drivers. If none are found, you can only fold with the CPU or you can configure such device(s) and drivers. If you do have that GPU hardware, it won't with the default Linux drivers. You have to upgrade them.

As you say, the 9400F does not have an iGPU. How are you displaying your video?
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Catalina588 »

As you say, the 9400F does not have an iGPU. How are you displaying your video?
With a gpu, an nVidia 2080Ti and 435.21 drivers. While cpu folding, the gpu loafs along according to Psensor.
Apparently you didn't install FAHControl. You shouldn't have to remove the CPU slots
Yes, I always use FAHControl. The only way I know of to stop a gpu client is to pause it or remove it in FAHControl configuration.
My guess is that it needs a re-application of thermal grease.
Possible, and I can do that. It's a new build with factory paste that has only a few hours of burn-in time. It dies with gpu-only folding as well, see below.
Any new installation will attempt to find a nVidia and/or AMD GPU (excluding integrated GPUs) with OpenCL plus the associated proprietary video drivers.
That conflicts with my experience that 7.3.5 Linux installs with gpu=false even if nVidia drivers and ocl-icd-opencl-dev are already installed. I've routinely had to remove gpu=false to fold gpus on Linux. What am I doing wrong?
Power Supply
Right now, 850 watt Thermaltake bronze. I swapped that in for a Corsair 850 watt.
Are you sure that your motherboard accelerates the CPU fan ? What happens if you try to run something else on the CPU (benchmark or stresstest) ?
Yes, I can see it accelerate and hear it as well. A cpu benchmark is a good idea.
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Catalina588 »

No CPU, One GPU

Got to 23%/23 minutes on a gpu work unit project 11714 before black-screen freeze. To compare, cpu freeze displays frozen desktop.

CPU temps show five cores in the 80C range and one at 93C. Yikes!

Log

Code: Select all

*********************** Log Started 2019-12-24T19:36:07Z ***********************
19:36:07:************************* Folding@home Client *************************
19:36:07:        Website: https://foldingathome.org/
19:36:07:      Copyright: (c) 2009-2018 foldingathome.org
19:36:07:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:36:07:           Args: --child --lifeline 1957 /etc/fahclient/config.xml --run-as
19:36:07:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
19:36:07:         Config: /etc/fahclient/config.xml
19:36:07:******************************** Build ********************************
19:36:07:        Version: 7.5.1
19:36:07:           Date: May 11 2018
19:36:07:           Time: 19:59:04
19:36:07:     Repository: Git
19:36:07:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
19:36:07:         Branch: master
19:36:07:       Compiler: GNU 6.3.0 20170516
19:36:07:        Options: -std=gnu++98 -O3 -funroll-loops
19:36:07:       Platform: linux2 4.14.0-3-amd64
19:36:07:           Bits: 64
19:36:07:           Mode: Release
19:36:07:******************************* System ********************************
19:36:07:            CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
19:36:07:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:36:07:           CPUs: 6
19:36:07:         Memory: 15.58GiB
19:36:07:    Free Memory: 14.27GiB
19:36:07:        Threads: POSIX_THREADS
19:36:07:     OS Version: 4.15
19:36:07:    Has Battery: false
19:36:07:     On Battery: false
19:36:07:     UTC Offset: -5
19:36:07:            PID: 1959
19:36:07:            CWD: /var/lib/fahclient
19:36:07:             OS: Linux 4.15.0-72-generic x86_64
19:36:07:        OS Arch: AMD64
19:36:07:           GPUs: 1
19:36:07:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
19:36:07:                 13448
19:36:07:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:10.1
19:36:07:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:435.21
19:36:07:***********************************************************************
19:36:07:<config>
19:36:07:  <!-- Client Control -->
19:36:07:  <fold-anon v='true'/>
19:36:07:
19:36:07:  <!-- HTTP Server -->
19:36:07:  <allow v='127.0.0.1 192.168.1.0/24'/>
19:36:07:
19:36:07:  <!-- Network -->
19:36:07:  <proxy v=':8080'/>
19:36:07:
19:36:07:  <!-- Remote Command Server -->
19:36:07:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
19:36:07:
19:36:07:  <!-- User Information -->
19:36:07:  <passkey v='********************************'/>
19:36:07:  <team v='224497'/>
19:36:07:  <user v='Catalina588_'/>
19:36:07:
19:36:07:  <!-- Folding Slots -->
19:36:07:  <slot id='0' type='CPU'>
19:36:07:    <paused v='true'/>
19:36:07:  </slot>
19:36:07:</config>
19:36:07:Switching to user fahclient
19:36:07:Trying to access database...
19:36:07:Successfully acquired database lock
19:36:07:Enabled folding slot 00: PAUSED cpu:5 (by user)
19:36:31:Adding folding slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti] M 13448
19:36:31:Saving configuration to /etc/fahclient/config.xml
19:36:31:<config>

19:36:31:</config>
[93m19:36:31:WARNING:WU00:Slot ID 0 no longer exists and there are no other matching slots, dumping[0m
19:36:31:WU00:Sending unit results: id:00 state:SEND error:DUMPED project:14308 run:1 clone:3 gen:29 core:0xa7 unit:0x000000230002894b5dd6cbdc87c025bc
19:36:31:WU00:Connecting to 155.247.166.219:8080
19:36:32:WU00:Server responded WORK_ACK (400)
19:36:32:WU00:Cleaning up
19:36:32:WU00:FS01:Connecting to 65.254.110.245:8080
19:36:32:WU00:FS01:Assigned to work server 140.163.4.241
19:36:32:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti] M 13448 from 140.163.4.241
19:36:32:WU00:FS01:Connecting to 140.163.4.241:8080
19:36:32:WU00:FS01:Downloading 16.53MiB
19:36:36:WU00:FS01:Download complete
19:36:36:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:11714 run:0 clone:10123 gen:36 core:0x21 unit:0x000000358ca304f15de0497365a13c21
19:36:36:WU00:FS01:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_21.fah
19:36:36:WU00:FS01:Connecting to cores.foldingathome.org:80
19:36:36:WU00:FS01:FahCore 21: Downloading 3.23MiB
19:36:37:WU00:FS01:FahCore 21: Download complete
19:36:37:WU00:FS01:Valid core signature
19:36:37:WU00:FS01:Unpacked 7.94MiB to cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21
19:36:37:WU00:FS01:Starting
19:36:37:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 705 -lifeline 1959 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
19:36:37:WU00:FS01:Started FahCore on PID 2119
19:36:37:WU00:FS01:Core PID:2123
19:36:37:WU00:FS01:FahCore 0x21 started
19:36:37:WU00:FS01:0x21:*********************** Log Started 2019-12-24T19:36:37Z ***********************
19:36:37:WU00:FS01:0x21:Project: 11714 (Run 0, Clone 10123, Gen 36)
19:36:37:WU00:FS01:0x21:Unit: 0x000000358ca304f15de0497365a13c21
19:36:37:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
19:36:37:WU00:FS01:0x21:Machine: 1
19:36:37:WU00:FS01:0x21:Reading tar file core.xml
19:36:37:WU00:FS01:0x21:Reading tar file integrator.xml
19:36:37:WU00:FS01:0x21:Reading tar file state.xml
19:36:37:WU00:FS01:0x21:Reading tar file system.xml
19:36:37:WU00:FS01:0x21:Digital signatures verified
19:36:37:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
19:36:37:WU00:FS01:0x21:Version 0.0.20
19:36:41:WU00:FS01:0x21:Completed 0 out of 7500000 steps (0%)
19:36:41:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:37:08:Saving configuration to /etc/fahclient/config.xml

19:37:08:  <!-- User Information -->
19:37:08:  <passkey v='********************************'/>
19:37:08:  <team v='224497'/>
19:37:08:  <user v='Catalina588_'/>
19:37:08:
19:37:08:  <!-- Folding Slots -->
19:37:08:  <slot id='1' type='GPU'/>
19:37:08:</config>
19:37:42:WU00:FS01:0x21:Completed 75000 out of 7500000 steps (1%)
19:38:44:WU00:FS01:0x21:Completed 150000 out of 7500000 steps (2%)
19:39:47:WU00:FS01:0x21:Completed 225000 out of 7500000 steps (3%)
19:40:50:WU00:FS01:0x21:Completed 300000 out of 7500000 steps (4%)
19:41:53:WU00:FS01:0x21:Completed 375000 out of 7500000 steps (5%)
19:42:56:WU00:FS01:0x21:Completed 450000 out of 7500000 steps (6%)
19:43:59:WU00:FS01:0x21:Completed 525000 out of 7500000 steps (7%)
19:45:02:WU00:FS01:0x21:Completed 600000 out of 7500000 steps (8%)
19:46:05:WU00:FS01:0x21:Completed 675000 out of 7500000 steps (9%)
19:47:08:WU00:FS01:0x21:Completed 750000 out of 7500000 steps (10%)
19:48:11:WU00:FS01:0x21:Completed 825000 out of 7500000 steps (11%)
19:49:14:WU00:FS01:0x21:Completed 900000 out of 7500000 steps (12%)
19:50:17:WU00:FS01:0x21:Completed 975000 out of 7500000 steps (13%)
19:51:21:WU00:FS01:0x21:Completed 1050000 out of 7500000 steps (14%)
19:52:24:WU00:FS01:0x21:Completed 1125000 out of 7500000 steps (15%)
19:52:47:FS01:Finishing
19:53:26:WU00:FS01:0x21:Completed 1200000 out of 7500000 steps (16%)
19:54:30:WU00:FS01:0x21:Completed 1275000 out of 7500000 steps (17%)
19:55:32:WU00:FS01:0x21:Completed 1350000 out of 7500000 steps (18%)
19:56:35:WU00:FS01:0x21:Completed 1425000 out of 7500000 steps (19%)
19:57:38:WU00:FS01:0x21:Completed 1500000 out of 7500000 steps (20%)
19:58:41:WU00:FS01:0x21:Completed 1575000 out of 7500000 steps (21%)
19:59:44:WU00:FS01:0x21:Completed 1650000 out of 7500000 steps (22%)
20:00:47:WU00:FS01:0x21:Completed 1725000 out of 7500000 steps (23%)
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Catalina588 »

Probably Solved

Factory cooler had one of four legs only partly engaged, so cooler not completely flush on chip. So embarrassing!

Gonna re-enable CPU folding overnight to burn it in.

GPU folding is running at 51C on one core, as expected.

Merry Christmas to all!
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by toTOW »

Glad you got it running.

Merry Christmas and happy folding !
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Betroz
Posts: 7
Joined: Tue Oct 25, 2016 2:56 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Betroz »

I have a similar problem, only that it happens at random and often after many hours of GPU only folding on my 2080Ti. The system just freezes and I must do a hard reboot. Under Windows 10 there is no problems at all with folding. No issues there. The problem exist only in Linux. Linux Mint 19.2, 19.3, KDE Neon does'nt matter (all based on Ubuntu 18.04). All updated and newest Nvidia 440 drivers. I have tried the 4.15, 5.0 and 5.3 kernels. Even tried older Nvidia drivers. Same problem happens. I love running Linux, but if my system is not entirely stable there, I'm better off just folding in Windows 10.

My own theory is that there is a problem/bug related to 2080Ti folding under Ubuntu 18.04 and derivatives. Maybe Ubuntu 20.04 will fix it.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by bruce »

The other possibility is that it has nothing to do with Linux 19.x In fact, it might be a hardware issue.

Dpending on which GPU you have and which over/underclocking settings you may be using and which WUs you've been assigned, it must be remembered that improvements in the FAHCore tend to make it run hotter, and specifically on some newer projects. Windows tends to run less efficiently than Linux, so you may be seeing a double increase in heat/power. Consider increasing your fan speed or reducing your overclock and see what happens.
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by HaloJones »

Agree with Bruce here. I have two dedicated Folding rigs running Mint, both have two gpu and both run for months without issue. Under Linux you will have higher gpu usage and with no ability to control gpu voltages it's possible that your cards are hanging.
single 1070

Image
Betroz
Posts: 7
Joined: Tue Oct 25, 2016 2:56 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Betroz »

bruce wrote:The other possibility is that it has nothing to do with Linux 19.x In fact, it might be a hardware issue.

Dpending on which GPU you have and which over/underclocking settings you may be using and which WUs you've been assigned, it must be remembered that improvements in the FAHCore tend to make it run hotter, and specifically on some newer projects. Windows tends to run less efficiently than Linux, so you may be seeing a double increase in heat/power. Consider increasing your fan speed or reducing your overclock and see what happens.
I have the Asus Strix OC 2080Ti, and it runs with no manual overclock- Load temp with Core22 is about 65C...
I did try to play Shadow of The Tomb Raider game under Linux too, and no crash there - with high load on the GPU.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by bruce »

Unfortunately, gaminig with high load on a GPU isn't the equivalent of stream computing's load. Gamining depends on getting hight performance out of the video generation components of a GPU (producing a high frame-rate) and FAH doesn't use those components at all. All the heat generated by FAH is concentrated on the compute components (shaders) and in a big chip, that makes quite a difference.

I would manually try manually underclocking for long enough to prove or disprove the theory that something is overheating.
Betroz
Posts: 7
Joined: Tue Oct 25, 2016 2:56 pm

Re: Linux Mint 19.2/.3 Freezes During CPU or GPU Folding

Post by Betroz »

bruce wrote:Unfortunately, gaminig with high load on a GPU isn't the equivalent of stream computing's load. Gamining depends on getting hight performance out of the video generation components of a GPU (producing a high frame-rate) and FAH doesn't use those components at all. All the heat generated by FAH is concentrated on the compute components (shaders) and in a big chip, that makes quite a difference.

I would manually try manually underclocking for long enough to prove or disprove the theory that something is overheating.
65C with Core22 load on the GPU is hardly overheating :wink:
Post Reply