INTERRUPTED Problem, just started recently.

Moderators: Site Moderators, FAHC Science Team

Post Reply
markfw
Posts: 147
Joined: Mon Feb 04, 2008 3:32 pm

INTERRUPTED Problem, just started recently.

Post by markfw »

I get the below on multiple computers, from 5950x running Linux to 7B12 EPYC all with at least one from processor. This only started recently, no changes, they run for months at a time befor reboot. Below is a sample of the log:

Code: Select all

21:17:07:WU00:FS01:0x22:Please consider upgrading your client version.
21:17:07:WU00:FS01:0x22:There are 4 platforms available.
21:17:07:WU00:FS01:0x22:Platform 0: Reference
21:17:07:WU00:FS01:0x22:Platform 1: CPU
21:17:07:WU00:FS01:0x22:Platform 2: OpenCL
21:17:07:WU00:FS01:0x22:  opencl-device -1 specified
21:17:07:WU00:FS01:0x22:Platform 3: CUDA
21:17:07:WU00:FS01:0x22:  cuda-device 0 specified
21:17:16:WU00:FS01:0x22:Attempting to create CUDA context:
21:17:16:WU00:FS01:0x22:  Configuring platform CUDA
21:17:32:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
21:18:07:WU00:FS01:Starting
21:18:07:WU00:FS01:Removing old file 'work/00/logfile_01-20220429-201306.txt'
21:18:07:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.20/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1805 -checkpoint 15 -cuda-device 0 -gpu-vendor nvidia -gpu -1 -gpu-usage 100
21:18:07:WU00:FS01:Started FahCore on PID 61122
21:18:07:WU00:FS01:Core PID:61126
21:18:07:WU00:FS01:FahCore 0x22 started
21:18:07:WU00:FS01:0x22:*********************** Log Started 2022-04-29T21:18:07Z ***********************
21:18:07:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
21:18:07:WU00:FS01:0x22:       Core: Core22
21:18:07:WU00:FS01:0x22:       Type: 0x22
21:18:07:WU00:FS01:0x22:    Version: 0.0.20
21:18:07:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:18:07:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
21:18:07:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
21:18:07:WU00:FS01:0x22:       Date: Jan 20 2022
21:18:07:WU00:FS01:0x22:       Time: 00:57:52
21:18:07:WU00:FS01:0x22:   Revision: 3f211b8a4346514edbff34e3cb1c0e0ec951373c
21:18:07:WU00:FS01:0x22:     Branch: HEAD
21:18:07:WU00:FS01:0x22:   Compiler: GNU 9.4.0
21:18:07:WU00:FS01:0x22:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
21:18:07:WU00:FS01:0x22:             -fdata-sections -O3 -funroll-loops -fno-pie
21:18:07:WU00:FS01:0x22:             -DOPENMM_VERSION="\"7.7.0\""
21:18:07:WU00:FS01:0x22:   Platform: linux 5.11.0-1025-azure
21:18:07:WU00:FS01:0x22:       Bits: 64
21:18:07:WU00:FS01:0x22:       Mode: Release
21:18:07:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
21:18:07:WU00:FS01:0x22:             <peastman@stanford.edu>
21:18:07:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 61122 -checkpoint 15
21:18:07:WU00:FS01:0x22:             -cuda-device 0 -gpu-vendor nvidia -gpu -1 -gpu-usage 100
21:18:07:WU00:FS01:0x22:************************************ libFAH ************************************
21:18:07:WU00:FS01:0x22:       Date: Jan 20 2022
21:18:07:WU00:FS01:0x22:       Time: 00:57:22
21:18:07:WU00:FS01:0x22:   Revision: 9f4ad694e75c2350d4bb6b8b5b769ba27e483a2f
21:18:07:WU00:FS01:0x22:     Branch: HEAD
21:18:07:WU00:FS01:0x22:   Compiler: GNU 9.4.0
21:18:07:WU00:FS01:0x22:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
21:18:07:WU00:FS01:0x22:             -fdata-sections -O3 -funroll-loops -fno-pie
21:18:07:WU00:FS01:0x22:   Platform: linux 5.11.0-1025-azure
21:18:07:WU00:FS01:0x22:       Bits: 64
21:18:07:WU00:FS01:0x22:       Mode: Release
21:18:07:WU00:FS01:0x22:************************************ CBang *************************************
21:18:07:WU00:FS01:0x22:       Date: Jan 20 2022
21:18:07:WU00:FS01:0x22:       Time: 00:57:00
21:18:07:WU00:FS01:0x22:   Revision: ab023d155b446906d55b0f6c9a1eedeea04f7a1a
21:18:07:WU00:FS01:0x22:     Branch: HEAD
21:18:07:WU00:FS01:0x22:   Compiler: GNU 9.4.0
21:18:07:WU00:FS01:0x22:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
21:18:07:WU00:FS01:0x22:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
21:18:07:WU00:FS01:0x22:   Platform: linux 5.11.0-1025-azure
21:18:07:WU00:FS01:0x22:       Bits: 64
21:18:07:WU00:FS01:0x22:       Mode: Release
21:18:07:WU00:FS01:0x22:************************************ System ************************************
21:18:07:WU00:FS01:0x22:        CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
21:18:07:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
21:18:07:WU00:FS01:0x22:       CPUs: 32
21:18:07:WU00:FS01:0x22:     Memory: 31.33GiB
21:18:07:WU00:FS01:0x22:Free Memory: 1.91GiB
21:18:07:WU00:FS01:0x22:    Threads: POSIX_THREADS
21:18:07:WU00:FS01:0x22: OS Version: 4.15
21:18:07:WU00:FS01:0x22:Has Battery: false
21:18:07:WU00:FS01:0x22: On Battery: false
21:18:07:WU00:FS01:0x22: UTC Offset: -7
21:18:07:WU00:FS01:0x22:        PID: 61126
21:18:07:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
21:18:07:WU00:FS01:0x22:************************************ OpenMM ************************************
21:18:07:WU00:FS01:0x22:    Version: 7.7.0
21:18:07:WU00:FS01:0x22:********************************************************************************
21:18:07:WU00:FS01:0x22:Project: 18201 (Run 7172, Clone 2, Gen 1)
21:18:07:WU00:FS01:0x22:Digital signatures verified
21:18:07:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
21:18:07:WU00:FS01:0x22:Version 0.0.20
21:18:07:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
21:18:07:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
21:18:07:WU00:FS01:0x22:  XTC frame write interval: 20000 steps (1.6%) [62 total]
21:18:07:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
21:18:07:WU00:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
21:18:07:WU00:FS01:0x22:Please consider upgrading your client version.
21:18:07:WU00:FS01:0x22:There are 4 platforms available.
21:18:07:WU00:FS01:0x22:Platform 0: Reference
21:18:07:WU00:FS01:0x22:Platform 1: CPU
21:18:07:WU00:FS01:0x22:Platform 2: OpenCL
21:18:07:WU00:FS01:0x22:  opencl-device -1 specified
21:18:07:WU00:FS01:0x22:Platform 3: CUDA
21:18:07:WU00:FS01:0x22:  cuda-device 0 specified
21:18:16:WU00:FS01:0x22:Attempting to create CUDA context:
21:18:16:WU00:FS01:0x22:  Configuring platform CUDA
21:18:33:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
Last edited by Joe_H on Fri Apr 29, 2022 11:37 pm, edited 1 time in total.
Reason: added code tags to log
markfw
Posts: 147
Joined: Mon Feb 04, 2008 3:32 pm

Re: INTERRUPTED Problem, just started recently.

Post by markfw »

I can't find the edit button. All are running linux cinnamon mint 19.2 all 2080TI video cards, 510 drivers.
Joe_H
Site Admin
Posts: 8224
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: INTERRUPTED Problem, just started recently.

Post by Joe_H »

markfw wrote: Fri Apr 29, 2022 9:21 pm I can't find the edit button. All are running linux cinnamon mint 19.2 all 2080TI video cards, 510 drivers.
Should show up as the left most button grouped at the top right of the post, symbol is supposed to be a pencil I guess.
Image
markfw
Posts: 147
Joined: Mon Feb 04, 2008 3:32 pm

Re: INTERRUPTED Problem, just started recently.

Post by markfw »

I found it. And I may have found the problem. Based on my research, it said "slow cpus" well, a 5950x is not slow. BUT only allowing one thread out of 32 to service the GPU is NOT good when the other 31 are Rosetta@home. I suspended all Rosetta, until I verify its the problem, but then I have to find out how many threads the GPU needs (its a 2080TI on all 6 boxes I have had an issue on).

I am Number 28 world-wide on F@H !

And thanks Joe ! I don't know how I missed it.
PaulTV
Posts: 236
Joined: Mon Jan 25, 2021 4:53 pm
Location: Netherlands

Re: INTERRUPTED Problem, just started recently.

Post by PaulTV »

The FahCore_22 process is single-threaded, afaik, and in top it should show 100% CPU usage (so take a single thread). Asides from that, it'd be good to have a thread reserved for the OS itself. The FahCore_22 process will show a nice setting of 20, which is the lowest priority. If the Rosetta processes have a lower nice (higher prio), it may explain that behavior. On my Linux folding rig, I have a script in cron to re-nice the process.

In /etc/crontab:

Code: Select all

*/5 * * * *   root    /usr/local/bin/renice_fah.sh > /dev/null 2>&1
The script /usr/local/bin/renice_fah.sh itself (don't forget to make executable) - this will give FahCore_22 a higher prio than standard user processes, but not as high as the critical OS processes:

Code: Select all

#!/usr/bin/env bash

set -ue

nice_to="-10"

### Get running core 22 process including nice
psline="$(ps -le | grep -e FahCore_22 | grep -v grep | tail -1)"
if [ -z "${psline}" ]
then
        echo "GPU core process not found"
        exit 0
fi
currentnice="$(echo "${psline}" | awk '{ print $8 }')"
if [ "${currentnice}" = "${nice_to}" ]
then
        echo "GPU core process already reniced to ${nice_to}"
        exit 0
fi
currentpid="$(echo "${psline}" | awk '{ print $4 }')"
echo "Re-nicing GPU Core process ${currentpid}"
renice "${nice_to}" -p "${currentpid}"
(edited because of rookie mistake; nice of 20 is lowest prio, -20 is highest prio, not the other way around)
Image

Ryzen 9800X3D / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 22.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
toTOW
Site Moderator
Posts: 6497
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: INTERRUPTED Problem, just started recently.

Post by toTOW »

PaulTV wrote: Sat Apr 30, 2022 8:46 am The FahCore_22 process is single-threaded, afaik, and in top it should show 100% CPU usage (so take a single thread).
It's not entirely true : the code feeding the GPU is single threaded, because there's nothing else it could do at the same time, but some parts of the core (the sanity checks and checkpoint writes) are multi-threaded.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
gunnarre
Posts: 560
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: INTERRUPTED Problem, just started recently.

Post by gunnarre »

In any case, the core shouldn't crash just because the CPU is loaded with higher priority tasks. An impact to performance would be expected if that happens, but it shouldn't crash completely. This sounds to me like either a bug in the folding core or the OS - if hardware stability has been eliminated as the cause.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Post Reply