ERROR:Guru Meditation

Moderators: Site Moderators, FAHC Science Team

Post Reply
arisu
Posts: 182
Joined: Mon Feb 24, 2025 11:11 pm

ERROR:Guru Meditation

Post by arisu »

I shut down FAH with "systemctl stop fah-client.service". I think systemd must have killed both the client and the cores too quickly, because when I started it up again, there was an error processing dhdl.xvg and the core failed. Luckily my network was off and I had a backup from before stopping the service so I didn't lose the WU.

This error is easily reproducible by running a plain a8 core in command line and then killing it, or by backing up the work directory while the core is running and then restoring it. Some portion of the time, the dhdl.xvg file gets corrupted (it's always the xvg file). I'm not sure why, because the modifications to the checkpoint code that FAH has made to GROMACS is intended to stop just that by truncating files that are appended to (like xvg files) back to their state during the last checkpoint.

Code: Select all

06:44:50:I1:WU269:*********************** Log Started 2025-04-04T06:44:49Z ***********************
06:44:50:I1:WU269:************************** Gromacs Folding@home Core ***************************
06:44:50:I1:WU269:       Core: Gromacs                                                                                                  
06:44:50:I1:WU269:       Type: 0xa8                        
06:44:50:I1:WU269:    Version: 0.0.12                     
06:44:50:I1:WU269:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
06:44:50:I1:WU269:  Copyright: 2020 foldingathome.org                                                                                   
06:44:50:I1:WU269:   Homepage: https://foldingathome.org/                                                                                                                                                                                                                       
06:44:50:I1:WU269:       Date: Jan 16 2021              
06:44:50:I1:WU269:       Time: 19:24:44                 
06:44:50:I1:WU269:   Compiler: GNU 8.3.0                     
06:44:50:I1:WU269:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
06:44:50:I1:WU269:             -fdata-sections -O3 -funroll-loops -fno-pie
06:44:50:I1:WU269:   Platform: linux2 4.15.0-128-generic                                                                                
06:44:50:I1:WU269:       Bits: 64                                                                                                                                                                                                                                               
06:44:50:I1:WU269:       Mode: Release
06:44:50:I1:WU269:       SIMD: avx2_256
06:44:50:I1:WU269:     OpenMP: ON
06:44:50:I1:WU269:       CUDA: OFF
06:44:50:I1:WU269:       Args: -dir aqglezsapduPAa44rC_rXPzXVe-6QMe9WP0YFIHrzrQ -suffix 01
06:44:50:I1:WU269:             -version 8.4.10 -lifeline 18990 -np 7
06:44:50:I1:WU269:************************************ libFAH ************************************
06:44:50:I1:WU269:       Date: Jan 16 2021
06:44:50:I1:WU269:       Time: 19:21:38
06:44:50:I1:WU269:   Compiler: GNU 8.3.0
06:44:50:I1:WU269:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
06:44:50:I1:WU269:             -fdata-sections -O3 -funroll-loops -fno-pie
06:44:50:I1:WU269:   Platform: linux2 4.15.0-128-generic
06:44:50:I1:WU269:       Bits: 64
06:44:50:I1:WU269:       Mode: Release
06:44:50:I1:WU269:************************************ CBang *************************************
06:44:50:I1:WU269:       Date: Jan 16 2021
06:44:50:I1:WU269:       Time: 19:21:24
06:44:50:I1:WU269:   Compiler: GNU 8.3.0
06:44:50:I1:WU269:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
06:44:50:I1:WU269:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
06:44:50:I1:WU269:   Platform: linux2 4.15.0-128-generic
06:44:50:I1:WU269:       Bits: 64
06:44:50:I1:WU269:       Mode: Release
06:44:50:I1:WU269:************************************ System ************************************
06:44:50:I1:WU269:        CPU: AMD Ryzen 7 7840U w/ Radeon 780M Graphics
06:44:50:I1:WU269:     CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
06:44:50:I1:WU269:       CPUs: 8
06:44:50:I1:WU269:     Memory: 30.58GiB
06:44:50:I1:WU269:Free Memory: 23.49GiB
06:44:50:I1:WU269:    Threads: POSIX_THREADS
06:44:50:I1:WU269: OS Version: 6.1
06:44:50:I1:WU269:Has Battery: true
06:44:50:I1:WU269: On Battery: false
06:44:50:I1:WU269: UTC Offset: 5
06:44:50:I1:WU269:        PID: 18995
06:44:50:I1:WU269:        CWD: /var/lib/fah-client/work
06:44:50:I1:WU269:********************************************************************************
06:44:50:I1:WU269:Project: 19228 (Run 6060, Clone 7, Gen 3)
06:44:50:I1:WU269:Unit: 0x00000000000000000000000000000000
06:44:50:I1:WU269:Digital signatures verified
06:44:50:I1:WU269:Calling: mdrun -c md3.gro -s md3.tpr -x md3.xtc -cpi state.cpt -cpt 5 -nt 7 -ntmpi 1
06:44:50:I1:WU269:ERROR:Guru Meditation #6a6fa21db879dcae.47018a7a424ec6e2 (14621.17230) 'aqglezsapduPAa44rC_rXPzXVe-6QMe9WP0YFIHrzrQ/01/dhdl.xvg'
06:44:50:I4:REQ2:> HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
06:44:50:E :WU269:Core returned BAD_FRAME_CHECKSUM (112)
06:44:50:E :WU269:Run did not produce any results. Dumping WU
N.B. The guru mediation error is being printed with an info log level of 1. It should probably be an error log level instead.
muziqaz
Posts: 1429
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: ERROR:Guru Meditation

Post by muziqaz »

When guru meditates, it is usually due to hardware instability, but since you are playing around with manual service killing, if you want a technical discussion or probably some scolding from the dev, GitHub your findings ;)
Fahcore_a8 dev moved out of our region of space a while ago, unfortunately, though
FAH Omega tester
Image
arisu
Posts: 182
Joined: Mon Feb 24, 2025 11:11 pm

Re: ERROR:Guru Meditation

Post by arisu »

It was shut down using the standard way that services are shut down (actually, the only way short of crashing the process on purpose). This is also how it would be shut down on any systemd Linux system simply by clicking reboot. It requests that the client stops and, if it doesn't stop in time, kills it.

Maybe a tweak in the .service file is needed to make systemd wait for longer before killing it? Not sure, because I'm just assuming that that is the cause.

No GitHub for the cores that I can find, sadly. Just the client. And this is the core being unhappy.
muziqaz
Posts: 1429
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: ERROR:Guru Meditation

Post by muziqaz »

arisu wrote: Fri Apr 04, 2025 10:01 am It was shut down using the standard way that services are shut down. This is also how it would be shut down on any systemd Linux system simply by clicking reboot. It requests that the client stops and, if it doesn't stop in time, kills it.

Maybe a tweak in the .service file is needed to make systemd wait for longer before killing it? Not sure, because I'm just assuming that that is the cause.
Look, let me break it to the basics.
When fahclient is shutdown by Linux system, it does that without any gurus meditating.
Now you done it manually, and suddenly guru decided to meditate.
Either your system is unstable (software or hardware), or your killing process is not the same to the one where client does it.
Fahclient V8 has no issues or bugs when it comes to closing or restarting the services while WU is being folded. Windows client has this issue, but not Linux one
FAH Omega tester
Image
arisu
Posts: 182
Joined: Mon Feb 24, 2025 11:11 pm

Re: ERROR:Guru Meditation

Post by arisu »

When clicking "shutdown" on a Linux computer, it will run "systemctl stop fah-client.service". This gracefully stops the client, or at least tries to. There exists no other way to shut down the service on Linux. As the service file has "Restart=always", it is completely impossible to stop the service without using systemd.

It might be that the client took a little too long to shut down, and systemd killed it by force (like Windows does when shutting down).
Last edited by arisu on Fri Apr 04, 2025 10:10 am, edited 1 time in total.
muziqaz
Posts: 1429
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: ERROR:Guru Meditation

Post by muziqaz »

So conclusion is, that your system is unstable ;)
FAH Omega tester
Image
arisu
Posts: 182
Joined: Mon Feb 24, 2025 11:11 pm

Re: ERROR:Guru Meditation

Post by arisu »

I found the issue. For some reason I had "DefaultTimeoutStopSec=6s" set in the global systemd config. This didn't give fah-client enough time to shut down (the default is supposed to be 90s). I fixed the issue by adding "TimeoutStopSec=90s" to the .service file which is more than enough.

This tweak must have been from years ago, I don't even remember making it! A tweak that makes Linux systems shut down faster so they behave more like Windows caused a Linux system to catch a Windows bug. Who would have thought? :lol:

Technically, any service file that requires time to shut down or will malfunction if killed is supposed to add a TimeoutStopSec to their services file. The FAH one is pretty barebones and does not have that. It's something else that should be changed (I'll add it to my list of GitHub issues that I'll create when I sign up). There are some systemd systems that, to speed up rebooting, lower the DefaultTimeoutStopSec, so this may help people who use lower timeouts.
muziqaz
Posts: 1429
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: ERROR:Guru Meditation

Post by muziqaz »

Make sure you split the issues per thread. Dont bunch them together in one issue thread ;)
FAH Omega tester
Image
daiko
Posts: 24
Joined: Tue Jun 08, 2021 1:14 pm
Hardware configuration: Mac Studio M1 Max
Mac Mini M4 Pro
Folding since 2005
Location: Atlantic County, NJ

Re: ERROR:Guru Meditation

Post by daiko »

Guru Meditation Error is something we used to see on the Amiga when it had a system crash back in the day. I didn’t realize it found its way into Linux systems as well! :D
Image
muziqaz
Posts: 1429
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: ERROR:Guru Meditation

Post by muziqaz »

Gurus need to meditate regardless of the time period or hardware
FAH Omega tester
Image
daiko
Posts: 24
Joined: Tue Jun 08, 2021 1:14 pm
Hardware configuration: Mac Studio M1 Max
Mac Mini M4 Pro
Folding since 2005
Location: Atlantic County, NJ

Re: ERROR:Guru Meditation

Post by daiko »

muziqaz wrote: Fri Apr 04, 2025 12:40 pm Gurus need to meditate regardless of the time period or hardware
Dude. Aummm... :lol:
Image
muziqaz
Posts: 1429
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: ERROR:Guru Meditation

Post by muziqaz »

P.S. it is not exclusive to Linux these days.
It shows up in FAH Windows systems as well
FAH Omega tester
Image
Post Reply