Project: 12419 (Run 77, Clone 0, Gen 257) crashes and restarts endlessly

Moderators: Site Moderators, FAHC Science Team

Post Reply
YInMn
Posts: 5
Joined: Sat Jun 17, 2023 12:49 pm

Project: 12419 (Run 77, Clone 0, Gen 257) crashes and restarts endlessly

Post by YInMn »

Hello there,

i noticed an issue with the Project: 12419 (Run 77, Clone 0, Gen 257) WU
The WU is crashing (signal 11/SEGV) and restarting endlessly (maybe a similar error like mentioned in this thread)
According to the WU status page this WU has already failed on a different system.

Excerpt from the system log:

Code: Select all

Aug 27 17:33:10 ****** systemd-coredump[36046]: Process 36039 (FahCore_a8) of user 62464 terminated abnormally with signal 11/SEGV, processing...
Aug 27 17:33:10 ****** systemd[1]: Started Process Core Dump (PID 36046/UID 0).
Aug 27 17:33:10 ****** FAHClient[2416]: 15:33:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
Aug 27 17:33:11 ****** systemd-coredump[36047]: Process 36039 (FahCore_a8) of user 62464 dumped core.
                                                 
                                                 Stack trace of thread 36044:
                                                 #0  0x000000000071b329 n/a (/var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 + 0x31b329)
                                                 ELF object binary architecture: AMD x86-64
                                                 
Aug 27 17:34:10 ****** systemd-coredump[36130]: Process 36122 (FahCore_a8) of user 62464 terminated abnormally with signal 11/SEGV, processing...
Aug 27 17:34:10 ****** systemd[1]: Started Process Core Dump (PID 36130/UID 0).
Aug 27 17:34:10 ****** FAHClient[2416]: 15:34:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
Aug 27 17:34:11 ****** systemd-coredump[36131]: Process 36122 (FahCore_a8) of user 62464 dumped core.
                                                 
                                                 Stack trace of thread 36128:
                                                 #0  0x000000000071b329 n/a (/var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 + 0x31b329)
                                                 ELF object binary architecture: AMD x86-64
                                                 
Aug 27 17:35:10 ****** systemd-coredump[36200]: Process 36192 (FahCore_a8) of user 62464 terminated abnormally with signal 11/SEGV, processing...
Aug 27 17:35:10 ****** systemd[1]: Started Process Core Dump (PID 36200/UID 0).
Aug 27 17:35:10 ****** FAHClient[2416]: 15:35:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
Aug 27 17:35:11 ****** systemd-coredump[36201]: Process 36192 (FahCore_a8) of user 62464 dumped core.
                                                 
                                                 Stack trace of thread 36197:
                                                 #0  0x000000000071b329 n/a (/var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 + 0x31b329)
                                                 ELF object binary architecture: AMD x86-64

Aug 27 17:36:10 ****** systemd-coredump[36282]: Process 36274 (FahCore_a8) of user 62464 terminated abnormally with signal 11/SEGV, processing...
Aug 27 17:36:10 ****** systemd[1]: Started Process Core Dump (PID 36282/UID 0).
Aug 27 17:36:11 ****** FAHClient[2416]: 15:36:11:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
Aug 27 17:36:11 ****** systemd-coredump[36283]: Process 36274 (FahCore_a8) of user 62464 dumped core.
                                                 
                                                 Stack trace of thread 36280:
                                                 #0  0x000000000071b329 n/a (/var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 + 0x31b329)
                                                 ELF object binary architecture: AMD x86-64
Excerpt from the FAHClient log:

Code: Select all

15:32:10:WU04:FS04:0xa8:Project: 12419 (Run 77, Clone 0, Gen 257)
15:32:10:WU04:FS04:0xa8:Unit: 0x00000000000000000000000000000000
15:32:10:WU04:FS04:0xa8:Digital signatures verified
15:32:10:WU04:FS04:0xa8:Calling: mdrun -c frame257.gro -s frame257.tpr -x frame257.xtc -cpt 3 -nt 4 -ntmpi 1
15:32:10:WU04:FS04:0xa8:Steps: first=1280000000 total=1285000000
15:32:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
15:32:11:WU04:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
15:33:09:WU04:FS04:Starting
15:33:09:WU04:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 04 -suffix 01 -version 706 -lifeline 2416 -checkpoint 3 -np 4
15:33:09:WU04:FS04:Started FahCore on PID 36035
15:33:09:WU04:FS04:Core PID:36039
15:33:09:WU04:FS04:FahCore 0xa8 started
15:33:10:WU04:FS04:0xa8:*********************** Log Started 2024-08-27T15:33:09Z ***********************
15:33:10:WU04:FS04:0xa8:************************** Gromacs Folding@home Core ***************************
15:33:10:WU04:FS04:0xa8:       Core: Gromacs
15:33:10:WU04:FS04:0xa8:       Type: 0xa8
15:33:10:WU04:FS04:0xa8:    Version: 0.0.12
15:33:10:WU04:FS04:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:33:10:WU04:FS04:0xa8:  Copyright: 2020 foldingathome.org
15:33:10:WU04:FS04:0xa8:   Homepage: https://foldingathome.org/
15:33:10:WU04:FS04:0xa8:       Date: Jan 16 2021
15:33:10:WU04:FS04:0xa8:       Time: 19:24:44
15:33:10:WU04:FS04:0xa8:   Compiler: GNU 8.3.0
15:33:10:WU04:FS04:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
15:33:10:WU04:FS04:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
15:33:10:WU04:FS04:0xa8:   Platform: linux2 4.15.0-128-generic
15:33:10:WU04:FS04:0xa8:       Bits: 64
15:33:10:WU04:FS04:0xa8:       Mode: Release
15:33:10:WU04:FS04:0xa8:       SIMD: avx2_256
15:33:10:WU04:FS04:0xa8:     OpenMP: ON
15:33:10:WU04:FS04:0xa8:       CUDA: OFF
15:33:10:WU04:FS04:0xa8:       Args: -dir 04 -suffix 01 -version 706 -lifeline 36035 -checkpoint 3 -np 4
15:33:10:WU04:FS04:0xa8:************************************ libFAH ************************************
15:33:10:WU04:FS04:0xa8:       Date: Jan 16 2021
15:33:10:WU04:FS04:0xa8:       Time: 19:21:38
15:33:10:WU04:FS04:0xa8:   Compiler: GNU 8.3.0
15:33:10:WU04:FS04:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
15:33:10:WU04:FS04:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
15:33:10:WU04:FS04:0xa8:   Platform: linux2 4.15.0-128-generic
15:33:10:WU04:FS04:0xa8:       Bits: 64
15:33:10:WU04:FS04:0xa8:       Mode: Release
15:33:10:WU04:FS04:0xa8:************************************ CBang *************************************
15:33:10:WU04:FS04:0xa8:       Date: Jan 16 2021
15:33:10:WU04:FS04:0xa8:       Time: 19:21:24
15:33:10:WU04:FS04:0xa8:   Compiler: GNU 8.3.0
15:33:10:WU04:FS04:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
15:33:10:WU04:FS04:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
15:33:10:WU04:FS04:0xa8:   Platform: linux2 4.15.0-128-generic
15:33:10:WU04:FS04:0xa8:       Bits: 64
15:33:10:WU04:FS04:0xa8:       Mode: Release
15:33:10:WU04:FS04:0xa8:************************************ System ************************************
15:33:10:WU04:FS04:0xa8:        CPU: AMD Ryzen 9 5950X 16-Core Processor
15:33:10:WU04:FS04:0xa8:     CPU ID: AuthenticAMD Family 25 Model 33 Stepping 2
15:33:10:WU04:FS04:0xa8:       CPUs: 32
15:33:10:WU04:FS04:0xa8:     Memory: 31.26GiB
15:33:10:WU04:FS04:0xa8:Free Memory: 20.67GiB
15:33:10:WU04:FS04:0xa8:    Threads: POSIX_THREADS
15:33:10:WU04:FS04:0xa8: OS Version: 6.10
15:33:10:WU04:FS04:0xa8:Has Battery: false
15:33:10:WU04:FS04:0xa8: On Battery: false
15:33:10:WU04:FS04:0xa8: UTC Offset: 2
15:33:10:WU04:FS04:0xa8:        PID: 36039
15:33:10:WU04:FS04:0xa8:        CWD: /var/lib/private/fah/work
15:33:10:WU04:FS04:0xa8:********************************************************************************
15:33:10:WU04:FS04:0xa8:Project: 12419 (Run 77, Clone 0, Gen 257)
15:33:10:WU04:FS04:0xa8:Unit: 0x00000000000000000000000000000000
15:33:10:WU04:FS04:0xa8:Digital signatures verified
15:33:10:WU04:FS04:0xa8:Calling: mdrun -c frame257.gro -s frame257.tpr -x frame257.xtc -cpt 3 -nt 4 -ntmpi 1
15:33:10:WU04:FS04:0xa8:Steps: first=1280000000 total=1285000000
15:33:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
15:33:11:WU04:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
15:34:09:WU04:FS04:Starting
15:34:09:WU04:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 04 -suffix 01 -version 706 -lifeline 2416 -checkpoint 3 -np 4
15:34:09:WU04:FS04:Started FahCore on PID 36118
15:34:09:WU04:FS04:Core PID:36122
15:34:09:WU04:FS04:FahCore 0xa8 started

[... log started message ...]

15:34:10:WU04:FS04:0xa8:Project: 12419 (Run 77, Clone 0, Gen 257)
15:34:10:WU04:FS04:0xa8:Unit: 0x00000000000000000000000000000000
15:34:10:WU04:FS04:0xa8:Digital signatures verified
15:34:10:WU04:FS04:0xa8:Calling: mdrun -c frame257.gro -s frame257.tpr -x frame257.xtc -cpt 3 -nt 4 -ntmpi 1
15:34:10:WU04:FS04:0xa8:Steps: first=1280000000 total=1285000000
15:34:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
15:34:11:WU04:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
15:35:09:WU04:FS04:Starting
15:35:09:WU04:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 04 -suffix 01 -version 706 -lifeline 2416 -checkpoint 3 -np 4
15:35:09:WU04:FS04:Started FahCore on PID 36188
15:35:09:WU04:FS04:Core PID:36192
15:35:09:WU04:FS04:FahCore 0xa8 started

[... log started message ...]

15:35:10:WU04:FS04:0xa8:Project: 12419 (Run 77, Clone 0, Gen 257)
15:35:10:WU04:FS04:0xa8:Unit: 0x00000000000000000000000000000000
15:35:10:WU04:FS04:0xa8:Digital signatures verified
15:35:10:WU04:FS04:0xa8:Calling: mdrun -c frame257.gro -s frame257.tpr -x frame257.xtc -cpt 3 -nt 4 -ntmpi 1
15:35:10:WU04:FS04:0xa8:Steps: first=1280000000 total=1285000000
15:35:10:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
15:35:11:WU04:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
15:36:09:WU04:FS04:Starting
15:36:09:WU04:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 04 -suffix 01 -version 706 -lifeline 2416 -checkpoint 3 -np 4
15:36:09:WU04:FS04:Started FahCore on PID 36270
15:36:09:WU04:FS04:Core PID:36274
15:36:09:WU04:FS04:FahCore 0xa8 started

[... log started message ...]

15:36:10:WU04:FS04:0xa8:Project: 12419 (Run 77, Clone 0, Gen 257)
15:36:10:WU04:FS04:0xa8:Unit: 0x00000000000000000000000000000000
15:36:10:WU04:FS04:0xa8:Digital signatures verified
15:36:10:WU04:FS04:0xa8:Calling: mdrun -c frame257.gro -s frame257.tpr -x frame257.xtc -cpt 3 -nt 4 -ntmpi 1
15:36:10:WU04:FS04:0xa8:Steps: first=1280000000 total=1285000000
15:36:11:WU04:FS04:0xa8:Completed 1 out of 5000000 steps (0%)
15:36:11:WU04:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
best regards,
YInMn
toTOW
Site Moderator
Posts: 6334
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 12419 (Run 77, Clone 0, Gen 257) crashes and restarts endlessly

Post by toTOW »

Yes,it looks like a bad WU, it already failed 3 times on 3 different users. Feel free to dump it.

I reported it to the researcher.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply