Project: 14531 (Run 0, Clone 1305, Gen 17)
Posted: Thu Apr 16, 2020 8:39 am
This WU appears to have stalled out on one of my linux servers. You have a lot of dead links in your "Troubleshooting WUs" sticky (such as links to a non-existent wiki and to your FAH-specific cpu stressing software) so I'm not exactly sure where to start with regards to troubleshooting on my end.
Here's the log:
This machine has been chewing through plenty of WUs for the past few weeks so either there's an issue with the WU or my machine is giving up the ghost (it's not overclocked). It runs a minimal GUI-less install of Debian 9 and has been running F@H exclusively.
Here's the log:
Code: Select all
08:14:31:WU00:FS00:Starting
08:14:31:WU00:FS00:Removing old file './work/00/logfile_01-20200416-074250.txt'
08:14:31:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 704 -lifeline 513 -checkpoint 15 -np 8
08:14:31:WU00:FS00:Started FahCore on PID 855
08:14:31:WU00:FS00:Core PID:859
08:14:31:WU00:FS00:FahCore 0xa7 started
08:14:31:WU00:FS00:0xa7:*********************** Log Started 2020-04-16T08:14:31Z ***********************
08:14:31:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
08:14:31:WU00:FS00:0xa7: Type: 0xa7
08:14:31:WU00:FS00:0xa7: Core: Gromacs
08:14:31:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 704 -lifeline 855 -checkpoint 15 -np 8
08:14:31:WU00:FS00:0xa7:************************************ CBang *************************************
08:14:31:WU00:FS00:0xa7: Date: Nov 5 2019
08:14:31:WU00:FS00:0xa7: Time: 06:06:57
08:14:31:WU00:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
08:14:31:WU00:FS00:0xa7: Branch: master
08:14:31:WU00:FS00:0xa7: Compiler: GNU 8.3.0
08:14:31:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
08:14:31:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
08:14:31:WU00:FS00:0xa7: Bits: 64
08:14:31:WU00:FS00:0xa7: Mode: Release
08:14:31:WU00:FS00:0xa7:************************************ System ************************************
08:14:31:WU00:FS00:0xa7: CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
08:14:31:WU00:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
08:14:31:WU00:FS00:0xa7: CPUs: 8
08:14:31:WU00:FS00:0xa7: Memory: 15.63GiB
08:14:31:WU00:FS00:0xa7:Free Memory: 15.40GiB
08:14:31:WU00:FS00:0xa7: Threads: POSIX_THREADS
08:14:31:WU00:FS00:0xa7: OS Version: 4.9
08:14:31:WU00:FS00:0xa7:Has Battery: false
08:14:31:WU00:FS00:0xa7: On Battery: false
08:14:31:WU00:FS00:0xa7: UTC Offset: -7
08:14:31:WU00:FS00:0xa7: PID: 859
08:14:31:WU00:FS00:0xa7: CWD: /var/lib/fahclient/work
08:14:31:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
08:14:31:WU00:FS00:0xa7: Version: 0.0.18
08:14:31:WU00:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:14:31:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
08:14:31:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
08:14:31:WU00:FS00:0xa7: Date: Nov 5 2019
08:14:31:WU00:FS00:0xa7: Time: 06:13:26
08:14:31:WU00:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
08:14:31:WU00:FS00:0xa7: Branch: master
08:14:31:WU00:FS00:0xa7: Compiler: GNU 8.3.0
08:14:31:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
08:14:31:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
08:14:31:WU00:FS00:0xa7: Bits: 64
08:14:31:WU00:FS00:0xa7: Mode: Release
08:14:31:WU00:FS00:0xa7:************************************ Build *************************************
08:14:31:WU00:FS00:0xa7: SIMD: avx_256
08:14:31:WU00:FS00:0xa7:********************************************************************************
08:14:31:WU00:FS00:0xa7:Project: 14531 (Run 0, Clone 1305, Gen 17)
08:14:31:WU00:FS00:0xa7:Unit: 0x0000001a80fccb0a5e6978bc26ce4efd
08:14:31:WU00:FS00:0xa7:Digital signatures verified
08:14:31:WU00:FS00:0xa7:Calling: mdrun -s frame17.tpr -o frame17.trr -cpi state.cpt -cpt 15 -nt 8
08:14:31:WU00:FS00:0xa7:Steps: first=4250000 total=250000
08:14:34:WU00:FS00:0xa7:Completed 223571 out of 250000 steps (89%)
08:14:36:WU00:FS00:0xa7:ERROR:
08:14:36:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
08:14:36:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
08:14:36:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
08:14:36:WU00:FS00:0xa7:ERROR:
08:14:36:WU00:FS00:0xa7:ERROR:Fatal error:
08:14:36:WU00:FS00:0xa7:ERROR:7 particles communicated to PME rank 0 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
08:14:36:WU00:FS00:0xa7:ERROR:This usually means that your system is not well equilibrated.
08:14:36:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
08:14:36:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
08:14:36:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
08:14:36:WU00:FS00:0xa7:ERROR:
08:14:36:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
08:14:36:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
08:14:36:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
08:14:36:WU00:FS00:0xa7:ERROR:
08:14:36:WU00:FS00:0xa7:ERROR:Fatal error:
08:14:36:WU00:FS00:0xa7:ERROR:2 particles communicated to PME rank 7 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
08:14:36:WU00:FS00:0xa7:ERROR:This usually means that your system is not well equilibrated.
08:14:36:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
08:14:36:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
08:14:36:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
08:14:41:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)