Domain decomposition error in project 14524
Posted: Tue Jun 09, 2020 11:20 pm
Checking on my machine, I noticed it was continually trying to start the following work unit, but failing to do so, stuck in a loop repeatedly. I pasted the logs below. The CPU is a Ryzen 5 1600, also running a GPU, the client automatically configures it to 11 threads, 12 minus 1 for the GPU (i did not set how many threads manually it is -1 to let client decide automatically) and it always reduces to 10 apparently. However this log seems to show that 10 threads is incompatible with this project.
Code: Select all
23:11:46:WU02:FS00:Starting
23:11:46:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 02 -suffix 01 -version 706 -lifeline 2669 -checkpoint 15 -np 11
23:11:46:WU02:FS00:Started FahCore on PID 26940
23:11:46:WU02:FS00:Core PID:26944
23:11:46:WU02:FS00:FahCore 0xa7 started
23:11:46:WU02:FS00:0xa7:*********************** Log Started 2020-06-09T23:11:46Z ***********************
23:11:46:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
23:11:46:WU02:FS00:0xa7: Type: 0xa7
23:11:46:WU02:FS00:0xa7: Core: Gromacs
23:11:46:WU02:FS00:0xa7: Args: -dir 02 -suffix 01 -version 706 -lifeline 26940 -checkpoint 15 -np
23:11:46:WU02:FS00:0xa7: 11
23:11:46:WU02:FS00:0xa7:************************************ CBang *************************************
23:11:46:WU02:FS00:0xa7: Date: Nov 5 2019
23:11:46:WU02:FS00:0xa7: Time: 06:06:57
23:11:46:WU02:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
23:11:46:WU02:FS00:0xa7: Branch: master
23:11:46:WU02:FS00:0xa7: Compiler: GNU 8.3.0
23:11:46:WU02:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
23:11:46:WU02:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
23:11:46:WU02:FS00:0xa7: Bits: 64
23:11:46:WU02:FS00:0xa7: Mode: Release
23:11:46:WU02:FS00:0xa7:************************************ System ************************************
23:11:46:WU02:FS00:0xa7: CPU: AMD Ryzen 5 1600 Six-Core Processor
23:11:46:WU02:FS00:0xa7: CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
23:11:46:WU02:FS00:0xa7: CPUs: 12
23:11:46:WU02:FS00:0xa7: Memory: 15.66GiB
23:11:46:WU02:FS00:0xa7:Free Memory: 1.27GiB
23:11:46:WU02:FS00:0xa7: Threads: POSIX_THREADS
23:11:46:WU02:FS00:0xa7: OS Version: 4.15
23:11:46:WU02:FS00:0xa7:Has Battery: false
23:11:46:WU02:FS00:0xa7: On Battery: false
23:11:46:WU02:FS00:0xa7: UTC Offset: -4
23:11:46:WU02:FS00:0xa7: PID: 26944
23:11:46:WU02:FS00:0xa7: CWD: /var/lib/fahclient/work
23:11:46:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
23:11:46:WU02:FS00:0xa7: Version: 0.0.18
23:11:46:WU02:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:11:46:WU02:FS00:0xa7: Copyright: 2019 foldingathome.org
23:11:46:WU02:FS00:0xa7: Homepage: https://foldingathome.org/
23:11:46:WU02:FS00:0xa7: Date: Nov 5 2019
23:11:46:WU02:FS00:0xa7: Time: 06:13:26
23:11:46:WU02:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
23:11:46:WU02:FS00:0xa7: Branch: master
23:11:46:WU02:FS00:0xa7: Compiler: GNU 8.3.0
23:11:46:WU02:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
23:11:46:WU02:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
23:11:46:WU02:FS00:0xa7: Bits: 64
23:11:46:WU02:FS00:0xa7: Mode: Release
23:11:46:WU02:FS00:0xa7:************************************ Build *************************************
23:11:46:WU02:FS00:0xa7: SIMD: avx_256
23:11:46:WU02:FS00:0xa7:********************************************************************************
23:11:46:WU02:FS00:0xa7:Project: 14524 (Run 916, Clone 1, Gen 20)
23:11:46:WU02:FS00:0xa7:Unit: 0x0000002180fccb0a5e459b90f96c8f13
23:11:46:WU02:FS00:0xa7:Reading tar file core.xml
23:11:46:WU02:FS00:0xa7:Reading tar file frame20.tpr
23:11:46:WU02:FS00:0xa7:Digital signatures verified
23:11:46:WU02:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
23:11:46:WU02:FS00:0xa7:Calling: mdrun -s frame20.tpr -o frame20.trr -x frame20.xtc -cpt 15 -nt 10
23:11:46:WU02:FS00:0xa7:Steps: first=5000000 total=250000
23:11:46:WU02:FS00:0xa7:ERROR:
23:11:46:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
23:11:46:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
23:11:46:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
23:11:46:WU02:FS00:0xa7:ERROR:
23:11:46:WU02:FS00:0xa7:ERROR:Fatal error:
23:11:46:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
23:11:46:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
23:11:46:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
23:11:46:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
23:11:46:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
23:11:46:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
23:11:51:WU02:FS00:0xa7:WARNING:Unexpected exit() call
23:11:51:WU02:FS00:0xa7:WARNING:Unexpected exit from science code
23:11:51:WU02:FS00:0xa7:Saving result file ../logfile_01.txt
23:11:51:WU02:FS00:0xa7:Saving result file md.log
23:11:51:WU02:FS00:0xa7:Saving result file science.log
23:11:51:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)