Decomposition problem - continuous loop
Posted: Wed Jun 17, 2020 2:11 pm
Yesterday I opened FAHControl and it appeared that nothing was being done. Looking at the log file showed the fatal decomposition error. I've only been using FAH for a couple of months (and it's been great up to now) so I was a little unsure of what to do. I started digging and came across troubleshooting information that indicated I should change the CPU value. I did this (from -1 to 32) and it appeared to have no effect. I also exited and restarted FAH (and even rebooted) but it simply appeared to pickup again from where it left off. My issue at this point is how to terminate the current loop where it seems to be stuck. System and a repeating section of the log information are shown below. Thanks for any help.
Dave
System Information:
O.S.: Linux Mint v19.3
Hardware: Home Brew
Processor: Intel Core i9-7900X (20 core)
Motherboard: ASUS PRIME X299-DELUXE
Memory: 64GB
Storage: 6T (RAID 10)
Dave
System Information:
O.S.: Linux Mint v19.3
Hardware: Home Brew
Processor: Intel Core i9-7900X (20 core)
Motherboard: ASUS PRIME X299-DELUXE
Memory: 64GB
Storage: 6T (RAID 10)
Code: Select all
08:19:59:WU02:FS00:Starting
08:19:59:WU02:FS00:Removing old file 'work/02/logfile_01-20200617-074758.txt'
08:19:59:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 02 -suffix 01 -version 706 -lifeline 2969 -checkpoint 15 -np 20
08:19:59:WU02:FS00:Started FahCore on PID 13055
08:19:59:WU02:FS00:Core PID:13059
08:19:59:WU02:FS00:FahCore 0xa7 started
08:20:00:WU02:FS00:0xa7:*********************** Log Started 2020-06-17T08:19:59Z ***********************
08:20:00:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
08:20:00:WU02:FS00:0xa7: Type: 0xa7
08:20:00:WU02:FS00:0xa7: Core: Gromacs
08:20:00:WU02:FS00:0xa7: Args: -dir 02 -suffix 01 -version 706 -lifeline 13055 -checkpoint 15 -np
08:20:00:WU02:FS00:0xa7: 20
08:20:00:WU02:FS00:0xa7:************************************ CBang *************************************
08:20:00:WU02:FS00:0xa7: Date: Nov 5 2019
08:20:00:WU02:FS00:0xa7: Time: 06:06:57
08:20:00:WU02:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
08:20:00:WU02:FS00:0xa7: Branch: master
08:20:00:WU02:FS00:0xa7: Compiler: GNU 8.3.0
08:20:00:WU02:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
08:20:00:WU02:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
08:20:00:WU02:FS00:0xa7: Bits: 64
08:20:00:WU02:FS00:0xa7: Mode: Release
08:20:00:WU02:FS00:0xa7:************************************ System ************************************
08:20:00:WU02:FS00:0xa7: CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
08:20:00:WU02:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 85 Stepping 4
08:20:00:WU02:FS00:0xa7: CPUs: 20
08:20:00:WU02:FS00:0xa7: Memory: 62.59GiB
08:20:00:WU02:FS00:0xa7:Free Memory: 24.07GiB
08:20:00:WU02:FS00:0xa7: Threads: POSIX_THREADS
08:20:00:WU02:FS00:0xa7: OS Version: 4.15
08:20:00:WU02:FS00:0xa7:Has Battery: false
08:20:00:WU02:FS00:0xa7: On Battery: false
08:20:00:WU02:FS00:0xa7: UTC Offset: -5
08:20:00:WU02:FS00:0xa7: PID: 13059
08:20:00:WU02:FS00:0xa7: CWD: /var/lib/fahclient/work
08:20:00:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
08:20:00:WU02:FS00:0xa7: Version: 0.0.18
08:20:00:WU02:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:20:00:WU02:FS00:0xa7: Copyright: 2019 foldingathome.org
08:20:00:WU02:FS00:0xa7: Homepage: https://foldingathome.org/
08:20:00:WU02:FS00:0xa7: Date: Nov 5 2019
08:20:00:WU02:FS00:0xa7: Time: 06:13:26
08:20:00:WU02:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
08:20:00:WU02:FS00:0xa7: Branch: master
08:20:00:WU02:FS00:0xa7: Compiler: GNU 8.3.0
08:20:00:WU02:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
08:20:00:WU02:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
08:20:00:WU02:FS00:0xa7: Bits: 64
08:20:00:WU02:FS00:0xa7: Mode: Release
08:20:00:WU02:FS00:0xa7:************************************ Build *************************************
08:20:00:WU02:FS00:0xa7: SIMD: avx_256
08:20:00:WU02:FS00:0xa7:********************************************************************************
08:20:00:WU02:FS00:0xa7:Project: 14524 (Run 589, Clone 2, Gen 34)
08:20:00:WU02:FS00:0xa7:Unit: 0x0000003580fccb0a5e459b9f0ada19d2
08:20:00:WU02:FS00:0xa7:Reading tar file core.xml
08:20:00:WU02:FS00:0xa7:Reading tar file frame34.tpr
08:20:00:WU02:FS00:0xa7:Digital signatures verified
08:20:00:WU02:FS00:0xa7:Calling: mdrun -s frame34.tpr -o frame34.trr -x frame34.xtc -cpt 15 -nt 20
08:20:00:WU02:FS00:0xa7:Steps: first=8500000 total=250000
08:20:00:WU02:FS00:0xa7:ERROR:
08:20:00:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
08:20:00:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
08:20:00:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
08:20:00:WU02:FS00:0xa7:ERROR:
08:20:00:WU02:FS00:0xa7:ERROR:Fatal error:
08:20:00:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
08:20:00:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
08:20:00:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
08:20:00:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
08:20:00:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
08:20:00:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
08:20:04:WU02:FS00:0xa7:WARNING:Unexpected exit() call
08:20:04:WU02:FS00:0xa7:WARNING:Unexpected exit from science code
08:20:04:WU02:FS00:0xa7:Saving result file ../logfile_01.txt
08:20:04:WU02:FS00:0xa7:Saving result file md.log
08:20:05:WU02:FS00:0xa7:Saving result file science.log
08:20:05:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)