We are running F@H on our cluster at the Fabrique du Loch - FabLab, Auray, France, and we have an issue with one of our nodes.
While slim nodes (16 cores/32Gb ram, I think Charmm is running on these) run ok, our SMP node (64/32 cores/1Tb ram) faces issues with Gromacs parameters:
Code: Select all
14:28:54:WU00:FS00:Starting
14:28:54:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /root/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 5606 -checkpoint 15 -np 63
14:28:54:WU00:FS00:Started FahCore on PID 6422
14:28:54:WU00:FS00:Core PID:6426
14:28:54:WU00:FS00:FahCore 0xa7 started
14:28:54:WU00:FS00:0xa7:*********************** Log Started 2020-04-10T14:28:54Z ***********************
14:28:54:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
14:28:54:WU00:FS00:0xa7: Type: 0xa7
14:28:54:WU00:FS00:0xa7: Core: Gromacs
14:28:54:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 705 -lifeline 6422 -checkpoint 15 -np
14:28:54:WU00:FS00:0xa7: 63
14:28:54:WU00:FS00:0xa7:************************************ CBang *************************************
14:28:54:WU00:FS00:0xa7: Date: Nov 5 2019
14:28:54:WU00:FS00:0xa7: Time: 06:06:57
14:28:54:WU00:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
14:28:54:WU00:FS00:0xa7: Branch: master
14:28:54:WU00:FS00:0xa7: Compiler: GNU 8.3.0
14:28:54:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
14:28:54:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
14:28:54:WU00:FS00:0xa7: Bits: 64
14:28:54:WU00:FS00:0xa7: Mode: Release
14:28:54:WU00:FS00:0xa7:************************************ System ************************************
14:28:54:WU00:FS00:0xa7: CPU: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
14:28:54:WU00:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 45 Stepping 7
14:28:54:WU00:FS00:0xa7: CPUs: 64
14:28:54:WU00:FS00:0xa7: Memory: 1007.37GiB
14:28:54:WU00:FS00:0xa7:Free Memory: 1003.03GiB
14:28:54:WU00:FS00:0xa7: Threads: POSIX_THREADS
14:28:54:WU00:FS00:0xa7: OS Version: 4.18
14:28:54:WU00:FS00:0xa7:Has Battery: false
14:28:54:WU00:FS00:0xa7: On Battery: false
14:28:54:WU00:FS00:0xa7: UTC Offset: 2
14:28:54:WU00:FS00:0xa7: PID: 6426
14:28:54:WU00:FS00:0xa7: CWD: /root/work
14:28:54:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
14:28:54:WU00:FS00:0xa7: Version: 0.0.18
14:28:54:WU00:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:28:54:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
14:28:54:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
14:28:54:WU00:FS00:0xa7: Date: Nov 5 2019
14:28:54:WU00:FS00:0xa7: Time: 06:13:26
14:28:54:WU00:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
14:28:54:WU00:FS00:0xa7: Branch: master
14:28:54:WU00:FS00:0xa7: Compiler: GNU 8.3.0
14:28:54:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
14:28:54:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
14:28:54:WU00:FS00:0xa7: Bits: 64
14:28:54:WU00:FS00:0xa7: Mode: Release
14:28:54:WU00:FS00:0xa7:************************************ Build *************************************
14:28:54:WU00:FS00:0xa7: SIMD: avx_256
14:28:54:WU00:FS00:0xa7:********************************************************************************
14:28:54:WU00:FS00:0xa7:Project: 16422 (Run 1449, Clone 1, Gen 18)
14:28:54:WU00:FS00:0xa7:Unit: 0x0000001396880e6e5e8bdfe2e38d5b85
14:28:54:WU00:FS00:0xa7:Reading tar file core.xml
14:28:54:WU00:FS00:0xa7:Reading tar file frame18.tpr
14:28:54:WU00:FS00:0xa7:Digital signatures verified
14:28:54:WU00:FS00:0xa7:Calling: mdrun -s frame18.tpr -o frame18.trr -x frame18.xtc -cpt 15 -nt 63
14:28:54:WU00:FS00:0xa7:Steps: first=4500000 total=250000
14:28:54:WU00:FS00:0xa7:ERROR:
14:28:54:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
14:28:54:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
14:28:54:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
14:28:54:WU00:FS00:0xa7:ERROR:
14:28:54:WU00:FS00:0xa7:ERROR:Fatal error:
14:28:54:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 49 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
14:28:54:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
14:28:54:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
14:28:54:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
14:28:54:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
14:28:54:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
14:28:59:WU00:FS00:0xa7:WARNING:Unexpected exit() call
14:28:59:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
14:28:59:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
14:28:59:WU00:FS00:0xa7:Saving result file md.log
14:28:59:WU00:FS00:0xa7:Saving result file science.log
14:29:00:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
How would it be possible to solve this ?
With my best regards
Beuk