WU 14572 INTERRUPTED
Posted: Tue Mar 31, 2020 7:28 pm
				
				Client is frequently blocked by faulty WU. This time it is from Project 14572. 
The faulty project is blocking CPU from useful (folding-at-home) work. FAH client does not give up on execution. How is it possible to block faulty projects?
			The faulty project is blocking CPU from useful (folding-at-home) work. FAH client does not give up on execution. How is it possible to block faulty projects?
Code: Select all
19:22:32:WU03:FS00:Starting
19:22:32:WU03:FS00:Removing old file './work/03/logfile_01-20200331-185031.txt'
19:22:32:WU03:FS00:Running FahCore: /home/ubuntu/FAHCoreWrapper /home/ubuntu/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 03 -suffix 01 -version 705 -lifeline 4172 -checkpoint 15 -np 29
19:22:32:WU03:FS00:Started FahCore on PID 75482
19:22:32:WU03:FS00:Core PID:75486
19:22:32:WU03:FS00:FahCore 0xa7 started
19:22:32:WU03:FS00:0xa7:*********************** Log Started 2020-03-31T19:22:32Z ***********************
19:22:32:WU03:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
19:22:32:WU03:FS00:0xa7:       Type: 0xa7
19:22:32:WU03:FS00:0xa7:       Core: Gromacs
19:22:32:WU03:FS00:0xa7:       Args: -dir 03 -suffix 01 -version 705 -lifeline 75482 -checkpoint 15 -np
19:22:32:WU03:FS00:0xa7:             29
19:22:32:WU03:FS00:0xa7:************************************ CBang *************************************
19:22:32:WU03:FS00:0xa7:       Date: Nov 5 2019
19:22:32:WU03:FS00:0xa7:       Time: 06:06:57
19:22:32:WU03:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
19:22:32:WU03:FS00:0xa7:     Branch: master
19:22:32:WU03:FS00:0xa7:   Compiler: GNU 8.3.0
19:22:32:WU03:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
19:22:32:WU03:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
19:22:32:WU03:FS00:0xa7:       Bits: 64
19:22:32:WU03:FS00:0xa7:       Mode: Release
19:22:32:WU03:FS00:0xa7:************************************ System ************************************
19:22:32:WU03:FS00:0xa7:        CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
19:22:32:WU03:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
19:22:32:WU03:FS00:0xa7:       CPUs: 32
19:22:32:WU03:FS00:0xa7:     Memory: 62.84GiB
19:22:32:WU03:FS00:0xa7:Free Memory: 31.33GiB
19:22:32:WU03:FS00:0xa7:    Threads: POSIX_THREADS
19:22:32:WU03:FS00:0xa7: OS Version: 5.5
19:22:32:WU03:FS00:0xa7:Has Battery: false
19:22:32:WU03:FS00:0xa7: On Battery: false
19:22:32:WU03:FS00:0xa7: UTC Offset: 0
19:22:32:WU03:FS00:0xa7:        PID: 75486
19:22:32:WU03:FS00:0xa7:        CWD: /home/ubuntu/work
19:22:32:WU03:FS00:0xa7:******************************** Build - libFAH ********************************
19:22:32:WU03:FS00:0xa7:    Version: 0.0.18
19:22:32:WU03:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:22:32:WU03:FS00:0xa7:  Copyright: 2019 foldingathome.org
19:22:32:WU03:FS00:0xa7:   Homepage: https://foldingathome.org/
19:22:32:WU03:FS00:0xa7:       Date: Nov 5 2019
19:22:32:WU03:FS00:0xa7:       Time: 06:13:26
19:22:32:WU03:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
19:22:32:WU03:FS00:0xa7:     Branch: master
19:22:32:WU03:FS00:0xa7:   Compiler: GNU 8.3.0
19:22:32:WU03:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
19:22:32:WU03:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
19:22:32:WU03:FS00:0xa7:       Bits: 64
19:22:32:WU03:FS00:0xa7:       Mode: Release
19:22:32:WU03:FS00:0xa7:************************************ Build *************************************
19:22:32:WU03:FS00:0xa7:       SIMD: avx_256
19:22:32:WU03:FS00:0xa7:********************************************************************************
19:22:32:WU03:FS00:0xa7:Project: 14572 (Run 0, Clone 1661, Gen 7)
19:22:32:WU03:FS00:0xa7:Unit: 0x0000000d287234c95e792c11c06e5a31
19:22:32:WU03:FS00:0xa7:Reading tar file core.xml
19:22:32:WU03:FS00:0xa7:Reading tar file frame7.tpr
19:22:32:WU03:FS00:0xa7:Digital signatures verified
19:22:32:WU03:FS00:0xa7:Reducing thread count from 29 to 28 to avoid domain decomposition by a prime number > 3
19:22:32:WU03:FS00:0xa7:Calling: mdrun -s frame7.tpr -o frame7.trr -x frame7.xtc -cpt 15 -nt 28
19:22:32:WU03:FS00:0xa7:Steps: first=3500000 total=500000
19:22:32:WU03:FS00:0xa7:ERROR:
19:22:32:WU03:FS00:0xa7:ERROR:-------------------------------------------------------
19:22:32:WU03:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
19:22:32:WU03:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
19:22:32:WU03:FS00:0xa7:ERROR:
19:22:32:WU03:FS00:0xa7:ERROR:Fatal error:
19:22:32:WU03:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
19:22:32:WU03:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
19:22:32:WU03:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
19:22:32:WU03:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
19:22:32:WU03:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
19:22:32:WU03:FS00:0xa7:ERROR:-------------------------------------------------------
19:22:37:WU03:FS00:0xa7:WARNING:Unexpected exit() call
19:22:37:WU03:FS00:0xa7:WARNING:Unexpected exit from science code
19:22:37:WU03:FS00:0xa7:Saving result file ../logfile_01.txt
19:22:37:WU03:FS00:0xa7:Saving result file md.log
19:22:37:WU03:FS00:0xa7:Saving result file science.log
19:22:37:WU03:FS00:FahCore returned: INTERRUPTED (102 = 0x66)