Project: 16404 (Run 0, Clone 5568, Gen 12)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Tohya
Posts: 48
Joined: Thu Feb 07, 2008 12:41 am

Project: 16404 (Run 0, Clone 5568, Gen 12)

Post by Tohya »

Saw a bit of an oddity in the log file. It finished folding, but threw a domain decomposition error after the AS lowered the core count by 1.

Code: Select all

00:50:55:WU03:FS00:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:16404 run:0 clone:5568 gen:12 core:0xa7 unit:0x0000000ea8f5c67d5e7ed04525ea8433
00:50:55:WU03:FS00:Starting
00:50:55:WARNING:WU03:FS00:AS lowered CPUs from 20 to 19
00:50:55:WU03:FS00:Running FahCore: D:\FAHClient/FAHCoreWrapper.exe D:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 03 -suffix 01 -version 705 -lifeline 10152 -checkpoint 15 -np 19
00:50:55:WU03:FS00:Started FahCore on PID 49520
00:50:55:WU03:FS00:Core PID:43472
00:50:55:WU03:FS00:FahCore 0xa7 started
00:50:56:WU03:FS00:0xa7:*********************** Log Started 2020-04-01T00:50:55Z ***********************
00:50:56:WU03:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
00:50:56:WU03:FS00:0xa7:       Type: 0xa7
00:50:56:WU03:FS00:0xa7:       Core: Gromacs
00:50:56:WU03:FS00:0xa7:       Args: -dir 03 -suffix 01 -version 705 -lifeline 49520 -checkpoint 15 -np
00:50:56:WU03:FS00:0xa7:             19
00:50:56:WU03:FS00:0xa7:************************************ CBang *************************************
00:50:56:WU03:FS00:0xa7:       Date: Oct 26 2019
00:50:56:WU03:FS00:0xa7:       Time: 01:38:25
00:50:56:WU03:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
00:50:56:WU03:FS00:0xa7:     Branch: master
00:50:56:WU03:FS00:0xa7:   Compiler: Visual C++ 2008
00:50:56:WU03:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:50:56:WU03:FS00:0xa7:   Platform: win32 10
00:50:56:WU03:FS00:0xa7:       Bits: 64
00:50:56:WU03:FS00:0xa7:       Mode: Release
00:50:56:WU03:FS00:0xa7:************************************ System ************************************
00:50:56:WU03:FS00:0xa7:        CPU: AMD Ryzen 9 3900X 12-Core Processor
00:50:56:WU03:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
00:50:56:WU03:FS00:0xa7:       CPUs: 24
00:50:56:WU03:FS00:0xa7:     Memory: 31.95GiB
00:50:56:WU03:FS00:0xa7:Free Memory: 22.32GiB
00:50:56:WU03:FS00:0xa7:    Threads: WINDOWS_THREADS
00:50:56:WU03:FS00:0xa7: OS Version: 6.2
00:50:56:WU03:FS00:0xa7:Has Battery: false
00:50:56:WU03:FS00:0xa7: On Battery: false
00:50:56:WU03:FS00:0xa7: UTC Offset: -5
00:50:56:WU03:FS00:0xa7:        PID: 43472
00:50:56:WU03:FS00:0xa7:        CWD: D:\FAHClient\work
00:50:56:WU03:FS00:0xa7:******************************** Build - libFAH ********************************
00:50:56:WU03:FS00:0xa7:    Version: 0.0.18
00:50:56:WU03:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:50:56:WU03:FS00:0xa7:  Copyright: 2019 foldingathome.org
00:50:56:WU03:FS00:0xa7:   Homepage: https://foldingathome.org/
00:50:56:WU03:FS00:0xa7:       Date: Oct 26 2019
00:50:56:WU03:FS00:0xa7:       Time: 01:52:30
00:50:56:WU03:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
00:50:56:WU03:FS00:0xa7:     Branch: master
00:50:56:WU03:FS00:0xa7:   Compiler: Visual C++ 2008
00:50:56:WU03:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:50:56:WU03:FS00:0xa7:   Platform: win32 10
00:50:56:WU03:FS00:0xa7:       Bits: 64
00:50:56:WU03:FS00:0xa7:       Mode: Release
00:50:56:WU03:FS00:0xa7:************************************ Build *************************************
00:50:56:WU03:FS00:0xa7:       SIMD: avx_256
00:50:56:WU03:FS00:0xa7:********************************************************************************
00:50:56:WU03:FS00:0xa7:Project: 16404 (Run 0, Clone 5568, Gen 12)
00:50:56:WU03:FS00:0xa7:Unit: 0x0000000ea8f5c67d5e7ed04525ea8433
00:50:56:WU03:FS00:0xa7:Reading tar file core.xml
00:50:56:WU03:FS00:0xa7:Reading tar file frame12.tpr
00:50:56:WU03:FS00:0xa7:Digital signatures verified
00:50:56:WU03:FS00:0xa7:Reducing thread count from 19 to 18 to avoid domain decomposition by a prime number > 3
00:50:56:WU03:FS00:0xa7:Calling: mdrun -s frame12.tpr -o frame12.trr -x frame12.xtc -cpt 15 -nt 18
00:50:56:WU03:FS00:0xa7:Steps: first=6000000 total=500000
00:50:56:WU03:FS00:0xa7:Completed 1 out of 500000 steps (0%)

01:28:09:WU03:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
01:28:10:WU03:FS00:0xa7:Saving result file ..\logfile_01.txt
01:28:10:WU03:FS00:0xa7:Saving result file frame12.trr
01:28:10:WU03:FS00:0xa7:Saving result file frame12.xtc
01:28:10:WU03:FS00:0xa7:Saving result file md.log
01:28:10:WU03:FS00:0xa7:Saving result file science.log
01:28:10:WU03:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
01:28:10:WU03:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
01:28:10:WU03:FS00:Sending unit results: id:03 state:SEND error:NO_ERROR project:16404 run:0 clone:5568 gen:12 core:0xa7 unit:0x0000000ea8f5c67d5e7ed04525ea8433
Joe_H
Site Admin
Posts: 8226
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Project: 16404 (Run 0, Clone 5568, Gen 12)

Post by Joe_H »

Perfectly normal. The CPU folding core does not work well with thread counts that are multiples of "large" prime numbers. In practice, 7 and greater is "large". For some projects 5 is "large", you can see earlier in the log where the AS reduced a request for 20 threads to 19. Once the core started processing, it reduced that to 18.

What the decomposition is doing is slicing up the three dimensional space the protein and solvent occupies. So a prime number of 19 could only turn into 19 thin slices. 18 can be factored into 2x3x3, so slice the space into thirds on two dimensions, and then into half in the third dimension.
Image
seesturm
Posts: 4
Joined: Sun Mar 29, 2020 8:50 am

Re: Project: 16404 (Run 0, Clone 5568, Gen 12)

Post by seesturm »

16404 is also broken for me. I'm using default config (cpu_cores=-1):

Code: Select all

15:54:12:WU03:FS00:Starting
15:54:12:WU03:FS00:Running FahCore: /home/ubuntu/FAHCoreWrapper /home/ubuntu/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 03 -suffix 01 -version 705 -lifeline 4172 -checkpoint 15 -np 29
15:54:12:WU03:FS00:Started FahCore on PID 79596
15:54:12:WU03:FS00:Core PID:79600
15:54:12:WU03:FS00:FahCore 0xa7 started
15:54:12:WU03:FS00:0xa7:*********************** Log Started 2020-04-02T15:54:12Z ***********************
15:54:12:WU03:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
15:54:12:WU03:FS00:0xa7:       Type: 0xa7
15:54:12:WU03:FS00:0xa7:       Core: Gromacs
15:54:12:WU03:FS00:0xa7:       Args: -dir 03 -suffix 01 -version 705 -lifeline 79596 -checkpoint 15 -np
15:54:12:WU03:FS00:0xa7:             29
15:54:12:WU03:FS00:0xa7:************************************ CBang *************************************
15:54:12:WU03:FS00:0xa7:       Date: Nov 5 2019
15:54:12:WU03:FS00:0xa7:       Time: 06:06:57
15:54:12:WU03:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
15:54:12:WU03:FS00:0xa7:     Branch: master
15:54:12:WU03:FS00:0xa7:   Compiler: GNU 8.3.0
15:54:12:WU03:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
15:54:12:WU03:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
15:54:12:WU03:FS00:0xa7:       Bits: 64
15:54:12:WU03:FS00:0xa7:       Mode: Release
15:54:12:WU03:FS00:0xa7:************************************ System ************************************
15:54:12:WU03:FS00:0xa7:        CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
15:54:12:WU03:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
15:54:12:WU03:FS00:0xa7:       CPUs: 32
15:54:12:WU03:FS00:0xa7:     Memory: 62.84GiB
15:54:12:WU03:FS00:0xa7:Free Memory: 29.54GiB
15:54:12:WU03:FS00:0xa7:    Threads: POSIX_THREADS
15:54:12:WU03:FS00:0xa7: OS Version: 5.5
15:54:12:WU03:FS00:0xa7:Has Battery: false
15:54:12:WU03:FS00:0xa7: On Battery: false
15:54:12:WU03:FS00:0xa7: UTC Offset: 0
15:54:12:WU03:FS00:0xa7:        PID: 79600
15:54:12:WU03:FS00:0xa7:        CWD: /home/ubuntu/work
15:54:12:WU03:FS00:0xa7:******************************** Build - libFAH ********************************
15:54:12:WU03:FS00:0xa7:    Version: 0.0.18
15:54:12:WU03:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:54:12:WU03:FS00:0xa7:  Copyright: 2019 foldingathome.org
15:54:12:WU03:FS00:0xa7:   Homepage: https://foldingathome.org/
15:54:12:WU03:FS00:0xa7:       Date: Nov 5 2019
15:54:12:WU03:FS00:0xa7:       Time: 06:13:26
15:54:12:WU03:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
15:54:12:WU03:FS00:0xa7:     Branch: master
15:54:12:WU03:FS00:0xa7:   Compiler: GNU 8.3.0
15:54:12:WU03:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
15:54:12:WU03:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
15:54:12:WU03:FS00:0xa7:       Bits: 64
15:54:12:WU03:FS00:0xa7:       Mode: Release
15:54:12:WU03:FS00:0xa7:************************************ Build *************************************
15:54:12:WU03:FS00:0xa7:       SIMD: avx_256
15:54:12:WU03:FS00:0xa7:********************************************************************************
15:54:12:WU03:FS00:0xa7:Project: 16404 (Run 0, Clone 6308, Gen 35)
15:54:12:WU03:FS00:0xa7:Unit: 0x00000024a8f5c67d5e7ed03d5cf0d364
15:54:12:WU03:FS00:0xa7:Reading tar file core.xml
15:54:12:WU03:FS00:0xa7:Reading tar file frame35.tpr
15:54:12:WU03:FS00:0xa7:Digital signatures verified
15:54:12:WU03:FS00:0xa7:Reducing thread count from 29 to 28 to avoid domain decomposition by a prime number > 3
15:54:12:WU03:FS00:0xa7:Calling: mdrun -s frame35.tpr -o frame35.trr -x frame35.xtc -cpt 15 -nt 28
15:54:12:WU03:FS00:0xa7:Steps: first=17500000 total=500000
15:54:12:WU03:FS00:0xa7:ERROR:
15:54:12:WU03:FS00:0xa7:ERROR:-------------------------------------------------------
15:54:12:WU03:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
15:54:12:WU03:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
15:54:12:WU03:FS00:0xa7:ERROR:
15:54:12:WU03:FS00:0xa7:ERROR:Fatal error:
15:54:12:WU03:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
15:54:12:WU03:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
15:54:12:WU03:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
15:54:12:WU03:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
15:54:12:WU03:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
15:54:12:WU03:FS00:0xa7:ERROR:-------------------------------------------------------
15:54:17:WU03:FS00:0xa7:WARNING:Unexpected exit() call
15:54:17:WU03:FS00:0xa7:WARNING:Unexpected exit from science code
15:54:17:WU03:FS00:0xa7:Saving result file ../logfile_01.txt
15:54:17:WU03:FS00:0xa7:Saving result file md.log
15:54:17:WU03:FS00:0xa7:Saving result file science.log
15:54:17:WU03:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project: 16404 (Run 0, Clone 5568, Gen 12)

Post by Neil-B »

The default -1 (to allow CPU a core) is from a time when core counts were smaller - it stepped your 29 down to 28 to try and avoid but hit a multiple of 7 :( … please see very comprehensive explanation of this type of issue in middle of this thread viewtopic.php?f=96&t=33701&p=321282#p321282

Setting that slot to 27 cores might avoid future issues.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply