Moderators:  Site Moderators , FAHC Science Team 
			
		
		
			
				
																			
								Tohya 							 
									
		Posts:  48 Joined:  Thu Feb 07, 2008 12:41 am 
		
						
					
													
							
						
									
						Post 
					 
								by Tohya  Wed Apr 01, 2020 3:49 am 
			
			
			
			
			Saw a bit of an oddity in the log file. It finished folding, but threw a domain decomposition error after the AS lowered the core count by 1.
Code: Select all 
00:50:55:WU03:FS00:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:16404 run:0 clone:5568 gen:12 core:0xa7 unit:0x0000000ea8f5c67d5e7ed04525ea8433
00:50:55:WU03:FS00:Starting
00:50:55:WARNING:WU03:FS00:AS lowered CPUs from 20 to 19
00:50:55:WU03:FS00:Running FahCore: D:\FAHClient/FAHCoreWrapper.exe D:\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 03 -suffix 01 -version 705 -lifeline 10152 -checkpoint 15 -np 19
00:50:55:WU03:FS00:Started FahCore on PID 49520
00:50:55:WU03:FS00:Core PID:43472
00:50:55:WU03:FS00:FahCore 0xa7 started
00:50:56:WU03:FS00:0xa7:*********************** Log Started 2020-04-01T00:50:55Z ***********************
00:50:56:WU03:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
00:50:56:WU03:FS00:0xa7:       Type: 0xa7
00:50:56:WU03:FS00:0xa7:       Core: Gromacs
00:50:56:WU03:FS00:0xa7:       Args: -dir 03 -suffix 01 -version 705 -lifeline 49520 -checkpoint 15 -np
00:50:56:WU03:FS00:0xa7:             19
00:50:56:WU03:FS00:0xa7:************************************ CBang *************************************
00:50:56:WU03:FS00:0xa7:       Date: Oct 26 2019
00:50:56:WU03:FS00:0xa7:       Time: 01:38:25
00:50:56:WU03:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
00:50:56:WU03:FS00:0xa7:     Branch: master
00:50:56:WU03:FS00:0xa7:   Compiler: Visual C++ 2008
00:50:56:WU03:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:50:56:WU03:FS00:0xa7:   Platform: win32 10
00:50:56:WU03:FS00:0xa7:       Bits: 64
00:50:56:WU03:FS00:0xa7:       Mode: Release
00:50:56:WU03:FS00:0xa7:************************************ System ************************************
00:50:56:WU03:FS00:0xa7:        CPU: AMD Ryzen 9 3900X 12-Core Processor
00:50:56:WU03:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
00:50:56:WU03:FS00:0xa7:       CPUs: 24
00:50:56:WU03:FS00:0xa7:     Memory: 31.95GiB
00:50:56:WU03:FS00:0xa7:Free Memory: 22.32GiB
00:50:56:WU03:FS00:0xa7:    Threads: WINDOWS_THREADS
00:50:56:WU03:FS00:0xa7: OS Version: 6.2
00:50:56:WU03:FS00:0xa7:Has Battery: false
00:50:56:WU03:FS00:0xa7: On Battery: false
00:50:56:WU03:FS00:0xa7: UTC Offset: -5
00:50:56:WU03:FS00:0xa7:        PID: 43472
00:50:56:WU03:FS00:0xa7:        CWD: D:\FAHClient\work
00:50:56:WU03:FS00:0xa7:******************************** Build - libFAH ********************************
00:50:56:WU03:FS00:0xa7:    Version: 0.0.18
00:50:56:WU03:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:50:56:WU03:FS00:0xa7:  Copyright: 2019 foldingathome.org
00:50:56:WU03:FS00:0xa7:   Homepage: https://foldingathome.org/
00:50:56:WU03:FS00:0xa7:       Date: Oct 26 2019
00:50:56:WU03:FS00:0xa7:       Time: 01:52:30
00:50:56:WU03:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
00:50:56:WU03:FS00:0xa7:     Branch: master
00:50:56:WU03:FS00:0xa7:   Compiler: Visual C++ 2008
00:50:56:WU03:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:50:56:WU03:FS00:0xa7:   Platform: win32 10
00:50:56:WU03:FS00:0xa7:       Bits: 64
00:50:56:WU03:FS00:0xa7:       Mode: Release
00:50:56:WU03:FS00:0xa7:************************************ Build *************************************
00:50:56:WU03:FS00:0xa7:       SIMD: avx_256
00:50:56:WU03:FS00:0xa7:********************************************************************************
00:50:56:WU03:FS00:0xa7:Project: 16404 (Run 0, Clone 5568, Gen 12)
00:50:56:WU03:FS00:0xa7:Unit: 0x0000000ea8f5c67d5e7ed04525ea8433
00:50:56:WU03:FS00:0xa7:Reading tar file core.xml
00:50:56:WU03:FS00:0xa7:Reading tar file frame12.tpr
00:50:56:WU03:FS00:0xa7:Digital signatures verified
00:50:56:WU03:FS00:0xa7:Reducing thread count from 19 to 18 to avoid domain decomposition by a prime number > 3
00:50:56:WU03:FS00:0xa7:Calling: mdrun -s frame12.tpr -o frame12.trr -x frame12.xtc -cpt 15 -nt 18
00:50:56:WU03:FS00:0xa7:Steps: first=6000000 total=500000
00:50:56:WU03:FS00:0xa7:Completed 1 out of 500000 steps (0%)
01:28:09:WU03:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
01:28:10:WU03:FS00:0xa7:Saving result file ..\logfile_01.txt
01:28:10:WU03:FS00:0xa7:Saving result file frame12.trr
01:28:10:WU03:FS00:0xa7:Saving result file frame12.xtc
01:28:10:WU03:FS00:0xa7:Saving result file md.log
01:28:10:WU03:FS00:0xa7:Saving result file science.log
01:28:10:WU03:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
01:28:10:WU03:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
01:28:10:WU03:FS00:Sending unit results: id:03 state:SEND error:NO_ERROR project:16404 run:0 clone:5568 gen:12 core:0xa7 unit:0x0000000ea8f5c67d5e7ed04525ea8433 
		 
				
		
		 
	 
				
		
		
			
				
																			
								Joe_H 							 
						Site Admin 			
		Posts:  8226 Joined:  Tue Apr 21, 2009 4:41 pmHardware configuration:  Mac Studio M1 Max 32 GB smp6Location:  W. MA 
		
						
					
													
							
						
									
						Post 
					 
								by Joe_H  Wed Apr 01, 2020 7:13 am 
			
			
			
			
			Perfectly normal.  The CPU folding core does not work well with thread counts that are multiples of "large" prime numbers.  In practice, 7 and greater is "large".  For some projects 5 is "large", you can see earlier in the log where the AS reduced a request for 20 threads to 19.  Once the core started processing, it reduced that to 18.
			
			
									
						
										
						 
		 
				
		
		 
	 
				
		
		
			
				
																			
								seesturm 							 
									
		Posts:  4 Joined:  Sun Mar 29, 2020 8:50 am 
		
						
					
													
							
						
									
						Post 
					 
								by seesturm  Thu Apr 02, 2020 3:56 pm 
			
			
			
			
			16404 is also broken for me. I'm using default config (cpu_cores=-1):
Code: Select all 
15:54:12:WU03:FS00:Starting
15:54:12:WU03:FS00:Running FahCore: /home/ubuntu/FAHCoreWrapper /home/ubuntu/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 03 -suffix 01 -version 705 -lifeline 4172 -checkpoint 15 -np 29
15:54:12:WU03:FS00:Started FahCore on PID 79596
15:54:12:WU03:FS00:Core PID:79600
15:54:12:WU03:FS00:FahCore 0xa7 started
15:54:12:WU03:FS00:0xa7:*********************** Log Started 2020-04-02T15:54:12Z ***********************
15:54:12:WU03:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
15:54:12:WU03:FS00:0xa7:       Type: 0xa7
15:54:12:WU03:FS00:0xa7:       Core: Gromacs
15:54:12:WU03:FS00:0xa7:       Args: -dir 03 -suffix 01 -version 705 -lifeline 79596 -checkpoint 15 -np
15:54:12:WU03:FS00:0xa7:             29
15:54:12:WU03:FS00:0xa7:************************************ CBang *************************************
15:54:12:WU03:FS00:0xa7:       Date: Nov 5 2019
15:54:12:WU03:FS00:0xa7:       Time: 06:06:57
15:54:12:WU03:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
15:54:12:WU03:FS00:0xa7:     Branch: master
15:54:12:WU03:FS00:0xa7:   Compiler: GNU 8.3.0
15:54:12:WU03:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
15:54:12:WU03:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
15:54:12:WU03:FS00:0xa7:       Bits: 64
15:54:12:WU03:FS00:0xa7:       Mode: Release
15:54:12:WU03:FS00:0xa7:************************************ System ************************************
15:54:12:WU03:FS00:0xa7:        CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
15:54:12:WU03:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
15:54:12:WU03:FS00:0xa7:       CPUs: 32
15:54:12:WU03:FS00:0xa7:     Memory: 62.84GiB
15:54:12:WU03:FS00:0xa7:Free Memory: 29.54GiB
15:54:12:WU03:FS00:0xa7:    Threads: POSIX_THREADS
15:54:12:WU03:FS00:0xa7: OS Version: 5.5
15:54:12:WU03:FS00:0xa7:Has Battery: false
15:54:12:WU03:FS00:0xa7: On Battery: false
15:54:12:WU03:FS00:0xa7: UTC Offset: 0
15:54:12:WU03:FS00:0xa7:        PID: 79600
15:54:12:WU03:FS00:0xa7:        CWD: /home/ubuntu/work
15:54:12:WU03:FS00:0xa7:******************************** Build - libFAH ********************************
15:54:12:WU03:FS00:0xa7:    Version: 0.0.18
15:54:12:WU03:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:54:12:WU03:FS00:0xa7:  Copyright: 2019 foldingathome.org
15:54:12:WU03:FS00:0xa7:   Homepage: https://foldingathome.org/
15:54:12:WU03:FS00:0xa7:       Date: Nov 5 2019
15:54:12:WU03:FS00:0xa7:       Time: 06:13:26
15:54:12:WU03:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
15:54:12:WU03:FS00:0xa7:     Branch: master
15:54:12:WU03:FS00:0xa7:   Compiler: GNU 8.3.0
15:54:12:WU03:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
15:54:12:WU03:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
15:54:12:WU03:FS00:0xa7:       Bits: 64
15:54:12:WU03:FS00:0xa7:       Mode: Release
15:54:12:WU03:FS00:0xa7:************************************ Build *************************************
15:54:12:WU03:FS00:0xa7:       SIMD: avx_256
15:54:12:WU03:FS00:0xa7:********************************************************************************
15:54:12:WU03:FS00:0xa7:Project: 16404 (Run 0, Clone 6308, Gen 35)
15:54:12:WU03:FS00:0xa7:Unit: 0x00000024a8f5c67d5e7ed03d5cf0d364
15:54:12:WU03:FS00:0xa7:Reading tar file core.xml
15:54:12:WU03:FS00:0xa7:Reading tar file frame35.tpr
15:54:12:WU03:FS00:0xa7:Digital signatures verified
15:54:12:WU03:FS00:0xa7:Reducing thread count from 29 to 28 to avoid domain decomposition by a prime number > 3
15:54:12:WU03:FS00:0xa7:Calling: mdrun -s frame35.tpr -o frame35.trr -x frame35.xtc -cpt 15 -nt 28
15:54:12:WU03:FS00:0xa7:Steps: first=17500000 total=500000
15:54:12:WU03:FS00:0xa7:ERROR:
15:54:12:WU03:FS00:0xa7:ERROR:-------------------------------------------------------
15:54:12:WU03:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
15:54:12:WU03:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
15:54:12:WU03:FS00:0xa7:ERROR:
15:54:12:WU03:FS00:0xa7:ERROR:Fatal error:
15:54:12:WU03:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
15:54:12:WU03:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
15:54:12:WU03:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
15:54:12:WU03:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
15:54:12:WU03:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
15:54:12:WU03:FS00:0xa7:ERROR:-------------------------------------------------------
15:54:17:WU03:FS00:0xa7:WARNING:Unexpected exit() call
15:54:17:WU03:FS00:0xa7:WARNING:Unexpected exit from science code
15:54:17:WU03:FS00:0xa7:Saving result file ../logfile_01.txt
15:54:17:WU03:FS00:0xa7:Saving result file md.log
15:54:17:WU03:FS00:0xa7:Saving result file science.log
15:54:17:WU03:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
 
		 
				
		
		 
	 
				
		
		
			
				
																			
								Neil-B 							 
									
		Posts:  1996 Joined:  Sun Mar 22, 2020 5:52 pmHardware configuration:  1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21Location:  UK 
		
						
					
													
							
						
									
						Post 
					 
								by Neil-B  Thu Apr 02, 2020 5:10 pm 
			
			
			
			
			The default -1 (to allow CPU a core) is from a time when core counts were smaller - it stepped your 29 down to 28 to try and avoid but hit a multiple of 7 
 … please see very comprehensive explanation of this type of issue in middle of this thread 
viewtopic.php?f=96&t=33701&p=321282#p321282 
Setting that slot to 27 cores might avoid future issues.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420 Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M (Green/Bold = Active)