Fatal GROMACS - particles communicated to PME rank...

Moderators: Site Moderators, FAHC Science Team

mcouture
Posts: 1
Joined: Mon Apr 06, 2020 7:17 pm

Fatal GROMACS - particles communicated to PME rank...

Post by mcouture »

Never saw this before:

Last night I saw the following error:

10:45:44:WU00:FS01:0xa7:ERROR:
10:45:44:WU00:FS01:0xa7:ERROR:-------------------------------------------------------
10:45:44:WU00:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
10:45:44:WU00:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
10:45:44:WU00:FS01:0xa7:ERROR:
10:45:44:WU00:FS01:0xa7:ERROR:Fatal error:
10:45:44:WU00:FS01:0xa7:ERROR:5 particles communicated to PME rank 5 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
10:45:44:WU00:FS01:0xa7:ERROR:This usually means that your system is not well equilibrated.
10:45:44:WU00:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
10:45:44:WU00:FS01:0xa7:ERROR:website at xxx
10:45:44:WU00:FS01:0xa7:ERROR:-------------------------------------------------------
Joe_H
Site Admin
Posts: 7927
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GROMACS - Fatal error

Post by Joe_H »

A fairly standard error, but what WU did this error occur on?

If you could at least post the log section from where the WU was started until the point the error occurred, that would be useful for feedback to the researcher running the project.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GROMACS - Fatal error

Post by PantherX »

Welcome to the F@H Forum mcouture,

If you need some guidance on posting the log file, please see this topic: viewtopic.php?f=24&t=26036
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
strombergFs
Posts: 16
Joined: Mon Apr 06, 2020 9:14 am

Re: GROMACS - Fatal error

Post by strombergFs »

i have the same error with project Project: 14619 (Run 1962, Clone 2, Gen 54)
Unit: 0x0000003f2879986c5e888e1748339e04
10:57:02:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
10:57:02:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
10:57:02:WU00:FS00:0xa7:ERROR:
10:57:02:WU00:FS00:0xa7:ERROR:Fatal error:
10:57:02:WU00:FS00:0xa7:ERROR:1 particles communicated to PME rank 1 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
10:57:02:WU00:FS00:0xa7:ERROR:This usually means that your system is not well equilibrated.
10:57:02:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
10:57:02:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
10:57:02:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
10:57:07:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)


please, how i can remove that WU, so that my computer can fold again.....now it is wasting a lot of time, there is no progress. Please let me know what i should do.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: GROMACS - Fatal error

Post by Neil-B »

Please post log including the top 200Lines or so which have the system configuration and the failed WU from download to failure … This will allow guidance/assistance to be given … guidance on posting longs is linked from an earlier response
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
strombergFs
Posts: 16
Joined: Mon Apr 06, 2020 9:14 am

Re: GROMACS - Fatal error

Post by strombergFs »

Code: Select all

*********************** Log Started 2020-05-01T10:45:51Z ***********************
10:45:51:************************* Folding@home Client *************************
10:45:51:    Website: https://foldingathome.org/
10:45:51:  Copyright: (c) 2009-2018 foldingathome.org
10:45:51:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
10:45:51:       Args: --child --lifeline 1405 /etc/fahclient/config.xml --run-as
10:45:51:             fahclient --pid-file=/var/run/fahclient.pid --daemon
10:45:51:     Config: /etc/fahclient/config.xml
10:45:51:******************************** Build ********************************
10:45:51:    Version: 7.5.1
10:45:51:       Date: May 11 2018
10:45:51:       Time: 19:59:04
10:45:51: Repository: Git
10:45:51:   Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
10:45:51:     Branch: master
10:45:51:   Compiler: GNU 6.3.0 20170516
10:45:51:    Options: -std=gnu++98 -O3 -funroll-loops
10:45:51:   Platform: linux2 4.14.0-3-amd64
10:45:51:       Bits: 64
10:45:51:       Mode: Release
10:45:51:******************************* System ********************************
10:45:51:        CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
10:45:51:     CPU ID: GenuineIntel Family 6 Model 142 Stepping 10
10:45:51:       CPUs: 8
10:45:51:     Memory: 7.65GiB
10:45:51:Free Memory: 6.63GiB
10:45:51:    Threads: POSIX_THREADS
10:45:51: OS Version: 5.3
10:45:51:Has Battery: false
10:45:51: On Battery: false
10:45:51: UTC Offset: 2
10:45:51:        PID: 1407
10:45:51:        CWD: /var/lib/fahclient
10:45:51:         OS: Linux 5.3.0-51-generic x86_64
10:45:51:    OS Arch: AMD64
10:45:51:       GPUs: 0
10:45:51:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
10:45:51:             libcuda.so: cannot open shared object file: No such file or
10:45:51:             directory
10:45:51:     OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so':
10:45:51:             libOpenCL.so: cannot open shared object file: No such file or
10:45:51:             directory
10:45:51:***********************************************************************
10:45:51:<config>
10:45:51:  <!-- Client Control -->
10:45:51:  <fold-anon v='true'/>
10:45:51:
10:45:51:  <!-- Folding Slot Configuration -->
10:45:51:  <gpu v='false'/>
10:45:51:
10:45:51:  <!-- Network -->
10:45:51:  <proxy v=':8080'/>
10:45:51:
10:45:51:  <!-- Slot Control -->
10:45:51:  <power v='full'/>
10:45:51:
10:45:51:  <!-- User Information -->
10:45:51:  <passkey v='********************************'/>
10:45:51:  <team v='258728'/>
10:45:51:  <user v='chrisNUCi7'/>
10:45:51:
10:45:51:  <!-- Folding Slots -->
10:45:51:  <slot id='0' type='CPU'>
10:45:51:    <cpus v='4'/>
10:45:51:  </slot>
10:45:51:  <slot id='1' type='CPU'>
10:45:51:    <cpus v='4'/>
10:45:51:  </slot>
10:45:51:</config>
10:45:51:Switching to user fahclient
10:45:51:Trying to access database...
10:45:51:Successfully acquired database lock
10:45:51:Enabled folding slot 00: READY cpu:4
10:45:51:Enabled folding slot 01: READY cpu:4
10:45:51:WU02:FS01:Starting
10:45:51:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 02 -suffix 01 -version 705 -lifeline 1407 -checkpoint 15 -np 4
10:45:51:WU02:FS01:Started FahCore on PID 1416
10:45:51:WU02:FS01:Core PID:1420
10:45:51:WU02:FS01:FahCore 0xa7 started
10:45:51:WU00:FS00:Starting
10:45:51:WU00:FS00:Removing old file './work/00/logfile_01-20200501-101357.txt'
10:45:51:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1407 -checkpoint 15 -np 4
10:45:51:WU00:FS00:Started FahCore on PID 1424
10:45:51:WU00:FS00:Core PID:1428
10:45:51:WU00:FS00:FahCore 0xa7 started
10:45:52:WU02:FS01:0xa7:*********************** Log Started 2020-05-01T10:45:51Z ***********************
10:45:52:WU02:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
10:45:52:WU02:FS01:0xa7:       Type: 0xa7
10:45:52:WU02:FS01:0xa7:       Core: Gromacs
10:45:52:WU02:FS01:0xa7:       Args: -dir 02 -suffix 01 -version 705 -lifeline 1416 -checkpoint 15 -np 4
10:45:52:WU02:FS01:0xa7:************************************ CBang *************************************
10:45:52:WU02:FS01:0xa7:       Date: Nov 5 2019
10:45:52:WU02:FS01:0xa7:       Time: 06:06:57
10:45:52:WU02:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
10:45:52:WU02:FS01:0xa7:     Branch: master
10:45:52:WU02:FS01:0xa7:   Compiler: GNU 8.3.0
10:45:52:WU02:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
10:45:52:WU02:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
10:45:52:WU02:FS01:0xa7:       Bits: 64
10:45:52:WU02:FS01:0xa7:       Mode: Release
10:45:52:WU02:FS01:0xa7:************************************ System ************************************
10:45:52:WU02:FS01:0xa7:        CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
10:45:52:WU02:FS01:0xa7:     CPU ID: GenuineIntel Family 6 Model 142 Stepping 10
10:45:52:WU02:FS01:0xa7:       CPUs: 8
10:45:52:WU02:FS01:0xa7:     Memory: 7.65GiB
10:45:52:WU02:FS01:0xa7:Free Memory: 6.62GiB
10:45:52:WU02:FS01:0xa7:    Threads: POSIX_THREADS
10:45:52:WU02:FS01:0xa7: OS Version: 5.3
10:45:52:WU02:FS01:0xa7:Has Battery: false
10:45:52:WU02:FS01:0xa7: On Battery: false
10:45:52:WU02:FS01:0xa7: UTC Offset: 2
10:45:52:WU02:FS01:0xa7:        PID: 1420
10:45:52:WU02:FS01:0xa7:        CWD: /var/lib/fahclient/work
10:45:52:WU02:FS01:0xa7:******************************** Build - libFAH ********************************
10:45:52:WU02:FS01:0xa7:    Version: 0.0.18
10:45:52:WU02:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
10:45:52:WU02:FS01:0xa7:  Copyright: 2019 foldingathome.org
10:45:52:WU02:FS01:0xa7:   Homepage: https://foldingathome.org/
10:45:52:WU02:FS01:0xa7:       Date: Nov 5 2019
10:45:52:WU02:FS01:0xa7:       Time: 06:13:26
10:45:52:WU02:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
10:45:52:WU02:FS01:0xa7:     Branch: master
10:45:52:WU02:FS01:0xa7:   Compiler: GNU 8.3.0
10:45:52:WU02:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
10:45:52:WU02:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
10:45:52:WU02:FS01:0xa7:       Bits: 64
10:45:52:WU02:FS01:0xa7:       Mode: Release
10:45:52:WU02:FS01:0xa7:************************************ Build *************************************
10:45:52:WU02:FS01:0xa7:       SIMD: avx_256
10:45:52:WU02:FS01:0xa7:********************************************************************************
10:45:52:WU02:FS01:0xa7:Project: 16425 (Run 861, Clone 1, Gen 33)
10:45:52:WU02:FS01:0xa7:Unit: 0x00000029a8f5c67d5e914130a7ffb1ab
10:45:52:WU02:FS01:0xa7:Digital signatures verified
10:45:52:WU02:FS01:0xa7:Calling: mdrun -s frame33.tpr -o frame33.trr -x frame33.xtc -cpi state.cpt -cpt 15 -nt 4
10:45:52:WU02:FS01:0xa7:Steps: first=33000000 total=1000000
10:45:52:WU00:FS00:0xa7:*********************** Log Started 2020-05-01T10:45:51Z ***********************
10:45:52:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
10:45:52:WU00:FS00:0xa7:       Type: 0xa7
10:45:52:WU00:FS00:0xa7:       Core: Gromacs
10:45:52:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 1424 -checkpoint 15 -np 4
10:45:52:WU00:FS00:0xa7:************************************ CBang *************************************
10:45:52:WU00:FS00:0xa7:       Date: Nov 5 2019
10:45:52:WU00:FS00:0xa7:       Time: 06:06:57
10:45:52:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
10:45:52:WU00:FS00:0xa7:     Branch: master
10:45:52:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
10:45:52:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
10:45:52:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
10:45:52:WU00:FS00:0xa7:       Bits: 64
10:45:52:WU00:FS00:0xa7:       Mode: Release
10:45:52:WU00:FS00:0xa7:************************************ System ************************************
10:45:52:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
10:45:52:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 142 Stepping 10
10:45:52:WU00:FS00:0xa7:       CPUs: 8
10:45:52:WU00:FS00:0xa7:     Memory: 7.65GiB
10:45:52:WU00:FS00:0xa7:Free Memory: 6.61GiB
10:45:52:WU00:FS00:0xa7:    Threads: POSIX_THREADS
10:45:52:WU00:FS00:0xa7: OS Version: 5.3
10:45:52:WU00:FS00:0xa7:Has Battery: false
10:45:52:WU00:FS00:0xa7: On Battery: false
10:45:52:WU00:FS00:0xa7: UTC Offset: 2
10:45:52:WU00:FS00:0xa7:        PID: 1428
10:45:52:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
10:45:52:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
10:45:52:WU00:FS00:0xa7:    Version: 0.0.18
10:45:52:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
10:45:52:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
10:45:52:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
10:45:52:WU00:FS00:0xa7:       Date: Nov 5 2019
10:45:52:WU00:FS00:0xa7:       Time: 06:13:26
10:45:52:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
10:45:52:WU00:FS00:0xa7:     Branch: master
10:45:52:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
10:45:52:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
10:45:52:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
10:45:52:WU00:FS00:0xa7:       Bits: 64
10:45:52:WU00:FS00:0xa7:       Mode: Release
10:45:52:WU00:FS00:0xa7:************************************ Build *************************************
10:45:52:WU00:FS00:0xa7:       SIMD: avx_256
10:45:52:WU00:FS00:0xa7:********************************************************************************
10:45:52:WU00:FS00:0xa7:Project: 14619 (Run 1962, Clone 2, Gen 54)
10:45:52:WU00:FS00:0xa7:Unit: 0x0000003f2879986c5e888e1748339e04
10:45:52:WU00:FS00:0xa7:Digital signatures verified
10:45:52:WU00:FS00:0xa7:Calling: mdrun -s frame54.tpr -o frame54.trr -cpt 15 -nt 4
10:45:52:WU00:FS00:0xa7:Steps: first=0 total=250000
10:45:52:WU02:FS01:0xa7:Completed 343942 out of 1000000 steps (34%)
10:45:53:WU00:FS00:0xa7:Completed 1 out of 250000 steps (0%)
10:45:53:WU00:FS00:0xa7:ERROR:
10:45:53:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
10:45:53:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
10:45:53:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
10:45:53:WU00:FS00:0xa7:ERROR:
10:45:53:WU00:FS00:0xa7:ERROR:Fatal error:
10:45:53:WU00:FS00:0xa7:ERROR:1 particles communicated to PME rank 1 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
10:45:53:WU00:FS00:0xa7:ERROR:This usually means that your system is not well equilibrated.
10:45:53:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
10:45:53:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
10:45:53:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
10:45:59:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

strombergFs
Posts: 16
Joined: Mon Apr 06, 2020 9:14 am

Re: GROMACS - Fatal error

Post by strombergFs »

my question is still how this bad WU can be removed so that my computer can fold again valid WUs....i do not want to wait several days without folding because of this bad WU.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: GROMACS - Fatal error

Post by Neil-B »

Try pausing then reducing the cpu count or the slot by 1 the unpausing … if this works set the slot to "finish" then adjust the slot back to original setting before restarting folding … if that doesn't work it may need to be dumped but someone will probably need to let the researcher know first in case there is other information needed from the science logs.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: GROMACS - Fatal error

Post by _r2w_ben »

This looks like a bad work unit. Pause the slot, go to /var/lib/fahclient/work, delete /00/ and resume the slot.

Instead of running two slots with 4 CPUs, you might want to run a single slot with 8 CPUs. Faster completions are preferred and rewarded by the Quick Return Bonus.
vvoelz
Pande Group Member
Posts: 552
Joined: Sun Dec 02, 2007 8:07 pm
Location: Temple University, Philadelphia PA

Re: GROMACS - Fatal error

Post by vvoelz »

Hi strombergFs -- sorry you got stuck with a bum WU! I do indeed see the errors from P14619/RUN1962/CLONE2 on our server. Things are fine up until Gen 54, then errors are returned. I have STOPPED further WUs from this PROJ/RUN/CLONE. Luckily, the other CLONEs in this RUN look good.

thanks Neil-B and _r2w_ben for alerting me to this.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GROMACS - Fatal error

Post by bruce »

@VV

I'm going to assume this is NOT the case of a WU exploding, though I don't have enough information to be confident that's true. If it's NOT, then we need to pass this one by our GROMACS expert and see if there's a better setting for PME that can avoid this error. Is I know _r2w_ben knows how to read the GROMACS manual, but is he also our expert or do we have somebody else? (Maybe one of our European friends?)
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GROMACS - Fatal error

Post by PantherX »

strombergFs wrote:...
10:45:51: CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
10:45:51: CPU ID: GenuineIntel Family 6 Model 142 Stepping 10
10:45:51: CPUs: 8
...
10:45:51: <!-- Folding Slots -->
10:45:51: <slot id='0' type='CPU'>
10:45:51: <cpus v='4'/>
10:45:51: </slot>
10:45:51: <slot id='1' type='CPU'>
10:45:51: <cpus v='4'/>
10:45:51: </slot>
...
Just wondering what's the rational behind running 2 CPU Slots of 4 CPUs each? The optimum settings would be:
1 CPU Slot with 8 CPUs
1 CPU Slot with 6 CPUs if you need spare CPUs for other tasks

Generally speaking, the priority of CPU folding is very low and has low probability of impacting other applications that you might be running on your system. However, you can test the above settings out to see what suits your needs better :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
strombergFs
Posts: 16
Joined: Mon Apr 06, 2020 9:14 am

Re: GROMACS - Fatal error

Post by strombergFs »

Thank you very much for your help. I am happy that you confirm that it is a bad WU. I delete it now and continue folding.
strombergFs
Posts: 16
Joined: Mon Apr 06, 2020 9:14 am

Re: GROMACS - Fatal error

Post by strombergFs »

Regarding the setup of slots: there was a time where you needed to wait long time to get a new WU to fold. So i thought its good to have two slots so always one have something to fold. I never had both slots waiting for a WU so at least 4 cpus could fold. And i guess at the end 2x4 cpus fold the same than 1x8. I have not known about the bonus. Ok, i am now running the setup with 8 cpus. Thank you very much for the nice tip.
prcowley
Posts: 28
Joined: Thu Jan 03, 2019 11:03 pm
Hardware configuration: Op Sys: Linux Ubuntu Studio 24.04 LTS
Kernal: 6.8.0-45-lowlatency (64-bit)
Proc: 16x AMD Ryzen 7 7800X3D 8-Core Processor
Mem: 32 GB
GPU: NVIDIA GeForce RTX 4080 SUPER/PCIe/SSE2
Location: Gisborne, New Zealand
Contact:

Re: Fatal GROMACS - particles communicated to PME rank...

Post by prcowley »

Hi
I have been having the same problem with Project 17000 Run 1132.
I tried changing the number of processors from 0 (allocate as many as needed -1 for the GPU
Then I tried 14 CPUs and 12 CPUs but it made no difference.
Having discovered this is a bit of a known issue with this WU I deleted it and moved on to another but thought you might like to see the journal in case it is of some help.

Partial log file below

Code: Select all

00:52:51:<config>
00:52:51:  <!-- Folding Slot Configuration -->
00:52:51:  <client-type v='advanced'/>
00:52:51:
00:52:51:  <!-- Network -->
00:52:51:  <proxy v=':8080'/>
00:52:51:
00:52:51:  <!-- Slot Control -->
00:52:51:  <pause-on-battery v='false'/>
00:52:51:
00:52:51:  <!-- User Information -->
00:52:51:  <passkey v='*****'/>
00:52:51:  <team v='163'/>
00:52:51:  <user v='Pcowley'/>
00:52:51:
00:52:51:  <!-- Work Unit Control -->
00:52:51:  <next-unit-percentage v='98'/>
00:52:51:
00:52:51:  <!-- Folding Slots -->
00:52:51:  <slot id='0' type='GPU'>
00:52:51:    <opencl-index v='0'/>
00:52:51:  </slot>
00:52:51:  <slot id='1' type='CPU'>
00:52:51:    <cpus v='12'/>
00:52:51:    <paused v='True'/>
00:52:51:  </slot>
00:52:51:</config>
00:53:00:FS01:Unpaused
00:53:00:WU01:FS01:Starting
00:53:00:WARNING:WU01:FS01:Changed SMP threads from 13 to 12 this can cause some work units to fail
00:53:00:WU01:FS01:Removing old file 'work/01/logfile_01-20200508-001742.txt'
00:53:00:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 20237 -checkpoint 15 -np 12
00:53:00:WU01:FS01:Started FahCore on PID 2505
00:53:00:WU01:FS01:Core PID:2509
00:53:00:WU01:FS01:FahCore 0xa7 started
00:53:00:WU01:FS01:0xa7:*********************** Log Started 2020-05-08T00:53:00Z ***********************
00:53:00:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
00:53:00:WU01:FS01:0xa7:       Type: 0xa7
00:53:00:WU01:FS01:0xa7:       Core: Gromacs
00:53:00:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 2505 -checkpoint 15 -np
00:53:00:WU01:FS01:0xa7:             12
00:53:00:WU01:FS01:0xa7:************************************ CBang *************************************
00:53:00:WU01:FS01:0xa7:       Date: Nov 5 2019
00:53:00:WU01:FS01:0xa7:       Time: 06:06:57
00:53:00:WU01:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
00:53:00:WU01:FS01:0xa7:     Branch: master
00:53:00:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
00:53:00:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
00:53:00:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
00:53:00:WU01:FS01:0xa7:       Bits: 64
00:53:00:WU01:FS01:0xa7:       Mode: Release
00:53:00:WU01:FS01:0xa7:************************************ System ************************************
00:53:00:WU01:FS01:0xa7:        CPU: AMD Ryzen 7 1700 Eight-Core Processor
00:53:00:WU01:FS01:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
00:53:00:WU01:FS01:0xa7:       CPUs: 16
00:53:00:WU01:FS01:0xa7:     Memory: 31.41GiB
00:53:00:WU01:FS01:0xa7:Free Memory: 3.28GiB
00:53:00:WU01:FS01:0xa7:    Threads: POSIX_THREADS
00:53:00:WU01:FS01:0xa7: OS Version: 4.15
00:53:00:WU01:FS01:0xa7:Has Battery: false
00:53:00:WU01:FS01:0xa7: On Battery: false
00:53:00:WU01:FS01:0xa7: UTC Offset: 12
00:53:00:WU01:FS01:0xa7:        PID: 2509
00:53:00:WU01:FS01:0xa7:        CWD: /var/lib/fahclient/work
00:53:00:WU01:FS01:0xa7:******************************** Build - libFAH ********************************
00:53:00:WU01:FS01:0xa7:    Version: 0.0.18
00:53:00:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:53:00:WU01:FS01:0xa7:  Copyright: 2019 foldingathome.org
00:53:00:WU01:FS01:0xa7:   Homepage: https://foldingathome.org/
00:53:00:WU01:FS01:0xa7:       Date: Nov 5 2019
00:53:00:WU01:FS01:0xa7:       Time: 06:13:26
00:53:00:WU01:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
00:53:00:WU01:FS01:0xa7:     Branch: master
00:53:00:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
00:53:00:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
00:53:00:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
00:53:00:WU01:FS01:0xa7:       Bits: 64
00:53:00:WU01:FS01:0xa7:       Mode: Release
00:53:00:WU01:FS01:0xa7:************************************ Build *************************************
00:53:00:WU01:FS01:0xa7:       SIMD: avx_256
00:53:00:WU01:FS01:0xa7:********************************************************************************
00:53:00:WU01:FS01:0xa7:Project: 14700 (Run 1132, Clone 0, Gen 0)
00:53:00:WU01:FS01:0xa7:Unit: 0x000000010002894b5ea9fd1bf4884955
00:53:00:WU01:FS01:0xa7:Digital signatures verified
00:53:00:WU01:FS01:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 12
00:53:00:WU01:FS01:0xa7:Steps: first=0 total=250000
00:53:01:WU01:FS01:0xa7:Completed 1 out of 250000 steps (0%)
00:53:02:WU01:FS01:0xa7:ERROR:
00:53:02:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
00:53:02:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
00:53:02:WU01:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
00:53:02:WU01:FS01:0xa7:ERROR:
00:53:02:WU01:FS01:0xa7:ERROR:Fatal error:
00:53:02:WU01:FS01:0xa7:ERROR:2 particles communicated to PME rank 7 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
00:53:02:WU01:FS01:0xa7:ERROR:This usually means that your system is not well equilibrated.
00:53:02:WU01:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
Cheers
Pete

Mod Edit: Added Code Tags - PantherX
Pete Cowley, Gisborne, New Zealand. The first city to see the light of the new day. :D
Image
Post Reply