FAHClient is failing on Ubuntu server

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
zaharcelac
Posts: 2
Joined: Mon Mar 09, 2020 7:04 pm

FAHClient is failing on Ubuntu server

Post by zaharcelac »

Hello,

Hope you can help me to get client running...

Log file:

Code: Select all

*********************** Log Started 2020-03-09T19:06:36Z ***********************
19:06:36:************************* Folding@home Client *************************
19:06:36:    Website: https://foldingathome.org/
19:06:36:  Copyright: (c) 2009-2018 foldingathome.org
19:06:36:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:06:36:       Args: --child --lifeline 2586 /etc/fahclient/config.xml --run-as
19:06:36:             fahclient --pid-file=/var/run/fahclient.pid --daemon
19:06:36:     Config: /etc/fahclient/config.xml
19:06:36:******************************** Build ********************************
19:06:36:    Version: 7.5.1
19:06:36:       Date: May 11 2018
19:06:36:       Time: 19:59:04
19:06:36: Repository: Git
19:06:36:   Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
19:06:36:     Branch: master
19:06:36:   Compiler: GNU 6.3.0 20170516
19:06:36:    Options: -std=gnu++98 -O3 -funroll-loops
19:06:36:   Platform: linux2 4.14.0-3-amd64
19:06:36:       Bits: 64
19:06:36:       Mode: Release
19:06:36:******************************* System ********************************
19:06:36:        CPU: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz
19:06:36:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
19:06:36:       CPUs: 24
19:06:36:     Memory: 188.90GiB
19:06:36:Free Memory: 188.26GiB
19:06:36:    Threads: POSIX_THREADS
19:06:36: OS Version: 4.15
19:06:36:Has Battery: false
19:06:36: On Battery: false
19:06:36: UTC Offset: -4
19:06:36:        PID: 2588
19:06:36:        CWD: /var/lib/fahclient
19:06:36:         OS: Linux 4.15.0-88-generic x86_64
19:06:36:    OS Arch: AMD64
19:06:36:       GPUs: 0
19:06:36:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
19:06:36:             libcuda.so: cannot open shared object file: No such file or
19:06:36:             directory
19:06:36:     OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so':
19:06:36:             libOpenCL.so: cannot open shared object file: No such file or
19:06:36:             directory
19:06:36:***********************************************************************
19:06:36:<config>
19:06:36:  <!-- Client Control -->
19:06:36:  <fold-anon v='true'/>
19:06:36:
19:06:36:  <!-- Folding Slot Configuration -->
19:06:36:  <gpu v='false'/>
19:06:36:
19:06:36:  <!-- HTTP Server -->
19:06:36:  <allow v='0.0.0.0/0'/>
19:06:36:
19:06:36:  <!-- Slot Control -->
19:06:36:  <power v='full'/>
19:06:36:
19:06:36:  <!-- User Information -->
19:06:36:  <passkey v='********************************'/>
19:06:36:  <team v='60'/>
19:06:36:  <user v='Z000000001'/>
19:06:36:
19:06:36:  <!-- Web Server -->
19:06:36:  <web-allow v='0.0.0.0/0'/>
19:06:36:
19:06:36:  <!-- Folding Slots -->
19:06:36:  <slot id='0' type='CPU'/>
19:06:36:</config>
19:06:36:Switching to user fahclient
19:06:36:Trying to access database...
19:06:36:Successfully acquired database lock
19:06:36:Enabled folding slot 00: READY cpu:24
19:06:36:WU00:FS00:Starting
19:06:36:WU00:FS00:Removing old file './work/00/logfile_01-20200309-182258.txt'
19:06:36:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 2588 -checkpoint 15 -np 24
19:06:36:WU00:FS00:Started FahCore on PID 2597
19:06:36:WU00:FS00:Core PID:2601
19:06:36:WU00:FS00:FahCore 0xa7 started
19:06:36:WU00:FS00:0xa7:*********************** Log Started 2020-03-09T19:06:36Z ***********************
19:06:36:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
19:06:36:WU00:FS00:0xa7:       Type: 0xa7
19:06:36:WU00:FS00:0xa7:       Core: Gromacs
19:06:36:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 2597 -checkpoint 15 -np
19:06:36:WU00:FS00:0xa7:             24
19:06:36:WU00:FS00:0xa7:************************************ CBang *************************************
19:06:36:WU00:FS00:0xa7:       Date: Nov 5 2019
19:06:36:WU00:FS00:0xa7:       Time: 05:57:01
19:06:36:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
19:06:36:WU00:FS00:0xa7:     Branch: master
19:06:36:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
19:06:36:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
19:06:36:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
19:06:36:WU00:FS00:0xa7:       Bits: 64
19:06:36:WU00:FS00:0xa7:       Mode: Release
19:06:36:WU00:FS00:0xa7:************************************ System ************************************
19:06:36:WU00:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz
19:06:36:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
19:06:36:WU00:FS00:0xa7:       CPUs: 24
19:06:36:WU00:FS00:0xa7:     Memory: 188.90GiB
19:06:36:WU00:FS00:0xa7:Free Memory: 188.26GiB
19:06:36:WU00:FS00:0xa7:    Threads: POSIX_THREADS
19:06:36:WU00:FS00:0xa7: OS Version: 4.15
19:06:36:WU00:FS00:0xa7:Has Battery: false
19:06:36:WU00:FS00:0xa7: On Battery: false
19:06:36:WU00:FS00:0xa7: UTC Offset: -4
19:06:36:WU00:FS00:0xa7:        PID: 2601
19:06:36:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
19:06:36:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
19:06:36:WU00:FS00:0xa7:    Version: 0.0.18
19:06:36:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:06:36:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
19:06:36:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
19:06:36:WU00:FS00:0xa7:       Date: Nov 5 2019
19:06:36:WU00:FS00:0xa7:       Time: 06:13:26
19:06:36:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
19:06:36:WU00:FS00:0xa7:     Branch: master
19:06:36:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
19:06:36:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
19:06:36:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
19:06:36:WU00:FS00:0xa7:       Bits: 64
19:06:36:WU00:FS00:0xa7:       Mode: Release
19:06:36:WU00:FS00:0xa7:************************************ Build *************************************
19:06:36:WU00:FS00:0xa7:       SIMD: sse2
19:06:36:WU00:FS00:0xa7:********************************************************************************
19:06:36:WU00:FS00:0xa7:Project: 14245 (Run 0, Clone 79, Gen 155)
19:06:36:WU00:FS00:0xa7:Unit: 0x000000f380fccb0a5d6fe0b4d8c36094
19:06:36:WU00:FS00:0xa7:Reading tar file core.xml
19:06:36:WU00:FS00:0xa7:Reading tar file frame155.tpr
19:06:36:WU00:FS00:0xa7:Digital signatures verified
19:06:36:WU00:FS00:0xa7:Calling: mdrun -s frame155.tpr -o frame155.trr -x frame155.xtc -cpt 15 -nt 24
19:06:36:WU00:FS00:0xa7:Steps: first=38750000 total=250000
19:06:36:WU00:FS00:0xa7:ERROR:
19:06:36:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
19:06:36:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
19:06:36:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-sse-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
19:06:36:WU00:FS00:0xa7:ERROR:
19:06:36:WU00:FS00:0xa7:ERROR:Fatal error:
19:06:36:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
19:06:36:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
19:06:36:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
19:06:36:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
19:06:36:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
19:06:36:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
19:06:41:WU00:FS00:0xa7:WARNING:Unexpected exit() call
19:06:41:WU00:FS00:0xa7:WARNING:Unexpected exit from science code


Dmesg contains multiple lines like this:

Code: Select all

[ 1190.736845] FahCore_a7[2335]: segfault at 50 ip 0000000001001e2d sp 00007ffd15b695d0 error 4 in FahCore_a7[406000+ec4000]
[ 1250.753218] FahCore_a7[2367]: segfault at 50 ip 0000000001001e2d sp 00007ffd972ed840 error 4 in FahCore_a7[406000+ec4000]
[ 1310.777090] FahCore_a7[2399]: segfault at 50 ip 0000000001001e2d sp 00007ffcd4b66850 error 4 in FahCore_a7[406000+ec4000]
[ 1370.825699] FahCore_a7[2432]: segfault at 50 ip 0000000001001e2d sp 00007ffe6fc6e9f0 error 4 in FahCore_a7[406000+ec4000]
[ 1430.834598] FahCore_a7[2466]: segfault at 50 ip 0000000001001e2d sp 00007fff04703540 error 4 in FahCore_a7[406000+ec4000]
[ 1490.853572] FahCore_a7[2498]: segfault at 50 ip 0000000001001e2d sp 00007fffbd0eb240 error 4 in FahCore_a7[406000+ec4000]
[ 1550.877738] FahCore_a7[2531]: segfault at 50 ip 0000000001001e2d sp 00007ffcb708b960 error 4 in FahCore_a7[406000+ec4000]
toTOW
Site Moderator
Posts: 6396
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: FAHClient is failing on Ubuntu server

Post by toTOW »

Don't worry, nothing is wrong with your system or your FAH installation. Folding need to decompose every WU in smaller part that will be computed on each thread of your system. This process is called decomposition, and it sometimes fails to find a valid one. It depends on the project settings and the number of threads in the system.

In this case, it seems that project 14245 has no decomposition for 24 threads. I reported it to the project owner so that it stop assigning to machines with 24 threads.

If you want to manually get rid of this WU, pause the client (or stop the service), go to /var/lib/fahclient/work/ and delete 00 folder. Then you can resume the client (or restart the service).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
zaharcelac
Posts: 2
Joined: Mon Mar 09, 2020 7:04 pm

Re: FAHClient is failing on Ubuntu server

Post by zaharcelac »

It helped! Thanks! New WU arrived and my box is 100% loaded again.
Post Reply