Page 1 of 1

Failed to continue GPU WU after reboot

Posted: Thu Aug 13, 2020 10:40 am
by Windhunter
FAHClient can successfully do GPU WUs on my Ubuntu. But when I reboot the computer it fails to restart GPU WU.

Before rebooting I manually paused the GPU slot. After rebooting I manually started it.
My system is Ubuntu 20.04.
Before rebooting the GPU WU was completed to 77%.
CPU WUs always restart well.

Here is a log after rebooting.

Code: Select all

*********************** Log Started 2020-08-12T08:19:56Z ***********************
08:19:56:Trying to access database...
08:19:57:Successfully acquired database lock
08:19:57:Read GPUs.txt
08:19:57:Enabled folding slot 00: READY cpu:2
08:20:05:Enabled folding slot 01: PAUSED gpu:0:GF108 [Quadro 600] 245.8  (by user)
08:20:05:****************************** FAHClient ******************************
08:20:05:      Version: 7.6.13
08:20:05:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:20:05:    Copyright: 2020 foldingathome.org
08:20:05:     Homepage: https://foldingathome.org/
08:20:05:         Date: Apr 28 2020
08:20:05:         Time: 04:20:16
08:20:05:     Revision: 5a652817f46116b6e135503af97f18e094414e3b
08:20:05:       Branch: master
08:20:05:     Compiler: GNU 8.3.0
08:20:05:      Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:20:05:               -fno-pie
08:20:05:     Platform: linux2 4.19.0-5-amd64
08:20:05:         Bits: 64
08:20:05:         Mode: Release
08:20:05:         Args: --child /etc/fahclient/config.xml --run-as fahclient
08:20:05:               --pid-file=/var/run/fahclient.pid --daemon
08:20:05:       Config: /etc/fahclient/config.xml
08:20:05:******************************** CBang ********************************
08:20:05:         Date: Apr 25 2020
08:20:05:         Time: 00:07:53
08:20:05:     Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
08:20:05:       Branch: master
08:20:05:     Compiler: GNU 8.3.0
08:20:05:      Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:20:05:               -fno-pie -fPIC
08:20:05:     Platform: linux2 4.19.0-5-amd64
08:20:05:         Bits: 64
08:20:05:         Mode: Release
08:20:05:******************************* System ********************************
08:20:05:          CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
08:20:05:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
08:20:05:         CPUs: 4
08:20:05:       Memory: 15.34GiB
08:20:05:  Free Memory: 14.75GiB
08:20:05:      Threads: POSIX_THREADS
08:20:05:   OS Version: 5.4
08:20:05:  Has Battery: false
08:20:05:   On Battery: false
08:20:05:   UTC Offset: 3
08:20:05:          PID: 1167
08:20:05:          CWD: /var/lib/fahclient
08:20:05:           OS: Linux 5.4.0-42-generic x86_64
08:20:05:      OS Arch: AMD64
08:20:05:         GPUs: 1
08:20:05:        GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:2 GF108 [Quadro 600] 245.8
08:20:05:CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:2.1 Driver:9.1
08:20:05:       OpenCL: Not detected: clGetDeviceIDs() returned -1
08:20:05:******************************* libFAH ********************************
08:20:05:         Date: Apr 15 2020
08:20:05:         Time: 21:43:24
08:20:05:     Revision: 216968bc7025029c841ed6e36e81a03a316890d3
08:20:05:       Branch: master
08:20:05:     Compiler: GNU 8.3.0
08:20:05:      Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:20:05:               -fno-pie
08:20:05:     Platform: linux2 4.19.0-5-amd64
08:20:05:         Bits: 64
08:20:05:         Mode: Release
08:20:05:***********************************************************************
08:20:05:<config>
08:20:05:  <!-- Client Control -->
08:20:05:  <fold-anon v='true'/>
08:20:05:  <idle-seconds v='120'/>
08:20:05:
08:20:05:  <!-- Folding Core -->
08:20:05:  <checkpoint v='20'/>
08:20:05:
08:20:05:  <!-- Folding Slot Configuration -->
08:20:05:  <cause v='COVID_19'/>
08:20:05:
08:20:05:  <!-- Network -->
08:20:05:  <proxy v=':8080'/>
08:20:05:
08:20:05:  <!-- Slot Control -->
08:20:05:  <power v='full'/>
08:20:05:
08:20:05:  <!-- User Information -->
08:20:05:  <passkey v='*****'/>
08:20:05:  <team v='279'/>
08:20:05:  <user v='Windhunter'/>
08:20:05:
08:20:05:  <!-- Folding Slots -->
08:20:05:  <slot id='0' type='CPU'>
08:20:05:    <cpus v='2'/>
08:20:05:  </slot>
08:20:05:  <slot id='1' type='GPU'>
08:20:05:    <cuda-index v='0'/>
08:20:05:    <max-packet-size v='small'/>
08:20:05:    <opencl-index v='0'/>
08:20:05:    <pause-on-start v='true'/>
08:20:05:    <paused v='true'/>
08:20:05:  </slot>
08:20:05:</config>
08:20:05:WU00:FS00:Starting
08:20:05:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 1167 -checkpoint 20 -np 2
08:20:05:WU00:FS00:Started FahCore on PID 1285
08:20:05:WU00:FS00:Core PID:1289
08:20:05:WU00:FS00:FahCore 0xa7 started
08:20:07:WU00:FS00:0xa7:*********************** Log Started 2020-08-12T08:20:07Z ***********************
08:20:07:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
08:20:07:WU00:FS00:0xa7:       Type: 0xa7
08:20:07:WU00:FS00:0xa7:       Core: Gromacs
08:20:07:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 1285 -checkpoint 20 -np 2
08:20:07:WU00:FS00:0xa7:************************************ CBang *************************************
08:20:07:WU00:FS00:0xa7:       Date: Nov 27 2019
08:20:07:WU00:FS00:0xa7:       Time: 11:26:54
08:20:07:WU00:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
08:20:07:WU00:FS00:0xa7:     Branch: master
08:20:07:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
08:20:07:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:20:07:WU00:FS00:0xa7:             -fno-pie -fPIC
08:20:07:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
08:20:07:WU00:FS00:0xa7:       Bits: 64
08:20:07:WU00:FS00:0xa7:       Mode: Release
08:20:07:WU00:FS00:0xa7:************************************ System ************************************
08:20:07:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
08:20:07:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
08:20:07:WU00:FS00:0xa7:       CPUs: 4
08:20:07:WU00:FS00:0xa7:     Memory: 15.34GiB
08:20:07:WU00:FS00:0xa7:Free Memory: 14.60GiB
08:20:07:WU00:FS00:0xa7:    Threads: POSIX_THREADS
08:20:07:WU00:FS00:0xa7: OS Version: 5.4
08:20:07:WU00:FS00:0xa7:Has Battery: false
08:20:07:WU00:FS00:0xa7: On Battery: false
08:20:07:WU00:FS00:0xa7: UTC Offset: 3
08:20:07:WU00:FS00:0xa7:        PID: 1289
08:20:07:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
08:20:07:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
08:20:07:WU00:FS00:0xa7:    Version: 0.0.19
08:20:07:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:20:07:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
08:20:07:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
08:20:07:WU00:FS00:0xa7:       Date: Nov 26 2019
08:20:07:WU00:FS00:0xa7:       Time: 00:41:42
08:20:07:WU00:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
08:20:07:WU00:FS00:0xa7:     Branch: master
08:20:07:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
08:20:07:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:20:07:WU00:FS00:0xa7:             -fno-pie
08:20:07:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
08:20:07:WU00:FS00:0xa7:       Bits: 64
08:20:07:WU00:FS00:0xa7:       Mode: Release
08:20:07:WU00:FS00:0xa7:************************************ Build *************************************
08:20:07:WU00:FS00:0xa7:       SIMD: avx_256
08:20:07:WU00:FS00:0xa7:********************************************************************************
08:20:07:WU00:FS00:0xa7:Project: 14379 (Run 885, Clone 1, Gen 356)
08:20:07:WU00:FS00:0xa7:Unit: 0x00000192455e42075e93309c39e0a61d
08:20:07:WU00:FS00:0xa7:Digital signatures verified
08:20:07:WU00:FS00:0xa7:Calling: mdrun -s frame356.tpr -o frame356.trr -cpi state.cpt -cpt 20 -nt 2
08:20:09:WU00:FS00:0xa7:Steps: first=0 total=250000
08:20:11:WU00:FS00:0xa7:Completed 43334 out of 250000 steps (17%)
08:21:56:WU00:FS00:0xa7:Completed 45000 out of 250000 steps (18%)
08:22:00:16:127.0.0.1:New Web session
08:23:25:FS01:Unpaused
08:23:25:FS01:Finishing
08:23:25:WU01:FS01:Starting
08:23:25:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 1167 -checkpoint 20 -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
08:23:25:WU01:FS01:Started FahCore on PID 3763
08:23:25:WU01:FS01:Core PID:3767
08:23:25:WU01:FS01:FahCore 0x22 started
08:23:25:WU01:FS01:0x22:*********************** Log Started 2020-08-12T08:23:25Z ***********************
08:23:25:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
08:23:25:WU01:FS01:0x22:       Core: Core22
08:23:25:WU01:FS01:0x22:       Type: 0x22
08:23:25:WU01:FS01:0x22:    Version: 0.0.11
08:23:25:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:23:25:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
08:23:25:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
08:23:25:WU01:FS01:0x22:       Date: Jun 27 2020
08:23:25:WU01:FS01:0x22:       Time: 22:50:00
08:23:25:WU01:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
08:23:25:WU01:FS01:0x22:     Branch: core22-0.0.11
08:23:25:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:23:25:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
08:23:25:WU01:FS01:0x22:             -funroll-loops
08:23:25:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
08:23:25:WU01:FS01:0x22:       Bits: 64
08:23:25:WU01:FS01:0x22:       Mode: Release
08:23:25:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
08:23:25:WU01:FS01:0x22:             <peastman@stanford.edu>
08:23:25:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 3763 -checkpoint 20
08:23:25:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
08:23:25:WU01:FS01:0x22:************************************ libFAH ************************************
08:23:25:WU01:FS01:0x22:       Date: Jun 27 2020
08:23:25:WU01:FS01:0x22:       Time: 22:11:04
08:23:25:WU01:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
08:23:25:WU01:FS01:0x22:     Branch: HEAD
08:23:25:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:23:25:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
08:23:25:WU01:FS01:0x22:             -funroll-loops
08:23:25:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
08:23:25:WU01:FS01:0x22:       Bits: 64
08:23:25:WU01:FS01:0x22:       Mode: Release
08:23:25:WU01:FS01:0x22:************************************ CBang *************************************
08:23:25:WU01:FS01:0x22:       Date: Jun 27 2020
08:23:25:WU01:FS01:0x22:       Time: 22:10:11
08:23:25:WU01:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
08:23:25:WU01:FS01:0x22:     Branch: HEAD
08:23:25:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:23:25:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
08:23:25:WU01:FS01:0x22:             -funroll-loops -fPIC
08:23:25:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
08:23:25:WU01:FS01:0x22:       Bits: 64
08:23:25:WU01:FS01:0x22:       Mode: Release
08:23:25:WU01:FS01:0x22:************************************ System ************************************
08:23:25:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
08:23:25:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
08:23:25:WU01:FS01:0x22:       CPUs: 4
08:23:25:WU01:FS01:0x22:     Memory: 15.34GiB
08:23:25:WU01:FS01:0x22:Free Memory: 11.76GiB
08:23:25:WU01:FS01:0x22:    Threads: POSIX_THREADS
08:23:25:WU01:FS01:0x22: OS Version: 5.4
08:23:25:WU01:FS01:0x22:Has Battery: false
08:23:25:WU01:FS01:0x22: On Battery: false
08:23:25:WU01:FS01:0x22: UTC Offset: 3
08:23:25:WU01:FS01:0x22:        PID: 3767
08:23:25:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
08:23:25:WU01:FS01:0x22:********************************************************************************
08:23:25:WU01:FS01:0x22:Project: 13421 (Run 4757, Clone 4, Gen 1)
08:23:25:WU01:FS01:0x22:Unit: 0x0000000312bc7d9a5f20bd49d859ebba
08:23:25:WU01:FS01:0x22:Digital signatures verified
08:23:25:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
08:23:25:WU01:FS01:0x22:Version 0.0.11
08:23:25:WU01:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:23:25:WU01:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:23:25:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:23:25:WU01:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
08:23:32:WU01:FS01:0x22:ERROR:NaNs detected in forces. 0 0
08:23:32:WU01:FS01:0x22:Saving result file ../logfile_01.txt
08:23:32:WU01:FS01:0x22:Saving result file science.log
08:23:32:WU01:FS01:0x22:Saving result file state.xml.bz2
08:23:32:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
[93m08:23:32:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)[0m
08:23:33:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4757 clone:4 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20bd49d859ebba
08:23:33:WU01:FS01:Uploading 304.00KiB to 18.188.125.154
08:23:33:WU01:FS01:Connecting to 18.188.125.154:808008:23:34:WU01:FS01:Upload complete
08:23:34:WU01:FS01:Server responded WORK_ACK (400)
08:23:34:WU01:FS01:Cleaning up

Re: Failed to continue GPU WU after reboot

Posted: Thu Aug 13, 2020 1:15 pm
by Neil-B
Was your reboot linked with patch/updates ... I ask as it appears your setup has OpenCl issues

Code: Select all

08:20:05:       OpenCL: Not detected: clGetDeviceIDs() returned -1
which is usually down to a driver issue ... If you system was running before the reboot and was patched with drivers that caused this lack of OpenCl then that might be why it.

Re: Failed to continue GPU WU after reboot

Posted: Thu Aug 13, 2020 2:10 pm
by Windhunter
Neil-B wrote:Was your reboot linked with patch/updates ... I ask as it appears your setup has OpenCl issues

Code: Select all

08:20:05:       OpenCL: Not detected: clGetDeviceIDs() returned -1
which is usually down to a driver issue ... If you system was running before the reboot and was patched with drivers that caused this lack of OpenCl then that might be why it.
May be.
I recreated GPU slot. And it started new WU.

Code: Select all

13:43:05:Adding folding slot 01: PAUSED gpu:0:GF108 [Quadro 600] 245.8  (by user)
13:43:05:Removing old file 'configs/config-20200813-132519.xml'
13:43:05:Saving configuration to /etc/fahclient/config.xml
13:43:05:<config>
13:43:05:  <!-- Client Control -->
13:43:05:  <fold-anon v='true'/>
13:43:05:  <idle-seconds v='120'/>
13:43:05:
13:43:05:  <!-- Folding Core -->
13:43:05:  <checkpoint v='20'/>
13:43:05:
13:43:05:  <!-- Folding Slot Configuration -->
13:43:05:  <cause v='COVID_19'/>
13:43:05:
13:43:05:  <!-- Network -->
13:43:05:  <proxy v=':8080'/>
13:43:05:
13:43:05:  <!-- Slot Control -->
13:43:05:  <power v='full'/>
13:43:05:
13:43:05:  <!-- User Information -->
13:43:05:  <passkey v='*****'/>
13:43:05:  <team v='279'/>
13:43:05:  <user v='Windhunter'/>
13:43:05:
13:43:05:  <!-- Folding Slots -->
13:43:05:  <slot id='0' type='CPU'>
13:43:05:    <cpus v='2'/>
13:43:05:  </slot>
13:43:05:  <slot id='1' type='GPU'>
13:43:05:    <opencl-index v='0'/>
13:43:05:    <pause-on-start v='true'/>
13:43:05:  </slot>
13:43:05:</config>
13:43:14:FS01:Unpaused
13:43:14:WU00:FS01:Connecting to assign1.foldingathome.org:80
13:43:15:WU00:FS01:Assigned to work server 18.188.125.154
13:43:15:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF108 [Quadro 600] 245.8  from 18.188.125.154
13:43:15:WU00:FS01:Connecting to 18.188.125.154:8080
13:43:15:WU00:FS01:Downloading 372.00KiB
13:43:18:WU00:FS01:Download complete
13:43:18:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13421 run:7644 clone:52 gen:2 core:0x22 unit:0x0000000212bc7d9a5f224a46799a0fe6
13:43:18:WU00:FS01:Starting
13:43:18:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1240 -checkpoint 20 -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
13:43:18:WU00:FS01:Started FahCore on PID 11318
13:43:18:WU00:FS01:Core PID:11322
13:43:18:WU00:FS01:FahCore 0x22 started
13:43:19:WU00:FS01:0x22:*********************** Log Started 2020-08-13T13:43:18Z ***********************
13:43:19:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
13:43:19:WU00:FS01:0x22:       Core: Core22
13:43:19:WU00:FS01:0x22:       Type: 0x22
13:43:19:WU00:FS01:0x22:    Version: 0.0.11
13:43:19:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:43:19:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
13:43:19:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
13:43:19:WU00:FS01:0x22:       Date: Jun 27 2020
13:43:19:WU00:FS01:0x22:       Time: 22:50:00
13:43:19:WU00:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
13:43:19:WU00:FS01:0x22:     Branch: core22-0.0.11
13:43:19:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:43:19:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:43:19:WU00:FS01:0x22:             -funroll-loops
13:43:19:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
13:43:19:WU00:FS01:0x22:       Bits: 64
13:43:19:WU00:FS01:0x22:       Mode: Release
13:43:19:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
13:43:19:WU00:FS01:0x22:             <peastman@stanford.edu>
13:43:19:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 11318 -checkpoint 20
13:43:19:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
13:43:19:WU00:FS01:0x22:************************************ libFAH ************************************
13:43:19:WU00:FS01:0x22:       Date: Jun 27 2020
13:43:19:WU00:FS01:0x22:       Time: 22:11:04
13:43:19:WU00:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
13:43:19:WU00:FS01:0x22:     Branch: HEAD
13:43:19:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:43:19:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:43:19:WU00:FS01:0x22:             -funroll-loops
13:43:19:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
13:43:19:WU00:FS01:0x22:       Bits: 64
13:43:19:WU00:FS01:0x22:       Mode: Release
13:43:19:WU00:FS01:0x22:************************************ CBang *************************************
13:43:19:WU00:FS01:0x22:       Date: Jun 27 2020
13:43:19:WU00:FS01:0x22:       Time: 22:10:11
13:43:19:WU00:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
13:43:19:WU00:FS01:0x22:     Branch: HEAD
13:43:19:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:43:19:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
13:43:19:WU00:FS01:0x22:             -funroll-loops -fPIC
13:43:19:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
13:43:19:WU00:FS01:0x22:       Bits: 64
13:43:19:WU00:FS01:0x22:       Mode: Release
13:43:19:WU00:FS01:0x22:************************************ System ************************************
13:43:19:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
13:43:19:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
13:43:19:WU00:FS01:0x22:       CPUs: 4
13:43:19:WU00:FS01:0x22:     Memory: 15.34GiB
13:43:19:WU00:FS01:0x22:Free Memory: 9.18GiB
13:43:19:WU00:FS01:0x22:    Threads: POSIX_THREADS
13:43:19:WU00:FS01:0x22: OS Version: 5.4
13:43:19:WU00:FS01:0x22:Has Battery: false
13:43:19:WU00:FS01:0x22: On Battery: false
13:43:19:WU00:FS01:0x22: UTC Offset: 3
13:43:19:WU00:FS01:0x22:        PID: 11322
13:43:19:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
13:43:19:WU00:FS01:0x22:********************************************************************************
13:43:19:WU00:FS01:0x22:Project: 13421 (Run 7644, Clone 52, Gen 2)
13:43:19:WU00:FS01:0x22:Unit: 0x0000000212bc7d9a5f224a46799a0fe6
13:43:19:WU00:FS01:0x22:Reading tar file core.xml
13:43:19:WU00:FS01:0x22:Reading tar file integrator.xml
13:43:19:WU00:FS01:0x22:Reading tar file state.xml.bz2
13:43:19:WU00:FS01:0x22:Reading tar file system.xml.bz2
13:43:19:WU00:FS01:0x22:Digital signatures verified
13:43:19:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
13:43:19:WU00:FS01:0x22:Version 0.0.11
13:43:19:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
13:43:19:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
13:43:19:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
13:43:19:WU00:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
13:43:22:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
13:43:37:Removing old file 'configs/config-20200813-132526.xml'
13:43:37:Saving configuration to /etc/fahclient/config.xml
13:43:37:<config>
13:43:37:  <!-- Client Control -->
13:43:37:  <fold-anon v='true'/>
13:43:37:  <idle-seconds v='120'/>
13:43:37:
13:43:37:  <!-- Folding Core -->
13:43:37:  <checkpoint v='20'/>
13:43:37:
13:43:37:  <!-- Folding Slot Configuration -->
13:43:37:  <cause v='COVID_19'/>
13:43:37:
13:43:37:  <!-- Network -->
13:43:37:  <proxy v=':8080'/>
13:43:37:
13:43:37:  <!-- Slot Control -->
13:43:37:  <power v='full'/>
13:43:37:
13:43:37:  <!-- User Information -->
13:43:37:  <passkey v='*****'/>
13:43:37:  <team v='279'/>
13:43:37:  <user v='Windhunter'/>
13:43:37:
13:43:37:  <!-- Folding Slots -->
13:43:37:  <slot id='0' type='CPU'>
13:43:37:    <cpus v='2'/>
13:43:37:  </slot>
13:43:37:  <slot id='1' type='GPU'>
13:43:37:    <opencl-index v='0'/>
13:43:37:    <pause-on-start v='true'/>
13:43:37:  </slot>
13:43:37:</config>
13:46:51:WU01:FS00:0xa7:Completed 27500 out of 125000 steps (22%)
13:48:55:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
But FAHControl keeps showing
OpenCL Not detected

I will try to reboot system.

Re: Failed to continue GPU WU after reboot

Posted: Thu Aug 13, 2020 2:33 pm
by Windhunter
I have deleted Intel's opencl package.
I have rebooted the system.
This time FAHCore found Nvidia OpenCL. But still cannot continue GPU WU. :e?:
Than it downloaded new WU, which also failed to start.
And only the next WU started without errors.

Code: Select all

*********************** Log Started 2020-08-13T14:14:11Z ***********************
14:14:11:Trying to access database...
14:14:12:Successfully acquired database lock
14:14:12:Read GPUs.txt
14:14:12:Enabled folding slot 00: READY cpu:2
14:14:17:Enabled folding slot 01: PAUSED gpu:0:GF108 [Quadro 600] 245.8  (by user)
14:14:17:****************************** FAHClient ******************************
14:14:17:        Version: 7.6.13
14:14:17:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:14:17:      Copyright: 2020 foldingathome.org
14:14:17:       Homepage: https://foldingathome.org/
14:14:17:           Date: Apr 28 2020
14:14:17:           Time: 04:20:16
14:14:17:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
14:14:17:         Branch: master
14:14:17:       Compiler: GNU 8.3.0
14:14:17:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
14:14:17:                 -funroll-loops -fno-pie
14:14:17:       Platform: linux2 4.19.0-5-amd64
14:14:17:           Bits: 64
14:14:17:           Mode: Release
14:14:17:           Args: --child /etc/fahclient/config.xml --run-as fahclient
14:14:17:                 --pid-file=/var/run/fahclient.pid --daemon
14:14:17:         Config: /etc/fahclient/config.xml
14:14:17:******************************** CBang ********************************
14:14:17:           Date: Apr 25 2020
14:14:17:           Time: 00:07:53
14:14:17:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
14:14:17:         Branch: master
14:14:17:       Compiler: GNU 8.3.0
14:14:17:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
14:14:17:                 -funroll-loops -fno-pie -fPIC
14:14:17:       Platform: linux2 4.19.0-5-amd64
14:14:17:           Bits: 64
14:14:17:           Mode: Release
14:14:17:******************************* System ********************************
14:14:17:            CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
14:14:17:         CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
14:14:17:           CPUs: 4
14:14:17:         Memory: 15.34GiB
14:14:17:    Free Memory: 14.76GiB
14:14:17:        Threads: POSIX_THREADS
14:14:17:     OS Version: 5.4
14:14:17:    Has Battery: false
14:14:17:     On Battery: false
14:14:17:     UTC Offset: 3
14:14:17:            PID: 1203
14:14:17:            CWD: /var/lib/fahclient
14:14:17:             OS: Linux 5.4.0-42-generic x86_64
14:14:17:        OS Arch: AMD64
14:14:17:           GPUs: 1
14:14:17:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:2 GF108 [Quadro 600] 245.8
14:14:17:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:2.1 Driver:9.1
14:14:17:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.1 Driver:390.138
14:14:17:******************************* libFAH ********************************
14:14:17:           Date: Apr 15 2020
14:14:17:           Time: 21:43:24
14:14:17:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
14:14:17:         Branch: master
14:14:17:       Compiler: GNU 8.3.0
14:14:17:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
14:14:17:                 -funroll-loops -fno-pie
14:14:17:       Platform: linux2 4.19.0-5-amd64
14:14:17:           Bits: 64
14:14:17:           Mode: Release
14:14:17:***********************************************************************
14:14:17:<config>
14:14:17:  <!-- Client Control -->
14:14:17:  <fold-anon v='true'/>
14:14:17:  <idle-seconds v='120'/>
14:14:17:
14:14:17:  <!-- Folding Core -->
14:14:17:  <checkpoint v='20'/>
14:14:17:
14:14:17:  <!-- Folding Slot Configuration -->
14:14:17:  <cause v='COVID_19'/>
14:14:17:
14:14:17:  <!-- Network -->
14:14:17:  <proxy v=':8080'/>
14:14:17:
14:14:17:  <!-- Slot Control -->
14:14:17:  <power v='full'/>
14:14:17:
14:14:17:  <!-- User Information -->
14:14:17:  <passkey v='*****'/>
14:14:17:  <team v='279'/>
14:14:17:  <user v='Windhunter'/>
14:14:17:
14:14:17:  <!-- Folding Slots -->
14:14:17:  <slot id='0' type='CPU'>
14:14:17:    <cpus v='2'/>
14:14:17:  </slot>
14:14:17:  <slot id='1' type='GPU'>
14:14:17:    <opencl-index v='0'/>
14:14:17:    <pause-on-start v='true'/>
14:14:17:    <paused v='true'/>
14:14:17:  </slot>
14:14:17:</config>
14:14:17:WU01:FS00:Starting
14:14:18:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 1203 -checkpoint 20 -np 2
14:14:18:WU01:FS00:Started FahCore on PID 1312
14:14:18:WU01:FS00:Core PID:1316
14:14:18:WU01:FS00:FahCore 0xa7 started
14:14:20:WU01:FS00:0xa7:*********************** Log Started 2020-08-13T14:14:19Z ***********************
14:14:20:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
14:14:20:WU01:FS00:0xa7:       Type: 0xa7
14:14:20:WU01:FS00:0xa7:       Core: Gromacs
14:14:20:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 1312 -checkpoint 20 -np 2
14:14:20:WU01:FS00:0xa7:************************************ CBang *************************************
14:14:20:WU01:FS00:0xa7:       Date: Nov 27 2019
14:14:20:WU01:FS00:0xa7:       Time: 11:26:54
14:14:20:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
14:14:20:WU01:FS00:0xa7:     Branch: master
14:14:20:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
14:14:20:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
14:14:20:WU01:FS00:0xa7:             -fno-pie -fPIC
14:14:20:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
14:14:20:WU01:FS00:0xa7:       Bits: 64
14:14:20:WU01:FS00:0xa7:       Mode: Release
14:14:20:WU01:FS00:0xa7:************************************ System ************************************
14:14:20:WU01:FS00:0xa7:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
14:14:20:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
14:14:20:WU01:FS00:0xa7:       CPUs: 4
14:14:20:WU01:FS00:0xa7:     Memory: 15.34GiB
14:14:20:WU01:FS00:0xa7:Free Memory: 14.63GiB
14:14:20:WU01:FS00:0xa7:    Threads: POSIX_THREADS
14:14:20:WU01:FS00:0xa7: OS Version: 5.4
14:14:20:WU01:FS00:0xa7:Has Battery: false
14:14:20:WU01:FS00:0xa7: On Battery: false
14:14:20:WU01:FS00:0xa7: UTC Offset: 3
14:14:20:WU01:FS00:0xa7:        PID: 1316
14:14:20:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
14:14:20:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
14:14:20:WU01:FS00:0xa7:    Version: 0.0.19
14:14:20:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:14:20:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
14:14:20:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
14:14:20:WU01:FS00:0xa7:       Date: Nov 26 2019
14:14:20:WU01:FS00:0xa7:       Time: 00:41:42
14:14:20:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
14:14:20:WU01:FS00:0xa7:     Branch: master
14:14:20:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
14:14:20:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
14:14:20:WU01:FS00:0xa7:             -fno-pie
14:14:20:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
14:14:20:WU01:FS00:0xa7:       Bits: 64
14:14:20:WU01:FS00:0xa7:       Mode: Release
14:14:20:WU01:FS00:0xa7:************************************ Build *************************************
14:14:20:WU01:FS00:0xa7:       SIMD: avx_256
14:14:20:WU01:FS00:0xa7:********************************************************************************
14:14:20:WU01:FS00:0xa7:Project: 13827 (Run 813, Clone 2, Gen 169)
14:14:20:WU01:FS00:0xa7:Unit: 0x000000ca80fccb095c9f836d01823063
14:14:20:WU01:FS00:0xa7:Digital signatures verified
14:14:20:WU01:FS00:0xa7:Calling: mdrun -s frame169.tpr -o frame169.trr -x frame169.xtc -cpi state.cpt -cpt 20 -nt 2
14:14:21:WU01:FS00:0xa7:Steps: first=21125000 total=125000
14:14:24:WU01:FS00:0xa7:Completed 35572 out of 125000 steps (28%)
14:16:00:15:127.0.0.1:New Web session
14:16:40:WU01:FS00:0xa7:Completed 36250 out of 125000 steps (29%)
14:18:03:FS01:Unpaused
14:18:03:WU00:FS01:Starting
14:18:03:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1203 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
14:18:03:WU00:FS01:Started FahCore on PID 4443
14:18:03:WU00:FS01:Core PID:4447
14:18:03:WU00:FS01:FahCore 0x22 started
14:18:04:WU00:FS01:0x22:*********************** Log Started 2020-08-13T14:18:03Z ***********************
14:18:04:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
14:18:04:WU00:FS01:0x22:       Core: Core22
14:18:04:WU00:FS01:0x22:       Type: 0x22
14:18:04:WU00:FS01:0x22:    Version: 0.0.11
14:18:04:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:18:04:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
14:18:04:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
14:18:04:WU00:FS01:0x22:       Date: Jun 27 2020
14:18:04:WU00:FS01:0x22:       Time: 22:50:00
14:18:04:WU00:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
14:18:04:WU00:FS01:0x22:     Branch: core22-0.0.11
14:18:04:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:18:04:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:18:04:WU00:FS01:0x22:             -funroll-loops
14:18:04:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:18:04:WU00:FS01:0x22:       Bits: 64
14:18:04:WU00:FS01:0x22:       Mode: Release
14:18:04:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
14:18:04:WU00:FS01:0x22:             <peastman@stanford.edu>
14:18:04:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4443 -checkpoint 20
14:18:04:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
14:18:04:WU00:FS01:0x22:             0 -gpu 0
14:18:04:WU00:FS01:0x22:************************************ libFAH ************************************
14:18:04:WU00:FS01:0x22:       Date: Jun 27 2020
14:18:04:WU00:FS01:0x22:       Time: 22:11:04
14:18:04:WU00:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
14:18:04:WU00:FS01:0x22:     Branch: HEAD
14:18:04:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:18:04:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:18:04:WU00:FS01:0x22:             -funroll-loops
14:18:04:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:18:04:WU00:FS01:0x22:       Bits: 64
14:18:04:WU00:FS01:0x22:       Mode: Release
14:18:04:WU00:FS01:0x22:************************************ CBang *************************************
14:18:04:WU00:FS01:0x22:       Date: Jun 27 2020
14:18:04:WU00:FS01:0x22:       Time: 22:10:11
14:18:04:WU00:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
14:18:04:WU00:FS01:0x22:     Branch: HEAD
14:18:04:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:18:04:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:18:04:WU00:FS01:0x22:             -funroll-loops -fPIC
14:18:04:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:18:04:WU00:FS01:0x22:       Bits: 64
14:18:04:WU00:FS01:0x22:       Mode: Release
14:18:04:WU00:FS01:0x22:************************************ System ************************************
14:18:04:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
14:18:04:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
14:18:04:WU00:FS01:0x22:       CPUs: 4
14:18:04:WU00:FS01:0x22:     Memory: 15.34GiB
14:18:04:WU00:FS01:0x22:Free Memory: 11.34GiB
14:18:04:WU00:FS01:0x22:    Threads: POSIX_THREADS
14:18:04:WU00:FS01:0x22: OS Version: 5.4
14:18:04:WU00:FS01:0x22:Has Battery: false
14:18:04:WU00:FS01:0x22: On Battery: false
14:18:04:WU00:FS01:0x22: UTC Offset: 3
14:18:04:WU00:FS01:0x22:        PID: 4447
14:18:04:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
14:18:04:WU00:FS01:0x22:********************************************************************************
14:18:04:WU00:FS01:0x22:Project: 13421 (Run 7644, Clone 52, Gen 2)
14:18:04:WU00:FS01:0x22:Unit: 0x0000000212bc7d9a5f224a46799a0fe6
14:18:04:WU00:FS01:0x22:Digital signatures verified
14:18:04:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:18:04:WU00:FS01:0x22:Version 0.0.11
14:18:04:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
14:18:04:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
14:18:04:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
14:18:04:WU00:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
14:18:11:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
14:18:11:WU00:FS01:0x22:Saving result file ../logfile_01.txt
14:18:11:WU00:FS01:0x22:Saving result file science.log
14:18:11:WU00:FS01:0x22:Saving result file state.xml.bz2
14:18:11:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
14:18:11:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
14:18:11:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:7644 clone:52 gen:2 core:0x22 unit:0x0000000212bc7d9a5f224a46799a0fe6
14:18:11:WU00:FS01:Uploading 302.00KiB to 18.188.125.154
14:18:11:WU00:FS01:Connecting to 18.188.125.154:8080
14:18:12:WU02:FS01:Connecting to assign1.foldingathome.org:80
14:18:12:WU02:FS01:Assigned to work server 18.188.125.154
14:18:12:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GF108 [Quadro 600] 245.8  from 18.188.125.154
14:18:12:WU02:FS01:Connecting to 18.188.125.154:8080
14:18:12:WU00:FS01:Upload complete
14:18:13:WU00:FS01:Server responded WORK_ACK (400)
14:18:13:WU00:FS01:Cleaning up
14:18:13:WU02:FS01:Downloading 360.50KiB
14:18:16:Removing old file 'configs/config-20200813-132929.xml'
14:18:16:Saving configuration to /etc/fahclient/config.xml
14:18:16:<config>
14:18:16:  <!-- Client Control -->
14:18:16:  <fold-anon v='true'/>
14:18:16:  <idle-seconds v='120'/>
14:18:16:
14:18:16:  <!-- Folding Core -->
14:18:16:  <checkpoint v='20'/>
14:18:16:
14:18:16:  <!-- Folding Slot Configuration -->
14:18:16:  <cause v='COVID_19'/>
14:18:16:
14:18:16:  <!-- Network -->
14:18:16:  <proxy v=':8080'/>
14:18:16:
14:18:16:  <!-- Slot Control -->
14:18:16:  <power v='full'/>
14:18:16:
14:18:16:  <!-- User Information -->
14:18:16:  <passkey v='*****'/>
14:18:16:  <team v='279'/>
14:18:16:  <user v='Windhunter'/>
14:18:16:
14:18:16:  <!-- Folding Slots -->
14:18:16:  <slot id='0' type='CPU'>
14:18:16:    <cpus v='2'/>
14:18:16:  </slot>
14:18:16:  <slot id='1' type='GPU'>
14:18:16:    <opencl-index v='0'/>
14:18:16:    <pause-on-start v='true'/>
14:18:16:  </slot>
14:18:16:</config>
14:18:16:WU02:FS01:Download complete
14:18:16:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:13421 run:7601 clone:20 gen:2 core:0x22 unit:0x0000000312bc7d9a5f224a46f0e7c3af
14:18:16:WU02:FS01:Starting
14:18:16:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 02 -suffix 01 -version 706 -lifeline 1203 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
14:18:16:WU02:FS01:Started FahCore on PID 4474
14:18:16:WU02:FS01:Core PID:4478
14:18:16:WU02:FS01:FahCore 0x22 started
14:18:17:WU02:FS01:0x22:*********************** Log Started 2020-08-13T14:18:16Z ***********************
14:18:17:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
14:18:17:WU02:FS01:0x22:       Core: Core22
14:18:17:WU02:FS01:0x22:       Type: 0x22
14:18:17:WU02:FS01:0x22:    Version: 0.0.11
14:18:17:WU02:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:18:17:WU02:FS01:0x22:  Copyright: 2020 foldingathome.org
14:18:17:WU02:FS01:0x22:   Homepage: https://foldingathome.org/
14:18:17:WU02:FS01:0x22:       Date: Jun 27 2020
14:18:17:WU02:FS01:0x22:       Time: 22:50:00
14:18:17:WU02:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
14:18:17:WU02:FS01:0x22:     Branch: core22-0.0.11
14:18:17:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:18:17:WU02:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:18:17:WU02:FS01:0x22:             -funroll-loops
14:18:17:WU02:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:18:17:WU02:FS01:0x22:       Bits: 64
14:18:17:WU02:FS01:0x22:       Mode: Release
14:18:17:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
14:18:17:WU02:FS01:0x22:             <peastman@stanford.edu>
14:18:17:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 4474 -checkpoint 20
14:18:17:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
14:18:17:WU02:FS01:0x22:             0 -gpu 0
14:18:17:WU02:FS01:0x22:************************************ libFAH ************************************
14:18:17:WU02:FS01:0x22:       Date: Jun 27 2020
14:18:17:WU02:FS01:0x22:       Time: 22:11:04
14:18:17:WU02:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
14:18:17:WU02:FS01:0x22:     Branch: HEAD
14:18:17:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:18:17:WU02:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:18:17:WU02:FS01:0x22:             -funroll-loops
14:18:17:WU02:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:18:17:WU02:FS01:0x22:       Bits: 64
14:18:17:WU02:FS01:0x22:       Mode: Release
14:18:17:WU02:FS01:0x22:************************************ CBang *************************************
14:18:17:WU02:FS01:0x22:       Date: Jun 27 2020
14:18:17:WU02:FS01:0x22:       Time: 22:10:11
14:18:17:WU02:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
14:18:17:WU02:FS01:0x22:     Branch: HEAD
14:18:17:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:18:17:WU02:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:18:17:WU02:FS01:0x22:             -funroll-loops -fPIC
14:18:17:WU02:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:18:17:WU02:FS01:0x22:       Bits: 64
14:18:17:WU02:FS01:0x22:       Mode: Release
14:18:17:WU02:FS01:0x22:************************************ System ************************************
14:18:17:WU02:FS01:0x22:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
14:18:17:WU02:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
14:18:17:WU02:FS01:0x22:       CPUs: 4
14:18:17:WU02:FS01:0x22:     Memory: 15.34GiB
14:18:17:WU02:FS01:0x22:Free Memory: 11.30GiB
14:18:17:WU02:FS01:0x22:    Threads: POSIX_THREADS
14:18:17:WU02:FS01:0x22: OS Version: 5.4
14:18:17:WU02:FS01:0x22:Has Battery: false
14:18:17:WU02:FS01:0x22: On Battery: false
14:18:17:WU02:FS01:0x22: UTC Offset: 3
14:18:17:WU02:FS01:0x22:        PID: 4478
14:18:17:WU02:FS01:0x22:        CWD: /var/lib/fahclient/work
14:18:17:WU02:FS01:0x22:********************************************************************************
14:18:17:WU02:FS01:0x22:Project: 13421 (Run 7601, Clone 20, Gen 2)
14:18:17:WU02:FS01:0x22:Unit: 0x0000000312bc7d9a5f224a46f0e7c3af
14:18:17:WU02:FS01:0x22:Reading tar file core.xml
14:18:17:WU02:FS01:0x22:Reading tar file integrator.xml
14:18:17:WU02:FS01:0x22:Reading tar file state.xml.bz2
14:18:17:WU02:FS01:0x22:Reading tar file system.xml.bz2
14:18:17:WU02:FS01:0x22:Digital signatures verified
14:18:17:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:18:17:WU02:FS01:0x22:Version 0.0.11
14:18:17:WU02:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
14:18:17:WU02:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
14:18:17:WU02:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
14:18:17:WU02:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
14:18:20:WU02:FS01:0x22:ERROR:NaNs detected in forces. 0 0
14:18:20:WU02:FS01:0x22:Saving result file ../logfile_01.txt
14:18:20:WU02:FS01:0x22:Saving result file science.log
14:18:20:WU02:FS01:0x22:Saving result file state.xml.bz2
14:18:20:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
14:18:20:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
14:18:20:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:7601 clone:20 gen:2 core:0x22 unit:0x0000000312bc7d9a5f224a46f0e7c3af
14:18:20:WU02:FS01:Uploading 292.50KiB to 18.188.125.154
14:18:20:WU02:FS01:Connecting to 18.188.125.154:8080
14:18:21:WU00:FS01:Connecting to assign1.foldingathome.org:80
14:18:21:WU00:FS01:Assigned to work server 128.252.203.9
14:18:21:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF108 [Quadro 600] 245.8  from 128.252.203.9
14:18:21:WU00:FS01:Connecting to 128.252.203.9:8080
14:18:21:WU02:FS01:Upload complete
14:18:22:WU02:FS01:Server responded WORK_ACK (400)
14:18:22:WU02:FS01:Cleaning up
14:18:23:WU00:FS01:Downloading 20.60MiB
14:18:29:WU00:FS01:Download 6.07%
14:18:35:WU00:FS01:Download 20.63%
14:18:38:FS01:Paused
14:18:41:WU00:FS01:Download 30.34%
14:18:47:WU00:FS01:Download 37.01%
14:18:53:WU00:FS01:Download 43.08%
14:18:59:WU00:FS01:Download 53.09%
14:19:05:WU00:FS01:Download 62.80%
14:19:11:WU00:FS01:Download 71.60%
14:19:17:Removing old file 'configs/config-20200813-133001.xml'
14:19:17:Saving configuration to /etc/fahclient/config.xml
14:19:17:<config>
14:19:17:  <!-- Client Control -->
14:19:17:  <fold-anon v='true'/>
14:19:17:  <idle-seconds v='120'/>
14:19:17:
14:19:17:  <!-- Folding Core -->
14:19:17:  <checkpoint v='20'/>
14:19:17:
14:19:17:  <!-- Folding Slot Configuration -->
14:19:17:  <cause v='COVID_19'/>
14:19:17:
14:19:17:  <!-- Network -->
14:19:17:  <proxy v=':8080'/>
14:19:17:
14:19:17:  <!-- Slot Control -->
14:19:17:  <power v='full'/>
14:19:17:
14:19:17:  <!-- User Information -->
14:19:17:  <passkey v='*****'/>
14:19:17:  <team v='279'/>
14:19:17:  <user v='Windhunter'/>
14:19:17:
14:19:17:  <!-- Folding Slots -->
14:19:17:  <slot id='0' type='CPU'>
14:19:17:    <cpus v='2'/>
14:19:17:  </slot>
14:19:17:  <slot id='1' type='GPU'>
14:19:17:    <opencl-index v='0'/>
14:19:17:    <pause-on-start v='true'/>
14:19:17:    <paused v='true'/>
14:19:17:  </slot>
14:19:17:</config>
14:19:17:WU00:FS01:Download 82.22%
14:19:23:WU00:FS01:Download 92.53%
14:19:29:WU00:FS01:Download 97.08%
14:19:30:WU00:FS01:Download complete
14:19:30:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14484 run:0 clone:233 gen:10 core:0x22 unit:0x0000001180fccb095f171a3d48eb2e52
14:20:47:WU01:FS00:0xa7:Completed 37500 out of 125000 steps (30%)
14:24:53:WU01:FS00:0xa7:Completed 38750 out of 125000 steps (31%)
14:25:08:FS01:Unpaused
14:25:08:WU00:FS01:Starting
14:25:08:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1203 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
14:25:08:WU00:FS01:Started FahCore on PID 4651
14:25:08:WU00:FS01:Core PID:4655
14:25:08:WU00:FS01:FahCore 0x22 started
14:25:08:WU00:FS01:0x22:*********************** Log Started 2020-08-13T14:25:08Z ***********************
14:25:08:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
14:25:08:WU00:FS01:0x22:       Core: Core22
14:25:08:WU00:FS01:0x22:       Type: 0x22
14:25:08:WU00:FS01:0x22:    Version: 0.0.11
14:25:08:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:25:08:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
14:25:08:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
14:25:08:WU00:FS01:0x22:       Date: Jun 27 2020
14:25:08:WU00:FS01:0x22:       Time: 22:50:00
14:25:08:WU00:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
14:25:08:WU00:FS01:0x22:     Branch: core22-0.0.11
14:25:08:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:25:08:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:25:08:WU00:FS01:0x22:             -funroll-loops
14:25:08:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:25:08:WU00:FS01:0x22:       Bits: 64
14:25:08:WU00:FS01:0x22:       Mode: Release
14:25:08:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
14:25:08:WU00:FS01:0x22:             <peastman@stanford.edu>
14:25:08:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4651 -checkpoint 20
14:25:08:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
14:25:08:WU00:FS01:0x22:             0 -gpu 0
14:25:08:WU00:FS01:0x22:************************************ libFAH ************************************
14:25:08:WU00:FS01:0x22:       Date: Jun 27 2020
14:25:08:WU00:FS01:0x22:       Time: 22:11:04
14:25:08:WU00:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
14:25:08:WU00:FS01:0x22:     Branch: HEAD
14:25:08:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:25:08:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:25:08:WU00:FS01:0x22:             -funroll-loops
14:25:08:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:25:08:WU00:FS01:0x22:       Bits: 64
14:25:08:WU00:FS01:0x22:       Mode: Release
14:25:08:WU00:FS01:0x22:************************************ CBang *************************************
14:25:08:WU00:FS01:0x22:       Date: Jun 27 2020
14:25:08:WU00:FS01:0x22:       Time: 22:10:11
14:25:08:WU00:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
14:25:08:WU00:FS01:0x22:     Branch: HEAD
14:25:08:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:25:08:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:25:08:WU00:FS01:0x22:             -funroll-loops -fPIC
14:25:08:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:25:08:WU00:FS01:0x22:       Bits: 64
14:25:08:WU00:FS01:0x22:       Mode: Release
14:25:08:WU00:FS01:0x22:************************************ System ************************************
14:25:08:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
14:25:08:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
14:25:08:WU00:FS01:0x22:       CPUs: 4
14:25:08:WU00:FS01:0x22:     Memory: 15.34GiB
14:25:08:WU00:FS01:0x22:Free Memory: 11.22GiB
14:25:08:WU00:FS01:0x22:    Threads: POSIX_THREADS
14:25:08:WU00:FS01:0x22: OS Version: 5.4
14:25:08:WU00:FS01:0x22:Has Battery: false
14:25:08:WU00:FS01:0x22: On Battery: false
14:25:08:WU00:FS01:0x22: UTC Offset: 3
14:25:08:WU00:FS01:0x22:        PID: 4655
14:25:08:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
14:25:08:WU00:FS01:0x22:********************************************************************************
14:25:08:WU00:FS01:0x22:Project: 14484 (Run 0, Clone 233, Gen 10)
14:25:08:WU00:FS01:0x22:Unit: 0x0000001180fccb095f171a3d48eb2e52
14:25:08:WU00:FS01:0x22:Reading tar file core.xml
14:25:08:WU00:FS01:0x22:Reading tar file integrator.xml.bz2
14:25:08:WU00:FS01:0x22:Reading tar file state.xml.bz2
14:25:08:WU00:FS01:0x22:Reading tar file system.xml.bz2
14:25:08:WU00:FS01:0x22:Digital signatures verified
14:25:08:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:25:08:WU00:FS01:0x22:Version 0.0.11
14:25:08:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
14:25:08:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
14:25:08:WU00:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
14:25:08:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
14:25:23:Removing old file 'configs/config-20200813-133024.xml'
14:25:23:Saving configuration to /etc/fahclient/config.xml
14:25:23:<config>
14:25:23:  <!-- Client Control -->
14:25:23:  <fold-anon v='true'/>
14:25:23:  <idle-seconds v='120'/>
14:25:23:
14:25:23:  <!-- Folding Core -->
14:25:23:  <checkpoint v='20'/>
14:25:23:
14:25:23:  <!-- Folding Slot Configuration -->
14:25:23:  <cause v='COVID_19'/>
14:25:23:
14:25:23:  <!-- Network -->
14:25:23:  <proxy v=':8080'/>
14:25:23:
14:25:23:  <!-- Slot Control -->
14:25:23:  <power v='full'/>
14:25:23:
14:25:23:  <!-- User Information -->
14:25:23:  <passkey v='*****'/>
14:25:23:  <team v='279'/>
14:25:23:  <user v='Windhunter'/>
14:25:23:
14:25:23:  <!-- Folding Slots -->
14:25:23:  <slot id='0' type='CPU'>
14:25:23:    <cpus v='2'/>
14:25:23:  </slot>
14:25:23:  <slot id='1' type='GPU'>
14:25:23:    <opencl-index v='0'/>
14:25:23:    <pause-on-start v='true'/>
14:25:23:  </slot>
14:25:23:</config>
14:25:31:WU00:FS01:0x22:Completed 0 out of 1250000 steps (0%)
14:26:18:FS01:Paused

Re: Failed to continue GPU WU after reboot

Posted: Thu Aug 13, 2020 5:16 pm
by bruce
According to what I see, Project: 13421 (Run 4757, Clone 4, Gen 1) experiences a NaN error when starting. I believe this is one of the troublesome errors for which a fix has not been established. Please go to the log history where you will find the previous log. The tail of that log should contain the beginning of that same WU: Project: 13421 (Run 4757, Clone 4, Gen 1). Apparently it downloaded then and perhaps did some processing. Presumably, there will also be evidence that you did, in fact, pause that WU before terminating FAHClient. That information will probably be very valuable in diagnosing this problem.

Re: Failed to continue GPU WU after reboot

Posted: Thu Aug 13, 2020 5:33 pm
by Windhunter
bruce,

Here is the log before reboot

Code: Select all

08:28:49:FS01:Unpaused
08:28:50:WU01:FS01:Connecting to assign1.foldingathome.org:80
08:28:50:WU01:FS01:Assigned to work server 18.188.125.154
08:28:50:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF108 [Quadro 600] 245.8  from 18.188.125.154
08:28:50:WU01:FS01:Connecting to 18.188.125.154:8080
08:28:51:WU01:FS01:Downloading 359.00KiB
08:28:53:WU01:FS01:Download complete
08:28:53:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13421 run:4757 clone:4 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20bd49d859ebba
08:28:53:WU01:FS01:Starting
08:28:53:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 1190 -checkpoint 20 -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
08:28:53:WU01:FS01:Started FahCore on PID 4951
08:28:53:WU01:FS01:Core PID:4955
08:28:53:WU01:FS01:FahCore 0x22 started
08:28:54:WU01:FS01:0x22:*********************** Log Started 2020-08-11T08:28:53Z ***********************
08:28:54:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
08:28:54:WU01:FS01:0x22:       Core: Core22
08:28:54:WU01:FS01:0x22:       Type: 0x22
08:28:54:WU01:FS01:0x22:    Version: 0.0.11
08:28:54:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:28:54:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
08:28:54:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
08:28:54:WU01:FS01:0x22:       Date: Jun 27 2020
08:28:54:WU01:FS01:0x22:       Time: 22:50:00
08:28:54:WU01:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
08:28:54:WU01:FS01:0x22:     Branch: core22-0.0.11
08:28:54:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:28:54:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
08:28:54:WU01:FS01:0x22:             -funroll-loops
08:28:54:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
08:28:54:WU01:FS01:0x22:       Bits: 64
08:28:54:WU01:FS01:0x22:       Mode: Release
08:28:54:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
08:28:54:WU01:FS01:0x22:             <peastman@stanford.edu>
08:28:54:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 4951 -checkpoint 20
08:28:54:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
08:28:54:WU01:FS01:0x22:************************************ libFAH ************************************
08:28:54:WU01:FS01:0x22:       Date: Jun 27 2020
08:28:54:WU01:FS01:0x22:       Time: 22:11:04
08:28:54:WU01:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
08:28:54:WU01:FS01:0x22:     Branch: HEAD
08:28:54:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:28:54:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
08:28:54:WU01:FS01:0x22:             -funroll-loops
08:28:54:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
08:28:54:WU01:FS01:0x22:       Bits: 64
08:28:54:WU01:FS01:0x22:       Mode: Release
08:28:54:WU01:FS01:0x22:************************************ CBang *************************************
08:28:54:WU01:FS01:0x22:       Date: Jun 27 2020
08:28:54:WU01:FS01:0x22:       Time: 22:10:11
08:28:54:WU01:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
08:28:54:WU01:FS01:0x22:     Branch: HEAD
08:28:54:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:28:54:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
08:28:54:WU01:FS01:0x22:             -funroll-loops -fPIC
08:28:54:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
08:28:54:WU01:FS01:0x22:       Bits: 64
08:28:54:WU01:FS01:0x22:       Mode: Release
08:28:54:WU01:FS01:0x22:************************************ System ************************************
08:28:54:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
08:28:54:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
08:28:54:WU01:FS01:0x22:       CPUs: 4
08:28:54:WU01:FS01:0x22:     Memory: 15.34GiB
08:28:54:WU01:FS01:0x22:Free Memory: 10.58GiB
08:28:54:WU01:FS01:0x22:    Threads: POSIX_THREADS
08:28:54:WU01:FS01:0x22: OS Version: 5.4
08:28:54:WU01:FS01:0x22:Has Battery: false
08:28:54:WU01:FS01:0x22: On Battery: false
08:28:54:WU01:FS01:0x22: UTC Offset: 3
08:28:54:WU01:FS01:0x22:        PID: 4955
08:28:54:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
08:28:54:WU01:FS01:0x22:********************************************************************************
08:28:54:WU01:FS01:0x22:Project: 13421 (Run 4757, Clone 4, Gen 1)
08:28:54:WU01:FS01:0x22:Unit: 0x0000000312bc7d9a5f20bd49d859ebba
08:28:54:WU01:FS01:0x22:Reading tar file core.xml
08:28:54:WU01:FS01:0x22:Reading tar file integrator.xml
08:28:54:WU01:FS01:0x22:Reading tar file state.xml.bz2
08:28:54:WU01:FS01:0x22:Reading tar file system.xml.bz2
08:28:54:WU01:FS01:0x22:Digital signatures verified
08:28:54:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
08:28:54:WU01:FS01:0x22:Version 0.0.11
08:28:54:WU01:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:28:54:WU01:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:28:54:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:28:54:WU01:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
08:28:57:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
08:29:31:Removing old file 'configs/config-20200806-100718.xml'
08:29:31:Saving configuration to /etc/fahclient/config.xml
08:29:31:<config>
08:29:31:  <!-- Client Control -->
08:29:31:  <fold-anon v='true'/>
08:29:31:  <idle-seconds v='120'/>
08:29:31:
08:29:31:  <!-- Folding Core -->
08:29:31:  <checkpoint v='20'/>
08:29:31:
08:29:31:  <!-- Folding Slot Configuration -->
08:29:31:  <cause v='COVID_19'/>
08:29:31:
08:29:31:  <!-- Network -->
08:29:31:  <proxy v=':8080'/>
08:29:31:
08:29:31:  <!-- Slot Control -->
08:29:31:  <power v='full'/>
08:29:31:
08:29:31:  <!-- User Information -->
08:29:31:  <passkey v='*****'/>
08:29:31:  <team v='279'/>
08:29:31:  <user v='Windhunter'/>
08:29:31:
08:29:31:  <!-- Folding Slots -->
08:29:31:  <slot id='0' type='CPU'>
08:29:31:    <cpus v='3'/>
08:29:31:  </slot>
08:29:31:  <slot id='1' type='GPU'>
08:29:31:    <cuda-index v='0'/>
08:29:31:    <max-packet-size v='small'/>
08:29:31:    <opencl-index v='0'/>
08:29:31:    <pause-on-start v='true'/>
08:29:31:  </slot>
08:29:31:</config>
08:29:48:FS01:Finishing
08:30:32:WU00:FS00:0xa7:Completed 105000 out of 125000 steps (84%)
08:30:46:Removing old file 'configs/config-20200806-115807.xml'
08:30:46:Saving configuration to /etc/fahclient/config.xml
08:30:46:<config>
08:30:46:  <!-- Client Control -->
08:30:46:  <fold-anon v='true'/>
08:30:46:  <idle-seconds v='120'/>
08:30:46:
08:30:46:  <!-- Folding Core -->
08:30:46:  <checkpoint v='20'/>
08:30:46:
08:30:46:  <!-- Folding Slot Configuration -->
08:30:46:  <cause v='COVID_19'/>
08:30:46:
08:30:46:  <!-- Network -->
08:30:46:  <proxy v=':8080'/>
08:30:46:
08:30:46:  <!-- Slot Control -->
08:30:46:  <power v='full'/>
08:30:46:
08:30:46:  <!-- User Information -->
08:30:46:  <passkey v='*****'/>
08:30:46:  <team v='279'/>
08:30:46:  <user v='Windhunter'/>
08:30:46:
08:30:46:  <!-- Folding Slots -->
08:30:46:  <slot id='0' type='CPU'>
08:30:46:    <cpus v='2'/>
08:30:46:  </slot>
08:30:46:  <slot id='1' type='GPU'>
08:30:46:    <cuda-index v='0'/>
08:30:46:    <max-packet-size v='small'/>
08:30:46:    <opencl-index v='0'/>
08:30:46:    <pause-on-start v='true'/>
08:30:46:  </slot>
08:30:46:</config>
08:30:46:FS00:Shutting core down
08:30:47:WU00:FS00:0xa7:Caught signal SIGINT(2) on PID 4265
08:30:47:WU00:FS00:0xa7:Exiting, please wait. . .
08:30:50:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
08:30:50:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
08:30:51:WU00:FS00:Starting
[93m08:30:51:WARNING:WU00:FS00:Changed SMP threads from 3 to 2 this can cause some work units to fail[0m
08:30:51:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 1190 -checkpoint 20 -np 2
08:30:51:WU00:FS00:Started FahCore on PID 4997
08:30:51:WU00:FS00:Core PID:5001
08:30:51:WU00:FS00:FahCore 0xa7 started
08:30:51:WU00:FS00:0xa7:*********************** Log Started 2020-08-11T08:30:51Z ***********************
08:30:51:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
08:30:51:WU00:FS00:0xa7:       Type: 0xa7
08:30:51:WU00:FS00:0xa7:       Core: Gromacs
08:30:51:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4997 -checkpoint 20 -np 2
08:30:51:WU00:FS00:0xa7:************************************ CBang *************************************
08:30:51:WU00:FS00:0xa7:       Date: Nov 27 2019
08:30:51:WU00:FS00:0xa7:       Time: 11:26:54
08:30:51:WU00:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
08:30:51:WU00:FS00:0xa7:     Branch: master
08:30:51:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
08:30:51:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:30:51:WU00:FS00:0xa7:             -fno-pie -fPIC
08:30:51:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
08:30:51:WU00:FS00:0xa7:       Bits: 64
08:30:51:WU00:FS00:0xa7:       Mode: Release
08:30:51:WU00:FS00:0xa7:************************************ System ************************************
08:30:51:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
08:30:51:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
08:30:51:WU00:FS00:0xa7:       CPUs: 4
08:30:51:WU00:FS00:0xa7:     Memory: 15.34GiB
08:30:51:WU00:FS00:0xa7:Free Memory: 10.90GiB
08:30:51:WU00:FS00:0xa7:    Threads: POSIX_THREADS
08:30:51:WU00:FS00:0xa7: OS Version: 5.4
08:30:51:WU00:FS00:0xa7:Has Battery: false
08:30:51:WU00:FS00:0xa7: On Battery: false
08:30:51:WU00:FS00:0xa7: UTC Offset: 3
08:30:51:WU00:FS00:0xa7:        PID: 5001
08:30:51:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
08:30:51:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
08:30:51:WU00:FS00:0xa7:    Version: 0.0.19
08:30:51:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:30:51:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
08:30:51:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
08:30:51:WU00:FS00:0xa7:       Date: Nov 26 2019
08:30:51:WU00:FS00:0xa7:       Time: 00:41:42
08:30:51:WU00:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
08:30:51:WU00:FS00:0xa7:     Branch: master
08:30:51:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
08:30:51:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
08:30:51:WU00:FS00:0xa7:             -fno-pie
08:30:51:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
08:30:51:WU00:FS00:0xa7:       Bits: 64
08:30:51:WU00:FS00:0xa7:       Mode: Release
08:30:51:WU00:FS00:0xa7:************************************ Build *************************************
08:30:51:WU00:FS00:0xa7:       SIMD: avx_256
08:30:51:WU00:FS00:0xa7:********************************************************************************
08:30:51:WU00:FS00:0xa7:Project: 14480 (Run 0, Clone 132, Gen 42)
08:30:51:WU00:FS00:0xa7:Unit: 0x0000003080fccb095f171bfe195e3cb6
08:30:51:WU00:FS00:0xa7:Digital signatures verified
08:30:51:WU00:FS00:0xa7:Calling: mdrun -s frame42.tpr -o frame42.trr -x frame42.xtc -cpi state.cpt -cpt 20 -nt 2
08:30:51:WU00:FS00:0xa7:Steps: first=5250000 total=125000
08:30:53:WU00:FS00:0xa7:Completed 105112 out of 125000 steps (84%)
08:31:33:Removing old file 'configs/config-20200806-181417.xml'
08:31:33:Saving configuration to /etc/fahclient/config.xml
08:31:33:<config>
08:31:33:  <!-- Client Control -->
08:31:33:  <fold-anon v='true'/>
08:31:33:  <idle-seconds v='120'/>
08:31:33:
08:31:33:  <!-- Folding Core -->
08:31:33:  <checkpoint v='20'/>
08:31:33:
08:31:33:  <!-- Folding Slot Configuration -->
08:31:33:  <cause v='COVID_19'/>
08:31:33:
08:31:33:  <!-- Network -->
08:31:33:  <proxy v=':8080'/>
08:31:33:
08:31:33:  <!-- Slot Control -->
08:31:33:  <power v='full'/>
08:31:33:
08:31:33:  <!-- User Information -->
08:31:33:  <passkey v='*****'/>
08:31:33:  <team v='279'/>
08:31:33:  <user v='Windhunter'/>
08:31:33:
08:31:33:  <!-- Folding Slots -->
08:31:33:  <slot id='0' type='CPU'>
08:31:33:    <cpus v='2'/>
08:31:33:  </slot>
08:31:33:  <slot id='1' type='GPU'>
08:31:33:    <cuda-index v='0'/>
08:31:33:    <max-packet-size v='small'/>
08:31:33:    <opencl-index v='0'/>
08:31:33:    <pause-on-start v='true'/>
08:31:33:  </slot>
08:31:33:</config>
08:34:15:WU00:FS00:0xa7:Completed 106250 out of 125000 steps (85%)
08:34:21:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)

...

15:15:41:WU01:FS01:0x22:Completed 750000 out of 1000000 steps (75%)
15:17:37:WU00:FS00:0xa7:Completed 25000 out of 250000 steps (10%)
15:20:20:WU00:FS00:0xa7:Completed 27500 out of 250000 steps (11%)
15:20:55:WU01:FS01:0x22:Completed 760000 out of 1000000 steps (76%)
15:23:02:WU00:FS00:0xa7:Completed 30000 out of 250000 steps (12%)
15:25:52:WU00:FS00:0xa7:Completed 32500 out of 250000 steps (13%)
15:26:14:WU01:FS01:0x22:Completed 770000 out of 1000000 steps (77%)
15:26:46:FS01:Paused
15:26:46:FS01:Shutting core down
15:26:46:WU01:FS01:0x22:Caught signal SIGINT(2) on PID 4955
15:26:46:WU01:FS01:0x22:Exiting, please wait. . .
15:26:46:WU01:FS01:0x22:Folding@home Core Shutdown: INTERRUPTED
15:26:47:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
15:27:22:Removing old file 'configs/config-20200811-080227.xml'
15:27:22:Saving configuration to /etc/fahclient/config.xml
15:27:22:<config>
15:27:22:  <!-- Client Control -->
15:27:22:  <fold-anon v='true'/>
15:27:22:  <idle-seconds v='120'/>
15:27:22:
15:27:22:  <!-- Folding Core -->
15:27:22:  <checkpoint v='20'/>
15:27:22:
15:27:22:  <!-- Folding Slot Configuration -->
15:27:22:  <cause v='COVID_19'/>
15:27:22:
15:27:22:  <!-- Network -->
15:27:22:  <proxy v=':8080'/>
15:27:22:
15:27:22:  <!-- Slot Control -->
15:27:22:  <power v='full'/>
15:27:22:
15:27:22:  <!-- User Information -->
15:27:22:  <passkey v='*****'/>
15:27:22:  <team v='279'/>
15:27:22:  <user v='Windhunter'/>
15:27:22:
15:27:22:  <!-- Folding Slots -->
15:27:22:  <slot id='0' type='CPU'>
15:27:22:    <cpus v='2'/>
15:27:22:  </slot>
15:27:22:  <slot id='1' type='GPU'>
15:27:22:    <cuda-index v='0'/>
15:27:22:    <max-packet-size v='small'/>
15:27:22:    <opencl-index v='0'/>
15:27:22:    <pause-on-start v='true'/>
15:27:22:    <paused v='true'/>
15:27:22:  </slot>
15:27:22:</config>
15:28:35:WU00:FS00:0xa7:Completed 35000 out of 250000 steps (14%)
15:28:59:Caught signal SIGINT(2) on PID 1190
15:28:59:Exiting, please wait. . .
15:29:00:FS00:Shutting core down
15:29:01:WU00:FS00:0xa7:Caught signal SIGINT(2) on PID 23699
15:29:01:WU00:FS00:0xa7:Exiting, please wait. . .
15:29:02:Clean exit

Re: Failed to continue GPU WU after reboot

Posted: Sat Aug 29, 2020 3:16 pm
by ng0177
Hi, I encountered precisely the same problem all of a sudden. It is odd that it only happens for the GPU whilst the CPU is fine.

Code: Select all

[quote]aur/foldingathome-beta 7.6.17-1 (+4 1.01) (Installed)
on
5.8.3-arch1-1[/quote]
In addition, I have to delete these two lines:

Code: Select all

<pci-bus v='29'/>
<pci-slot v='0'/>
from:

Code: Select all

sudo systemctl stop foldingathome

sudo nano /opt/fah/config.xml
<config>
....
  <!-- Folding Slots -->
  <slot id='0' type='CPU'>
  </slot>
  <slot id='1' type='GPU'>
  <pci-bus v='29'/>
  <pci-slot v='0'/>
  </slot>
</config>

sudo systemctl start foldingathome
to do away with the GPU "Disabled" status.

If a client is not running continously, holding it and re-starting the next day may be typical; hopefully w/o loosing work.

Re: Failed to continue GPU WU after reboot

Posted: Sat Aug 29, 2020 3:57 pm
by Neil-B
GPU an CPU WUs use totally different cores (CPU Gromacs based, GPU OpenMM based) so issues with one and not the other are perfectly possible (and probably actually the norm)

Re: Failed to continue GPU WU after reboot

Posted: Sat Aug 29, 2020 6:46 pm
by PantherX
ng0177 wrote:Hi, I encountered precisely the same problem all of a sudden. It is odd that it only happens for the GPU whilst the CPU is fine.
...aur/foldingathome-beta 7.6.17-1 (+4 1.01) (Installed)...
Please note that your issue seems to be using a developmental build which is not stable. If you revert to V7.6.13, everything will be functioning as expected :)

In future, please keep an eye out on this Forum as any Public Beta will be announced here at which point, you're more than welcomed to try out the Beta software and report your feedback. Until then, please refrain from using developmental builds as they are unstable.