Page 1 of 1

Bad Work Unit (Potential Energy??)

Posted: Tue Dec 22, 2015 3:14 am
by Kebast
Anyone know what happened here?

Code: Select all

13:57:34:************************* Folding@home Client *************************
13:57:34:    Website: http://folding.stanford.edu/
13:57:34:  Copyright: (c) 2009-2014 Stanford University
13:57:34:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:57:34:       Args: --child --lifeline 1907 /etc/fahclient/config.xml --run-as
13:57:34:             fahclient --pid-file=/var/run/fahclient.pid --daemon
13:57:34:     Config: /etc/fahclient/config.xml
13:57:34:******************************** Build ********************************
13:57:34:    Version: 7.4.4
13:57:34:       Date: Mar 4 2014
13:57:34:       Time: 12:02:38
13:57:34:    SVN Rev: 4130
13:57:34:     Branch: fah/trunk/client
13:57:34:   Compiler: GNU 4.4.7
13:57:34:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
13:57:34:             -fno-unsafe-math-optimizations -msse2
13:57:34:   Platform: linux2 3.2.0-1-amd64
13:57:34:       Bits: 64
13:57:34:       Mode: Release
13:57:34:******************************* System ********************************
13:57:34:        CPU: AMD FX(tm)-6300 Six-Core Processor
13:57:34:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
13:57:34:       CPUs: 6
13:57:34:     Memory: 15.64GiB
13:57:34:Free Memory: 15.12GiB
13:57:34:    Threads: POSIX_THREADS
13:57:34: OS Version: 3.13
13:57:34:Has Battery: false
13:57:34: On Battery: false
13:57:34: UTC Offset: -5
13:57:34:        PID: 1909
13:57:34:        CWD: /var/lib/fahclient
13:57:34:         OS: Linux 3.13.0-71-generic x86_64
13:57:34:    OS Arch: AMD64
13:57:34:       GPUs: 1
13:57:34:      GPU 0: NVIDIA:4 GM107 [GeForce GTX 750 Ti]
13:57:34:       CUDA: 5.0
13:57:34:CUDA Driver: 7050
13:57:34:***********************************************************************
13:57:34:<config>
13:57:34:  <!-- Client Control -->
13:57:34:  <fold-anon v='true'/>
13:57:34:
13:57:34:  <!-- Folding Slot Configuration -->
13:57:34:  <gpu v='false'/>
13:57:34:
13:57:34:  <!-- HTTP Server -->
13:57:34:  <allow v='127.0.0.1,192.168.66.95-192.168.66.151'/>
13:57:34:
13:57:34:  <!-- Network -->
13:57:34:  <proxy v=':8080'/>
13:57:34:
13:57:34:  <!-- Remote Command Server -->
13:57:34:  <command-allow-no-pass v='127.0.0.1,192.168.66.95-192.168.66.151'/>
13:57:34:
13:57:34:  <!-- Slot Control -->
13:57:34:  <power v='full'/>
13:57:34:
13:57:34:  <!-- User Information -->
13:57:34:  <passkey v='********************************'/>
13:57:34:  <team v='229226'/>
13:57:34:  <user v='Kebast'/>
13:57:34:
13:57:34:  <!-- Work Unit Control -->
13:57:34:  <next-unit-percentage v='100'/>
13:57:34:
13:57:34:  <!-- Folding Slots -->
13:57:34:  <slot id='1' type='GPU'>
13:57:34:    <client-type v='advanced'/>
13:57:34:  </slot>
13:57:34:</config>

21:00:27:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 171.64.65.104
21:00:27:WU00:FS01:Connecting to 171.64.65.104:8080
21:00:29:WU00:FS01:Downloading 10.04MiB
21:00:35:WU00:FS01:Download 8.72%
21:00:45:WU00:FS01:Download 13.70%
21:00:52:WU00:FS01:Download 16.81%
21:00:58:WU00:FS01:Download 18.68%
21:01:04:WU00:FS01:Download 40.48%
21:01:08:WU00:FS01:Download complete
21:01:08:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9209 run:0 clone:0 gen:97 core:0x21 unit:0x0000008e664f2dd055edef326a5050fe
21:01:08:WU00:FS01:Starting
21:01:08:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1909 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:01:08:WU00:FS01:Started FahCore on PID 24167
21:01:08:WU00:FS01:Core PID:24171
21:01:08:WU00:FS01:FahCore 0x21 started
21:01:09:WU00:FS01:0x21:*********************** Log Started 2015-12-21T21:01:08Z ***********************
21:01:09:WU00:FS01:0x21:Project: 9209 (Run 0, Clone 0, Gen 97)
21:01:09:WU00:FS01:0x21:Unit: 0x0000008e664f2dd055edef326a5050fe
21:01:09:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:01:09:WU00:FS01:0x21:Machine: 1
21:01:09:WU00:FS01:0x21:Reading tar file core.xml
21:01:09:WU00:FS01:0x21:Reading tar file system.xml
21:01:10:WU00:FS01:0x21:Reading tar file integrator.xml
21:01:10:WU00:FS01:0x21:Reading tar file state.xml
21:01:11:WU00:FS01:0x21:Digital signatures verified
21:01:11:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:01:11:WU00:FS01:0x21:Version 0.0.12
21:01:43:WU00:FS01:0x21:ERROR:Potential energy error of 1624.45, threshold of 10
21:01:43:WU00:FS01:0x21:ERROR:Reference Potential Energy: -1.23981e+06 | Given Potential Energy: -1.23818e+06
21:01:43:WU00:FS01:0x21:Saving result file logfile_01.txt
21:01:43:WU00:FS01:0x21:Saving result file log.txt
21:01:43:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
21:01:43:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:01:43:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9209 run:0 clone:0 gen:97 core:0x21 unit:0x0000008e664f2dd055edef326a5050fe
21:01:43:WU00:FS01:Uploading 2.36KiB to 171.64.65.104
21:01:43:WU00:FS01:Connecting to 171.64.65.104:8080
21:01:44:WU00:FS01:Upload complete

Re: Bad Work Unit (Potential Energy??)

Posted: Tue Dec 22, 2015 5:30 am
by bruce
Yes. There are two possible causes.

1) That core has had a number of changes since the version you have and most likely you've encountered a bug for which a fix will be available very soon. You're running version 0.0.12 and version 0.0.17 is currently being beta tested and hopefully will be ready for distribution very soon.

2) If you're overclocking, a GPU instability might be the cause of the same error message.

Re: Bad Work Unit (Potential Energy??)

Posted: Tue Dec 22, 2015 6:14 am
by bruce
You appear to be running Linux.

* Pause all active FAH WUs and wait for all FAHCore* processes to end.
* > cd /var/lib/fahclient
* > sudo rm -r Cores
* One-by-one, restart each of your slots and they should download some fresh copies of the FahCores, including a later version of FahCore_21.
(It won't be v17, but it will be a later version which fixed at least one of the possible causes of that error.)

Re: Bad Work Unit (Potential Energy??)

Posted: Tue Dec 22, 2015 9:27 am
by toTOW
Note that someone else has been able to complete the WU.

Re: Bad Work Unit (Potential Energy??)

Posted: Thu Dec 24, 2015 11:19 pm
by Kebast
Thanks, it is updating the core now.
I'd expect a bug vs any overclocking issue. It failed immediately, but I am lightly overclocking that card, so could be that too. I looked back through my logs, and the only other error for the entire month of December was on 12-DEC-15 of the same type. Potential Energy error on 9211-0-69-10. It also failed immediately before showing any work percentage completed.
bruce wrote:You appear to be running Linux.

* Pause all active FAH WUs and wait for all FAHCore* processes to end.
* > cd /var/lib/fahclient
* > sudo rm -r Cores
* One-by-one, restart each of your slots and they should download some fresh copies of the FahCores, including a later version of FahCore_21.
(It won't be v17, but it will be a later version which fixed at least one of the possible causes of that error.)