Page 2 of 2

Re: Cancel GPU WU

Posted: Mon Oct 24, 2016 5:26 pm
by SteveWillis
ChristianVirtual wrote:@SteveWillis,
can you please post part of the log files where the completion is mentioned; there where cases recently with another project where the total steps to complete a WU got wrong.
Ideally you can find in your log a WU from the impacted project which run well (and post few completion log lines) and find a WU with longer completions time (please post in all case the full Project-Run-Clone-Gen information)
Any update? I keep getting these and they are killing my PPD.

Re: Cancel GPU WU

Posted: Mon Oct 24, 2016 9:16 pm
by ChristianVirtual
SteveWillis wrote:
ChristianVirtual wrote:@SteveWillis,
can you please post part of the log files where the completion is mentioned; there where cases recently with another project where the total steps to complete a WU got wrong.
Ideally you can find in your log a WU from the impacted project which run well (and post few completion log lines) and find a WU with longer completions time (please post in all case the full Project-Run-Clone-Gen information)
Any update? I keep getting these and they are killing my PPD.
Wonder what motherboard you have, what CPU and if your 3rd card get enough data through the PCIe lanes. Can you please share the first 100 lines of your logfile with system info and actual config ? Also what driver version from nV.

A 9802 on my 980ti took 4m21s TPF resulting in 571k PPD; not exciting either compare to the 700k I nor all get.

So I think we have partially a problem with capacity of your MB/card combination but also "unlucky" benchmarking of this early projects for faster GPUs (happen from time to time).

Re: Cancel GPU WU

Posted: Mon Oct 24, 2016 10:51 pm
by SteveWillis
ChristianVirtual wrote:
SteveWillis wrote:
ChristianVirtual wrote:@SteveWillis,
can you please post part of the log files where the completion is mentioned; there where cases recently with another project where the total steps to complete a WU got wrong.
Ideally you can find in your log a WU from the impacted project which run well (and post few completion log lines) and find a WU with longer completions time (please post in all case the full Project-Run-Clone-Gen information)
Any update? I keep getting these and they are killing my PPD.
Wonder what motherboard you have, what CPU and if your 3rd card get enough data through the PCIe lanes. Can you please share the first 100 lines of your logfile with system info and actual config ? Also what driver version from nV.

A 9802 on my 980ti took 4m21s TPF resulting in 571k PPD; not exciting either compare to the 700k I nor all get.

So I think we have partially a problem with capacity of your MB/card combination but also "unlucky" benchmarking of this early projects for faster GPUs (happen from time to time).
Thank you for taking time to look at this, I really appreciate it.
These are occurring randomly across my 3 GPUs. The only thing the projects have in common is that they are 92xx and all run by Jade SHi. Right now two of my three work queues are these. So I have one project earning 837K and two earning 438K and 466K PPD. When I have three projects that aren't these the PPD is typically around 800K each but that seldom happens since I'm getting so many of these Jade SHi ones.
You may be completely right but then why do I occassionally get three projects at once that I get 2.4M PPD?


Motherboard =
[MB-AM3-AS-SB-990FXR2] qty 1 Asus
Sabertooth 990FX

CPU =
[CPU-AM3-FX-8320BR] qty 1 AMD FX
8320 Eight Core 3.5GHz


Code: Select all

*********************** Log Started 2016-10-19T08:37:21Z ***********************
08:37:21:************************* Folding@home Client *************************
08:37:21:    Website: http://folding.stanford.edu/
08:37:21:  Copyright: (c) 2009-2014 Stanford University
08:37:21:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:37:21:       Args: --child --lifeline 1796 /etc/fahclient/config.xml --run-as
08:37:21:             fahclient --pid-file=/var/run/fahclient.pid --daemon
08:37:21:     Config: /etc/fahclient/config.xml
08:37:21:******************************** Build ********************************
08:37:21:    Version: 7.4.4
08:37:21:       Date: Mar 4 2014
08:37:21:       Time: 12:02:38
08:37:21:    SVN Rev: 4130
08:37:21:     Branch: fah/trunk/client
08:37:21:   Compiler: GNU 4.4.7
08:37:21:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
08:37:21:             -fno-unsafe-math-optimizations -msse2
08:37:21:   Platform: linux2 3.2.0-1-amd64
08:37:21:       Bits: 64
08:37:21:       Mode: Release
08:37:21:******************************* System ********************************
08:37:21:        CPU: AMD FX(tm)-8320 Eight-Core Processor
08:37:21:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
08:37:21:       CPUs: 8
08:37:21:     Memory: 31.32GiB
08:37:21:Free Memory: 30.92GiB
08:37:21:    Threads: POSIX_THREADS
08:37:21: OS Version: 3.19
08:37:21:Has Battery: false
08:37:21: On Battery: false
08:37:21: UTC Offset: -5
08:37:21:        PID: 1798
08:37:21:        CWD: /var/lib/fahclient
08:37:21:         OS: Linux 3.19.0-32-generic x86_64
08:37:21:    OS Arch: AMD64
08:37:21:       GPUs: 6
08:37:21:      GPU 0: NVIDIA:5 GP104 [GeForce GTX 1080]
08:37:21:      GPU 1: UNSUPPORTED: NV3 [PCI]
08:37:21:      GPU 2: NVIDIA:5 GP104 [GeForce GTX 1080]
08:37:21:      GPU 3: UNSUPPORTED: NV3 [PCI]
08:37:21:      GPU 4: NVIDIA:5 GP104 [GeForce GTX 1080]
08:37:21:      GPU 5: UNSUPPORTED: NV3 [PCI]
08:37:21:       CUDA: 6.1
08:37:21:CUDA Driver: 8000
08:37:21:***********************************************************************
08:37:21:<config>
08:37:21:  <!-- Client Control -->
08:37:21:  <fold-anon v='true'/>
08:37:21:
08:37:21:  <!-- Folding Slot Configuration -->
08:37:21:  <gpu v='false'/>
08:37:21:
08:37:21:  <!-- Network -->
08:37:21:  <proxy v=':8080'/>
08:37:21:
08:37:21:  <!-- User Information -->
08:37:21:  <passkey v='********************************'/>
08:37:21:  <team v='11086'/>
08:37:21:  <user v='SteveWillis'/>
08:37:21:
08:37:21:  <!-- Folding Slots -->
08:37:21:  <slot id='1' type='GPU'/>
08:37:21:  <slot id='2' type='GPU'/>
08:37:21:  <slot id='3' type='GPU'/>
08:37:21:</config>
08:37:21:Switching to user fahclient
08:37:21:Trying to access database...
08:37:21:Successfully acquired database lock
08:37:21:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1080]
08:37:21:Enabled folding slot 02: READY gpu:2:GP104 [GeForce GTX 1080]
08:37:21:Enabled folding slot 03: READY gpu:4:GP104 [GeForce GTX 1080]
08:37:21:WU02:FS01:Starting
08:37:21:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1798 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
08:37:21:WU02:FS01:Started FahCore on PID 1829
08:37:22:WU02:FS01:Core PID:1838
08:37:22:WU02:FS01:FahCore 0x21 started
08:37:22:WU00:FS02:Starting
08:37:22:WU00:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1798 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
08:37:22:WU00:FS02:Started FahCore on PID 1840
08:37:22:WU00:FS02:Core PID:1844
08:37:22:WU00:FS02:FahCore 0x21 started
08:37:23:WU01:FS03:Starting
08:37:23:WU01:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1798 -checkpoint 15 -gpu 2 -gpu-vendor nvidia

Re: Cancel GPU WU

Posted: Mon Oct 24, 2016 11:06 pm
by SteveWillis
I guess the definitive test is to remove one of the GPUs and see how the remaining two do. I have one set to finish now.

Re: Cancel GPU WU

Posted: Tue Oct 25, 2016 6:14 am
by bruce
SteveWillis wrote:Let me throw this in. I'm folding on three nvidia gtx 1080 cards. No overclock. No over heating. Most WUs give me about 700 - 800 estimated PPD. With around 1/2 to two minutes per fold. In the last couple of days I have noticed 3 WUs in the 92xx range, 9210, 9212, and 9208. All run by Jade SHi. These all run significantly longer, give about half the PPD, and have TPFs of 5 to 7 minutes. Just wanted to let you know you aren't alone.
WUs which generate lower PPDs are a legitimate problem (addressed in your other topic on the same issue) but ...
unluckycandy wrote:Is there a way to cancel a work unit it is extremely unstable, is taking forever, and keeps re-starting.
You didn't report anything that was "extremely unstable" or that "keeps re-starting" and those are quite different issues than what you might call "taking forever"

Re: Cancel GPU WU

Posted: Tue Oct 25, 2016 7:51 am
by SteveWillis
You didn't report anything that was "extremely unstable" or that "keeps re-starting" and those are quite different issues than what you might call "taking forever"
I can't argue with that.

Re: Cancel GPU WU

Posted: Tue Oct 25, 2016 11:38 am
by SteveWillis
Well, at least it isn't just me. viewtopic.php?f=74&t=29035