Project 16918 sitting at less than 10% of average performanc

Moderators: Site Moderators, FAHC Science Team

ThWuensche
Posts: 79
Joined: Fri May 29, 2020 4:10 pm

Project 16918 sitting at less than 10% of average performanc

Post by ThWuensche »

I now have 4 of 16918 WUs running on 4 Radeon VII on one host and they are all in the range of 90000 to 110000 PPD compared of an average of about 1350000 PPD otherwise. And these WUs are running for around 18h, where other WUs typically take from 1h30 to 4h.

What's wrong with them?
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by _r2w_ben »

16900-16919 except 16918 have low atom counts, i.e. less than 15K. If it was one of those other projects, it would be explainable since the Radeon VII is wide with a large number of shaders. Can you go to science.log in the work folder and post this section?

Code: Select all

Stream copied, deserializing...
    Checking for system.xml
    Found system.xml
  Deserializing System...successful.
  Found 11541 atoms, 5 forces.
ThWuensche
Posts: 79
Joined: Fri May 29, 2020 4:10 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by ThWuensche »

Here is the part of science.log. I pasted from the beginning of the file, so that the setup is visible. The CPU is running a WU on 12 of the 24 cores, so plenty of free capacity on the CPU.

The PRCG is 16918,124,3,9. Four other WUs from the same series are currently running with similar performance.

Code: Select all

*************************** Core22 Folding@home Core ***************************
       Core: Core22
       Type: 0x22
    Version: 0.0.11
     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
  Copyright: 2020 foldingathome.org
   Homepage: https://foldingathome.org/
       Date: Jun 27 2020
       Time: 22:50:00
   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
     Branch: core22-0.0.11
   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
             -funroll-loops
   Platform: linux2 4.19.76-linuxkit
       Bits: 64
       Mode: Release
Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
             <peastman@stanford.edu>
       Args: -dir 00 -suffix 01 -version 706 -lifeline 6465 -checkpoint 15
             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
************************************ libFAH ************************************
       Date: Jun 27 2020
       Time: 22:11:04
   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
     Branch: HEAD
   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
             -funroll-loops
   Platform: linux2 4.19.76-linuxkit
       Bits: 64
       Mode: Release
************************************ CBang *************************************
       Date: Jun 27 2020
       Time: 22:10:11
   Revision: f8529962055b0e7bde23e429f5072ff758089dee
     Branch: HEAD
   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
             -funroll-loops -fPIC
   Platform: linux2 4.19.76-linuxkit
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: AMD Ryzen 9 3900X 12-Core Processor
     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
       CPUs: 24
     Memory: 15.58GiB
Free Memory: 5.64GiB
    Threads: POSIX_THREADS
 OS Version: 5.6
Has Battery: false
 On Battery: false
 UTC Offset: 2
        PID: 6469
        CWD: /var/lib/fahclient/work
********************************************************************************
Folding@home GPU Core22 Folding@home Core
Version 0.0.11
[1] compatible platform(s):
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 2.0 AMD-APP (3137.0)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.

(4) device(s) found on platform 0:
  -- 0 --
  DEVICE_NAME = gfx906+sram-ecc
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 2.0 

  -- 1 --
  DEVICE_NAME = gfx906+sram-ecc
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 2.0 

  -- 2 --
  DEVICE_NAME = gfx906+sram-ecc
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 2.0 

  -- 3 --
  DEVICE_NAME = gfx906+sram-ecc
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 2.0 

[ Entering Init ]
  Launch time: 2020-08-01T22:06:28Z
  Arguments passed: -dir 00 -suffix 01 -version 706 -lifeline 6465 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0 
  For testState comparison of CPU and GPU, will use:
    forceTolerance: 5 kJ/mol/nm
    energyTolerance: 10000 kJ/mol
[ Leaving  Init ]
[ Entering Main ]
  Reading core settings...
  Total number of steps: 5000000
  Checkpoint write interval: 100000 steps (2%) [50 total]
  JSON viewer frame write interval: 50000 steps (1%) [100 total]
  XTC frame write interval: 250000 steps (5%) [20 total]
  Global context and integrator variables write interval: disabled
[ Initializing Core Contexts ]
  Using platform OpenCL
  Looking for vendor: amd...found on platformId 0
  Setting platform precision to mixed
  Setting DisablePmeStream to 1
    Checking for integrator.xml
    Found integrator.xml
Loading integrator from integrator.xml
Stream copied, deserializing...
    Checking for integrator.xml
    Found integrator.xml
Loading integrator from integrator.xml
Stream copied, deserializing...
    Checking for system.xml
    Found system.xml
  Deserializing System...successful.
  Found 65436 atoms, 6 forces.
  Finding State XML file...
    Checking for state.xml
    Found state.xml
  Deserializing State...successful.
    Ewald error tolerance in force 0 is 0.0005
    Ewald parameters: alpha 2.62826 nx 72 ny 72 nz 72
    Integrator Type: N6OpenMM18LangevinIntegratorE
    Constraint Tolerance: 1e-06
    Time Step in PS: 0.002
    Using CPU platform for reference calculations.
  Performing initial sanity checks before starting work...
  Comparing forces and energies between initial State and CPU...
  Comparing forces and energies between GPU and CPU...
[ Initialized Core Contexts... ]
[ Finding non-water atoms ]
  1755 non-water atoms found / 65436 particles
[ Finished finiding non-water atoms ]
  1755 atoms specified to write to XTC files
[ Writing viewer topology ]
  1755 atoms written
[ Viewer topology written ]
  Using OpenCL on platformId 0 and gpu 0
  Creating a list of all Context parameters:
    MonteCarloPressure
    MonteCarloTemperature

  v(^_^)v  MD ready starting from step 0

  Checking for existence of retries file (numRetries) ...
  Starting watchdog...
Completed 0 out of 5000000 steps (0%)
  Running tests
  All tests passed.
  Writing binary checkpoint
  Binary checkpoint complete. Cleared numRetries file.
Completed 50000 out of 5000000 steps (1%)
Completed 100000 out of 5000000 steps (2%)
  Performance since last checkpoint: 32.40810203 ns/day
  Running tests
  All tests passed.
  Writing binary checkpoint
  Binary checkpoint complete. Cleared numRetries file.
Completed 150000 out of 5000000 steps (3%)
Completed 200000 out of 5000000 steps (4%)
  Performance since last checkpoint: 32.43243243 ns/day
  Running tests
  All tests passed.
  Writing binary checkpoint
  Binary checkpoint complete. Cleared numRetries file.
ThWuensche
Posts: 79
Joined: Fri May 29, 2020 4:10 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by ThWuensche »

In contrast to many very slow and long running WUs in these series I now have one (16918,151,58,8) which so far indicates 1.1 million PPD. One is at 466 kPPD (16918,58,99,2), one is at 104 kPPD (16918,29,79,4), one at 112 kPPD (16918,73,101,1),

I don't care for the PPD, anyhow I don't get anything for them. But low PPD probably means ineffective use of the GPU resources and low computational throughput, harming project results.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 16918 sitting at less than 10% of average perfor

Post by bruce »

ThWuensche wrote:...But low PPD probably means ineffective use of the GPU resources and low computational throughput, harming project results.
I'd guess the same thing. I do know there's a team looking into reasons why there might be ineffective things happening and looking for ways to correct that problem.

If a certain problem causes a WU to crash, there's a crash report that can be used as a starting point for an investigation. If the GPU is running at (say 50% effectivity) there's no report unless you observe it and report it. "50%: doesn't appear in either of the logs.
Gary480six
Posts: 93
Joined: Mon Jan 21, 2008 6:42 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by Gary480six »

Chiming in to say I'm having trouble with one of these work units too.

WU: 16918 R155 C9 G7
GPU is a GTX 650 TI so no stellar performer.... but I seem to be getting about 1/4 of the PPD I've gotten on other 0x22 work units. I'm kinda new to Ubuntu Folding - so it could be my fault too. :oops:


Code: Select all

16:50:55:WU02:FS01:0x22:*********************** Log Started 2020-08-03T16:50:54Z ***********************
16:50:55:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
16:50:55:WU02:FS01:0x22:       Core: Core22
16:50:55:WU02:FS01:0x22:       Type: 0x22
16:50:55:WU02:FS01:0x22:    Version: 0.0.11
16:50:55:WU02:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:50:55:WU02:FS01:0x22:  Copyright: 2020 foldingathome.org
16:50:55:WU02:FS01:0x22:   Homepage: https://foldingathome.org/
16:50:55:WU02:FS01:0x22:       Date: Jun 27 2020
16:50:55:WU02:FS01:0x22:       Time: 22:50:00
16:50:55:WU02:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
16:50:55:WU02:FS01:0x22:     Branch: core22-0.0.11
16:50:55:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
16:50:55:WU02:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
16:50:55:WU02:FS01:0x22:             -funroll-loops
16:50:55:WU02:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
16:50:55:WU02:FS01:0x22:       Bits: 64
16:50:55:WU02:FS01:0x22:       Mode: Release
16:50:55:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
16:50:55:WU02:FS01:0x22:             <peastman@stanford.edu>
16:50:55:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 1629 -checkpoint 15
16:50:55:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
16:50:55:WU02:FS01:0x22:             0 -gpu 0
16:50:55:WU02:FS01:0x22:************************************ libFAH ************************************
16:50:55:WU02:FS01:0x22:       Date: Jun 27 2020
16:50:55:WU02:FS01:0x22:       Time: 22:11:04
16:50:55:WU02:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
16:50:55:WU02:FS01:0x22:     Branch: HEAD
16:50:55:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
16:50:55:WU02:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
16:50:55:WU02:FS01:0x22:             -funroll-loops
16:50:55:WU02:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
16:50:55:WU02:FS01:0x22:       Bits: 64
16:50:55:WU02:FS01:0x22:       Mode: Release
16:50:55:WU02:FS01:0x22:************************************ CBang *************************************
16:50:55:WU02:FS01:0x22:       Date: Jun 27 2020
16:50:55:WU02:FS01:0x22:       Time: 22:10:11
16:50:55:WU02:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
16:50:55:WU02:FS01:0x22:     Branch: HEAD
16:50:55:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
16:50:55:WU02:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
16:50:55:WU02:FS01:0x22:             -funroll-loops -fPIC
16:50:55:WU02:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
16:50:55:WU02:FS01:0x22:       Bits: 64
16:50:55:WU02:FS01:0x22:       Mode: Release
16:50:55:WU02:FS01:0x22:************************************ System ************************************
16:50:55:WU02:FS01:0x22:        CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
16:50:55:WU02:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
16:50:55:WU02:FS01:0x22:       CPUs: 8
16:50:55:WU02:FS01:0x22:     Memory: 7.70GiB
16:50:55:WU02:FS01:0x22:Free Memory: 6.38GiB
16:50:55:WU02:FS01:0x22:    Threads: POSIX_THREADS
16:50:55:WU02:FS01:0x22: OS Version: 5.4
16:50:55:WU02:FS01:0x22:Has Battery: false
16:50:55:WU02:FS01:0x22: On Battery: false
16:50:55:WU02:FS01:0x22: UTC Offset: -4
16:50:55:WU02:FS01:0x22:        PID: 1633
16:50:55:WU02:FS01:0x22:        CWD: /var/lib/fahclient/work
16:50:55:WU02:FS01:0x22:********************************************************************************
16:50:55:WU02:FS01:0x22:Project: 16918 (Run 155, Clone 9, Gen 7)
16:50:55:WU02:FS01:0x22:Unit: 0x0000000a0002894c5f176173855a4285
16:50:55:WU02:FS01:0x22:Digital signatures verified
16:50:55:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
16:50:55:WU02:FS01:0x22:Version 0.0.11
16:50:55:WU02:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
16:50:55:WU02:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
16:50:55:WU02:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
16:50:55:WU02:FS01:0x22:  Global context and integrator variables write interval: disabled
16:50:55:WU00:FS00:0xa7:*********************** Log Started 2020-08-03T16:50:54Z ***********************
16:50:55:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:50:55:WU00:FS00:0xa7:       Type: 0xa7
16:50:55:WU00:FS00:0xa7:       Core: Gromacs
16:50:55:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 1635 -checkpoint 15 -np 7
16:50:55:WU00:FS00:0xa7:************************************ CBang *************************************
16:50:55:WU00:FS00:0xa7:       Date: Nov 27 2019
16:50:55:WU00:FS00:0xa7:       Time: 11:26:54
16:50:55:WU00:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
16:50:55:WU00:FS00:0xa7:     Branch: master
16:50:55:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:50:55:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
16:50:55:WU00:FS00:0xa7:             -fno-pie -fPIC
16:50:55:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:50:55:WU00:FS00:0xa7:       Bits: 64
16:50:55:WU00:FS00:0xa7:       Mode: Release
16:50:55:WU00:FS00:0xa7:************************************ System ************************************
16:50:55:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
16:50:55:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
16:50:55:WU00:FS00:0xa7:       CPUs: 8
16:50:55:WU00:FS00:0xa7:     Memory: 7.70GiB
16:50:55:WU00:FS00:0xa7:Free Memory: 6.37GiB
16:50:55:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:50:55:WU00:FS00:0xa7: OS Version: 5.4
16:50:55:WU00:FS00:0xa7:Has Battery: false
16:50:55:WU00:FS00:0xa7: On Battery: false
16:50:55:WU00:FS00:0xa7: UTC Offset: -4
16:50:55:WU00:FS00:0xa7:        PID: 1639
16:50:55:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
16:50:55:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:50:55:WU00:FS00:0xa7:    Version: 0.0.19
16:50:55:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:50:55:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:50:55:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
16:50:55:WU00:FS00:0xa7:       Date: Nov 26 2019
16:50:55:WU00:FS00:0xa7:       Time: 00:41:42
16:50:55:WU00:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
16:50:55:WU00:FS00:0xa7:     Branch: master
16:50:55:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:50:55:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
16:50:55:WU00:FS00:0xa7:             -fno-pie
16:50:55:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:50:55:WU00:FS00:0xa7:       Bits: 64
16:50:55:WU00:FS00:0xa7:       Mode: Release
16:50:55:WU00:FS00:0xa7:************************************ Build *************************************
16:50:55:WU00:FS00:0xa7:       SIMD: avx_256
16:50:55:WU00:FS00:0xa7:********************************************************************************
16:50:55:WU00:FS00:0xa7:Project: 14216 (Run 671, Clone 1, Gen 230)
16:50:55:WU00:FS00:0xa7:Unit: 0x00000111cedfaa925ea343b7cefc451e
16:50:55:WU00:FS00:0xa7:Digital signatures verified
16:50:55:WU00:FS00:0xa7:Reducing thread count from 7 to 6 to avoid domain decomposition by a prime number > 3
16:50:55:WU00:FS00:0xa7:Calling: mdrun -s frame230.tpr -o frame230.trr -x frame230.xtc -cpi state.cpt -cpt 15 -nt 6
16:50:55:WU00:FS00:0xa7:Steps: first=14375000 total=62500
16:50:59:WU00:FS00:0xa7:Completed 5382 out of 62500 steps (8%)
16:51:02:WU02:FS01:0x22:Completed 0 out of 5000000 steps (0%)
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by _r2w_ben »

ThWuensche wrote:Here is the part of science.log. I pasted from the beginning of the file, so that the setup is visible. The CPU is running a WU on 12 of the 24 cores, so plenty of free capacity on the CPU.

The PRCG is 16918,124,3,9. Four other WUs from the same series are currently running with similar performance.

Code: Select all

Stream copied, deserializing...
    Checking for system.xml
    Found system.xml
  Deserializing System...successful.
  Found 65436 atoms, 6 forces.
That is the expected number of atoms so it's not a smaller system like some of the other 169xx. There was an issue in the past where a folder path was wrong on the server and the wrong data was packaged up as a work unit.
Joe_H
Site Admin
Posts: 7946
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 16918 sitting at less than 10% of average perfor

Post by Joe_H »

Is the number of step correct for this Project? There has been an occasional problem in the past where WUs have been generated with more steps that normal.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Jaqui
Posts: 7
Joined: Sun Aug 09, 2020 2:02 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by Jaqui »

I just recently started with the newest web based client so my detailed data about the cores is less, I have unit 16918 (46, 22, 5) running on one of the gpus ( dual radeon m7 / m6 ) and noticed the is a long running unit, with a long time for it to run.
Oddly, the unit processes BETTER ( faster ) if I have the cpu power level set to medium than if it is set to full. It still looks to be .01 days late to finish, but it was .15 days late 2 hours ago.
ThWuensche
Posts: 79
Joined: Fri May 29, 2020 4:10 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by ThWuensche »

Jaqui wrote:Oddly, the unit processes BETTER ( faster ) if I have the cpu power level set to medium than if it is set to full. It still looks to be .01 days late to finish, but it was .15 days late 2 hours ago.
Think that's not odd. If the power level is set to full, the CPU gives more power (threads) to the WU running on the CPU, leaving less for handling the GPU. For example I have set the number of CPUs to the real number of cores on the processor, since my impression is that having two threads on one CPU core through hyperthreading gives very low benefit for the throughput on the CPU, but harms the performance of the GPUs. This is on a Ryzen 3x00, don't know about Intel CPUs.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project 16918 sitting at less than 10% of average perfor

Post by Neil-B »

Jaqui wrote:It still looks to be .01 days late to finish, but it was .15 days late 2 hours ago.
... is that to the Timeout or to the Expiration Deadline? ... If Timeout then nothing to worry about as it will get returned and will further the science - If to Expiration then fingers crossed it sneaks in before this ... I am guessing from what you have posted you are using web control rather than advanced control and iirc that interface only uses the Expiration Deadline so might be tight but hopefully it will get there.

Moving slider to medium releases one more CPU thread for servicing the GPU - it has not effect as such on the GPU as GPU folding is either on or off (Light would turn it off) ... As previous poster mentioned the extra CPU thread helping to server the GPU and helping with the CPU load during checkpointing and sanity checks will potentially speed up the GPU folding.
Last edited by Neil-B on Sun Aug 09, 2020 6:01 pm, edited 2 times in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 16918 sitting at less than 10% of average perfor

Post by bruce »

FAH is planning an enhancement to how GPUs are rated, how projects are rated, and which pairs are optimum. To do that, benchmarks are run with both good and bad combinations. Unfortunately your current assignment is a less-than-optimum combination. That will be taken into account when the new system is rolled out. All we can do is thank you for the contribution to the future system.
Jaqui
Posts: 7
Joined: Sun Aug 09, 2020 2:02 pm

Re: Project 16918 sitting at less than 10% of average perfor

Post by Jaqui »

Neil-B wrote:I am guessing from what you have posted you are using web control rather than advanced control and iirc that interface only uses the Expiration Deadline so might be tight but hopefully it will get there.
Yup, I'm using the web control and nope, it didn't squeeze in, 5 hours and 28 minutes late. The unit really slowed down nearing completion.
ahh well, if I get units occasionally that my hardware can't complete in time it happens.
( That info helps with tuning the units for future processing. )

It's just to bad that my own health issues aren't one to be included in the projects, but then they are not curable at all; only controllable.
ThWuensche
Posts: 79
Joined: Fri May 29, 2020 4:10 pm

Fixed: Project 16918 sitting at less than 10% of average per

Post by ThWuensche »

It looks like the issue is fixed and was some kind of compatibility problem of the WU with the amd graphics firmware. After upgrading first Linux kernel from buster-backports 5.6 to 5.7 and then firmware-amd-gpu from buster-backports 20190717-2~bpo10+1 to 20200619-1 from testing, the problem vanished. It has to be noted that the problem existed only for this type of WU and not for other WUs and that it showed only on the PC with four GPUs, not (or rarely) on those with two GPUs. Why firmware, which is related to each individual card, shows such effect depending on number of cards is unclear. Guess is that it is related to available resources in combination with that specific WU type, but this most likely will not be clarified.

Anyhow, these four Radeon VII can now effectively contribute to the project, running at slightly above 1mPPD, instead of blocking the GPUs for 16-20h at 100kPPD.
foldy
Posts: 2040
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Project 16918 sitting at less than 10% of average perfor

Post by foldy »

Maybe because you run 4 GPUs concurrently the mainboard uses something like a pcie switch chip and if it has issues with certain firmware then pcie bandwidth may be an issue? Good you solved it with linux kernel and firmware changes.
Post Reply