Page 1 of 1

Vega 56 Running Poorly

Posted: Sat May 30, 2020 7:08 pm
by Ricketyedge
Anyone else folding on Vega seen anything like this before? GPU usage spiking up and down like crazy and PPD output looks lousy.

Image

Re: Vega 56 Running Poorly

Posted: Sat May 30, 2020 8:23 pm
by BobWilliams757
What is your usual PPD average with that GPU?

I have seen some spikes in use with some WU's, but rarely to that extent. I've run several of that WU and had expected average results on all of them.

Are you running anything that tracks your temps as you are folding this?

Re: Vega 56 Running Poorly

Posted: Sat May 30, 2020 9:56 pm
by _r2w_ben
Does the log show progress on that slot? The work unit might have stalled and you'll need to pause and restart the slot or Windows completely.
FAHControl shows 17% in your screenshot but it can get ahead of what is actually in the log file.

Re: Vega 56 Running Poorly

Posted: Sat May 30, 2020 10:52 pm
by Crawdaddy79
I think there may be something wrong. My Vega 64 TPF on 11744 is between 48 and 56 seconds, generally. You're seeing 1:28 in the screencapture and there is not that much performance delta in our cards.

Your PPD is another issue. Your estimated credit should not match your base credit unless you're expected to finish the WU after the timeout. Make sure your passkey is set up correctly so that you get bonus points for early returns.

Visit https://apps.foldingathome.org/getpasskey to start.

Re: Vega 56 Running Poorly

Posted: Sat May 30, 2020 11:13 pm
by PantherX
Welcome to the F@H Forum Ricketyedge.

Can you please post the log file? Ensure you include the first 100 lines which will inform us of what the system configuration is and what the client settings are. If you require guidance, please view this topic: viewtopic.php?f=24&t=26036

Re: Vega 56 Running Poorly

Posted: Sun May 31, 2020 2:31 am
by Ricketyedge
I wasn't set up with a passkey and didn't realize it could make that much difference. Did that and PPD is great now, but I'm still seeing those GPU usage spikes. It looks like a heart rate monitor, right now GPU usage tanks to zero every 30 seconds quivers between 60 and 80 percent. You could set your watch to it. GPU and Mem temps are steady around 60 degrees.

Code: Select all

*********************** Log Started 2020-05-31T02:14:02Z ***********************
02:14:02:Trying to access database...
02:14:02:Successfully acquired database lock
02:14:02:Read GPUs.txt
02:14:02:Enabled folding slot 00: READY cpu:15
02:14:03:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
02:14:03:****************************** FAHClient ******************************
02:14:03:        Version: 7.6.13
02:14:03:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:14:03:      Copyright: 2020 foldingathome.org
02:14:03:       Homepage: https://foldingathome.org/
02:14:03:           Date: Apr 27 2020
02:14:03:           Time: 21:21:01
02:14:03:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
02:14:03:         Branch: master
02:14:03:       Compiler: Visual C++ 2008
02:14:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:14:03:       Platform: win32 10
02:14:03:           Bits: 32
02:14:03:           Mode: Release
02:14:03:           Args: --open-web-control
02:14:03:         Config: C:\Users\liedt\AppData\Roaming\FAHClient\config.xml
02:14:03:******************************** CBang ********************************
02:14:03:           Date: Apr 24 2020
02:14:03:           Time: 17:07:55
02:14:03:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
02:14:03:         Branch: master
02:14:03:       Compiler: Visual C++ 2008
02:14:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:14:03:       Platform: win32 10
02:14:03:           Bits: 32
02:14:03:           Mode: Release
02:14:03:******************************* System ********************************
02:14:03:            CPU: AMD Ryzen 7 2700 Eight-Core Processor
02:14:03:         CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
02:14:03:           CPUs: 16
02:14:03:         Memory: 31.94GiB
02:14:03:    Free Memory: 27.02GiB
02:14:03:        Threads: WINDOWS_THREADS
02:14:03:     OS Version: 6.2
02:14:03:    Has Battery: false
02:14:03:     On Battery: false
02:14:03:     UTC Offset: -5
02:14:03:            PID: 5732
02:14:03:            CWD: C:\Users\liedt\AppData\Roaming\FAHClient
02:14:03:  Win32 Service: false
02:14:03:             OS: Windows 10 Enterprise
02:14:03:        OS Arch: AMD64
02:14:03:           GPUs: 1
02:14:03:          GPU 0: Bus:10 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
02:14:03:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
02:14:03:                 specified module could not be found.
02:14:03:
02:14:03:OpenCL Device 0: Platform:0 Device:0 Bus:10 Slot:0 Compute:1.2 Driver:3004.8
02:14:03:******************************* libFAH ********************************
02:14:03:           Date: Apr 15 2020
02:14:03:           Time: 14:53:14
02:14:03:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
02:14:03:         Branch: master
02:14:03:       Compiler: Visual C++ 2008
02:14:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:14:03:       Platform: win32 10
02:14:03:           Bits: 32
02:14:03:           Mode: Release
02:14:03:***********************************************************************
02:14:03:<config>
02:14:03:  <!-- Network -->
02:14:03:  <proxy v=':8080'/>
02:14:03:
02:14:03:  <!-- Slot Control -->
02:14:03:  <power v='full'/>
02:14:03:
02:14:03:  <!-- User Information -->
02:14:03:  <passkey v='*****'/>
02:14:03:  <team v='223518'/>
02:14:03:  <user v='RicketyEdge'/>
02:14:03:
02:14:03:  <!-- Folding Slots -->
02:14:03:  <slot id='0' type='CPU'/>
02:14:03:  <slot id='1' type='GPU'/>
02:14:03:</config>
02:14:03:WU01:FS00:Cleaning up
02:14:03:ERROR:WU01:FS00:Exception: Failed to remove directory 'work/01': The directory is not empty.
02:14:03:WU00:FS00:Cleaning up
02:14:03:ERROR:WU00:FS00:Exception: Failed to remove directory 'work/00': The directory is not empty.
02:14:03:WU04:FS01:Starting
02:14:03:WU04:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\liedt\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 04 -suffix 01 -version 706 -lifeline 5732 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:14:03:WU04:FS01:Started FahCore on PID 14620
02:14:03:WU04:FS01:Core PID:9576
02:14:03:WU04:FS01:FahCore 0x22 started
02:14:03:WU02:FS00:Starting
02:14:03:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\liedt\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 5732 -checkpoint 15 -np 15
02:14:03:WU02:FS00:Started FahCore on PID 2220
02:14:03:WU02:FS00:Core PID:2940
02:14:03:WU02:FS00:FahCore 0xa7 started
02:14:03:WU01:FS00:Cleaning up
02:14:03:ERROR:WU01:FS00:Exception: Failed to remove directory 'work/01': The directory is not empty.
02:14:03:WU00:FS00:Cleaning up
02:14:03:ERROR:WU00:FS00:Exception: Failed to remove directory 'work/00': The directory is not empty.
02:14:04:WU04:FS01:0x22:*********************** Log Started 2020-05-31T02:14:03Z ***********************
02:14:04:WU04:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:14:04:WU04:FS01:0x22:       Type: 0x22
02:14:04:WU04:FS01:0x22:       Core: Core22
02:14:04:WU04:FS01:0x22:    Website: https://foldingathome.org/
02:14:04:WU04:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:14:04:WU04:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:14:04:WU04:FS01:0x22:             <rafal.wiewiora@choderalab.org>
02:14:04:WU04:FS01:0x22:       Args: -dir 04 -suffix 01 -version 706 -lifeline 14620 -checkpoint 15
02:14:04:WU04:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:14:04:WU04:FS01:0x22:     Config: <none>
02:14:04:WU04:FS01:0x22:************************************ Build *************************************
02:14:04:WU04:FS01:0x22:    Version: 0.0.5
02:14:04:WU04:FS01:0x22:       Date: Apr 22 2020
02:14:04:WU04:FS01:0x22:       Time: 04:42:59
02:14:04:WU04:FS01:0x22: Repository: Git
02:14:04:WU04:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:14:04:WU04:FS01:0x22:     Branch: HEAD
02:14:04:WU04:FS01:0x22:   Compiler: Visual C++ 2008
02:14:04:WU04:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:14:04:WU04:FS01:0x22:   Platform: win32 10
02:14:04:WU04:FS01:0x22:       Bits: 64
02:14:04:WU04:FS01:0x22:       Mode: Release
02:14:04:WU04:FS01:0x22:************************************ System ************************************
02:14:04:WU04:FS01:0x22:        CPU: AMD Ryzen 7 2700 Eight-Core Processor
02:14:04:WU04:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
02:14:04:WU04:FS01:0x22:       CPUs: 16
02:14:04:WU04:FS01:0x22:     Memory: 31.94GiB
02:14:04:WU04:FS01:0x22:Free Memory: 26.99GiB
02:14:04:WU04:FS01:0x22:    Threads: WINDOWS_THREADS
02:14:04:WU04:FS01:0x22: OS Version: 6.2
02:14:04:WU04:FS01:0x22:Has Battery: false
02:14:04:WU04:FS01:0x22: On Battery: false
02:14:04:WU04:FS01:0x22: UTC Offset: -5
02:14:04:WU04:FS01:0x22:        PID: 9576
02:14:04:WU04:FS01:0x22:        CWD: C:\Users\liedt\AppData\Roaming\FAHClient\work
02:14:04:WU04:FS01:0x22:         OS: Windows 10 Pro
02:14:04:WU04:FS01:0x22:    OS Arch: AMD64
02:14:04:WU04:FS01:0x22:********************************************************************************
02:14:04:WU04:FS01:0x22:Project: 16435 (Run 334, Clone 4, Gen 24)
02:14:04:WU04:FS01:0x22:Unit: 0x0000003403854c135e9a4efbf7caa572
02:14:04:WU04:FS01:0x22:Digital signatures verified
02:14:04:WU04:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:14:04:WU04:FS01:0x22:Version 0.0.5
02:14:04:WU04:FS01:0x22:  Found a checkpoint file
02:14:04:WU02:FS00:0xa7:*********************** Log Started 2020-05-31T02:14:03Z ***********************
02:14:04:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
02:14:04:WU02:FS00:0xa7:       Type: 0xa7
02:14:04:WU02:FS00:0xa7:       Core: Gromacs
02:14:04:WU02:FS00:0xa7:       Args: -dir 02 -suffix 01 -version 706 -lifeline 2220 -checkpoint 15 -np
02:14:04:WU02:FS00:0xa7:             15
02:14:04:WU02:FS00:0xa7:************************************ CBang *************************************
02:14:04:WU02:FS00:0xa7:       Date: Oct 26 2019
02:14:04:WU02:FS00:0xa7:       Time: 01:38:25
02:14:04:WU02:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
02:14:04:WU02:FS00:0xa7:     Branch: master
02:14:04:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
02:14:04:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:14:04:WU02:FS00:0xa7:   Platform: win32 10
02:14:04:WU02:FS00:0xa7:       Bits: 64
02:14:04:WU02:FS00:0xa7:       Mode: Release
02:14:04:WU02:FS00:0xa7:************************************ System ************************************
02:14:04:WU02:FS00:0xa7:        CPU: AMD Ryzen 7 2700 Eight-Core Processor
02:14:04:WU02:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
02:14:04:WU02:FS00:0xa7:       CPUs: 16
02:14:04:WU02:FS00:0xa7:     Memory: 31.94GiB
02:14:04:WU02:FS00:0xa7:Free Memory: 26.98GiB
02:14:04:WU02:FS00:0xa7:    Threads: WINDOWS_THREADS
02:14:04:WU02:FS00:0xa7: OS Version: 6.2
02:14:04:WU02:FS00:0xa7:Has Battery: false
02:14:04:WU02:FS00:0xa7: On Battery: false
02:14:04:WU02:FS00:0xa7: UTC Offset: -5
02:14:04:WU02:FS00:0xa7:        PID: 2940
02:14:04:WU02:FS00:0xa7:        CWD: C:\Users\liedt\AppData\Roaming\FAHClient\work
02:14:04:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
02:14:04:WU02:FS00:0xa7:    Version: 0.0.18
02:14:04:WU02:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:14:04:WU02:FS00:0xa7:  Copyright: 2019 foldingathome.org
02:14:04:WU02:FS00:0xa7:   Homepage: https://foldingathome.org/
02:14:04:WU02:FS00:0xa7:       Date: Oct 26 2019
02:14:04:WU02:FS00:0xa7:       Time: 01:52:30
02:14:04:WU02:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
02:14:04:WU02:FS00:0xa7:     Branch: master
02:14:04:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
02:14:04:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:14:04:WU02:FS00:0xa7:   Platform: win32 10
02:14:04:WU02:FS00:0xa7:       Bits: 64
02:14:04:WU02:FS00:0xa7:       Mode: Release
02:14:04:WU02:FS00:0xa7:************************************ Build *************************************
02:14:04:WU02:FS00:0xa7:       SIMD: avx_256
02:14:04:WU02:FS00:0xa7:********************************************************************************
02:14:04:WU02:FS00:0xa7:Project: 13851 (Run 0, Clone 34902, Gen 141)
02:14:04:WU02:FS00:0xa7:Unit: 0x000000a8287234c95e788191b90a50c6
02:14:04:WU02:FS00:0xa7:Digital signatures verified
02:14:04:WU02:FS00:0xa7:Calling: mdrun -s frame141.tpr -o frame141.trr -x frame141.xtc -e frame141.edr -cpi state.cpt -cpt 15 -nt 15
02:14:04:WU02:FS00:0xa7:Steps: first=70500000 total=500000
02:14:05:WU02:FS00:0xa7:Completed 375802 out of 500000 steps (75%)
02:14:17:WU04:FS01:0x22:Completed 2930000 out of 5000000 steps (58%)
02:14:17:WU04:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:14:19:10:127.0.0.1:New Web session
02:15:03:WU01:FS00:Cleaning up
02:15:03:ERROR:WU01:FS00:Exception: Failed to remove directory 'work/01': The directory is not empty.
02:15:03:WU00:FS00:Cleaning up
02:15:03:ERROR:WU00:FS00:Exception: Failed to remove directory 'work/00': The directory is not empty.
02:15:13:WU02:FS00:0xa7:Completed 380000 out of 500000 steps (76%)
02:15:38:WU04:FS01:0x22:Completed 2950000 out of 5000000 steps (59%)
02:16:26:WU02:FS00:0xa7:Completed 385000 out of 500000 steps (77%)
02:16:41:WU01:FS00:Cleaning up
02:16:41:ERROR:WU01:FS00:Exception: Failed to remove directory 'work/01': The directory is not empty.
02:16:41:WU00:FS00:Cleaning up
02:16:41:ERROR:WU00:FS00:Exception: Failed to remove directory 'work/00': The directory is not empty.
02:17:41:WU02:FS00:0xa7:Completed 390000 out of 500000 steps (78%)
02:18:44:WU04:FS01:0x22:Completed 3000000 out of 5000000 steps (60%)
02:18:55:WU02:FS00:0xa7:Completed 395000 out of 500000 steps (79%)
02:19:18:WU01:FS00:Cleaning up
02:19:18:ERROR:WU01:FS00:Exception: Failed to remove directory 'work/01': The directory is not empty.
02:19:18:WU00:FS00:Cleaning up
02:19:18:ERROR:WU00:FS00:Exception: Failed to remove directory 'work/00': The directory is not empty.
02:20:10:WU02:FS00:0xa7:Completed 400000 out of 500000 steps (80%)
02:21:23:WU02:FS00:0xa7:Completed 405000 out of 500000 steps (81%)
02:21:43:WU04:FS01:0x22:Completed 3050000 out of 5000000 steps (61%)
02:22:38:WU02:FS00:0xa7:Completed 410000 out of 500000 steps (82%)
02:23:32:WU01:FS00:Cleaning up
02:23:32:ERROR:WU01:FS00:Exception: Failed to remove directory 'work/01': The directory is not empty.
02:23:32:WU00:FS00:Cleaning up
02:23:32:ERROR:WU00:FS00:Exception: Failed to remove directory 'work/00': The directory is not empty.
02:23:52:WU02:FS00:0xa7:Completed 415000 out of 500000 steps (83%)
02:24:56:WU04:FS01:0x22:Completed 3100000 out of 5000000 steps (62%)
02:25:05:WU02:FS00:0xa7:Completed 420000 out of 500000 steps (84%)
02:26:20:WU02:FS00:0xa7:Completed 425000 out of 500000 steps (85%)
02:27:34:WU02:FS00:0xa7:Completed 430000 out of 500000 steps (86%)
02:28:06:WU04:FS01:0x22:Completed 3150000 out of 5000000 steps (63%)
02:28:49:WU02:FS00:0xa7:Completed 435000 out of 500000 steps (87%)

Re: Vega 56 Running Poorly

Posted: Sun May 31, 2020 3:20 am
by bruce
Any chance that the 30 second intervals coincide with the checkpoints and/or the snapshots for the viewer?

In the "science" log for Core_22, about 60 lines down, you'll find a message like this: Checkpoint write frequency: 50000 (5%)
Snapshots for the viewer are called ViewerFrame*.

The next version of FAHCore_22 should transfer the (5%) into FAH's client log.

Both are found in FAH's data file, subdirectory /work/0n where n is the WU number (not the slot number).

Re: Vega 56 Running Poorly

Posted: Sun May 31, 2020 5:46 am
by MeeLee
Your Vega GPU frequency chart looks normal.
The dips you see are save states. And it never loads at 100%, even on Nvidia. Always somewhere between 90 and 98%.
Your GPU isn't running optimally, you could manually increase the fan speed to 100%.
It's clear from the dip at the end, that the GPU is sensitive to temperatures. Adding that 20% extra fan speed could get your GPU perform better.