Page 1 of 1
MultiGPU problem
Posted: Mon Dec 05, 2016 7:55 pm
by yalexey
Almost every reboot, I see in the log messages, such as:
Code: Select all
14:51:48:ERROR:FS01:OpenCL device not found for 'opencl-index' = 3 with vendor ID = 0x10de, plese correct this by removing the manually configured 'opencl-index' option.
14:51:48:ERROR:FS03:'opencl-index'=2 is in use by another folding slot but GPU 1 matches this device's PCI bus=4 and PCI slot=0, please correct this by removing any manually configured 'opencl-index' options.
14:51:49:ERROR:WU03:FS03:Failed to start core: OpenCL device matching slot 3 not found
After that, sometimes, one or two video cards left without work. I have to remove slots or assign values manually, restart the client, etc. Just deleting is not enough. It is necessary to prescribe the characteristics of slots.
I have 4 GPU in system. One of them - CPU internal graphic core. Three other - Nvidia GTX 1070 by Palit and Gigabyte. System - Win10.
Is it possible to somehow get rid of this problem completely?
Re: MultiGPU problem
Posted: Mon Dec 05, 2016 8:22 pm
by bruce
FAHClient does have some problems in multi-GPU installations. Some changes have been made to the beta version which might or might not help, but it's still an open issue.
Have you tried the
V7.4.15 Open Beta?
We need to see more of the log ... from the beginning through the portion you've included.
(See my Signature if you need help)
Re: MultiGPU problem
Posted: Mon Dec 05, 2016 9:00 pm
by yalexey
Yes. It is V7.4.15 client.
Code: Select all
14:51:48:************************* Folding@home Client *************************
14:51:48: Website: http://folding.stanford.edu/
14:51:48: Copyright: (c) 2009-2016 Stanford University
14:51:48: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:51:48: Args:
14:51:48: Config: C:\Users\folder\AppData\Roaming\FAHClient\config.xml
14:51:48:******************************** Build ********************************
14:51:48: Version: 7.4.15
14:51:48: Date: Aug 17 2016
14:51:48: Time: 04:33:41
14:51:48: Repository: Git
14:51:48: Revision: 4f3e0e25571a9f691719f0c273739294bde517dd
14:51:48: Branch: master
14:51:48: Compiler: GNU 5.3.1 20160205
14:51:48: Options: -std=gnu++98 -I/mingw64/include -O3 -funroll-loops -ffast-math
14:51:48: -mfpmath=sse -fno-unsafe-math-optimizations -msse2
14:51:48: Platform: linux2 4.6.0-1-amd64
14:51:48: Bits: 64
14:51:48: Mode: Release
14:51:48:******************************* System ********************************
14:51:48: CPU: Intel(R) Pentium(R) CPU G4400 @ 3.30GHz
14:51:48: CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
14:51:48: CPUs: 2
14:51:48: Memory: 3.92GiB
14:51:48: Free Memory: 2.38GiB
14:51:48: Threads: WINDOWS_THREADS
14:51:48: OS Version: 6.2
14:51:48: Has Battery: false
14:51:48: On Battery: false
14:51:48: UTC Offset: 3
14:51:48: PID: 4900
14:51:48: CWD: C:\Users\folder\AppData\Roaming\FAHClient
14:51:48: OS: Windows 10 Pro
14:51:48: OS Arch: AMD64
14:51:48: GPUs: 3
14:51:48: GPU 0: Bus:4 Slot:0 NVIDIA:5 GP104 [GeForce GTX 1070]
14:51:48: GPU 1: Bus:4 Slot:0 NVIDIA:5 GP104 [GeForce GTX 1070]
14:51:48: GPU 2: Bus:4 Slot:0 NVIDIA:5 GP104 [GeForce GTX 1070]
14:51:48: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:8.0
14:51:48: CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:6.1 Driver:8.0
14:51:48: CUDA Device 2: Platform:0 Device:2 Bus:4 Slot:0 Compute:6.1 Driver:8.0
14:51:48:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:373.6
14:51:48:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:373.6
14:51:48:OpenCL Device 2: Platform:0 Device:2 Bus:4 Slot:0 Compute:1.2 Driver:373.6
14:51:48:OpenCL Device 3: Platform:1 Device:0 Bus:NA Slot:NA Compute:1.2 Driver:21.20
14:51:48:OpenCL Device 4: Platform:1 Device:1 Bus:NA Slot:NA Compute:1.2 Driver:6.6
14:51:48: Win32 Service: false
14:51:48:***********************************************************************
14:51:48:<config>
14:51:48: <service-description v='Folding@home Client'/>
14:51:48: <service-restart v='true'/>
14:51:48: <service-restart-delay v='5000'/>
14:51:48:
14:51:48: <!-- Client Control -->
14:51:48: <client-threads v='6'/>
14:51:48: <cycle-rate v='4'/>
14:51:48: <cycles v='-1'/>
14:51:48: <data-directory v='.'/>
14:51:48: <disable-sleep-when-active v='true'/>
14:51:48: <exec-directory v='C:\Program Files\FAHClient'/>
14:51:48: <exit-when-done v='false'/>
14:51:48: <fold-anon v='false'/>
14:51:48: <open-web-control v='false'/>
14:51:48:
14:51:48: <!-- Configuration -->
14:51:48: <config-rotate v='true'/>
14:51:48: <config-rotate-dir v='configs'/>
14:51:48: <config-rotate-max v='16'/>
14:51:48:
14:51:48: <!-- Debugging -->
14:51:48: <assignment-servers>
14:51:48: assign3.stanford.edu:8080 assign4.stanford.edu:80
14:51:48: </assignment-servers>
14:51:48: <auth-as v='true'/>
14:51:48: <capture-directory v='capture'/>
14:51:48: <capture-on-error v='false'/>
14:51:48: <capture-packets v='false'/>
14:51:48: <capture-requests v='false'/>
14:51:48: <capture-responses v='false'/>
14:51:48: <capture-sockets v='false'/>
14:51:48: <core-exec v='FahCore_$type'/>
14:51:48: <core-wrapper-exec v='FAHCoreWrapper'/>
14:51:48: <debug-sockets v='false'/>
14:51:48: <exception-locations v='true'/>
14:51:48: <gpu-assignment-servers>
14:51:48: assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
14:51:48: </gpu-assignment-servers>
14:51:48: <stack-traces v='false'/>
14:51:48:
14:51:48: <!-- Error Handling -->
14:51:48: <max-slot-errors v='10'/>
14:51:48: <max-unit-errors v='5'/>
14:51:48:
14:51:48: <!-- Folding Core -->
14:51:48: <checkpoint v='5'/>
14:51:48: <core-dir v='cores'/>
14:51:48: <core-priority v='low'/>
14:51:48: <cpu-affinity v='false'/>
14:51:48: <cpu-usage v='100'/>
14:51:48: <gpu-usage v='100'/>
14:51:48: <no-assembly v='false'/>
14:51:48:
14:51:48: <!-- Folding Slot Configuration -->
14:51:48: <cause v='ANY'/>
14:51:48: <client-subtype v='STDCLI'/>
14:51:48: <client-type v='advanced'/>
14:51:48: <cpu-species v='X86_PENTIUM_II'/>
14:51:48: <cpu-type v='AMD64'/>
14:51:48: <cpus v='-1'/>
14:51:48: <disable-viz v='false'/>
14:51:48: <gpu v='true'/>
14:51:48: <max-packet-size v='normal'/>
14:51:48: <os-species v='WIN_8'/>
14:51:48: <os-type v='WIN32'/>
14:51:48: <project-key v='0'/>
14:51:48: <smp v='true'/>
14:51:48:
14:51:48: <!-- GUI -->
14:51:48: <gui-enabled v='true'/>
14:51:48:
14:51:48: <!-- HTTP Server -->
14:51:48: <allow v='127.0.0.1, 192.168.147.90-192.168.147.120'/>
14:51:48: <connection-timeout v='60'/>
14:51:48: <deny v='0/0'/>
14:51:48: <http-addresses v='0:7396'/>
14:51:48: <https-addresses v=''/>
14:51:48: <max-connect-time v='900'/>
14:51:48: <max-connections v='800'/>
14:51:48: <max-request-length v='52428800'/>
14:51:48: <min-connect-time v='300'/>
14:51:48:
14:51:48: <!-- Logging -->
14:51:48: <log v='log.txt'/>
14:51:48: <log-color v='false'/>
14:51:48: <log-crlf v='true'/>
14:51:48: <log-date v='false'/>
14:51:48: <log-date-periodically v='21600'/>
14:51:48: <log-domain v='false'/>
14:51:48: <log-header v='true'/>
14:51:48: <log-level v='true'/>
14:51:48: <log-no-info-header v='true'/>
14:51:48: <log-redirect v='false'/>
14:51:48: <log-rotate v='true'/>
14:51:48: <log-rotate-dir v='logs'/>
14:51:48: <log-rotate-max v='16'/>
14:51:48: <log-short-level v='false'/>
14:51:48: <log-simple-domains v='true'/>
14:51:48: <log-thread-id v='false'/>
14:51:48: <log-thread-prefix v='true'/>
14:51:48: <log-time v='true'/>
14:51:48: <log-to-screen v='true'/>
14:51:48: <log-truncate v='false'/>
14:51:48: <verbosity v='4'/>
14:51:48:
14:51:48: <!-- Network -->
14:51:48: <proxy v='5.189.132.136:1455'/>
14:51:48: <proxy-enable v='false'/>
14:51:48: <proxy-pass v=''/>
14:51:48: <proxy-user v=''/>
14:51:48:
14:51:48: <!-- Process Control -->
14:51:48: <child v='false'/>
14:51:48: <daemon v='false'/>
14:51:48: <pid v='false'/>
14:51:48: <pid-file v='Folding@home Client.pid'/>
14:51:48: <respawn v='false'/>
14:51:48: <service v='false'/>
14:51:48:
14:51:48: <!-- Remote Command Server -->
14:51:48: <command-address v='0.0.0.0'/>
14:51:48: <command-allow-no-pass v='127.0.0.1, 192.168.147.90-192.168.147.120'/>
14:51:48: <command-deny-no-pass v='0/0'/>
14:51:48: <command-enable v='true'/>
14:51:48: <command-port v='36330'/>
14:51:48: <password v='*'/>
14:51:48:
14:51:48: <!-- Slot Control -->
14:51:48: <idle v='false'/>
14:51:48: <max-shutdown-wait v='60'/>
14:51:48: <pause-on-battery v='false'/>
14:51:48: <pause-on-start v='false'/>
14:51:48: <paused v='false'/>
14:51:48: <power v='full'/>
14:51:48: <streaming v='false'/>
14:51:48:
14:51:48: <!-- Web Server -->
14:51:48: <web-allow v='127.0.0.1'/>
14:51:48: <web-deny v='0/0'/>
14:51:48: <web-enable v='true'/>
14:51:48:
14:51:48: <!-- Web Server Sessions -->
14:51:48: <session-cookie v='sid'/>
14:51:48: <session-lifetime v='86400'/>
14:51:48: <session-timeout v='3600'/>
14:51:48:
14:51:48: <!-- Work Unit Control -->
14:51:48: <dump-after-deadline v='true'/>
14:51:48: <max-queue v='16'/>
14:51:48: <max-units v='0'/>
14:51:48: <next-unit-percentage v='99'/>
14:51:48: <stall-detection-enabled v='false'/>
14:51:48: <stall-percent v='5'/>
14:51:48: <stall-timeout v='1800'/>
14:51:48:
14:51:48: <!-- Folding Slots -->
14:51:48: <slot id='1' type='GPU'>
14:51:48: <opencl-index v='3'/>
14:51:48: </slot>
14:51:48: <slot id='2' type='GPU'>
14:51:48: <cuda-index v='1'/>
14:51:48: <gpu-index v='2'/>
14:51:48: </slot>
14:51:48: <slot id='3' type='GPU'>
14:51:48: <cuda-index v='0'/>
14:51:48: </slot>
14:51:48:</config>
14:51:48:Trying to access database...
14:51:48:Successfully acquired database lock
14:51:48:ERROR:FS01:OpenCL device not found for 'opencl-index' = 3 with vendor ID = 0x10de, plese correct this by removing the manually configured 'opencl-index' option.
14:51:48:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1070]
14:51:48:Enabled folding slot 02: READY gpu:2:GP104 [GeForce GTX 1070]
14:51:48:ERROR:FS03:'opencl-index'=2 is in use by another folding slot but GPU 1 matches this device's PCI bus=4 and PCI slot=0, please correct this by removing any manually configured 'opencl-index' options.
14:51:48:Enabled folding slot 03: READY gpu:1:GP104 [GeForce GTX 1070]
14:51:48:WU01:FS02:Starting
14:51:48:WU01:FS02:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" C:\Users\folder\AppData\Roaming\FAHClient\cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 4900 -checkpoint 5 -opencl-platform 0 -gpu-vendor nvidia -gpu 2
14:51:48:WU01:FS02:Started FahCore on PID 4432
14:51:48:WU01:FS02:Core PID:1788
14:51:48:WU01:FS02:FahCore 0x21 started
14:51:49:WU03:FS03:Starting
14:51:49:ERROR:WU03:FS03:Failed to start core: OpenCL device matching slot 3 not found
14:51:49:WU03:FS03:Starting
14:51:49:ERROR:WU03:FS03:Failed to start core: OpenCL device matching slot 3 not found
Obviously, for some reason, sometimes the client takes into account first (0) built-in graphics core CPU, and sometimes puts his last in the list of devices.
Re: MultiGPU problem
Posted: Mon Dec 05, 2016 9:40 pm
by bruce
PLEASE turn off the added verbosity. It makes it more difficult to see what you've changed.
{Added to existing Github ticket.]
Re: MultiGPU problem
Posted: Mon Dec 05, 2016 10:17 pm
by des1957
This a known problem with the beta 7.4.15. Every restart it will fail to assign the proper slots. Revert back to previous version. I ran 4 gpus on the old version with no problems. I tries the beta version and ran into the same problem.
Re: MultiGPU problem
Posted: Mon Dec 05, 2016 10:56 pm
by Joe_H
One thing that was seen in early reports of this GPU problem and the 7.4.15 public beta is that this bug shows up more for multi-GPU setups where the cards are all the same model. It is less of an issue with mixed GPU installations.