Page 1 of 1

Difficulty downloading GPU WU on multiple machines

Posted: Fri Mar 24, 2017 8:43 am
by Librarian
Hello,

So for about 3 days now I am often not getting GPU units sent to my 2 folding rigs. At 99% it tries to start a download, and gets the AS and WS, but then it will begin the download and freeze. I've found it hanging for hours afterwards. The only way I can get it to work is shutdown FAH completely, and restart the program (after deleting the "wuinfo_01.dat" file that is in the work folder. Only file present). It normally will immediately get a WU and start folding.

This is happening to both of my computers, which are in separate locations, on different ISPs. No firewall software and only has been occurring since Sunday. I'm posting a couple pieces of logs, but let me know anything you need to look at.

Thank you,

Code: Select all

*********************** Log Started 2017-03-24T05:43:13Z ***********************
05:43:13:************************* Folding@home Client *************************
05:43:13:      Website: http://folding.stanford.edu/
05:43:13:    Copyright: (c) 2009-2014 Stanford University
05:43:13:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
05:43:13:         Args: --open-web-control
05:43:13:******************************** Build ********************************
05:43:13:      Version: 7.4.4
05:43:13:         Date: Mar 4 2014
05:43:13:         Time: 20:26:54
05:43:13:      SVN Rev: 4130
05:43:13:       Branch: fah/trunk/client
05:43:13:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
05:43:13:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
05:43:13:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
05:43:13:     Platform: win32 XP
05:43:13:         Bits: 32
05:43:13:         Mode: Release
05:43:13:******************************* System ********************************
05:43:13:          CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
05:43:13:       CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
05:43:13:         CPUs: 8
05:43:13:       Memory: 15.94GiB
05:43:13:  Free Memory: 12.30GiB
05:43:13:      Threads: WINDOWS_THREADS
05:43:13:   OS Version: 6.1
05:43:13:  Has Battery: false
05:43:13:   On Battery: false
05:43:13:   UTC Offset: -10
05:43:13:          PID: 5784
05:43:13:           OS: Windows 7 Professional
05:43:13:      OS Arch: AMD64
05:43:13:         GPUs: 2
05:43:13:        GPU 0: UNSUPPORTED: NV3 [PCI]
05:43:13:        GPU 1: NVIDIA:5 GP104 [GeForce GTX 1070]
05:43:13:         CUDA: 6.1
05:43:13:  CUDA Driver: 8000
05:43:13:Win32 Service: false
05:43:13:***********************************************************************
05:43:13:<config>
05:43:13:  <service-description v='Folding@home Client'/>
05:43:13:  <service-restart v='true'/>
05:43:13:  <service-restart-delay v='5000'/>
05:43:13:
05:43:13:  <!-- Client Control -->
05:43:13:  <client-threads v='6'/>
05:43:13:  <cycle-rate v='4'/>
05:43:13:  <cycles v='-1'/>
05:43:13:  <data-directory v='.'/>
05:43:13:  <disable-sleep-when-active v='true'/>
05:43:13:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
05:43:13:  <exit-when-done v='false'/>
05:43:13:  <fold-anon v='false'/>
05:43:13:  <open-web-control v='true'/>
05:43:13:
05:43:13:  <!-- Configuration -->
05:43:13:  <config-rotate v='true'/>
05:43:13:  <config-rotate-dir v='configs'/>
05:43:13:  <config-rotate-max v='16'/>
05:43:13:
05:43:13:  <!-- Debugging -->
05:43:13:  <assignment-servers>
05:43:13:    assign3.stanford.edu:8080 assign4.stanford.edu:80
05:43:13:  </assignment-servers>
05:43:13:  <auth-as v='true'/>
05:43:13:  <capture-directory v='capture'/>
05:43:13:  <capture-on-error v='false'/>
05:43:13:  <capture-packets v='false'/>
05:43:13:  <capture-requests v='false'/>
05:43:13:  <capture-responses v='false'/>
05:43:13:  <capture-sockets v='false'/>
05:43:13:  <core-exec v='FahCore_$type'/>
05:43:13:  <core-wrapper-exec v='FAHCoreWrapper'/>
05:43:13:  <debug-sockets v='false'/>
05:43:13:  <exception-locations v='true'/>
05:43:13:  <gpu-assignment-servers>
05:43:13:    assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
05:43:13:  </gpu-assignment-servers>
05:43:13:  <stack-traces v='false'/>
05:43:13:
05:43:13:  <!-- Error Handling -->
05:43:13:  <max-slot-errors v='10'/>
05:43:13:  <max-unit-errors v='5'/>
05:43:13:
05:43:13:  <!-- Folding Core -->
05:43:13:  <checkpoint v='10'/>
05:43:13:  <core-dir v='cores'/>
05:43:13:  <core-priority v='idle'/>
05:43:13:  <cpu-affinity v='false'/>
05:43:13:  <cpu-usage v='100'/>
05:43:13:  <gpu-usage v='100'/>
05:43:13:  <no-assembly v='false'/>
05:43:13:
05:43:13:  <!-- Folding Slot Configuration -->
05:43:13:  <cause v='ANY'/>
05:43:13:  <client-subtype v='STDCLI'/>
05:43:13:  <client-type v='normal'/>
05:43:13:  <cpu-species v='X86_PENTIUM_II'/>
05:43:13:  <cpu-type v='AMD64'/>
05:43:13:  <cpus v='-1'/>
05:43:13:  <gpu v='true'/>
05:43:13:  <max-packet-size v='normal'/>
05:43:13:  <os-species v='UNKNOWN'/>
05:43:13:  <os-type v='WIN32'/>
05:43:13:  <project-key v='0'/>
05:43:13:  <smp v='true'/>
05:43:13:
05:43:13:  <!-- GUI -->
05:43:13:  <gui-enabled v='true'/>
05:43:13:
05:43:13:  <!-- HTTP Server -->
05:43:13:  <allow v='127.0.0.1'/>
05:43:13:  <connection-timeout v='60'/>
05:43:13:  <deny v='0/0'/>
05:43:13:  <http-addresses v='0:7396'/>
05:43:13:  <https-addresses v=''/>
05:43:13:  <max-connect-time v='900'/>
05:43:13:  <max-connections v='800'/>
05:43:13:  <max-request-length v='52428800'/>
05:43:13:  <min-connect-time v='300'/>
05:43:13:  <threads v='8'/>
05:43:13:
05:43:13:  <!-- Logging -->
05:43:13:  <log v='log.txt'/>
05:43:13:  <log-color v='false'/>
05:43:13:  <log-crlf v='true'/>
05:43:13:  <log-date v='false'/>
05:43:13:  <log-date-periodically v='21600'/>
05:43:13:  <log-debug v='true'/>
05:43:13:  <log-domain v='false'/>
05:43:13:  <log-header v='true'/>
05:43:13:  <log-level v='true'/>
05:43:13:  <log-no-info-header v='true'/>
05:43:13:  <log-redirect v='false'/>
05:43:13:  <log-rotate v='true'/>
05:43:13:  <log-rotate-dir v='logs'/>
05:43:13:  <log-rotate-max v='16'/>
05:43:13:  <log-short-level v='false'/>
05:43:13:  <log-simple-domains v='true'/>
05:43:13:  <log-thread-id v='false'/>
05:43:13:  <log-thread-prefix v='true'/>
05:43:13:  <log-time v='true'/>
05:43:13:  <log-to-screen v='true'/>
05:43:13:  <log-truncate v='false'/>
05:43:13:  <verbosity v='4'/>
05:43:13:
05:43:13:  <!-- Network -->
05:43:13:  <proxy v=':8080'/>
05:43:13:  <proxy-enable v='false'/>
05:43:13:  <proxy-pass v=''/>
05:43:13:  <proxy-user v=''/>
05:43:13:
05:43:13:  <!-- Process Control -->
05:43:13:  <child v='false'/>
05:43:13:  <daemon v='false'/>
05:43:13:  <pid v='false'/>
05:43:13:  <pid-file v='Folding@home Client.pid'/>
05:43:13:  <respawn v='false'/>
05:43:13:  <service v='false'/>
05:43:13:
05:43:13:  <!-- Remote Command Server -->
05:43:13:  <command-address v='0.0.0.0'/>
05:43:13:  <command-allow-no-pass v='127.0.0.1'/>
05:43:13:  <command-deny-no-pass v='0/0'/>
05:43:13:  <command-enable v='true'/>
05:43:13:  <command-port v='36330'/>
05:43:13:
05:43:13:  <!-- Slot Control -->
05:43:13:  <idle v='false'/>
05:43:13:  <max-shutdown-wait v='60'/>
05:43:13:  <pause-on-battery v='true'/>
05:43:13:  <pause-on-start v='false'/>
05:43:13:  <paused v='false'/>
05:43:13:  <power v='full'/>
05:43:13:
05:43:13:  <!-- User Information -->
05:43:13:  <machine-id v='0'/>
05:43:13:  <passkey v='********************************'/>
05:43:13:  <team v='11314'/>
05:43:13:  <user v='[TLB]Librarian'/>
05:43:13:
05:43:13:  <!-- Web Server -->
05:43:13:  <web-allow v='127.0.0.1'/>
05:43:13:  <web-deny v='0/0'/>
05:43:13:  <web-enable v='true'/>
05:43:13:
05:43:13:  <!-- Web Server Sessions -->
05:43:13:  <session-cookie v='sid'/>
05:43:13:  <session-lifetime v='86400'/>
05:43:13:  <session-timeout v='3600'/>
05:43:13:
05:43:13:  <!-- Work Unit Control -->
05:43:13:  <dump-after-deadline v='true'/>
05:43:13:  <max-queue v='16'/>
05:43:13:  <max-units v='0'/>
05:43:13:  <next-unit-percentage v='99'/>
05:43:13:  <stall-detection-enabled v='false'/>
05:43:13:  <stall-percent v='5'/>
05:43:13:  <stall-timeout v='1800'/>
05:43:13:
05:43:13:  <!-- Folding Slots -->
05:43:13:  <slot id='0' type='CPU'>
05:43:13:    <cpus v='6'/>
05:43:13:    <paused v='true'/>
05:43:13:  </slot>
05:43:13:  <slot id='1' type='GPU'>
05:43:13:    <client-type v='beta'/>
05:43:13:    <paused v='true'/>
05:43:13:  </slot>
05:43:13:</config>
05:43:13:Trying to access database...
05:43:13:Successfully acquired database lock
05:43:13:Enabled folding slot 00: PAUSED cpu:6 (by user)
05:43:13:Enabled folding slot 01: PAUSED gpu:1:GP104 [GeForce GTX 1070] (by user)
05:43:13:WARNING:WU02:Missing data files, dumping
05:43:13:WU02:FS01:Cleaning up
05:43:15:1:127.0.0.1 GET /ping?_=1490334195241&callback=jQuery1900028221646554233093_1490334195240
05:43:15:2:127.0.0.1 GET /?nocache=0.7535321118413016
05:43:16:3:127.0.0.1 GET /api/updates/set?_=1490334196461&sid=a31ee93c69a5eaff005fffa765ace319&update_id=0&update_path=/api/basic&update_rate=1
05:43:16:4:127.0.0.1 GET /api/updates/set?_=1490334196462&sid=a31ee93c69a5eaff005fffa765ace319&update_id=1&update_path=/api/slots&update_rate=1
05:43:16:5:127.0.0.1 GET /api/configured?_=1490334196463&sid=a31ee93c69a5eaff005fffa765ace319
05:43:17:FS00:Unpaused
05:43:17:FS01:Unpaused
05:43:17:WU00:FS00:Starting
05:43:17:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Adam/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/AVX/Core_a7.fah/FahCore_a7.exe -dir 00 -suffix 01 -version 704 -lifeline 5784 -checkpoint 10 -np 6
05:43:17:WU00:FS00:Started FahCore on PID 1996
05:43:17:WU00:FS00:Core PID:2484
05:43:17:WU00:FS00:FahCore 0xa7 started
05:43:17:WU00:FS00:0xa7:*********************** Log Started 2017-03-24T05:43:17Z ***********************
05:43:17:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
05:43:17:WU00:FS00:0xa7:       Type: 0xa7
05:43:17:WU00:FS00:0xa7:       Core: Gromacs
05:43:17:WU00:FS00:0xa7:    Website: http://folding.stanford.edu/
05:43:17:WU00:FS00:0xa7:  Copyright: (c) 2009-2016 Stanford University
05:43:17:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
05:43:17:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 704 -lifeline 1996 -checkpoint 10 -np 6
05:43:17:WU00:FS00:0xa7:     Config: <none>
05:43:17:WU00:FS00:0xa7:************************************ Build *************************************
05:43:17:WU00:FS00:0xa7:    Version: 0.0.11
05:43:17:WU00:FS00:0xa7:       Date: Sep 21 2016
05:43:17:WU00:FS00:0xa7:       Time: 01:43:48
05:43:17:WU00:FS00:0xa7: Repository: Git
05:43:17:WU00:FS00:0xa7:   Revision: 957bd90e68d95ddcf1594dc15ff6c64cc4555146
05:43:17:WU00:FS00:0xa7:     Branch: master
05:43:17:WU00:FS00:0xa7:   Compiler: GNU 4.2.1 Compatible Clang 3.9.0 (trunk 274080)
05:43:17:WU00:FS00:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
05:43:17:WU00:FS00:0xa7:             -fno-unsafe-math-optimizations -msse2 -I/mingw64/include
05:43:17:WU00:FS00:0xa7:             -Wno-inconsistent-dllimport -Wno-parentheses-equality
05:43:17:WU00:FS00:0xa7:             -Wno-deprecated-register -Wno-unused-local-typedef
05:43:17:WU00:FS00:0xa7:   Platform: linux2 4.6.0-1-amd64
05:43:17:WU00:FS00:0xa7:       Bits: 64
05:43:17:WU00:FS00:0xa7:       Mode: Release
05:43:17:WU00:FS00:0xa7:       SIMD: avx_256
05:43:17:WU00:FS00:0xa7:************************************ System ************************************
05:43:17:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
05:43:17:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
05:43:17:WU00:FS00:0xa7:       CPUs: 8
05:43:17:WU00:FS00:0xa7:     Memory: 15.94GiB
05:43:17:WU00:FS00:0xa7:Free Memory: 11.99GiB
05:43:17:WU00:FS00:0xa7:    Threads: WINDOWS_THREADS
05:43:17:WU00:FS00:0xa7: OS Version: 6.1
05:43:17:WU00:FS00:0xa7:Has Battery: false
05:43:17:WU00:FS00:0xa7: On Battery: false
05:43:17:WU00:FS00:0xa7: UTC Offset: -10
05:43:17:WU00:FS00:0xa7:        PID: 2484
05:43:17:WU00:FS00:0xa7:         OS: Windows 7 Professional Service Pack 1
05:43:17:WU00:FS00:0xa7:    OS Arch: AMD64
05:43:17:WU00:FS00:0xa7:********************************************************************************
05:43:17:WU00:FS00:0xa7:Project: 13800 (Run 0, Clone 2121, Gen 35)
05:43:17:WU00:FS00:0xa7:Unit: 0x0000002a80fccb0458a5fcd781443144
05:43:17:WU00:FS00:0xa7:Digital signatures verified
05:43:17:WU00:FS00:0xa7:Calling: mdrun -s frame35.tpr -o frame35.trr -x frame35.xtc -cpi state.cpt -cpt 10 -nt 6
05:43:17:WU00:FS00:0xa7:Steps: first=8750000 total=250000
05:43:18:WU01:FS01:Connecting to 171.67.108.45:80
05:43:20:WU00:FS00:0xa7:Completed 57322 out of 250000 steps (22%)
05:43:21:6:127.0.0.1 GET /?nocache=0.7535321118413016
05:43:21:7:127.0.0.1 GET /api/updates?_=1490334196464&sid=a31ee93c69a5eaff005fffa765ace319
05:43:21:8:127.0.0.1 GET /api/updates?_=1490334196465&sid=a31ee93c69a5eaff005fffa765ace319
05:43:21:WU01:FS01:Assigned to work server 171.67.108.157
05:43:21:WU01:FS01:Requesting new work unit for slot 01: READY gpu:1:GP104 [GeForce GTX 1070] from 171.67.108.157
05:43:21:WU01:FS01:Connecting to 171.67.108.157:8080
05:43:22:WU01:FS01:Downloading 5.18MiB
05:43:24:WU01:FS01:Download complete
05:43:24:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9414 run:241 clone:0 gen:146 core:0x21 unit:0x000000abab436c9d585e0691b674b2e5
05:43:24:WU01:FS01:Starting
05:43:24:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Adam/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/NVIDIA/Fermi/beta/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 5784 -checkpoint 10 -gpu 0 -gpu-vendor nvidia
05:43:24:WU01:FS01:Started FahCore on PID 6984
05:43:24:WU01:FS01:Core PID:2480
05:43:24:WU01:FS01:FahCore 0x21 started
05:43:24:WU01:FS01:0x21:*********************** Log Started 2017-03-24T05:43:24Z ***********************
05:43:24:WU01:FS01:0x21:Project: 9414 (Run 241, Clone 0, Gen 146)
05:43:24:WU01:FS01:0x21:Unit: 0x000000abab436c9d585e0691b674b2e5
05:43:24:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
05:43:24:WU01:FS01:0x21:Machine: 1
05:43:24:WU01:FS01:0x21:Reading tar file core.xml
05:43:24:WU01:FS01:0x21:Reading tar file integrator.xml
05:43:24:WU01:FS01:0x21:Reading tar file state.xml
05:43:24:WU01:FS01:0x21:Reading tar file system.xml
05:43:24:WU01:FS01:0x21:Digital signatures verified
05:43:24:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
05:43:24:WU01:FS01:0x21:Version 0.0.18
05:43:27:WU01:FS01:0x21:Completed 0 out of 6250000 steps (0%)
05:43:27:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
05:43:44:WU00:FS00:0xa7:Completed 57500 out of 250000 steps (23%)
05:44:14:Removing old file 'configs/config-20170321-085054.xml'
05:44:14:Saving configuration to config.xml
05:44:14:<config>
05:44:14:  <!-- Folding Core -->
05:44:14:  <checkpoint v='10'/>
05:44:14:
05:44:14:  <!-- Logging -->
05:44:14:  <verbosity v='4'/>
05:44:14:
05:44:14:  <!-- Network -->
05:44:14:  <proxy v=':8080'/>
05:44:14:
05:44:14:  <!-- Slot Control -->
05:44:14:  <power v='full'/>
05:44:14:
05:44:14:  <!-- User Information -->
05:44:14:  <passkey v='********************************'/>
05:44:14:  <team v='11314'/>
05:44:14:  <user v='[TLB]Librarian'/>
05:44:14:
05:44:14:  <!-- Folding Slots -->
05:44:14:  <slot id='0' type='CPU'>
05:44:14:    <cpus v='6'/>
05:44:14:  </slot>
05:44:14:  <slot id='1' type='GPU'>
05:44:14:    <client-type v='beta'/>
05:44:14:  </slot>
05:44:14:</config>
05:44:26:WU01:FS01:0x21:Completed 62500 out of 6250000 steps (1%)
05:45:24:WU01:FS01:0x21:Completed 125000 out of 6250000 steps (2%)
05:46:22:WU01:FS01:0x21:Completed 187500 out of 6250000 steps (3%)
05:47:21:WU01:FS01:0x21:Completed 250000 out of 6250000 steps (4%)
05:48:19:WU01:FS01:0x21:Completed 312500 out of 6250000 steps (5%)
05:49:17:WU00:FS00:0xa7:Completed 60000 out of 250000 steps (24%)
05:49:18:WU01:FS01:0x21:Completed 375000 out of 6250000 steps (6%)
05:50:16:WU01:FS01:0x21:Completed 437500 out of 6250000 steps (7%)
05:51:15:WU01:FS01:0x21:Completed 500000 out of 6250000 steps (8%)
05:52:14:WU01:FS01:0x21:Completed 562500 out of 6250000 steps (9%)
05:53:13:WU01:FS01:0x21:Completed 625000 out of 6250000 steps (10%)
05:54:11:WU01:FS01:0x21:Completed 687500 out of 6250000 steps (11%)
05:54:48:WU00:FS00:0xa7:Completed 62500 out of 250000 steps (25%)
05:55:09:WU01:FS01:0x21:Completed 750000 out of 6250000 steps (12%)
05:56:08:WU01:FS01:0x21:Completed 812500 out of 6250000 steps (13%)
05:57:07:WU01:FS01:0x21:Completed 875000 out of 6250000 steps (14%)
05:58:05:WU01:FS01:0x21:Completed 937500 out of 6250000 steps (15%)
05:59:04:WU01:FS01:0x21:Completed 1000000 out of 6250000 steps (16%)
06:00:03:WU01:FS01:0x21:Completed 1062500 out of 6250000 steps (17%)
06:00:20:WU00:FS00:0xa7:Completed 65000 out of 250000 steps (26%)
06:01:01:WU01:FS01:0x21:Completed 1125000 out of 6250000 steps (18%)
06:02:00:WU01:FS01:0x21:Completed 1187500 out of 6250000 steps (19%)
06:02:58:WU01:FS01:0x21:Completed 1250000 out of 6250000 steps (20%)
06:03:57:WU01:FS01:0x21:Completed 1312500 out of 6250000 steps (21%)
06:04:56:WU01:FS01:0x21:Completed 1375000 out of 6250000 steps (22%)
06:05:51:WU00:FS00:0xa7:Completed 67500 out of 250000 steps (27%)
06:05:54:WU01:FS01:0x21:Completed 1437500 out of 6250000 steps (23%)
06:06:53:WU01:FS01:0x21:Completed 1500000 out of 6250000 steps (24%)
06:07:52:WU01:FS01:0x21:Completed 1562500 out of 6250000 steps (25%)
06:08:50:WU01:FS01:0x21:Completed 1625000 out of 6250000 steps (26%)
06:09:49:WU01:FS01:0x21:Completed 1687500 out of 6250000 steps (27%)
06:10:47:WU01:FS01:0x21:Completed 1750000 out of 6250000 steps (28%)
06:11:22:WU00:FS00:0xa7:Completed 70000 out of 250000 steps (28%)
06:11:46:WU01:FS01:0x21:Completed 1812500 out of 6250000 steps (29%)
06:12:45:WU01:FS01:0x21:Completed 1875000 out of 6250000 steps (30%)
06:13:43:WU01:FS01:0x21:Completed 1937500 out of 6250000 steps (31%)
06:14:42:WU01:FS01:0x21:Completed 2000000 out of 6250000 steps (32%)
06:15:41:WU01:FS01:0x21:Completed 2062500 out of 6250000 steps (33%)
06:16:39:WU01:FS01:0x21:Completed 2125000 out of 6250000 steps (34%)
06:16:53:WU00:FS00:0xa7:Completed 72500 out of 250000 steps (29%)
06:17:38:WU01:FS01:0x21:Completed 2187500 out of 6250000 steps (35%)
06:18:37:WU01:FS01:0x21:Completed 2250000 out of 6250000 steps (36%)
06:19:35:WU01:FS01:0x21:Completed 2312500 out of 6250000 steps (37%)
06:20:34:WU01:FS01:0x21:Completed 2375000 out of 6250000 steps (38%)
06:21:33:WU01:FS01:0x21:Completed 2437500 out of 6250000 steps (39%)
06:22:24:WU00:FS00:0xa7:Completed 75000 out of 250000 steps (30%)
06:22:31:WU01:FS01:0x21:Completed 2500000 out of 6250000 steps (40%)
06:23:30:WU01:FS01:0x21:Completed 2562500 out of 6250000 steps (41%)
06:24:28:WU01:FS01:0x21:Completed 2625000 out of 6250000 steps (42%)
06:25:27:WU01:FS01:0x21:Completed 2687500 out of 6250000 steps (43%)
06:26:26:WU01:FS01:0x21:Completed 2750000 out of 6250000 steps (44%)
06:27:25:WU01:FS01:0x21:Completed 2812500 out of 6250000 steps (45%)
06:27:56:WU00:FS00:0xa7:Completed 77500 out of 250000 steps (31%)
06:28:23:WU01:FS01:0x21:Completed 2875000 out of 6250000 steps (46%)
06:29:22:WU01:FS01:0x21:Completed 2937500 out of 6250000 steps (47%)
06:30:20:WU01:FS01:0x21:Completed 3000000 out of 6250000 steps (48%)
06:31:20:WU01:FS01:0x21:Completed 3062500 out of 6250000 steps (49%)
06:32:18:WU01:FS01:0x21:Completed 3125000 out of 6250000 steps (50%)
06:33:17:WU01:FS01:0x21:Completed 3187500 out of 6250000 steps (51%)
06:33:27:WU00:FS00:0xa7:Completed 80000 out of 250000 steps (32%)
06:34:15:WU01:FS01:0x21:Completed 3250000 out of 6250000 steps (52%)
06:35:14:WU01:FS01:0x21:Completed 3312500 out of 6250000 steps (53%)
06:36:13:WU01:FS01:0x21:Completed 3375000 out of 6250000 steps (54%)
06:37:11:WU01:FS01:0x21:Completed 3437500 out of 6250000 steps (55%)
06:38:10:WU01:FS01:0x21:Completed 3500000 out of 6250000 steps (56%)
06:38:58:WU00:FS00:0xa7:Completed 82500 out of 250000 steps (33%)
06:39:09:WU01:FS01:0x21:Completed 3562500 out of 6250000 steps (57%)
06:40:07:WU01:FS01:0x21:Completed 3625000 out of 6250000 steps (58%)
06:41:06:WU01:FS01:0x21:Completed 3687500 out of 6250000 steps (59%)
06:42:05:WU01:FS01:0x21:Completed 3750000 out of 6250000 steps (60%)
06:43:04:WU01:FS01:0x21:Completed 3812500 out of 6250000 steps (61%)
06:44:02:WU01:FS01:0x21:Completed 3875000 out of 6250000 steps (62%)
06:44:29:WU00:FS00:0xa7:Completed 85000 out of 250000 steps (34%)
06:45:01:WU01:FS01:0x21:Completed 3937500 out of 6250000 steps (63%)
06:45:59:WU01:FS01:0x21:Completed 4000000 out of 6250000 steps (64%)
06:46:59:WU01:FS01:0x21:Completed 4062500 out of 6250000 steps (65%)
06:47:57:WU01:FS01:0x21:Completed 4125000 out of 6250000 steps (66%)
06:48:56:WU01:FS01:0x21:Completed 4187500 out of 6250000 steps (67%)
06:49:54:WU01:FS01:0x21:Completed 4250000 out of 6250000 steps (68%)
06:50:00:WU00:FS00:0xa7:Completed 87500 out of 250000 steps (35%)
06:50:53:WU01:FS01:0x21:Completed 4312500 out of 6250000 steps (69%)
06:51:52:WU01:FS01:0x21:Completed 4375000 out of 6250000 steps (70%)
06:52:50:WU01:FS01:0x21:Completed 4437500 out of 6250000 steps (71%)
06:53:49:WU01:FS01:0x21:Completed 4500000 out of 6250000 steps (72%)
06:54:48:WU01:FS01:0x21:Completed 4562500 out of 6250000 steps (73%)
06:55:32:WU00:FS00:0xa7:Completed 90000 out of 250000 steps (36%)
06:55:46:WU01:FS01:0x21:Completed 4625000 out of 6250000 steps (74%)
06:56:45:WU01:FS01:0x21:Completed 4687500 out of 6250000 steps (75%)
06:57:44:WU01:FS01:0x21:Completed 4750000 out of 6250000 steps (76%)
06:58:42:WU01:FS01:0x21:Completed 4812500 out of 6250000 steps (77%)
06:59:41:WU01:FS01:0x21:Completed 4875000 out of 6250000 steps (78%)
07:00:40:WU01:FS01:0x21:Completed 4937500 out of 6250000 steps (79%)
07:01:03:WU00:FS00:0xa7:Completed 92500 out of 250000 steps (37%)
07:01:38:WU01:FS01:0x21:Completed 5000000 out of 6250000 steps (80%)
07:02:37:WU01:FS01:0x21:Completed 5062500 out of 6250000 steps (81%)
07:03:36:WU01:FS01:0x21:Completed 5125000 out of 6250000 steps (82%)
07:04:34:WU01:FS01:0x21:Completed 5187500 out of 6250000 steps (83%)
07:05:33:WU01:FS01:0x21:Completed 5250000 out of 6250000 steps (84%)
07:06:32:WU01:FS01:0x21:Completed 5312500 out of 6250000 steps (85%)
07:06:34:WU00:FS00:0xa7:Completed 95000 out of 250000 steps (38%)
07:07:30:WU01:FS01:0x21:Completed 5375000 out of 6250000 steps (86%)
07:08:29:WU01:FS01:0x21:Completed 5437500 out of 6250000 steps (87%)
07:09:27:WU01:FS01:0x21:Completed 5500000 out of 6250000 steps (88%)
07:10:27:WU01:FS01:0x21:Completed 5562500 out of 6250000 steps (89%)
07:11:25:WU01:FS01:0x21:Completed 5625000 out of 6250000 steps (90%)
07:12:05:WU00:FS00:0xa7:Completed 97500 out of 250000 steps (39%)
07:12:24:WU01:FS01:0x21:Completed 5687500 out of 6250000 steps (91%)
07:13:22:WU01:FS01:0x21:Completed 5750000 out of 6250000 steps (92%)
07:14:21:WU01:FS01:0x21:Completed 5812500 out of 6250000 steps (93%)
07:15:20:WU01:FS01:0x21:Completed 5875000 out of 6250000 steps (94%)
07:16:18:WU01:FS01:0x21:Completed 5937500 out of 6250000 steps (95%)
07:17:17:WU01:FS01:0x21:Completed 6000000 out of 6250000 steps (96%)
07:17:37:WU00:FS00:0xa7:Completed 100000 out of 250000 steps (40%)
07:18:16:WU01:FS01:0x21:Completed 6062500 out of 6250000 steps (97%)
07:19:15:WU01:FS01:0x21:Completed 6125000 out of 6250000 steps (98%)
07:20:13:WU01:FS01:0x21:Completed 6187500 out of 6250000 steps (99%)
[b]07:20:14:WU02:FS01:Connecting to 171.67.108.45:80
07:20:15:WU02:FS01:Assigned to work server 140.163.4.243
07:20:15:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:GP104 [GeForce GTX 1070] from 140.163.4.243
07:20:15:WU02:FS01:Connecting to 140.163.4.243:8080
07:20:15:WU02:FS01:Downloading 2.67MiB[/b]
07:21:12:WU01:FS01:0x21:Completed 6250000 out of 6250000 steps (100%)
07:21:12:WU01:FS01:0x21:Saving result file logfile_01.txt
07:21:12:WU01:FS01:0x21:Saving result file checkpointState.xml
07:21:12:WU01:FS01:0x21:Saving result file checkpt.crc
07:21:12:WU01:FS01:0x21:Saving result file log.txt
07:21:12:WU01:FS01:0x21:Saving result file positions.xtc
07:21:12:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
07:21:13:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
07:21:13:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:9414 run:241 clone:0 gen:146 core:0x21 unit:0x000000abab436c9d585e0691b674b2e5
07:21:13:WU01:FS01:Uploading 7.82MiB to 171.67.108.157
07:21:13:WU01:FS01:Connecting to 171.67.108.157:8080
07:21:19:WU01:FS01:Upload 15.19%
07:21:25:WU01:FS01:Upload 27.18%
07:21:31:WU01:FS01:Upload 39.18%
07:21:37:WU01:FS01:Upload 51.17%
07:21:43:WU01:FS01:Upload 63.16%
07:21:49:WU01:FS01:Upload 75.96%
07:21:55:WU01:FS01:Upload 87.95%
07:22:01:WU01:FS01:Upload 99.94%
07:22:01:WU01:FS01:Upload complete
07:22:01:WU01:FS01:Server responded WORK_ACK (400)
07:22:01:WU01:FS01:Final credit estimate, 43647.00 points
07:22:01:WU01:FS01:Cleaning up
07:23:02:WU00:FS00:0xa7:Completed 102500 out of 250000 steps (41%)
07:28:17:WU00:FS00:0xa7:Completed 105000 out of 250000 steps (42%)
07:33:33:WU00:FS00:0xa7:Completed 107500 out of 250000 steps (43%)
07:38:49:WU00:FS00:0xa7:Completed 110000 out of 250000 steps (44%)
07:44:03:WU00:FS00:0xa7:Completed 112500 out of 250000 steps (45%)
07:47:11:FS00:Paused
07:47:11:FS01:Paused
07:47:11:FS00:Shutting core down
07:47:11:WU00:FS00:0xa7:WARNING:Console control signal 1 on PID 2484
07:47:11:WU00:FS00:0xa7:Exiting, please wait. . .
07:47:12:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:47:15:Removing old file 'configs/config-20170322-052147.xml'
07:47:15:Saving configuration to config.xml
07:47:15:<config>
07:47:15:  <!-- Folding Core -->
07:47:15:  <checkpoint v='10'/>
07:47:15:
07:47:15:  <!-- Logging -->
07:47:15:  <verbosity v='4'/>
07:47:15:
07:47:15:  <!-- Network -->
07:47:15:  <proxy v=':8080'/>
07:47:15:
07:47:15:  <!-- Slot Control -->
07:47:15:  <power v='full'/>
07:47:15:
07:47:15:  <!-- User Information -->
07:47:15:  <passkey v='********************************'/>
07:47:15:  <team v='11314'/>
07:47:15:  <user v='[TLB]Librarian'/>
07:47:15:
07:47:15:  <!-- Folding Slots -->
07:47:15:  <slot id='0' type='CPU'>
07:47:15:    <cpus v='6'/>
07:47:15:    <paused v='true'/>
07:47:15:  </slot>
07:47:15:  <slot id='1' type='GPU'>
07:47:15:    <client-type v='beta'/>
07:47:15:    <paused v='true'/>
07:47:15:  </slot>
07:47:15:</config>
Snippet of the portion that is happening often to me. It goes that far, and just never finishes the download.

Code: Select all

07:20:14:WU02:FS01:Connecting to 171.67.108.45:80
07:20:15:WU02:FS01:Assigned to work server 140.163.4.243
07:20:15:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:GP104 [GeForce GTX 1070] from 140.163.4.243
07:20:15:WU02:FS01:Connecting to 140.163.4.243:8080
07:20:15:WU02:FS01:Downloading 2.67MiB

Re: Difficulty downloading GPU WU on multiple machines

Posted: Tue Mar 28, 2017 4:09 pm
by foldy
I think this is a known issue, FahClient cannot resume from a stalled download?

Re: Difficulty downloading GPU WU on multiple machines

Posted: Tue Mar 28, 2017 4:23 pm
by Joe_H
Yes, this is a known issue. The networking code in the public beta -7.4.16 - is somewhat improved, but still sometimes does not detect a stalled download or upload and retry the connection. You should only need to restart the FAHClient process, no file deletion necessary.

Common cause is usually a less than 100% reliable connection. This can be wireless, a flaky router or other network component, or even a bad cable. In my case I have seen it happen on my laptop when RF interference from a microwave temporarily messed with its wifi connection.

P.S. Please return the logging verbosity to the default setting of 3. The higher verbosity level almost never provides any extra useful information and makes it harder to diagnose an issue.

Re: Difficulty downloading GPU WU on multiple machines

Posted: Tue Mar 28, 2017 4:54 pm
by Ricorocks
My two pennies! For me this only happens on two of my three folders, both 64 bit & always late a night. New modem arrives today (fingers crossed, this helps). I've had 3 yellow circles (web control showing gpu yellow circle failed download) in the last 5 days.

Folders on ethernet, 32 bit never fails, tried it's cable on the others still Y.C, tried, switching a Y.C. to WIFI, still got Y.C. If the modem does not cure this, it's either the FAH Client, or ISP.

Any kind of software that can monitor/log network?

Re: Difficulty downloading GPU WU on multiple machines

Posted: Thu Mar 30, 2017 1:55 am
by Ricorocks
Got the new modem/router 'bridged' so it uses my asus ac router. After explaining the situation, the phone co sent, the guy to check the connection from switch to house. So having a line test by the phone company is free & eliminates cause of problem.
Hopefully now no more Y.C.

Re: Difficulty downloading GPU WU on multiple machines

Posted: Thu Mar 30, 2017 3:20 am
by Ricorocks
Well this blows 10pm checked machines, one with GTX 960 & 1060 both Y.C. I can close FAH client, from system tray, but the only way it will restart is "reboot" FAH client will not start FAH client from icon(s) :twisted:

Re: Difficulty downloading GPU WU on multiple machines

Posted: Thu Mar 30, 2017 3:31 am
by Ricorocks
DL Client 7.4.16 Is this uninstall - reinstall

or

Install over 7.4.15

Re: Difficulty downloading GPU WU on multiple machines

Posted: Thu Mar 30, 2017 9:47 am
by foldy
You need to uninstall previous FAHClient first.
Then install the 7.4.16.

Re: Difficulty downloading GPU WU on multiple machines

Posted: Fri Mar 31, 2017 5:09 pm
by Ricorocks
With 7.4.16 & the newest Nvidia driver x.92 Yellow circles again, on two gpu machine. Too early to tell on the single gpu 1050ti. Uninstall & upgrade to 7.4.16 can quit FAH, but still cannot restart it from within windows, only REBOOT starts FAH, tried run as admin, brings up windows user acct. Allow this machine to make... "yes no", say yes still won't start, must reboot.

I'm hoping this is the known flaw with two gpus, the single gpu 1050ti need to not fail!

Re: Difficulty downloading GPU WU on multiple machines

Posted: Fri Mar 31, 2017 11:08 pm
by bruce
Ricorocks wrote:I'm hoping this is the known flaw with two gpus, the single gpu 1050ti need to not fail!
The order of installation can mess up a second GPU.
First, all (both) GPUs must physically operational. Then (re-)install the drivers from the manufacturer. Then (re-)install FAHClient.
I've seen a lot of strange problems if things are done in a different order.

Perhaps Fah Client?

Posted: Sun Apr 02, 2017 2:40 pm
by Ricorocks
I understand flaw in client regarding two gpu's, in one box. Normally associated with failed dl, & client fails to restart. To eliminate 'network' as culprit. New modem, router, ethernet cables, switch, phone company tested connection from the office & my home, switched to nvidia driver xxx.92, Fah client from 7.xx.15 goes 7.xx.16. Last 5 day 4 yellow circles (web control) requires reboot to re-start.

NOTE closing FAH client, I cannot restart FAH client, by it's icon, even choosing 'run as admin'. The only way to restart FAH is a reboot:

This mornings yellow circle log showed:

Code: Select all

3:22:25:WU00:FS00:0xa7:Completed 1300000 out of 2500000 steps (52%)
13:23:05:WARNING:FS02:Size of positions 394 does not match topology 391
13:23:27:WU01:FS02:0x21:Completed 562500 out of 6250000 steps (9%)
13:24:53:WARNING:FS02:Size of positions 394 does not match topology 391
13:24:56:WU00:FS00:0xa7:Completed 1325000 out of 2500000 steps (53%)
13:25:43:WU01:FS02:0x21:Completed 625000 out of 6250000 steps (10%)
13:26:42:WARNING:FS02:Size of positions 394 does not match topology 391
13:27:27:WU00:FS00:0xa7:Completed 1350000 out of 2500000 steps (54%)
13:27:59:WU01:FS02:0x21:Completed 687500 out of 6250000 steps (11%)
13:28:31:WARNING:FS02:Size of positions 394 does not match topology 391
13:29:22:94:127.0.0.1:New Web connection
13:29:58:WU00:FS00:0xa7:Completed 1375000 out of 2500000 steps (55%)
13:30:14:WU01:FS02:0x21:Completed 750000 out of 6250000 steps (12%)
13:30:20:WARNING:FS02:Size of positions 394 does not match topology 391
13:32:08:WARNING:FS02:Size of positions 394 does not match topology 391
13:32:29:WU00:FS00:0xa7:Completed 1400000 out of 2500000 steps (56%)
13:32:31:WU01:FS02:0x21:Completed 812500 out of 6250000 steps (13%)
13:33:57:WARNING:FS02:Size of positions 394 does not match topology 391
13:34:46:WU01:FS02:0x21:Completed 875000 out of 6250000 steps (14%)
13:35:01:WU00:FS00:0xa7:Completed 1425000 out of 2500000 steps (57%)
13:35:46:WARNING:FS02:Size of positions 394 does not match topology 391
13:37:02:WU01:FS02:0x21:Completed 937500 out of 6250000 steps (15%)
13:37:32:WU00:FS00:0xa7:Completed 1450000 out of 2500000 steps (58%)
13:37:34:WARNING:FS02:Size of positions 394 does not match topology 391
13:39:18:WU01:FS02:0x21:Completed 1000000 out of 6250000 steps (16%)
13:39:23:WARNING:FS02:Size of positions 394 does not match topology 391

Questions:

1. Did the fail happen at "13:39:23, no more entries past 13:39:23?

2. Is this still a failed 'dl' or have to do with 'topology'?

3. Single gpu last 5 days zero Yellow circles

4. 32bit machine one gpu never fails, previously mentioned 32 bit machine, may dl less frequently, therefore reduced chance of fail. This seems odd, as it NEVER gets the yellow circle, for gpu.

Thanks
Rick

Re: Perhaps Fah Client?

Posted: Mon Apr 03, 2017 5:41 am
by bruce
There is no "failed download" or "failed upload" shown in your log. Those are normal messages showing processing progress
Below is a normal download.
Obviously the number of messages "Download xx.xx %" will depend on the speed of your connection and on the size of the WU

A download failure would have some "Download xx.xx% messages" but they would stop before reaching the last two lines.

Ignore intervening lines from other slots or WUs which don't match the string "WU01:FS02: or whatever you see in your example. A successful or a failed upload looks very similar.

Code: Select all

21:21:40:WU01:FS02:Requesting new work unit for slot 02: RUNNING [device] from xxx.xxx.xxx.xxx
21:21:40:WU01:FS02:Connecting to 140.163.4.242:8080
21:21:40:WU01:FS02:Downloading 3.47MiB
21:21:46:WU01:FS02:Download 45.00%
21:21:52:WU01:FS02:Download 99.01%
21:21:52:WU01:FS02:Download complete
21:21:52:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:....

Re: Difficulty downloading GPU WU on multiple machines

Posted: Wed Apr 05, 2017 2:11 pm
by Ricorocks
Thanks Bruce!

1. More than one issue can cause 'Yellow Circles" correct?

This from single gpu 64bit machine with 1050ti yellow circled this am

end of log:

Code: Select all

10:12:38:WU00:FS00:0xa4:Completed 1575000 out of 2500000 steps  (63%)
10:29:35:WU00:FS00:0xa4:Completed 1600000 out of 2500000 steps  (64%)
10:46:33:WU00:FS00:0xa4:Completed 1625000 out of 2500000 steps  (65%)
11:03:31:WU00:FS00:0xa4:Completed 1650000 out of 2500000 steps  (66%)
11:20:29:WU00:FS00:0xa4:Completed 1675000 out of 2500000 steps  (67%)
11:37:28:WU00:FS00:0xa4:Completed 1700000 out of 2500000 steps  (68%)
11:54:26:WU00:FS00:0xa4:Completed 1725000 out of 2500000 steps  (69%)
12:11:24:WU00:FS00:0xa4:Completed 1750000 out of 2500000 steps  (70%)
12:28:15:WU00:FS00:0xa4:Completed 1775000 out of 2500000 steps  (71%)
12:45:09:WU00:FS00:0xa4:Completed 1800000 out of 2500000 steps  (72%)
13:02:03:WU00:FS00:0xa4:Completed 1825000 out of 2500000 steps  (73%)
13:18:57:WU00:FS00:0xa4:Completed 1850000 out of 2500000 steps  (74%)
13:31:08:531:127.0.0.1:New Web connection
13:35:53:WU00:FS00:0xa4:Completed 1875000 out of 2500000 steps  (75%)


Note the log from the 1050ti machine, no new lines (fog) AFTER 13:35:53

In your example log of failed 'dl' the dl goes like this: requesting >connecting>downloading>download 45%, 99.01%>Download complete. What tips you off this is a failed dl?

Ist it at 21:21:52 Download Complete, the log does not continue?
___________________________________________________________

New question: I can quit FAH client: FAH's tray icon (icons near 'Time & Date' lower r corner) click choose "QUIT" FAH stops running. Fah's "DESKTOP" icon does not re-start FAH Client, Even if choosing """"run as admin""", ONLY WAY TO RESTART FAH IS, REBOOT. This SOP on two machines, one with 7.4.15 the other with 7.4.16

Re: Difficulty downloading GPU WU on multiple machines

Posted: Wed Apr 05, 2017 2:51 pm
by bruce
Ricorocks wrote:1. More than one issue can cause 'Yellow Circles" correct?
Yes.

As I said earler, you're confusing a dl failure with a failure during processing. That is NOT a failed download. As foldy said, the second log snippet in the OP is a failed download.

When a WU is issuing messages "Completed M out of N steps (xx%)" stops, the FAHCore has stopped processing. Whether it's running and is "hung" or somehow the process has aborted without a message cannot be determined from the log.

(The report of estimated progress incorrectly continues assuming a linear increments until it gets updated information from the FAHCore or until it reaches 99.99%.)

A failed download requires the restart of FAHClient. A FAHCore hang can be temporarily fixed by a pause/resume of that slot.

All the instances of a FAHCore hang that I've seen were caused by a hardware problem -- most notably by overclocking or overheating. In essence, you have an unstable computer that's capable of running an OS and one or more overclocking benchmark programs but is unstable when running FAH. Give it more margin or check the thermal grease under the heatsink and case airflow (Dust?, Too much added heat?).