Page 1 of 2

Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 11:59 am
by ComputerGenie
New thread because I keep getting merged with issues that are "fixed" (and threats about "we do have ways we can deal with that type of activity" if I post on any other 171.67.108.xx thread, because I'm "trying to hijack the topic").

Everyone says I need to provide a log, I ask from which system and get no answer, so we'll randomly start with the Win 7 rig (because it's got the lowest IP on my LAN - I guess that's as good a reason as any to pick 1 random rig out of multiples to provide a log for) and go from there (yes, the verbosity is set to 5 because there's no obvious indicators at 3 or 4).

Code: Select all

*********************** Log Started 2017-06-12T11:36:03Z ***********************
11:36:03:************************* Folding@home Client *************************
11:36:03:        Website: http://folding.stanford.edu/
11:36:03:      Copyright: (c) 2009-2016 Stanford University
11:36:03:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
11:36:03:           Args: 
11:36:03:         Config: C:\Users\ComputerGenie\AppData\Roaming\FAHClient\config.xml
11:36:03:******************************** Build ********************************
11:36:03:        Version: 7.4.16
11:36:03:           Date: Jan 6 2017
11:36:03:           Time: 00:25:14
11:36:03:     Repository: Git
11:36:03:       Revision: a9e9e27dc2ee6ff01398c439677bc27f6cb74032
11:36:03:         Branch: master
11:36:03:       Compiler: Visual C++ 2008
11:36:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox -arch:SSE /MT
11:36:03:       Platform: win32 10
11:36:03:           Bits: 32
11:36:03:           Mode: Release
11:36:03:******************************* System ********************************
11:36:03:            CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
11:36:03:         CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
11:36:03:           CPUs: 8
11:36:03:         Memory: 63.94GiB
11:36:03:    Free Memory: 59.71GiB
11:36:03:        Threads: WINDOWS_THREADS
11:36:03:     OS Version: 6.1
11:36:03:    Has Battery: false
11:36:03:     On Battery: false
11:36:03:     UTC Offset: -5
11:36:03:            PID: 4256
11:36:03:            CWD: C:\Users\ComputerGenie\AppData\Roaming\FAHClient
11:36:03:             OS: Windows 7 Ultimate
11:36:03:        OS Arch: AMD64
11:36:03:           GPUs: 0
11:36:03:  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:6.1 Driver:8.0
11:36:03:  CUDA Device 1: Platform:0 Device:1 Bus:1 Slot:0 Compute:6.1 Driver:8.0
11:36:03:OpenCL Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:382.53
11:36:03:OpenCL Device 1: Platform:0 Device:1 Bus:1 Slot:0 Compute:1.2 Driver:382.53
11:36:03:  Win32 Service: false
11:36:03:***********************************************************************
11:36:03:<config>
11:36:03:  <service-description v='Folding@home Client'/>
11:36:03:  <service-restart v='true'/>
11:36:03:  <service-restart-delay v='5000'/>
11:36:03:
11:36:03:  <!-- Client Control -->
11:36:03:  <client-threads v='6'/>
11:36:03:  <cycle-rate v='4'/>
11:36:03:  <cycles v='-1'/>
11:36:03:  <data-directory v='.'/>
11:36:03:  <disable-sleep-when-active v='true'/>
11:36:03:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
11:36:03:  <exit-when-done v='false'/>
11:36:03:  <fold-anon v='false'/>
11:36:03:  <open-web-control v='false'/>
11:36:03:
11:36:03:  <!-- Configuration -->
11:36:03:  <config-rotate v='true'/>
11:36:03:  <config-rotate-dir v='configs'/>
11:36:03:  <config-rotate-max v='16'/>
11:36:03:
11:36:03:  <!-- Debugging -->
11:36:03:  <assignment-servers>
11:36:03:    assign3.stanford.edu:8080 assign4.stanford.edu:80
11:36:03:  </assignment-servers>
11:36:03:  <auth-as v='true'/>
11:36:03:  <capture-directory v='capture'/>
11:36:03:  <capture-on-error v='false'/>
11:36:03:  <capture-packets v='false'/>
11:36:03:  <capture-requests v='false'/>
11:36:03:  <capture-responses v='false'/>
11:36:03:  <capture-sockets v='false'/>
11:36:03:  <core-exec v='FahCore_$type'/>
11:36:03:  <core-wrapper-exec v='FAHCoreWrapper'/>
11:36:03:  <debug-sockets v='false'/>
11:36:03:  <exception-locations v='true'/>
11:36:03:  <gpu-assignment-servers>
11:36:03:    assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
11:36:03:  </gpu-assignment-servers>
11:36:03:  <stack-traces v='false'/>
11:36:03:
11:36:03:  <!-- Error Handling -->
11:36:03:  <max-slot-errors v='10'/>
11:36:03:  <max-unit-errors v='5'/>
11:36:03:
11:36:03:  <!-- Folding Core -->
11:36:03:  <checkpoint v='30'/>
11:36:03:  <core-dir v='cores'/>
11:36:03:  <core-priority v='idle'/>
11:36:03:  <cpu-affinity v='false'/>
11:36:03:  <cpu-usage v='100'/>
11:36:03:  <gpu-usage v='100'/>
11:36:03:  <no-assembly v='false'/>
11:36:03:
11:36:03:  <!-- Folding Slot Configuration -->
11:36:03:  <cause v='ALZHEIMERS'/>
11:36:03:  <client-subtype v='STDCLI'/>
11:36:03:  <client-type v='normal'/>
11:36:03:  <cpu-species v='X86_PENTIUM_II'/>
11:36:03:  <cpu-type v='AMD64'/>
11:36:03:  <cpus v='-1'/>
11:36:03:  <disable-viz v='false'/>
11:36:03:  <gpu v='true'/>
11:36:03:  <max-packet-size v='normal'/>
11:36:03:  <os-species v='WIN_7'/>
11:36:03:  <os-type v='WIN32'/>
11:36:03:  <project-key v='0'/>
11:36:03:  <smp v='true'/>
11:36:03:
11:36:03:  <!-- GUI -->
11:36:03:  <gui-enabled v='true'/>
11:36:03:
11:36:03:  <!-- HTTP Server -->
11:36:03:  <allow v='127.0.0.1'/>
11:36:03:  <connection-timeout v='60'/>
11:36:03:  <deny v='0/0'/>
11:36:03:  <http-addresses v='0:7396'/>
11:36:03:  <https-addresses v=''/>
11:36:03:  <max-connect-time v='900'/>
11:36:03:  <max-connections v='800'/>
11:36:03:  <max-request-length v='52428800'/>
11:36:03:  <min-connect-time v='300'/>
11:36:03:
11:36:03:  <!-- Logging -->
11:36:03:  <log v='log.txt'/>
11:36:03:  <log-color v='false'/>
11:36:03:  <log-crlf v='true'/>
11:36:03:  <log-date v='false'/>
11:36:03:  <log-date-periodically v='21600'/>
11:36:03:  <log-domain v='false'/>
11:36:03:  <log-header v='true'/>
11:36:03:  <log-level v='true'/>
11:36:03:  <log-no-info-header v='true'/>
11:36:03:  <log-redirect v='false'/>
11:36:03:  <log-rotate v='true'/>
11:36:03:  <log-rotate-dir v='logs'/>
11:36:03:  <log-rotate-max v='16'/>
11:36:03:  <log-short-level v='false'/>
11:36:03:  <log-simple-domains v='true'/>
11:36:03:  <log-thread-id v='false'/>
11:36:03:  <log-thread-prefix v='true'/>
11:36:03:  <log-time v='true'/>
11:36:03:  <log-to-screen v='true'/>
11:36:03:  <log-truncate v='false'/>
11:36:03:  <verbosity v='5'/>
11:36:03:
11:36:03:  <!-- Network -->
11:36:03:  <proxy v=':8080'/>
11:36:03:  <proxy-enable v='false'/>
11:36:03:  <proxy-pass v=''/>
11:36:03:  <proxy-user v=''/>
11:36:03:
11:36:03:  <!-- Process Control -->
11:36:03:  <child v='false'/>
11:36:03:  <daemon v='false'/>
11:36:03:  <pid v='false'/>
11:36:03:  <pid-file v='Folding@home Client.pid'/>
11:36:03:  <respawn v='false'/>
11:36:03:  <service v='false'/>
11:36:03:
11:36:03:  <!-- Remote Command Server -->
11:36:03:  <command-address v='0.0.0.0'/>
11:36:03:  <command-allow-no-pass v='127.0.0.1'/>
11:36:03:  <command-deny-no-pass v='0/0'/>
11:36:03:  <command-enable v='true'/>
11:36:03:  <command-port v='36330'/>
11:36:03:
11:36:03:  <!-- Slot Control -->
11:36:03:  <idle v='false'/>
11:36:03:  <max-shutdown-wait v='60'/>
11:36:03:  <pause-on-battery v='false'/>
11:36:03:  <pause-on-start v='true'/>
11:36:03:  <paused v='false'/>
11:36:03:  <power v='full'/>
11:36:03:
11:36:03:  <!-- User Information -->
11:36:03:  <machine-id v='0'/>
11:36:03:  <passkey v='********************************'/>
11:36:03:  <team v='*******'/>
11:36:03:  <user v='********************************'/>
11:36:03:
11:36:03:  <!-- Web Server -->
11:36:03:  <web-allow v='127.0.0.1'/>
11:36:03:  <web-deny v='0/0'/>
11:36:03:  <web-enable v='true'/>
11:36:03:
11:36:03:  <!-- Web Server Sessions -->
11:36:03:  <session-cookie v='sid'/>
11:36:03:  <session-lifetime v='86400'/>
11:36:03:  <session-timeout v='3600'/>
11:36:03:
11:36:03:  <!-- Work Unit Control -->
11:36:03:  <dump-after-deadline v='true'/>
11:36:03:  <max-queue v='16'/>
11:36:03:  <max-units v='0'/>
11:36:03:  <next-unit-percentage v='99'/>
11:36:03:  <stall-detection-enabled v='false'/>
11:36:03:  <stall-percent v='5'/>
11:36:03:  <stall-timeout v='1800'/>
11:36:03:
11:36:03:  <!-- Folding Slots -->
11:36:03:  <slot id='0' type='GPU'>
11:36:03:    <next-unit-percentage v='98'/>
11:36:03:    <paused v='true'/>
11:36:03:  </slot>
11:36:03:  <slot id='1' type='GPU'>
11:36:03:    <next-unit-percentage v='98'/>
11:36:03:  </slot>
11:36:03:</config>
11:36:03:Connecting to assign-GPU.stanford.edu:80
11:36:03:Updated GPUs.txt
11:36:03:Read GPUs.txt
11:36:03:Trying to access database...
11:36:03:Successfully acquired database lock
....
11:36:22:FS00:Unpaused
11:36:22:FS01:Unpaused
11:36:23:WU00:FS00:Connecting to 171.67.108.45:80
11:36:23:WU01:FS01:Connecting to 171.67.108.45:80
11:36:23:WU00:FS00:Connecting to 171.67.108.45:80
11:36:23:WU00:FS00:Assigned to work server 171.67.108.157
11:36:23:WU01:FS01:Connecting to 171.67.108.45:80
11:36:23:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:GP102 [GeForce GTX 1080 Ti] 11380 from 171.67.108.157
11:36:23:WU00:FS00:Connecting to 171.67.108.157:8080
11:36:24:WU01:FS01:Assigned to work server 171.67.108.102
11:36:24:WU00:FS00:Downloading 5.16MiB
11:36:24:WU01:FS01:Requesting new work unit for slot 01: READY gpu:1:GP104 [GeForce GTX 1080] 8873 from 171.67.108.102
11:36:24:WU01:FS01:Connecting to 171.67.108.102:8080
11:36:34:WU01:FS01:Downloading 7.06MiB
11:36:40:WU01:FS01:Download 94.77%
11:36:40:WU01:FS01:Download complete
11:36:40:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13204 run:31 clone:16 gen:98 core:0x21 unit:0x00000035ab436c6657894f0c0a705f85
User and team are ****ed out because, aside from being irrelevant, not all rigs are on same team or same user (but all are suffering the same issue).
Downloads just "hang" and there is nothing further in the log about that slot. FS01 got work and FS00 did not.
The "cause" setting doesn't matter (it happens on "any" or any individual setting).
As you can see, it's not an "internet connection issue" (since 1 of the 2 cards got work). I await further instructions/advice....

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 3:59 pm
by bollix47
Okay there is at least one obvious problem in your log:

11:36:03: GPUs: 0

That usually indicates your drivers are not new enough to support those GPUs although it may have other meanings such as no GPUs.txt file but there's no sign of that in the log.

In this case I would start by setting the client to FINISH, when the WU is finished close the client and install the latest drivers from nvidia. This should ensure that they support your GPUs.

Then re-install the client by uninstalling including data (don't save your config.xml), re-run the installer, setup your folding info and configuration and try to Fold again.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:12 pm
by ComputerGenie
bollix47 wrote:Okay there is at least one obvious problem in your log:

11:36:03: GPUs: 0

That usually indicates your drivers are not new enough to support those GPUs although it may have other meanings such as no GPUs.txt file but there's no sign of that in the log.

In this case I would start by setting the client to FINISH, when the WU is finished close the client and install the latest drivers from nvidia. This should ensure that they support your GPUs.

Then re-install the client by uninstalling including data, re-run the installer and try to Fold again.
That was just a 1st run thing, the close/restart =

Code: Select all

12:00:42:           GPUs: 2
12:00:42:          GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:7 GP102 [GeForce GTX 1080 Ti] 11380
12:00:42:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1080] 8873
and Driver:382.53 is the newest (3 days old - since nothing else worked)

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:13 pm
by bruce
1) please reset verbosity to the default value.
2) The key piece of information in that log is GPUs: 0. Why do you need to restart?
3) Are you running a copy of 32-bit Windows in a virtual machine? The emulated GPUs created by VMs are typically not supported (just as they are not supported when running in a Windows service). They have to be real GPUs.

Also, I strongly recommend a 64-bit OS.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:15 pm
by ComputerGenie
bruce wrote:1) please reset verbosity to the default value.
2) The key piece of information in that log is GPUs: 0
3) Are you running a copy of 32-bit Windows in a virtual machine? The emulated GPUs created by VMs are typically not supported (just as they are not supported when running in a Windows service). They have to be real GPUs.

Also, I strongly recommend a 64-bit OS.
That log is from a 64-bit Win 7 physical machine (with real GPUs).

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:19 pm
by bruce
Suggestion: There may be a problem with assignments downloaded with no functional GPUs. Set pause-on-start=true and fix whatever problem causes the GPU:0 error before you download a new assignment.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:20 pm
by bollix47
That was just a 1st run thing, the close/restart =

Code: Select all

12:00:42:           GPUs: 2
12:00:42:          GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:7 GP102 [GeForce GTX 1080 Ti] 11380
12:00:42:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1080] 8873
and Driver:382.53 is the newest (3 days old - since nothing else worked)
When you installed those 3 day old drivers did you re-install the Folding@home software? If not then try that now as described in my previous post.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:23 pm
by foldy
Now "GPUs: 2" looks good.

The original problem I think is that download never finishes.
11:36:24:WU00:FS00:Downloading 5.16MiB

I think that is a known issue were a downloading work unit gets stuck sometimes.
The only workaround I know is to restart the FahClient.

The question is why your PCs get this error more often?

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:29 pm
by ComputerGenie
foldy wrote:...The original problem I think is that download never finishes.
11:36:24:WU00:FS00:Downloading 5.16MiB

I think that is a known issue were a downloading work unit gets stuck sometimes...
That's what I've been saying. :(
foldy wrote:...The only workaround I know is to restart the FahClient...
Given that it happened to 3 separate systems more than 60 times over the weekend, there's little chance of my continuing to do that. :(
foldy wrote:...The question is why your PCs get this error more often?
If I knew that, I'd have ~3 million more points. :P

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 4:40 pm
by ComputerGenie
bruce wrote:Suggestion: There may be a problem with assignments downloaded with no functional GPUs. Set pause-on-start=true and fix whatever problem causes the GPU:0 error before you download a new assignment.
Because of what all loads on any given rig, That's already set. The "0" issue was specifically related to that particular start (maybe due to clearing all of the temp files except themes and config).
bollix47 wrote:When you installed those 3 day old drivers did you re-install the Folding@home software? If not then try that now as described in my previous post.
Yeah, I've tried straight update, update with reinstall, and any number of different possibilities as far as the chicken/egg of drivers and F@H.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 5:11 pm
by rwh202
That log shows a failed download from 171.67.108.157:8080 and a successful one from 171.67.108.102:8080

Have you ever had success from .157 or a failure from another server that has sometimes worked e.g. .102

Just trying to ascertain whether it's just one server or not.

Also, what happens when you try to access those addresses from a browser on your LAN, do you get the WS splash page for both?

Finally, are all your clients at the same location on the same LAN or is this affecting you in multiple locations?

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 5:22 pm
by ComputerGenie
rwh202 wrote:That log shows a failed download from 171.67.108.157:8080 and a successful one from 171.67.108.102:8080

Have you ever had success from .157 or a failure from another server that has sometimes worked e.g. .102...
It has happened with more than one (ironically, .102 is usually the most afflicted; however, that could just be because it's also the one most designated).
rwh202 wrote:...Also, what happens when you try to access those addresses from a browser on your LAN, do you get the WS splash page for both?...
Yes, the "splash" shows fine, as are pings and tracert (usually under 10 hops).

rwh202 wrote:...Finally, are all your clients at the same location on the same LAN or is this affecting you in multiple locations?
Same geophysical location, 2 separate buildings, and 1 rig is on a different LAN. <- if that's at all possible to follow in a text-based forum.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 7:16 pm
by ComputerGenie
Is there a possible chance that it's a time sync issue (either in the software or on my end)?
I'm looking at the most recent WU...
on the "Status" tab, it says "Assigned: 2017-06-12T19:02:46Z"
in the log is: "19:02:22:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)"
Meaning that 24 seconds before status claims it was assigned, it was 3% done. :shock:

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 7:56 pm
by rwh202
ComputerGenie wrote:Is there a possible chance that it's a time sync issue (either in the software or on my end)?
I'm looking at the most recent WU...
on the "Status" tab, it says "Assigned: 2017-06-12T19:02:46Z"
in the log is: "19:02:22:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)"
Meaning that 24 seconds before status claims it was assigned, it was 3% done. :shock:
That's an interesting observation. Not sure I've ever seen that - just checked on 3 slots and the assigned time matches the download entry in the log to within 2 seconds. Can't understand how it could impact things to that extent but running out of other ideas. Is your system time accurate or is it the server that's out?

With the separate LANs is there any shared hardware at all? The only times I've had similar issues, it was an overheating network switch that would drop packets when under load and then a dodgy cable, both causing permanently hung downloads in FAHClient.

Re: Sporadic issues with 171.67.108.xx servers

Posted: Mon Jun 12, 2017 8:13 pm
by ComputerGenie
rwh202 wrote:...Can't understand how it could impact things to that extent but running out of other ideas. Is your system time accurate or is it the server that's out?

With the separate LANs is there any shared hardware at all? The only times I've had similar issues, it was an overheating network switch that would drop packets when under load and then a dodgy cable, both causing permanently hung downloads in FAHClient.
All of my systems are are synced at least once per day (I have one that is synced 3 times per day with a specific German server, but that's totally unrelated).
The only hardware that all 3 systems share is a single 4-port switch (and technically a modem).