Page 5 of 10
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Mon May 29, 2017 2:34 pm
by foldy
SteveWillis wrote:Mine has been running all day without missing a beat and the script hasn't triggered the pause/unpause even once. I'm going to show you my firewall settings. I messed around with them some and maybe it will be some help.
Code: Select all
Status: active
To Action From
-- ------ ----
Anywhere REJECT 171.67.108.105
Anywhere ALLOW 171.67.108.102
Anywhere REJECT OUT 171.67.108.105
Anywhere ALLOW OUT 171.67.108.102
171.67.108.102 ALLOW OUT Anywhere
171.67.108.105 REJECT OUT Anywhere
I see you also reject out going to 171.67.108.105 ?
For
LINUX this is:
Code: Select all
sudo ufw reject in from 171.67.108.105
sudo ufw reject out to 171.67.108.105
To remove when server works again:
Code: Select all
sudo ufw delete reject in from 171.67.108.105
sudo ufw delete reject out to 171.67.108.105
For
WINDOWS this is:
Code: Select all
netsh advfirewall firewall add rule name="FahClient workaround" dir=in interface=any action=block remoteip=171.67.108.105
netsh advfirewall firewall add rule name="FahClient workaround" dir=out interface=any action=block remoteip=171.67.108.105
To remove when server works again:
Code: Select all
netsh advfirewall firewall delete rule name="FahClient workaround"
Put them in a admin cmd shell and press enter key. It should say OK.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Mon May 29, 2017 6:54 pm
by PS3EdOlkkola
@foldy, thank you for the workaround. I think I'll keep this rule in place even after the server issue is fixed. Why? Because if the researcher(s) who own the project(s) on this server can't be bothered to fix the issue on a weekend, then all of my 41 slots of high-performance GPUs will go to other projects.
In the real world outside of academia, the individuals responsible for wasting untold thousand of hours of unrecoverable donor time would be unceremoniously fired. Sure, there is tremendous demand for computational biologists and data scientists, but two questions all potential employees emanating from Stanford PL/FAH should be asked by their prospective employers is this, "In your research work, did you ever have an instance where one of your computational projects became disabled? After being notified, what specifically did you do to remedy the situation, and how long did it take to get your project restored?" If the answer has any tinge of procrastination where they weren't on it like white-on-rice, and/or (as is the case here) crickets when it comes to notifying donors of the issue and what's being done to fix it, then move on to the next candidate.
Spending the better part of the weekend pausing and unpausing slots hoping to get a different server is a Whiskey Tango Foxtrot situation.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Mon May 29, 2017 7:48 pm
by foldy
Workaround was found by SteveWillis but some say it is not working and others have success.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Mon May 29, 2017 8:04 pm
by bollix47
I'm not using any form of workaround and I've had no problem with assignments since yesterday ... a few "Server did not assign ..." but only one or two at a time and work continued pretty much immediately. I have not used the Pause/fold sequence once today.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Mon May 29, 2017 9:06 pm
by rwh202
Just to echo PS3EdOlkkola, there is a serious lack of respect and responsibility being shown in the handling of this issue.
I was ill at the weekend and literally crawling from room to room to sort out stuck clients and firewall rules - that's how seriously I take my 'responsibility' to donate to this project - I just wish there was some evidence of likewise at Stanford.
They have an amazing petaflop scale resource at their disposal, but just because they aren't paying for it doesn't mean it should be taken for granted.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Mon May 29, 2017 11:34 pm
by SteveWillis
I should mention that my older machine has also not had any problem at all. Only my newer machine had the problem. I mentioned it earlier but didn't bother to include my log.
Code: Select all
*********************** Log Started 2017-05-29T23:18:46Z ***********************
23:18:46:************************* Folding@home Client *************************
23:18:46: Website: http://folding.stanford.edu/
23:18:46: Copyright: (c) 2009-2014 Stanford University
23:18:46: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:18:46: Args: --child --lifeline 1895 /etc/fahclient/config.xml --run-as
23:18:46: fahclient --pid-file=/var/run/fahclient.pid --daemon
23:18:46: Config: /etc/fahclient/config.xml
23:18:46:******************************** Build ********************************
23:18:46: Version: 7.4.4
23:18:46: Date: Mar 4 2014
23:18:46: Time: 12:02:38
23:18:46: SVN Rev: 4130
23:18:46: Branch: fah/trunk/client
23:18:46: Compiler: GNU 4.4.7
23:18:46: Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
23:18:46: -fno-unsafe-math-optimizations -msse2
23:18:46: Platform: linux2 3.2.0-1-amd64
23:18:46: Bits: 64
23:18:46: Mode: Release
23:18:46:******************************* System ********************************
23:18:46: CPU: AMD FX(tm)-8320 Eight-Core Processor
23:18:46: CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
23:18:46: CPUs: 8
23:18:46: Memory: 31.32GiB
23:18:46:Free Memory: 30.66GiB
23:18:46: Threads: POSIX_THREADS
23:18:46: OS Version: 3.19
23:18:46:Has Battery: false
23:18:46: On Battery: false
23:18:46: UTC Offset: -5
23:18:46: PID: 1897
23:18:46: CWD: /var/lib/fahclient
23:18:46: OS: Linux 3.19.0-32-generic x86_64
23:18:46: OS Arch: AMD64
23:18:46: GPUs: 6
23:18:46: GPU 0: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46: GPU 1: UNSUPPORTED: NV3 [PCI]
23:18:46: GPU 2: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46: GPU 3: UNSUPPORTED: NV3 [PCI]
23:18:46: GPU 4: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46: GPU 5: UNSUPPORTED: NV3 [PCI]
23:18:46: CUDA: 6.1
23:18:46:CUDA Driver: 8000
23:18:46:***********************************************************************
23:18:46:<config>
23:18:46: <!-- Client Control -->
23:18:46: <fold-anon v='true'/>
23:18:46:
23:18:46: <!-- Folding Core -->
23:18:46: <checkpoint v='30'/>
23:18:46:
23:18:46: <!-- Folding Slot Configuration -->
23:18:46: <cause v='HUNTINGTONS'/>
23:18:46:
23:18:46: <!-- Network -->
23:18:46: <proxy v=':8080'/>
23:18:46:
23:18:46: <!-- Slot Control -->
23:18:46: <power v='full'/>
23:18:46:
23:18:46: <!-- User Information -->
23:18:46: <passkey v='********************************'/>
23:18:46: <team v='224497'/>
23:18:46: <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
23:18:46:
23:18:46: <!-- Work Unit Control -->
23:18:46: <next-unit-percentage v='100'/>
23:18:46:
23:18:46: <!-- Folding Slots -->
23:18:46: <slot id='0' type='GPU'/>
23:18:46: <slot id='1' type='GPU'/>
23:18:46: <slot id='2' type='GPU'/>
23:18:46:</config>
23:18:46:Switching to user fahclient
23:18:46:Trying to access database...
23:18:46:Successfully acquired database lock
23:18:46:Enabled folding slot 00: READY gpu:0:GP104 [GeForce GTX 1080] 8873
23:18:46:Enabled folding slot 01: READY gpu:2:GP104 [GeForce GTX 1080] 8873
23:18:46:Enabled folding slot 02: READY gpu:4:GP104 [GeForce GTX 1080] 8873
23:18:46:WU01:FS02:Starting
23:18:46:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 2 -gpu-vendor nvidia
23:18:46:WU01:FS02:Started FahCore on PID 1907
23:18:46:WU01:FS02:Core PID:1911
23:18:46:WU01:FS02:FahCore 0x21 started
23:18:47:WU03:FS01:Starting
23:18:47:WU03:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 03 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 1 -gpu-vendor nvidia
23:18:47:WU03:FS01:Started FahCore on PID 1923
23:18:47:WU03:FS01:Core PID:1927
23:18:47:WU03:FS01:FahCore 0x18 started
23:18:48:WU02:FS00:Starting
23:18:48:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
23:18:48:WU02:FS00:Started FahCore on PID 1944
23:18:48:WU02:FS00:Core PID:1948
23:18:48:WU02:FS00:FahCore 0x21 started
23:18:50:WU02:FS00:0x21:*********************** Log Started 2017-05-29T23:18:49Z ***********************
23:18:50:WU02:FS00:0x21:Project: 11407 (Run 0, Clone 14, Gen 611)
23:18:50:WU02:FS00:0x21:Unit: 0x000003308ca304f25686b2425fe1bc9b
23:18:50:WU02:FS00:0x21:CPU: 0x00000000000000000000000000000000
23:18:50:WU02:FS00:0x21:Machine: 0
23:18:50:WU02:FS00:0x21:Digital signatures verified
23:18:50:WU02:FS00:0x21:Folding@home GPU Core21 Folding@home Core
23:18:50:WU02:FS00:0x21:Version 0.0.18
23:18:50:WU02:FS00:0x21: Found a checkpoint file
23:18:50:WU01:FS02:0x21:*********************** Log Started 2017-05-29T23:18:49Z ***********************
23:18:50:WU01:FS02:0x21:Project: 11431 (Run 2, Clone 22, Gen 47)
23:18:50:WU01:FS02:0x21:Unit: 0x000000368ca304e858e137b8db71c725
23:18:50:WU01:FS02:0x21:CPU: 0x00000000000000000000000000000000
23:18:50:WU01:FS02:0x21:Machine: 2
23:18:50:WU01:FS02:0x21:Digital signatures verified
23:18:50:WU01:FS02:0x21:Folding@home GPU Core21 Folding@home Core
23:18:50:WU01:FS02:0x21:Version 0.0.18
23:18:50:WU01:FS02:0x21: Found a checkpoint file
23:18:51:WU03:FS01:0x18:*********************** Log Started 2017-05-29T23:18:50Z ***********************
23:18:51:WU03:FS01:0x18:Project: 10490 (Run 20, Clone 0, Gen 989)
23:18:51:WU03:FS01:0x18:Unit: 0x000004738ca304f45537e8e07e943650
23:18:51:WU03:FS01:0x18:CPU: 0x00000000000000000000000000000000
23:18:51:WU03:FS01:0x18:Machine: 1
23:18:51:WU03:FS01:0x18:Digital signatures verified
23:18:51:WU03:FS01:0x18:Folding@home GPU core18
23:18:51:WU03:FS01:0x18:Version 0.0.4
23:18:51:WU03:FS01:0x18: Found a checkpoint file
23:19:04:WU02:FS00:0x21:Completed 1125000 out of 5000000 steps (22%)
23:19:04:WU02:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:19:09:WU01:FS02:0x21:Completed 3000000 out of 5000000 steps (60%)
23:19:09:WU01:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:19:10:WU03:FS01:0x18:Completed 375000 out of 5000000 steps (7%)
23:19:10:WU03:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
23:19:52:WU03:FS01:0x18:Completed 400000 out of 5000000 steps (8%)
23:19:52:WU02:FS00:0x21:Completed 1150000 out of 5000000 steps (23%)
23:21:10:WU03:FS01:0x18:Completed 450000 out of 5000000 steps (9%)
23:21:28:WU02:FS00:0x21:Completed 1200000 out of 5000000 steps (24%)
23:21:31:WU01:FS02:0x21:Completed 3050000 out of 5000000 steps (61%)
23:22:28:WU03:FS01:0x18:Completed 500000 out of 5000000 steps (10%)
23:23:03:WU02:FS00:0x21:Completed 1250000 out of 5000000 steps (25%)
23:23:51:WU03:FS01:0x18:Completed 550000 out of 5000000 steps (11%)
23:23:53:WU01:FS02:0x21:Completed 3100000 out of 5000000 steps (62%)
23:24:40:WU02:FS00:0x21:Completed 1300000 out of 5000000 steps (26%)
23:25:09:WU03:FS01:0x18:Completed 600000 out of 5000000 steps (12%)
23:26:14:WU01:FS02:0x21:Completed 3150000 out of 5000000 steps (63%)
23:26:16:WU02:FS00:0x21:Completed 1350000 out of 5000000 steps (27%)
23:26:31:WU03:FS01:0x18:Completed 650000 out of 5000000 steps (13%)
23:27:50:WU03:FS01:0x18:Completed 700000 out of 5000000 steps (14%)
23:27:52:WU02:FS00:0x21:Completed 1400000 out of 5000000 steps (28%)
23:28:36:WU01:FS02:0x21:Completed 3200000 out of 5000000 steps (64%)
23:29:08:WU03:FS01:0x18:Completed 750000 out of 5000000 steps (15%)
23:29:27:WU02:FS00:0x21:Completed 1450000 out of 5000000 steps (29%)
23:30:27:WU03:FS01:0x18:Completed 800000 out of 5000000 steps (16%)
23:30:53:WU01:FS02:0x21:Completed 3250000 out of 5000000 steps (65%)
23:30:57:WU02:FS00:0x21:Completed 1500000 out of 5000000 steps (30%)
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 4:04 am
by RABishop
I have been having trouble with the *.105 one of these servers for quite some time now. The server shows as not online, as in down, missing, useless, and more than just a little annoying, as this has been going on for over a day, maybe even longer than that. I have 6 computers with lots of gtx 1080s on them that keep running into this trash. I'm on #3 right now, and here is the log. Oh, well. I guess for this one, never mind. All the gpus are working now. It's just the cpu that is stuck. It was stuck on that one b4, but now it's reading all zeros. I just got the newest version of V7 on this machine, because it was having horrible problems. I'll have to go through the other 5 and see. But I'll still show you the log.
03:46:50:Adding folding slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873
03:46:50:Saving configuration to /etc/fahclient/config.xml
03:46:50:<config>
03:46:50: <!-- Network -->
03:46:50: <proxy v=':8080'/>
03:46:50:
03:46:50: <!-- Slot Control -->
03:46:50: <power v='full'/>
03:46:50:
03:46:50: <!-- User Information -->
03:46:50: <passkey v='********************************'/>
03:46:50: <user v='RABishop'/>
03:46:50:
03:46:50: <!-- Folding Slots -->
03:46:50: <slot id='0' type='CPU'/>
03:46:50: <slot id='1' type='GPU'/>
03:46:50: <slot id='2' type='GPU'/>
03:46:50: <slot id='3' type='GPU'/>
03:46:50:</config>
03:46:51:WU00:FS03:Connecting to 171.67.108.45:80
03:46:51:WU00:FS03:Assigned to work server 171.67.108.105
03:46:51:WU00:FS03:Requesting new work unit for slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873 from 171.67.108.105
03:46:51:WU00:FS03:Connecting to 171.67.108.105:8080
03:46:51:ERROR:WU00:FS03:Exception: Server did not assign work unit
03:46:51:WU00:FS03:Connecting to 171.67.108.45:80
03:46:51:WU00:FS03:Assigned to work server 171.67.108.105
03:46:51:WU00:FS03:Requesting new work unit for slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873 from 171.67.108.105
03:46:51:WU00:FS03:Connecting to 171.67.108.105:8080
03:46:52:ERROR:WU00:FS03:Exception: Server did not assign work unit
03:47:02:WU01:FS01:0x21:Completed 100000 out of 2500000 steps (4%)
03:47:13:WU02:FS02:0x21:Completed 72000 out of 2400000 steps (3%)
03:47:46:WU01:FS01:0x21:Completed 125000 out of 2500000 steps (5%)
03:47:47:Saving configuration to /etc/fahclient/config.xml
03:47:47:<config>
03:47:47: <!-- Network -->
03:47:47: <proxy v=':8080'/>
03:47:47:
03:47:47: <!-- Slot Control -->
03:47:47: <power v='full'/>
03:47:47:
03:47:47: <!-- User Information -->
03:47:47: <passkey v='********************************'/>
03:47:47: <user v='RABishop'/>
03:47:47:
03:47:47: <!-- Folding Slots -->
03:47:47: <slot id='0' type='CPU'/>
03:47:47: <slot id='1' type='GPU'/>
03:47:47: <slot id='2' type='GPU'/>
03:47:47: <slot id='3' type='GPU'/>
03:47:47:</config>
03:47:51:WU00:FS03:Connecting to 171.67.108.45:80
03:47:51:WU00:FS03:Assigned to work server 171.67.108.105
03:47:51:WU00:FS03:Requesting new work unit for slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873 from 171.67.108.105
03:47:51:WU00:FS03:Connecting to 171.67.108.105:8080
03:47:52:ERROR:WU00:FS03:Exception: Server did not assign work unit
03:47:52:WU02:FS02:0x21:Completed 96000 out of 2400000 steps (4%)
03:48:12:WU03:FS00:Connecting to 171.67.108.45:8080
03:48:12:WARNING:WU03:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
03:48:12:WU03:FS00:Connecting to 171.64.65.35:80
03:48:12:WARNING:WU03:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
03:48:12:ERROR:WU03:FS00:Exception: Could not get an assignment
03:48:29:WU01:FS01:0x21:Completed 150000 out of 2500000 steps (6%)
03:48:31:WU02:FS02:0x21:Completed 120000 out of 2400000 steps (5%)
03:49:09:WU02:FS02:0x21:Completed 144000 out of 2400000 steps (6%)
03:49:13:WU01:FS01:0x21:Completed 175000 out of 2500000 steps (7%)
03:49:28:WU00:FS03:Connecting to 171.67.108.45:80
03:49:29:WU00:FS03:Assigned to work server 171.67.108.105
03:49:29:WU00:FS03:Requesting new work unit for slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873 from 171.67.108.105
03:49:29:WU00:FS03:Connecting to 171.67.108.105:8080
03:49:29:ERROR:WU00:FS03:Exception: Server did not assign work unit
03:49:48:WU02:FS02:0x21:Completed 168000 out of 2400000 steps (7%)
03:49:56:WU01:FS01:0x21:Completed 200000 out of 2500000 steps (8%)
03:50:27:WU02:FS02:0x21:Completed 192000 out of 2400000 steps (8%)
03:50:41:WU01:FS01:0x21:Completed 225000 out of 2500000 steps (9%)
03:50:49:WU03:FS00:Connecting to 171.67.108.45:8080
03:50:49:WARNING:WU03:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
03:50:49:WU03:FS00:Connecting to 171.64.65.35:80
03:50:49:WARNING:WU03:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
03:50:49:ERROR:WU03:FS00:Exception: Could not get an assignment
03:51:09:WU02:FS02:0x21:Completed 216000 out of 2400000 steps (9%)
03:51:25:WU01:FS01:0x21:Completed 250000 out of 2500000 steps (10%)
03:51:49:WU02:FS02:0x21:Completed 240000 out of 2400000 steps (10%)
03:52:06:WU00:FS03:Connecting to 171.67.108.45:80
03:52:06:WU00:FS03:Assigned to work server 171.67.108.105
03:52:06:WU00:FS03:Requesting new work unit for slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873 from 171.67.108.105
03:52:06:WU00:FS03:Connecting to 171.67.108.105:8080
03:52:06:ERROR:WU00:FS03:Exception: Server did not assign work unit
03:52:09:WU01:FS01:0x21:Completed 275000 out of 2500000 steps (11%)
03:52:28:WU02:FS02:0x21:Completed 264000 out of 2400000 steps (11%)
03:52:53:WU01:FS01:0x21:Completed 300000 out of 2500000 steps (12%)
03:53:08:WU02:FS02:0x21:Completed 288000 out of 2400000 steps (12%)
03:53:37:WU01:FS01:0x21:Completed 325000 out of 2500000 steps (13%)
03:53:47:WU02:FS02:0x21:Completed 312000 out of 2400000 steps (13%)
03:54:21:WU01:FS01:0x21:Completed 350000 out of 2500000 steps (14%)
03:54:26:WU02:FS02:0x21:Completed 336000 out of 2400000 steps (14%)
03:55:03:WU03:FS00:Connecting to 171.67.108.45:8080
03:55:03:WARNING:WU03:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
03:55:03:WU03:FS00:Connecting to 171.64.65.35:80
03:55:04:WARNING:WU03:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
03:55:04:ERROR:WU03:FS00:Exception: Could not get an assignment
03:55:05:WU02:FS02:0x21:Completed 360000 out of 2400000 steps (15%)
03:55:05:WU01:FS01:0x21:Completed 375000 out of 2500000 steps (15%)
03:55:44:WU02:FS02:0x21:Completed 384000 out of 2400000 steps (16%)
03:55:49:WU01:FS01:0x21:Completed 400000 out of 2500000 steps (16%)
03:56:20:WU00:FS03:Connecting to 171.67.108.45:80
03:56:20:WU00:FS03:Assigned to work server 171.67.108.160
03:56:20:WU00:FS03:Requesting new work unit for slot 03: READY gpu:4:GP104 [GeForce GTX 1080] 8873 from 171.67.108.160
03:56:20:WU00:FS03:Connecting to 171.67.108.160:8080
03:56:23:WU00:FS03:Downloading 2.02MiB
03:56:23:WU00:FS03:Download complete
03:56:23:WU00:FS03:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9839 run:0 clone:26 gen:205 core:0x21 unit:0x000000eeab436ca05890cac62307f54f
03:56:23:WU00:FS03:Starting
03:56:23:WU00:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1459 -checkpoint 15 -gpu 2 -gpu-vendor nvidia
03:56:23:WU00:FS03:Started FahCore on PID 4042
03:56:23:WU00:FS03:Core PID:4046
03:56:23:WU00:FS03:FahCore 0x21 started
03:56:24:WU02:FS02:0x21:Completed 408000 out of 2400000 steps (17%)
03:56:24:WU00:FS03:0x21:*********************** Log Started 2017-05-30T03:56:23Z ***********************
03:56:24:WU00:FS03:0x21:Project: 9839 (Run 0, Clone 26, Gen 205)
03:56:24:WU00:FS03:0x21:Unit: 0x000000eeab436ca05890cac62307f54f
03:56:24:WU00:FS03:0x21:CPU: 0x00000000000000000000000000000000
03:56:24:WU00:FS03:0x21:Machine: 3
03:56:24:WU00:FS03:0x21:Reading tar file core.xml
03:56:24:WU00:FS03:0x21:Reading tar file integrator.xml
03:56:24:WU00:FS03:0x21:Reading tar file state.xml
03:56:24:WU00:FS03:0x21:Reading tar file system.xml
03:56:24:WU00:FS03:0x21:Digital signatures verified
03:56:24:WU00:FS03:0x21:Folding@home GPU Core21 Folding@home Core
03:56:24:WU00:FS03:0x21:Version 0.0.18
03:56:27:WU00:FS03:0x21:Completed 0 out of 2400000 steps (0%)
03:56:27:WU00:FS03:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:56:33:WU01:FS01:0x21:Completed 425000 out of 2500000 steps (17%)
03:57:03:WU02:FS02:0x21:Completed 432000 out of 2400000 steps (18%)
03:57:03:WU00:FS03:0x21:Completed 24000 out of 2400000 steps (1%)
03:57:17:WU01:FS01:0x21:Completed 450000 out of 2500000 steps (18%)
03:57:39:WU00:FS03:0x21:Completed 48000 out of 2400000 steps (2%)
03:57:41:WU02:FS02:0x21:Completed 456000 out of 2400000 steps (19%)
03:58:01:WU01:FS01:0x21:Completed 475000 out of 2500000 steps (19%)
03:58:15:WU00:FS03:0x21:Completed 72000 out of 2400000 steps (3%)
03:58:20:WU02:FS02:0x21:Completed 480000 out of 2400000 steps (20%)
03:58:45:WU01:FS01:0x21:Completed 500000 out of 2500000 steps (20%)
03:58:51:WU00:FS03:0x21:Completed 96000 out of 2400000 steps (4%)
03:58:59:WU02:FS02:0x21:Completed 504000 out of 2400000 steps (21%)
03:59:28:WU00:FS03:0x21:Completed 120000 out of 2400000 steps (5%)
03:59:30:WU01:FS01:0x21:Completed 525000 out of 2500000 steps (21%)
03:59:38:WU02:FS02:0x21:Completed 528000 out of 2400000 steps (22%)
04:00:04:WU00:FS03:0x21:Completed 144000 out of 2400000 steps (6%)
04:00:13:WU01:FS01:0x21:Completed 550000 out of 2500000 steps (22%)
04:00:17:WU02:FS02:0x21:Completed 552000 out of 2400000 steps (23%)
04:00:40:WU00:FS03:0x21:Completed 168000 out of 2400000 steps (7%)
04:00:55:WU02:FS02:0x21:Completed 576000 out of 2400000 steps (24%)
04:00:57:WU01:FS01:0x21:Completed 575000 out of 2500000 steps (23%)
04:01:17:WU00:FS03:0x21:Completed 192000 out of 2400000 steps (8%)
04:01:34:WU02:FS02:0x21:Completed 600000 out of 2400000 steps (25%)
04:01:41:WU01:FS01:0x21:Completed 600000 out of 2500000 steps (24%)
04:01:53:WU00:FS03:0x21:Completed 216000 out of 2400000 steps (9%)
04:01:55:WU03:FS00:Connecting to 171.67.108.45:8080
04:01:55:WARNING:WU03:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
04:01:55:WU03:FS00:Connecting to 171.64.65.35:80
04:01:55:WARNING:WU03:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
04:01:55:ERROR:WU03:FS00:Exception: Could not get an assignment
04:02:13:WU02:FS02:0x21:Completed 624000 out of 2400000 steps (26%)
04:02:26:WU01:FS01:0x21:Completed 625000 out of 2500000 steps (25%)
04:02:29:WU00:FS03:0x21:Completed 240000 out of 2400000 steps (10%)
It hasn't been running long.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 5:29 am
by TristanChen
PS3EdOlkkola's sentiments are spot on. Letting scores of petaflops of research of computing power go to waste is a damn shame in any setting. Savvy donors will take their computing power elsewhere. The guys who keep coming back to prod for action are the guys who care... If you look at the top 10 folders in the world, nearly all of them have had their output cut from a third to a half. Some (like #2 folder in the world msi_TW) have quit entirely since the issue began.
Judging from my 24 GPUs, 171.67.108.45/171.67.108.105 is pretty much causing 100% of the issues, and has been for the past weekend. Even after implementing the firewall rules suggested on this forum, FAH still calls the server and stalls. I've had to manually shut down the client and restart multiple times (sometimes as many as 10 times) until the client stops calling 171.67.108.45. As of an hour ago, 80% of WU requests from my computers at home are still pointing to 171.67.108.45, and this appears to be especially the case for high-end Nvidia 10x-series on my most productive machines. Over an entire unsupervised day, the failure rate of these cards are basically 100%.
My father died of cancer 3 years ago. Since then, I've dedicated myself to becoming a FAH billionaire. Currently running 8 1080Tis, 2 1080s, 9 1070s, 1 1060, 1 690, 1 7970, 1 670, and baby r7-260 through several different accounts... And yes, to echo PS3EdOlkkola, this past weekend has been a whiskey-tango-foxtrot episode from the twilight zone.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 6:20 am
by TristanChen
Just reconfirming that 171.67.108.45/171.67.108.105 combo is still causing nearly 100% of my Nvidia 10x-series cards to stall this morning. It seems to be the default server for these cards, and is called 8 out of 10 times, meaning that over 24 unsupervised hrs folding slots with these newer Nvidia cards will stall nearly 100% of the time. For Windows 10, the previously suggested firewall settings did not fix the issue for me, and so I've had to manually shut down and relaunch FAHClient sometimes as many as 10 times before a different, working server is called. Really really frustrating. It looks like some of our top folders have already quit (e.g. msi_TW) and the output of nearly everyone in the top 10 worldwide have been cut by 1/3 to 1/2. Quite the fiasco.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 9:03 am
by midnytwarrior
Hello all
Although in my case the workaround from foldy initially worked for a full day yesterday (Thank You Foldy!).
However, now it's not working.
Did it over and over again and even restarted my system.
Did the pause and un-pause method and still being directed to that *105 server
I hope someone from Stanford be able to rectify this issue.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 10:51 am
by boristsybin
i switched my (ge)Force(s) to the Dark Side (mining)
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 12:28 pm
by PS3EdOlkkola
@Adam A. Wanderer 41 GPU folding slots plus 1 Intel Phi 7210 (256 CPUs) delivers between 34 and 35 million points per day when there are no issues. The last two days have been between 40% and 75% of the normal total, and only that high because I've had to stay on top of pausing/unpausing slots every couple of hours since Friday afternoon.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 1:00 pm
by Serge_Grenier
Seems <client-type v='beta'/> is working to get WUs since yesterday.
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 1:07 pm
by Nert
This whole episode is sad and disrespectful to the people that contribute to this project. Two questions come to mind:
1) Why do the volunteer contributors have a sense of urgency and those responsible for the project do not ?
2) These problems ALWAYS seem to happen over holiday weekends. Is everything so fragile that it fails when no one is there to hand hold the systems and keep them running ?
Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105
Posted: Tue May 30, 2017 2:33 pm
by boristsybin
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.
seems it works