Page 1 of 2

unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Sun Mar 22, 2015 10:14 am
by hisui
I can connect that server in browser (401 HTTP UNAUTHORIZED /) both ping and trace seems fine.This problem has lasted over 24 hours
also http://assign3.stanford.edu:8080/ doesn't load complete(always stuck at 80-90%)
other server such as 155.247.166.219:8080 works fine.(seems they are hosted in other network)
ping result
Pinging 171.64.65.124 with 32 bytes of data:
Reply from 171.64.65.124: bytes=32 time=170ms TTL=44
Reply from 171.64.65.124: bytes=32 time=171ms TTL=44
Reply from 171.64.65.124: bytes=32 time=172ms TTL=44
Reply from 171.64.65.124: bytes=32 time=170ms TTL=44

tcping result
Probing 171.64.65.124:8080/tcp - Port is open - time=175.356ms
Probing 171.64.65.124:8080/tcp - Port is open - time=170.436ms
Probing 171.64.65.124:8080/tcp - Port is open - time=169.812ms
Probing 171.64.65.124:8080/tcp - Port is open - time=169.943ms

error log see below

Code: Select all

22:20:58:Trying to access database...
22:20:58:Successfully acquired database lock
22:20:58:Enabled folding slot 00: READY smp:2
22:20:58:WU00:FS00:Starting
22:20:58:WU00:FS00:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe E:/FAHCdata/cores/web.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 702 -lifeline 3156 -checkpoint 15 -np 2
22:20:58:WU00:FS00:Started FahCore on PID 3236
22:20:58:WU00:FS00:Core PID:3248
22:20:58:WU00:FS00:FahCore 0xa4 started
22:20:59:WU00:FS00:0xa4:
22:20:59:WU00:FS00:0xa4:*------------------------------*
22:20:59:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
22:20:59:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
22:20:59:WU00:FS00:0xa4:
22:20:59:WU00:FS00:0xa4:Preparing to commence simulation
22:20:59:WU00:FS00:0xa4:- Looking at optimizations...
22:20:59:WU00:FS00:0xa4:- Files status OK
22:20:59:WU00:FS00:0xa4:- Expanded 118604 -> 268860 (decompressed 226.6 percent)
22:20:59:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=118604 data_size=268860, decompressed_data_size=268860 diff=0
22:20:59:WU00:FS00:0xa4:- Digital signature verified
22:20:59:WU00:FS00:0xa4:
22:20:59:WU00:FS00:0xa4:Project: 6397 (Run 37, Clone 6, Gen 85)
22:20:59:WU00:FS00:0xa4:
22:20:59:WU00:FS00:0xa4:Assembly optimizations on if available.
22:20:59:WU00:FS00:0xa4:Entering M.D.
22:21:00:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
22:21:04:WU00:FS00:0xa4:Using Gromacs checkpoints
22:21:04:WU00:FS00:0xa4:Mapping NT from 2 to 2 
22:21:05:WU00:FS00:0xa4:Resuming from checkpoint
22:21:05:WU00:FS00:0xa4:Verified 00/wudata_01.log
22:21:05:WU00:FS00:0xa4:Verified 00/wudata_01.trr
22:21:05:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
22:21:05:WU00:FS00:0xa4:Verified 00/wudata_01.edr
22:21:05:WU00:FS00:0xa4:Completed 4146440 out of 5000000 steps  (82%)
22:21:30:WU00:FS00:0xa4:Completed 4150000 out of 5000000 steps  (83%)
22:27:33:WU00:FS00:0xa4:Completed 4200000 out of 5000000 steps  (84%)
22:33:34:WU00:FS00:0xa4:Completed 4250000 out of 5000000 steps  (85%)
22:39:35:WU00:FS00:0xa4:Completed 4300000 out of 5000000 steps  (86%)
22:45:35:WU00:FS00:0xa4:Completed 4350000 out of 5000000 steps  (87%)
22:51:37:WU00:FS00:0xa4:Completed 4400000 out of 5000000 steps  (88%)
22:57:37:WU00:FS00:0xa4:Completed 4450000 out of 5000000 steps  (89%)
23:03:39:WU00:FS00:0xa4:Completed 4500000 out of 5000000 steps  (90%)
23:09:41:WU00:FS00:0xa4:Completed 4550000 out of 5000000 steps  (91%)
23:15:43:WU00:FS00:0xa4:Completed 4600000 out of 5000000 steps  (92%)
23:21:45:WU00:FS00:0xa4:Completed 4650000 out of 5000000 steps  (93%)
23:27:46:WU00:FS00:0xa4:Completed 4700000 out of 5000000 steps  (94%)
23:33:48:WU00:FS00:0xa4:Completed 4750000 out of 5000000 steps  (95%)
23:39:51:WU00:FS00:0xa4:Completed 4800000 out of 5000000 steps  (96%)
23:45:53:WU00:FS00:0xa4:Completed 4850000 out of 5000000 steps  (97%)
23:51:56:WU00:FS00:0xa4:Completed 4900000 out of 5000000 steps  (98%)
23:58:02:WU00:FS00:0xa4:Completed 4950000 out of 5000000 steps  (99%)
23:58:03:WU01:FS00:Connecting to assign3.stanford.edu:8080
23:58:09:WU01:FS00:News: 
23:58:09:WU01:FS00:Assigned to work server 171.64.65.124
23:58:09:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:2 from 171.64.65.124
23:58:09:WU01:FS00:Connecting to 171.64.65.124:8080
23:58:12:WU01:FS00:Downloading 897.22KiB
23:58:56:WU01:FS00:Download 7.13%
23:59:18:WU01:FS00:Download 14.27%
00:04:07:WU00:FS00:0xa4:Completed 5000000 out of 5000000 steps  (100%)
00:04:07:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
00:04:17:WU00:FS00:0xa4:
00:04:17:WU00:FS00:0xa4:Finished Work Unit:
00:04:17:WU00:FS00:0xa4:- Reading up to 1254528 from "00/wudata_01.trr": Read 1254528
00:04:17:WU00:FS00:0xa4:trr file hash check passed.
00:04:17:WU00:FS00:0xa4:- Reading up to 110140 from "00/wudata_01.xtc": Read 110140
00:04:17:WU00:FS00:0xa4:xtc file hash check passed.
00:04:17:WU00:FS00:0xa4:edr file hash check passed.
00:04:17:WU00:FS00:0xa4:logfile size: 88603
00:04:17:WU00:FS00:0xa4:Leaving Run
00:04:18:WU00:FS00:0xa4:- Writing 1525171 bytes of core data to disk...
00:04:19:WU00:FS00:0xa4:Done: 1524659 -> 1290327 (compressed to 84.6 percent)
00:04:19:WU00:FS00:0xa4:  ... Done.
00:04:19:WU00:FS00:0xa4:- Shutting down core
00:04:19:WU00:FS00:0xa4:
00:04:19:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
00:04:19:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
00:04:19:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:6397 run:37 clone:6 gen:85 core:0xa4 unit:0x000000600002894b5462c9a4a9d5ac50
00:04:19:WU00:FS00:Uploading 1.23MiB to 155.247.166.219
00:04:19:WU00:FS00:Connecting to 155.247.166.219:8080
00:04:32:WU00:FS00:Upload 10.15%
00:04:40:WU00:FS00:Upload 20.31%
00:04:46:WU00:FS00:Upload 25.39%
00:04:55:WU00:FS00:Upload 35.54%
00:05:01:WU00:FS00:Upload 45.69%
00:05:10:WU00:FS00:Upload 55.85%
00:05:18:WU00:FS00:Upload 66.00%
00:05:26:WU00:FS00:Upload 76.16%
00:05:35:WU00:FS00:Upload 86.31%
00:05:45:WU00:FS00:Upload 96.46%
00:05:49:WU00:FS00:Upload complete
00:05:49:WU00:FS00:Server responded WORK_ACK (400)
00:05:49:WU00:FS00:Final credit estimate, 1239.00 points
00:05:49:WU00:FS00:Cleaning up
******************************** Date: 22/03/15 ********************************
09:52:36:Lost lifeline PID 3132, exiting
09:52:37:Server connection id=1 ended
another example

Code: Select all

*********************** Log Started 2015-03-21T13:50:11Z ***********************
13:50:11:************************* Folding@home Client *************************
13:50:11:      Website: http://folding.stanford.edu/
13:50:11:    Copyright: (c) 2009-2012 Stanford University
13:50:11:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:50:11:         Args: --lifeline 3300 --command-port=36330
13:50:11:       Config: E:/FAHCdata/config.xml
13:50:11:******************************** Build ********************************
13:50:11:      Version: 7.2.9
13:50:11:         Date: Oct 3 2012
13:50:11:         Time: 18:05:48
13:50:11:      SVN Rev: 3578
13:50:11:       Branch: fah/trunk/client
13:50:11:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
13:50:11:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
13:50:11:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
13:50:11:     Platform: win32 XP
13:50:11:         Bits: 32
13:50:11:         Mode: Release
13:50:11:******************************* System ********************************
13:50:11:          CPU: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
13:50:11:       CPU ID: GenuineIntel Family 6 Model 23 Stepping 10
13:50:11:         CPUs: 4
13:50:11:       Memory: 3.93GiB
13:50:11:  Free Memory: 2.99GiB
13:50:11:      Threads: WINDOWS_THREADS
13:50:11:   On Battery: false
13:50:11:   UTC offset: 8
13:50:11:          PID: 1920
13:50:11:          CWD: E:/FAHCdata
13:50:11:           OS: Microsoft Windows Server 2003 Service Pack 2
13:50:11:      OS Arch: X86
13:50:11:         GPUs: 1
13:50:11:        GPU 0: NVIDIA:1 G98 [GeForce 9300 GE]
13:50:11:         CUDA: Not detected
13:50:11:Win32 Service: false
13:50:11:***********************************************************************
13:50:11:<config>
13:50:11:  <!-- Network -->
13:50:11:  <proxy v='127.0.0.1:8080'/>
13:50:11:
13:50:11:  <!-- User Information -->
13:50:11:  <passkey v='********************************'/>
13:50:11:  <user v='Hisui'/>
13:50:11:
13:50:11:  <!-- Folding Slots -->
13:50:11:  <slot id='0' type='SMP'>
13:50:11:    <cpus v='2'/>
13:50:11:  </slot>
13:50:11:</config>
13:50:11:Trying to access database...
13:50:11:Successfully acquired database lock
13:50:11:Enabled folding slot 00: READY smp:2
13:50:11:WU00:FS00:Connecting to assign3.stanford.edu:8080
13:50:12:WU00:FS00:News: 
13:50:12:WU00:FS00:Assigned to work server 171.64.65.100
13:50:12:WU00:FS00:Requesting new work unit for slot 00: READY smp:2 from 171.64.65.100
13:50:12:WU00:FS00:Connecting to 171.64.65.100:8080
13:50:13:WU00:FS00:Downloading 614.23KiB
13:50:14:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
13:50:30:WU00:FS00:Download 10.42%
13:50:50:WU00:FS00:Download 20.84%
13:51:08:WU00:FS00:Download 31.26%
13:52:05:WU00:FS00:Download 41.68%
13:52:49:WU00:FS00:Download 52.10%
13:59:56:Lost lifeline PID 3300, exiting
13:59:57:Server connection id=1 ended

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Sun Mar 22, 2015 3:00 pm
by Joe_H
I was able to connect to all servers you mentioned without any problem, so that points to a network issue between your system and the servers and not the servers themselves. Is there something running on your network or the ISP you are connected to using a lot of network bandwidth for streaming video or torrent? Or are you connected by a wireless connection that is being interfered with by distance or other devices? I mention this because the upload and download speeds shown are relatively slow, and I am comparing them to my DSL line.

There also is a known issue with the network code in the folding client, it can sometimes fail to retry a download or upload that fails for some reason. If this happens the client will just sit there and never retry the connection. It is more commonly seen when there are network problems, and there are some slight improvements in the current version 7.4.4 client. When this happens the only way to get the download or upload to resume is restarting the FAHClient process whether by rebooting the system or manually stopping and restarting that process.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Sun Mar 22, 2015 9:51 pm
by hisui
thx for ur reply,currently the problem has automatically resolved by itself,seems it's isp based problem,I didn't do a thing

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 8:38 am
by TwistedKestrel
I think there's something up with at least 171.64.65.124, I'm running into the same thing now. Noticed the CPU slot on my client stalled, stopped FAH (had to kill the process) and started it again, and then I got this.

Code: Select all

*********************** Log Started 2015-03-23T08:27:25Z ***********************
08:27:34:FS00:Unpaused
08:27:35:WU01:FS00:Connecting to 171.67.108.200:8080
08:27:39:WU01:FS00:Assigned to work server 171.64.65.124
08:27:39:WU01:FS00:Requesting new work unit for slot 00: READY cpu:4 from 171.64.65.124
08:27:39:WU01:FS00:Connecting to 171.64.65.124:8080
08:27:42:WU01:FS00:Downloading 808.52KiB
08:28:20:WU01:FS00:Download 7.92%
08:29:29:WU01:FS00:Download 10.81%
08:29:29:ERROR:WU01:FS00:Exception: Transfer failed
08:29:30:WU01:FS00:Connecting to 171.67.108.200:8080
08:29:30:WU01:FS00:Assigned to work server 171.64.65.124
08:29:30:WU01:FS00:Requesting new work unit for slot 00: READY cpu:4 from 171.64.65.124
08:29:30:WU01:FS00:Connecting to 171.64.65.124:8080
08:29:31:WU01:FS00:Downloading 902.89KiB
08:30:19:WU01:FS00:Download 7.09%
08:32:28:WU01:FS00:Download 14.18%
Ping and tracert don't show anything unusual.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 3:16 pm
by bruce
How long did you let it retry before you decided to kill the process? Sometimes it takes 5 or 10 minutes before the client will abandon a download connection and connect to another server. The log you posted only shows two download attempts about two minutes apart.

The server logs indicate the server has been normally active over the past 8+ hours.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 4:49 pm
by Joe_H
Also take into consideration that this is a busy server, I have seen as many as a couple thousand WU's listed as returned over the course of an hour.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 5:02 pm
by Grendel
TwistedKestrel wrote:[..]Noticed the CPU slot on my client stalled, stopped FAH (had to kill the process) and started it again[..]
Had/have the same thing going on three different machines -- CPU and GPU clients all stall out (indefinitely) while receiving a new WU. First noticed it on Sat, ever since I had to Pause, Quit, Kill, and restart the clients to get a new WU.

Edit: Notice the time stamps at the end:

Code: Select all

04:48:32:WU00:FS01:0xa4:Completed 247500 out of 250000 steps  (99%)
04:48:33:WU01:FS01:Connecting to 171.67.108.200:8080
04:48:34:WU01:FS01:Assigned to work server 171.64.65.124
04:48:34:WU01:FS01:Requesting new work unit for slot 01: RUNNING cpu:4 from 171.64.65.124
04:48:34:WU01:FS01:Connecting to 171.64.65.124:8080
04:48:35:WU01:FS01:Downloading 860.71KiB
04:48:51:WU01:FS01:Download 7.44%
04:51:08:WU00:FS01:0xa4:Completed 250000 out of 250000 steps  (100%)
04:51:09:WU00:FS01:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
04:51:19:WU00:FS01:0xa4:
04:51:19:WU00:FS01:0xa4:Finished Work Unit:
04:51:19:WU00:FS01:0xa4:- Reading up to 915792 from "00/wudata_01.trr": Read 915792
04:51:19:WU00:FS01:0xa4:trr file hash check passed.
04:51:19:WU00:FS01:0xa4:- Reading up to 839252 from "00/wudata_01.xtc": Read 839252
04:51:19:WU00:FS01:0xa4:xtc file hash check passed.
04:51:19:WU00:FS01:0xa4:edr file hash check passed.
04:51:19:WU00:FS01:0xa4:logfile size: 23494
04:51:19:WU00:FS01:0xa4:Leaving Run
04:51:23:WU00:FS01:0xa4:- Writing 1781026 bytes of core data to disk...
04:51:24:WU00:FS01:0xa4:Done: 1780514 -> 1722053 (compressed to 96.7 percent)
04:51:24:WU00:FS01:0xa4:  ... Done.
04:51:24:WU00:FS01:0xa4:- Shutting down core
04:51:24:WU00:FS01:0xa4:
04:51:24:WU00:FS01:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
04:51:24:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:51:24:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:9014 run:373 clone:7 gen:128 core:0xa4 unit:0x00000090664f2de45491d4a2588b8185
04:51:24:WU00:FS01:Uploading 1.64MiB to 171.64.65.124
04:51:24:WU00:FS01:Connecting to 171.64.65.124:8080
04:51:30:WU00:FS01:Upload 22.83%
04:51:36:WU00:FS01:Upload 57.07%
04:51:42:WU00:FS01:Upload 87.50%
04:51:56:WU00:FS01:Upload complete
04:51:56:WU00:FS01:Server responded WORK_ACK (400)
04:51:56:WU00:FS01:Final credit estimate, 1310.00 points
04:51:56:WU00:FS01:Cleaning up
******************************* Date: 2015-03-23 *******************************
16:50:20:FS01:Paused
Edit2: Tail-end of log from another machine (GPU+CPU) :

Code: Select all

******************************* Date: 2015-03-21 *******************************
07:11:55:WU01:FS00:0x18:Completed 2225000 out of 2500000 steps (89%)
07:12:06:WU02:FS01:0xa4:Completed 20000 out of 250000 steps  (8%)
07:15:15:WU02:FS01:0xa4:Completed 22500 out of 250000 steps  (9%)
07:18:24:WU02:FS01:0xa4:Completed 25000 out of 250000 steps  (10%)
07:18:26:WU01:FS00:0x18:Completed 2250000 out of 2500000 steps (90%)
07:21:30:WU02:FS01:0xa4:Completed 27500 out of 250000 steps  (11%)
07:24:38:WU02:FS01:0xa4:Completed 30000 out of 250000 steps  (12%)
07:24:55:WU01:FS00:0x18:Completed 2275000 out of 2500000 steps (91%)
07:27:45:WU02:FS01:0xa4:Completed 32500 out of 250000 steps  (13%)
07:30:52:WU02:FS01:0xa4:Completed 35000 out of 250000 steps  (14%)
07:31:25:WU01:FS00:0x18:Completed 2300000 out of 2500000 steps (92%)
07:33:58:WU02:FS01:0xa4:Completed 37500 out of 250000 steps  (15%)
07:37:06:WU02:FS01:0xa4:Completed 40000 out of 250000 steps  (16%)
07:38:13:WU01:FS00:0x18:Completed 2325000 out of 2500000 steps (93%)
07:40:14:WU02:FS01:0xa4:Completed 42500 out of 250000 steps  (17%)
07:43:21:WU02:FS01:0xa4:Completed 45000 out of 250000 steps  (18%)
07:44:43:WU01:FS00:0x18:Completed 2350000 out of 2500000 steps (94%)
07:46:29:WU02:FS01:0xa4:Completed 47500 out of 250000 steps  (19%)
07:49:39:WU02:FS01:0xa4:Completed 50000 out of 250000 steps  (20%)
07:51:12:WU01:FS00:0x18:Completed 2375000 out of 2500000 steps (95%)
07:52:47:WU02:FS01:0xa4:Completed 52500 out of 250000 steps  (21%)
07:55:56:WU02:FS01:0xa4:Completed 55000 out of 250000 steps  (22%)
07:57:41:WU01:FS00:0x18:Completed 2400000 out of 2500000 steps (96%)
07:59:04:WU02:FS01:0xa4:Completed 57500 out of 250000 steps  (23%)
08:03:08:WU02:FS01:0xa4:Completed 60000 out of 250000 steps  (24%)
08:04:30:WU01:FS00:0x18:Completed 2425000 out of 2500000 steps (97%)
08:06:16:WU02:FS01:0xa4:Completed 62500 out of 250000 steps  (25%)
08:09:23:WU02:FS01:0xa4:Completed 65000 out of 250000 steps  (26%)
08:11:00:WU01:FS00:0x18:Completed 2450000 out of 2500000 steps (98%)
08:12:32:WU02:FS01:0xa4:Completed 67500 out of 250000 steps  (27%)
08:15:39:WU02:FS01:0xa4:Completed 70000 out of 250000 steps  (28%)
08:17:30:WU01:FS00:0x18:Completed 2475000 out of 2500000 steps (99%)
08:18:48:WU02:FS01:0xa4:Completed 72500 out of 250000 steps  (29%)
08:21:55:WU02:FS01:0xa4:Completed 75000 out of 250000 steps  (30%)
08:24:00:WU01:FS00:0x18:Completed 2500000 out of 2500000 steps (100%)
08:24:04:WU00:FS00:Connecting to 171.67.108.200:80
08:24:05:WU00:FS00:Assigned to work server 171.64.65.84
08:24:05:WU00:FS00:Requesting new work unit for slot 00: RUNNING gpu:0:GK106 [GeForce GTX 660] from 171.64.65.84
08:24:05:WU00:FS00:Connecting to 171.64.65.84:8080
08:24:07:WU00:FS00:Downloading 3.46MiB
08:24:18:WU01:FS00:0x18:Saving result file logfile_01.txt
08:24:18:WU01:FS00:0x18:Saving result file checkpointState.xml
08:24:20:WU01:FS00:0x18:Saving result file checkpt.crc
08:24:20:WU01:FS00:0x18:Saving result file log.txt
08:24:20:WU01:FS00:0x18:Saving result file positions.xtc
08:24:21:WU01:FS00:0x18:Folding@home Core Shutdown: FINISHED_UNIT
08:24:22:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:24:22:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9111 run:12 clone:6 gen:66 core:0x18 unit:0x000000570a3b1e805473859597d1b4ab
08:24:22:WU01:FS00:Uploading 7.18MiB to 171.64.65.92
08:24:22:WU01:FS00:Connecting to 171.64.65.92:8080
08:24:28:WU01:FS00:Upload 16.53%
08:24:34:WU01:FS00:Upload 35.68%
08:24:40:WU01:FS00:Upload 55.69%
08:24:46:WU01:FS00:Upload 73.96%
08:24:52:WU01:FS00:Upload 92.24%
08:25:01:WU01:FS00:Upload complete
08:25:01:WU01:FS00:Server responded WORK_ACK (400)
08:25:01:WU01:FS00:Final credit estimate, 19811.00 points
08:25:01:WU01:FS00:Cleaning up
08:25:02:WU02:FS01:0xa4:Completed 77500 out of 250000 steps  (31%)
08:25:12:WU00:FS00:Download 1.81%
08:25:42:WU00:FS00:Download 3.61%
08:25:51:WU00:FS00:Download 5.42%
08:28:07:WU02:FS01:0xa4:Completed 80000 out of 250000 steps  (32%)
08:31:13:WU02:FS01:0xa4:Completed 82500 out of 250000 steps  (33%)
08:34:18:WU02:FS01:0xa4:Completed 85000 out of 250000 steps  (34%)
08:37:24:WU02:FS01:0xa4:Completed 87500 out of 250000 steps  (35%)
08:40:29:WU02:FS01:0xa4:Completed 90000 out of 250000 steps  (36%)
08:43:34:WU02:FS01:0xa4:Completed 92500 out of 250000 steps  (37%)
08:46:39:WU02:FS01:0xa4:Completed 95000 out of 250000 steps  (38%)
08:49:44:WU02:FS01:0xa4:Completed 97500 out of 250000 steps  (39%)
08:52:49:WU02:FS01:0xa4:Completed 100000 out of 250000 steps  (40%)
08:55:54:WU02:FS01:0xa4:Completed 102500 out of 250000 steps  (41%)
08:58:58:WU02:FS01:0xa4:Completed 105000 out of 250000 steps  (42%)
09:02:03:WU02:FS01:0xa4:Completed 107500 out of 250000 steps  (43%)
09:05:09:WU02:FS01:0xa4:Completed 110000 out of 250000 steps  (44%)
09:08:13:WU02:FS01:0xa4:Completed 112500 out of 250000 steps  (45%)
09:11:18:WU02:FS01:0xa4:Completed 115000 out of 250000 steps  (46%)
09:14:23:WU02:FS01:0xa4:Completed 117500 out of 250000 steps  (47%)
09:17:27:WU02:FS01:0xa4:Completed 120000 out of 250000 steps  (48%)
09:20:32:WU02:FS01:0xa4:Completed 122500 out of 250000 steps  (49%)
09:23:36:WU02:FS01:0xa4:Completed 125000 out of 250000 steps  (50%)
09:26:40:WU02:FS01:0xa4:Completed 127500 out of 250000 steps  (51%)
09:29:45:WU02:FS01:0xa4:Completed 130000 out of 250000 steps  (52%)
09:32:49:WU02:FS01:0xa4:Completed 132500 out of 250000 steps  (53%)
09:35:54:WU02:FS01:0xa4:Completed 135000 out of 250000 steps  (54%)
09:38:59:WU02:FS01:0xa4:Completed 137500 out of 250000 steps  (55%)
09:42:04:WU02:FS01:0xa4:Completed 140000 out of 250000 steps  (56%)
09:45:08:WU02:FS01:0xa4:Completed 142500 out of 250000 steps  (57%)
09:48:12:WU02:FS01:0xa4:Completed 145000 out of 250000 steps  (58%)
09:51:16:WU02:FS01:0xa4:Completed 147500 out of 250000 steps  (59%)
09:54:21:WU02:FS01:0xa4:Completed 150000 out of 250000 steps  (60%)
09:57:24:WU02:FS01:0xa4:Completed 152500 out of 250000 steps  (61%)
10:00:28:WU02:FS01:0xa4:Completed 155000 out of 250000 steps  (62%)
10:03:33:WU02:FS01:0xa4:Completed 157500 out of 250000 steps  (63%)
10:06:37:WU02:FS01:0xa4:Completed 160000 out of 250000 steps  (64%)
10:09:42:WU02:FS01:0xa4:Completed 162500 out of 250000 steps  (65%)
10:12:46:WU02:FS01:0xa4:Completed 165000 out of 250000 steps  (66%)
10:15:50:WU02:FS01:0xa4:Completed 167500 out of 250000 steps  (67%)
10:18:55:WU02:FS01:0xa4:Completed 170000 out of 250000 steps  (68%)
10:21:58:WU02:FS01:0xa4:Completed 172500 out of 250000 steps  (69%)
10:25:03:WU02:FS01:0xa4:Completed 175000 out of 250000 steps  (70%)
10:28:07:WU02:FS01:0xa4:Completed 177500 out of 250000 steps  (71%)
10:31:12:WU02:FS01:0xa4:Completed 180000 out of 250000 steps  (72%)
10:34:16:WU02:FS01:0xa4:Completed 182500 out of 250000 steps  (73%)
10:37:20:WU02:FS01:0xa4:Completed 185000 out of 250000 steps  (74%)
10:40:25:WU02:FS01:0xa4:Completed 187500 out of 250000 steps  (75%)
10:43:29:WU02:FS01:0xa4:Completed 190000 out of 250000 steps  (76%)
10:46:34:WU02:FS01:0xa4:Completed 192500 out of 250000 steps  (77%)
10:49:39:WU02:FS01:0xa4:Completed 195000 out of 250000 steps  (78%)
10:52:44:WU02:FS01:0xa4:Completed 197500 out of 250000 steps  (79%)
10:55:49:WU02:FS01:0xa4:Completed 200000 out of 250000 steps  (80%)
10:58:54:WU02:FS01:0xa4:Completed 202500 out of 250000 steps  (81%)
11:01:59:WU02:FS01:0xa4:Completed 205000 out of 250000 steps  (82%)
11:05:03:WU02:FS01:0xa4:Completed 207500 out of 250000 steps  (83%)
11:08:08:WU02:FS01:0xa4:Completed 210000 out of 250000 steps  (84%)
11:11:13:WU02:FS01:0xa4:Completed 212500 out of 250000 steps  (85%)
11:14:17:WU02:FS01:0xa4:Completed 215000 out of 250000 steps  (86%)
11:17:22:WU02:FS01:0xa4:Completed 217500 out of 250000 steps  (87%)
11:20:26:WU02:FS01:0xa4:Completed 220000 out of 250000 steps  (88%)
11:23:31:WU02:FS01:0xa4:Completed 222500 out of 250000 steps  (89%)
11:26:36:WU02:FS01:0xa4:Completed 225000 out of 250000 steps  (90%)
11:29:40:WU02:FS01:0xa4:Completed 227500 out of 250000 steps  (91%)
11:32:44:WU02:FS01:0xa4:Completed 230000 out of 250000 steps  (92%)
11:35:48:WU02:FS01:0xa4:Completed 232500 out of 250000 steps  (93%)
11:38:52:WU02:FS01:0xa4:Completed 235000 out of 250000 steps  (94%)
11:41:57:WU02:FS01:0xa4:Completed 237500 out of 250000 steps  (95%)
11:45:02:WU02:FS01:0xa4:Completed 240000 out of 250000 steps  (96%)
11:48:06:WU02:FS01:0xa4:Completed 242500 out of 250000 steps  (97%)
11:51:11:WU02:FS01:0xa4:Completed 245000 out of 250000 steps  (98%)
11:54:15:WU02:FS01:0xa4:Completed 247500 out of 250000 steps  (99%)
11:54:17:WU01:FS01:Connecting to 171.67.108.200:8080
11:54:18:WU01:FS01:Assigned to work server 171.64.65.124
11:54:18:WU01:FS01:Requesting new work unit for slot 01: RUNNING cpu:3 from 171.64.65.124
11:54:18:WU01:FS01:Connecting to 171.64.65.124:8080
11:54:18:WU01:FS01:Downloading 902.77KiB
11:54:31:WU01:FS01:Download 7.09%
11:55:32:WU01:FS01:Download 14.18%
11:56:06:WU01:FS01:Download 21.27%
11:56:24:WU01:FS01:Download 28.36%
11:57:08:WU01:FS01:Download 42.54%
11:57:17:WU01:FS01:Download 49.62%
11:57:20:WU02:FS01:0xa4:Completed 250000 out of 250000 steps  (100%)
11:57:20:WU02:FS01:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
11:57:28:WU01:FS01:Download 56.71%
11:57:30:WU02:FS01:0xa4:
11:57:30:WU02:FS01:0xa4:Finished Work Unit:
11:57:30:WU02:FS01:0xa4:- Reading up to 905376 from "02/wudata_01.trr": Read 905376
11:57:30:WU02:FS01:0xa4:trr file hash check passed.
11:57:30:WU02:FS01:0xa4:- Reading up to 829280 from "02/wudata_01.xtc": Read 829280
11:57:30:WU02:FS01:0xa4:xtc file hash check passed.
11:57:30:WU02:FS01:0xa4:edr file hash check passed.
11:57:30:WU02:FS01:0xa4:logfile size: 23676
11:57:30:WU02:FS01:0xa4:Leaving Run
11:57:32:WU02:FS01:0xa4:- Writing 1760820 bytes of core data to disk...
11:57:32:WU02:FS01:0xa4:Done: 1760308 -> 1702981 (compressed to 96.7 percent)
11:57:32:WU02:FS01:0xa4:  ... Done.
11:57:32:WU02:FS01:0xa4:- Shutting down core
11:57:32:WU02:FS01:0xa4:
11:57:32:WU02:FS01:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
11:57:33:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
11:57:33:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9012 run:911 clone:1 gen:107 core:0xa4 unit:0x00000085664f2de453c8809aa9816d8f
11:57:33:WU02:FS01:Uploading 1.62MiB to 171.64.65.124
11:57:33:WU02:FS01:Connecting to 171.64.65.124:8080
11:57:39:WU02:FS01:Upload 96.18%
11:57:40:WU02:FS01:Upload complete
11:57:40:WU02:FS01:Server responded WORK_ACK (400)
11:57:40:WU02:FS01:Final credit estimate, 1259.00 points
11:57:40:WU02:FS01:Cleaning up
******************************* Date: 2015-03-23 *******************************
16:51:50:FS00:Paused
16:51:50:FS01:Paused

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 5:51 pm
by TwistedKestrel
bruce wrote:How long did you let it retry before you decided to kill the process? Sometimes it takes 5 or 10 minutes before the client will abandon a download connection and connect to another server. The log you posted only shows two download attempts about two minutes apart.

The server logs indicate the server has been normally active over the past 8+ hours.
Well, about ten minutes after that, I restarted the machine entirely and started the client again and left it overnight. Timestamps are in the same day as the previous log.

Code: Select all

08:41:38:FS00:Unpaused
08:41:39:WU01:FS00:Connecting to 171.67.108.200:8080
08:41:40:WU01:FS00:Assigned to work server 171.64.65.124
08:41:40:WU01:FS00:Requesting new work unit for slot 00: READY cpu:4 from 171.64.65.124
08:41:40:WU01:FS00:Connecting to 171.64.65.124:8080
08:41:41:WU01:FS00:Downloading 902.29KiB
******************************* Date: 2015-03-23 *******************************
16:47:55:FS00:Paused

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 10:30 pm
by jimerickson
there is something wrong with this server. had to dump 3 units when i got home from work. they were all 9015. they were worth near zero so they had been hung for awhile. there is no telling how long they were sitting there and i am not digging through the logs to find out. just have someone take a look at the server. it is futile to continue folding units from this server as it is not uploading them.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Mon Mar 23, 2015 10:39 pm
by Joe_H
It is not a general problem with this server, almost all of my WU's for the last couple weeks on three different systems have come from that server and have been returned to that server. Quite a few others are also using this server with little to no problems, as I mentioned earlier in this thread over 2,000 completed WU's an hour are being uploaded most of the time.

There are a very small number of reports of issues, and almost all appear to point to networking issues between the persons reporting them and the WS. That may be on the persons own network, or issues with their local ISP, it is not clear which.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 9:15 am
by billford
Joe_H wrote: There are a very small number of reports of issues, and almost all appear to point to networking issues between the persons reporting them and the WS. That may be on the persons own network, or issues with their local ISP, it is not clear which.
I don't think that's a valid argument- if it were "local" problems it's unlikely that only two Stanford servers would be involved.

I had a stuck P9014 this morning- coincidence only goes so far.

It's not impossible that there's an intermittent problem, which I'll concede will make it a pig to trace :(

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 5:18 pm
by Joe_H
Or your stuck WU coincided with a period of time that this server was actually in the Reject state. There were several periods of this last night, the last was for about an hour and ended by 23:20 PDT. With the time difference, that is morning for you. Since then whatever was causing the server to go into Reject status appears to have been corrected.

Unlike yours, many of these other reports did not happen at a time that the server was verifiably offline. Yes, it might be an intermittent problem on the WS end, but I would expect more frequent reports given the high volume of work downloading and uploading. And no, these are not the only two Stanford servers with occasional reports of download or upload problems.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 5:32 pm
by Joe_H
jimerickson wrote:... there is no telling how long they were sitting there and i am not digging through the logs to find out.
There is no need to dig through the logs, FAHControl will show the time a WU was assigned and the two timeout values. WU's will get bonus points up until the first, and base points after that until the expiration time.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 6:26 pm
by bruce
Grendel wrote:Had/have the same thing going on three different machines -- CPU and GPU clients all stall out (indefinitely) while receiving a new WU. First noticed it on Sat, ever since I had to Pause, Quit, Kill, and restart the clients to get a new WU.

Code: Select all

11:54:15:WU02:FS01:0xa4:Completed 247500 out of 250000 steps  (99%)
11:54:17:WU01:FS01:Connecting to 171.67.108.200:8080
11:54:18:WU01:FS01:Assigned to work server 171.64.65.124
11:54:18:WU01:FS01:Requesting new work unit for slot 01: RUNNING cpu:3 from 171.64.65.124
11:54:18:WU01:FS01:Connecting to 171.64.65.124:8080
11:54:18:WU01:FS01:Downloading 902.77KiB
11:54:31:WU01:FS01:Download 7.09%
11:55:32:WU01:FS01:Download 14.18%
11:56:06:WU01:FS01:Download 21.27%
11:56:24:WU01:FS01:Download 28.36%
11:57:08:WU01:FS01:Download 42.54%
11:57:17:WU01:FS01:Download 49.62%
11:57:20:WU02:FS01:0xa4:Completed 250000 out of 250000 steps  (100%)
11:57:20:WU02:FS01:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
11:57:28:WU01:FS01:Download 56.71%
11:57:30:WU02:FS01:0xa4:
11:57:30:WU02:FS01:0xa4:Finished Work Unit:
11:57:30:WU02:FS01:0xa4:- Reading up to 905376 from "02/wudata_01.trr": Read 905376
11:57:30:WU02:FS01:0xa4:trr file hash check passed.
11:57:30:WU02:FS01:0xa4:- Reading up to 829280 from "02/wudata_01.xtc": Read 829280
11:57:30:WU02:FS01:0xa4:xtc file hash check passed.
11:57:30:WU02:FS01:0xa4:edr file hash check passed.
11:57:30:WU02:FS01:0xa4:logfile size: 23676
11:57:30:WU02:FS01:0xa4:Leaving Run
11:57:32:WU02:FS01:0xa4:- Writing 1760820 bytes of core data to disk...
11:57:32:WU02:FS01:0xa4:Done: 1760308 -> 1702981 (compressed to 96.7 percent)
11:57:32:WU02:FS01:0xa4:  ... Done.
11:57:32:WU02:FS01:0xa4:- Shutting down core
11:57:32:WU02:FS01:0xa4:
11:57:32:WU02:FS01:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
11:57:33:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
11:57:33:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9012 run:911 clone:1 gen:107 core:0xa4 unit:0x00000085664f2de453c8809aa9816d8f
11:57:33:WU02:FS01:Uploading 1.62MiB to 171.64.65.124
11:57:33:WU02:FS01:Connecting to 171.64.65.124:8080
11:57:39:WU02:FS01:Upload 96.18%
11:57:40:WU02:FS01:Upload complete
11:57:40:WU02:FS01:Server responded WORK_ACK (400)
11:57:40:WU02:FS01:Final credit estimate, 1259.00 points
11:57:40:WU02:FS01:Cleaning up
******************************* Date: 2015-03-23 *******************************
16:51:50:FS00:Paused
16:51:50:FS01:Paused
[/code][/quote]

Is your internet connection by satellite or is there some other reason why it seems to be operating in half-duplex? Most people's connection can handle both an upload and a download concurrently and that is usually more efficient. Since that isn't working for you, we can probably avoid that condition by adjusting next-unit-percentage.

In your case, next-unit-percentage is set to 99% so the download begins at 11:54:18 but is unable to finish by the time the upload begins at 11:57:33 causing your connection to fail. (I have no explanation for why your connection can't do that.)

You can choose one of two options. Set next-unit-percentage to 98% or even 97%, allowing enough time for the download to complete before the upload begins.

The other option would be to set next-unit-percentage to 100% which may or may not turn out to be a better option for you.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 6:56 pm
by Grendel
bruce wrote:Is your internet connection by satellite or is there some other reason why it seems to be operating in half-duplex? Most people's connection can handle both an upload and a download concurrently and that is usually more efficient. Since that isn't working for you, we can probably avoid that condition by adjusting next-unit-percentage.

In your case, next-unit-percentage is set to 99% so the download begins at 11:54:18 but is unable to finish by the time the upload begins at 11:57:33 causing your connection to fail. (I have no explanation for why your connection can't do that.)

You can choose one of two options. Set next-unit-percentage to 98% or even 97%, allowing enough time for the download to complete before the upload begins.

The other option would be to set next-unit-percentage to 100% which may or may not turn out to be a better option for you.
Thanks for taking the time to respond. I don't think my connection has anything to do w/ it (altho there's no way to know for sure.) I'm folding at two sites (same ISP but different paths), happened at both of them. I have had this ISP for almost a year w/o ever seeing this problem before (been folding for ten years now.) Right now it seems to be working. What I find interesting is that the client apparently doesn't let go of the TCP connection even after two days (!), that may be worthwhile looking into.

Edit: I see what you mean -- WU is d/ling VERY slow (that's a 20Mb connection..), not finished when the upload starts. Normally I wouldn't expect that to be a problem tho, port 8080 should only be used for the initial connection negotiation if I remember my TCP/IP stuff right.. ;) Still think the client should terminate the d/l after a reasonable period of inactivity ?

Edit2: having to kill FAHClient.exe once this happens could point to the hang happening w/in the windows kernel, wonder if a recent update messed things up...