WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Moderators: Site Moderators, FAHC Science Team

Post Reply
dustingebhardt
Posts: 50
Joined: Tue Apr 05, 2011 8:26 pm

WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by dustingebhardt »

I have two P7809 WU waiting to upload. One is (3, 143, 4) and the other (2, 404, 3). I see this repeated several times in the log file:

Code: Select all

21:33:43:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
21:33:43:Unit 01: Uploading 4.13MiB to 171.64.65.99
21:33:43:Connecting to 171.64.65.99:8080
21:34:02:Unit 01: 69.78%
21:34:32:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
21:34:32:Trying to send results to collection server
21:34:32:Unit 01: Uploading 4.13MiB to 171.67.108.49
21:34:32:Connecting to 171.67.108.49:8080
21:34:51:Unit 01: 69.78%
21:35:21:ERROR: Exception: Received short response, expected 512 bytes, got 0
Any help?
gwildperson
Posts: 450
Joined: Tue Dec 04, 2007 8:36 pm

Re: WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by gwildperson »

What kind of error was reported when those two WUs finished computing?

Which version of V7 are you running?
dustingebhardt
Posts: 50
Joined: Tue Apr 05, 2011 8:26 pm

Re: WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by dustingebhardt »

The client was completeting and sending units successfully up until the last 2 WUs. Here is a log snip during the time that the good WUs were being sent and the newer ones failed to upload.

Code: Select all

06:30:09:Unit 01:Completed 1470000 out of 1500000 steps  (98%)
07:02:05:Unit 01:Completed 1485000 out of 1500000 steps  (99%)
07:02:05:Connecting to assign3.stanford.edu:8080
07:02:05:News: Welcome to Folding@Home
07:02:05:Assigned to work server 171.64.65.99
07:02:05:Requesting new work unit for slot 00: RUNNING smp:2 from 171.64.65.99
07:02:05:Connecting to 171.64.65.99:8080
07:02:07:Slot 00: Downloading 1.98MiB
07:02:13:Slot 00: 55.34%
07:02:17:Slot 00: Download complete
07:02:17:Received Unit: id:00 state:DOWNLOAD project:7809 run:2 clone:404 gen:3 core:0xa4 unit:0x000000050a3b1e874e310a978ba06525
07:30:11:Unit 01:Completed 1500000 out of 1500000 steps  (100%)
07:30:12:Unit 01:DynamicWrapper: Finished Work Unit: sleep=10000
07:30:22:Unit 01:
07:30:22:Unit 01:Finished Work Unit:
07:30:22:Unit 01:- Reading up to 2908800 from "01/wudata_01.trr": Read 2908800
07:30:22:Unit 01:trr file hash check passed.
07:30:22:Unit 01:- Reading up to 1554512 from "01/wudata_01.xtc": Read 1554512
07:30:22:Unit 01:xtc file hash check passed.
07:30:22:Unit 01:edr file hash check passed.
07:30:22:Unit 01:logfile size: 34787
07:30:22:Unit 01:Leaving Run
07:30:25:Unit 01:- Writing 4503111 bytes of core data to disk...
07:30:26:Unit 01:Done: 4502599 -> 4325772 (compressed to 96.0 percent)
07:30:26:Unit 01:  ... Done.
07:30:28:Unit 01:- Shutting down core
07:30:28:Unit 01:
07:30:28:Unit 01:Folding@home Core Shutdown: FINISHED_UNIT
07:30:29:FahCore, running Unit 01, returned: FINISHED_UNIT (100 = 0x64)
07:30:29:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:30:29:Starting Unit 00
07:30:29:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:30:29:Running core: "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe" -dir 00 -suffix 01 -lifeline 1400 -version 701 -checkpoint 15 -np 2 -service
07:30:29:Connecting to 171.64.65.99:8080
07:30:29:Started core on PID 5160
07:30:29:FahCore 0xa4 started
07:30:29:Unit 00:
07:30:29:Unit 00:*------------------------------*
07:30:29:Unit 00:Folding@Home Gromacs GB Core
07:30:29:Unit 00:Version 2.27 (Dec. 15, 2010)
07:30:29:Unit 00:
07:30:29:Unit 00:Preparing to commence simulation
07:30:29:Unit 00:- Looking at optimizations...
07:30:29:Unit 00:- Created dyn
07:30:29:Unit 00:- Files status OK
07:30:30:Unit 00:- Expanded 2079304 -> 5386224 (decompressed 259.0 percent)
07:30:30:Unit 00:Called DecompressByteArray: compressed_data_size=2079304 data_size=5386224, decompressed_data_size=5386224 diff=0
07:30:30:Unit 00:- Digital signature verified
07:30:30:Unit 00:
07:30:30:Unit 00:Project: 7809 (Run 2, Clone 404, Gen 3)
07:30:30:Unit 00:
07:30:30:Unit 00:Assembly optimizations on if available.
07:30:30:Unit 00:Entering M.D.
07:30:36:Unit 00:Mapping NT from 2 to 2 
07:30:36:Unit 00:Completed 0 out of 1500000 steps  (0%)
07:30:49:Unit 01: 69.68%
07:31:19:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
07:31:19:Trying to send results to collection server
07:31:19:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:31:19:Connecting to 171.67.108.49:8080
07:31:46:Unit 01: 69.78%
07:32:16:ERROR: Exception: Received short response, expected 512 bytes, got 0
07:32:16:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:32:17:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:32:17:Connecting to 171.64.65.99:8080
07:32:47:WARNING: Exception: Failed to send results to work server: Upload failed
07:32:47:Trying to send results to collection server
07:32:47:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:32:47:Connecting to 171.67.108.49:8080
07:33:17:ERROR: Exception: Upload failed
07:33:17:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:33:17:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:33:17:Connecting to 171.64.65.99:8080
07:33:48:WARNING: Exception: Failed to send results to work server: Upload failed
07:33:48:Trying to send results to collection server
07:33:48:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:33:48:Connecting to 171.67.108.49:8080
07:34:18:ERROR: Exception: Upload failed
07:34:54:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:34:55:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:34:55:Connecting to 171.64.65.99:8080
07:35:18:Unit 01: 69.78%
07:35:48:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
07:35:48:Trying to send results to collection server
07:35:48:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:35:48:Connecting to 171.67.108.49:8080
07:36:17:Unit 01: 69.68%
07:36:47:ERROR: Exception: Received short response, expected 512 bytes, got 0
07:37:32:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:37:32:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:37:32:Connecting to 171.64.65.99:8080
07:37:57:Unit 01: 69.78%
07:38:27:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
07:38:27:Trying to send results to collection server
07:38:27:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:38:27:Connecting to 171.67.108.49:8080
07:38:57:Unit 01: 69.68%
07:39:27:ERROR: Exception: Received short response, expected 512 bytes, got 0
07:41:46:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:41:46:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:41:46:Connecting to 171.64.65.99:8080
07:42:09:Unit 01: 69.78%
07:42:39:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
07:42:39:Trying to send results to collection server
07:42:39:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:42:39:Connecting to 171.67.108.49:8080
07:43:04:Unit 01: 69.78%
07:43:34:ERROR: Exception: Received short response, expected 512 bytes, got 0
07:48:37:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:48:37:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:48:37:Connecting to 171.64.65.99:8080
07:48:57:Unit 01: 69.78%
07:49:27:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
07:49:27:Trying to send results to collection server
07:49:27:Unit 01: Uploading 4.13MiB to 171.67.108.49
07:49:27:Connecting to 171.67.108.49:8080
07:49:51:Unit 01: 69.68%
07:50:21:ERROR: Exception: Received short response, expected 512 bytes, got 0
07:59:43:Sending unit results: id:01 state:SEND project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
07:59:43:Unit 01: Uploading 4.13MiB to 171.64.65.99
07:59:43:Connecting to 171.64.65.99:8080
08:00:05:Unit 01: 69.78%
08:00:35:WARNING: Exception: Failed to send results to work server: Received short response, expected 512 bytes, got 0
08:00:35:Trying to send results to collection server
08:00:35:Unit 01: Uploading 4.13MiB to 171.67.108.49
08:00:35:Connecting to 171.67.108.49:8080
08:01:03:Unit 01: 69.78%
08:01:33:ERROR: Exception: Received short response, expected 512 bytes, got 0
08:02:13:Unit 00:Completed 15000 out of 1500000 steps  (1%)
I'm using v7.1.33
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by bruce »

Please upgrade to V7.1.38. See viewtopic.php?f=67&t=19795

Download the new client, choosing Save. Shut down the old client. Run the installer for the new client. It will detect the old one and ask if you want to remove it. Click retry to remove it but do not remove the data. Run the installer again to install and start the new client.
dustingebhardt
Posts: 50
Joined: Tue Apr 05, 2011 8:26 pm

Re: WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by dustingebhardt »

Upgraded to 7.1.38 but the 2 WUs are still not sending. I'm about to finish another unit (@97% currently). Here is the whole log after the version update:

Code: Select all

*********************** Log Started 2011-10-19T11:50:01 ************************
11:50:01:************************* Folding@home Client *************************
11:50:01:      Website: http://folding.stanford.edu/
11:50:01:    Copyright: (c) 2009-2011 Stanford University
11:50:01:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
11:50:01:         Args: 
11:50:01:       Config: C:/Documents and Settings/All Users/Application
11:50:01:               Data/FAHClient/config.xml
11:50:01:******************************** Build ********************************
11:50:01:      Version: 7.1.38
11:50:01:         Date: Oct 6 2011
11:50:01:         Time: 19:57:04
11:50:01:      SVN Rev: 3080
11:50:01:       Branch: fah/trunk/client
11:50:01:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
11:50:01:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
11:50:01:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
11:50:01:     Platform: win32 XP
11:50:01:         Bits: 32
11:50:01:         Mode: Release
11:50:01:******************************* System ********************************
11:50:01:          CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
11:50:01:       CPU ID: GenuineIntel Family 6 Model 23 Stepping 6
11:50:01:         CPUs: 2
11:50:01:       Memory: 4.00GiB
11:50:01:  Free Memory: 2.66GiB
11:50:01:      Threads: WINDOWS_THREADS
11:50:01:   On Battery: false
11:50:01:   UTC offset: -4
11:50:01:          PID: 5716
11:50:01:          CWD: C:/WINDOWS/system32
11:50:01:           OS: Microsoft Windows Server 2003 Service Pack 2
11:50:01:      OS Arch: X86
11:50:01:         GPUs: 2
11:50:01:        GPU 0: UNSUPPORTED: RV370 5B60 [Radeon X300 (PCIE)]
11:50:01:        GPU 1: UNSUPPORTED: RV370 [Radeon X300SE]
11:50:01:         CUDA: Not detected
11:50:01:Win32 Service: true
11:50:01:***********************************************************************
11:50:01:<config>
11:50:01:  <!-- Folding Slot Configuration -->
11:50:01:  <gpu v='true'/>
11:50:01:
11:50:01:  <!-- Network -->
11:50:01:  <proxy v=':8080'/>
11:50:01:
11:50:01:  <!-- Remote Command Server -->
11:50:01:  <command-allow v='127.0.0.1 192.168.1.0/24'/>
11:50:01:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
11:50:01:  <password v='********'/>
11:50:01:
11:50:01:  <!-- User Information -->
11:50:01:  <passkey v='********************************'/>
11:50:01:  <team v='33'/>
11:50:01:  <user v='Dustin_Gebhardt'/>
11:50:01:
11:50:01:  <!-- Folding Slots -->
11:50:01:  <slot id='0' type='SMP'/>
11:50:01:</config>
11:50:01:Trying to access database...
11:50:02:Upgrading database schema from version 9 to 10
11:50:02:Successfully acquired database lock
11:50:02:Enabled folding slot 00: READY smp:2
11:50:02:Downloading project 7809 description
11:50:02:Connecting to fah-web.stanford.edu:80
11:50:02:Sending unit results: id:01 state:SEND error:OK project:7809 run:3 clone:143 gen:4 core:0xa4 unit:0x000000060a3b1e874e310ba04d4a4112
11:50:02:Unit 01: Uploading 4.13MiB to 171.64.65.99
11:50:02:Sending unit results: id:00 state:SEND error:OK project:7809 run:2 clone:404 gen:3 core:0xa4 unit:0x000000050a3b1e874e310a978ba06525
11:50:02:Connecting to 171.64.65.99:8080
11:50:02:Starting Unit 02
11:50:02:Unit 00: Uploading 4.13MiB to 171.64.65.99
11:50:02:Connecting to 171.64.65.99:8080
11:50:02:Running core: "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe" -dir 02 -suffix 01 -lifeline 5716 -version 701 -checkpoint 15 -np 2 -service
11:50:02:Started core on PID 2620
11:50:02:FahCore 0xa4 started
11:50:03:Unit 02:
11:50:03:Unit 02:*------------------------------*
11:50:03:Unit 02:Folding@Home Gromacs GB Core
11:50:03:Unit 02:Version 2.27 (Dec. 15, 2010)
11:50:03:Unit 02:
11:50:03:Unit 02:Preparing to commence simulation
11:50:03:Unit 02:- Ensuring status. Please wait.
11:50:05:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
11:50:12:Unit 02:- Looking at optimizations...
11:50:12:Unit 02:- Working with standard loops on this execution.
11:50:12:Unit 02:- Previous termination of core was improper.
11:50:12:Unit 02:- Files status OK
11:50:13:Unit 02:- Expanded 2079427 -> 5386224 (decompressed 259.0 percent)
11:50:13:Unit 02:Called DecompressByteArray: compressed_data_size=2079427 data_size=5386224, decompressed_data_size=5386224 diff=0
11:50:13:Unit 02:- Digital signature verified
11:50:13:Unit 02:
11:50:13:Unit 02:Project: 7809 (Run 3, Clone 372, Gen 2)
11:50:13:Unit 02:
11:50:13:Unit 02:Entering M.D.
11:50:19:Unit 02:Using Gromacs checkpoints
11:50:19:Unit 02:Mapping NT from 2 to 2 
11:50:20:Unit 02:Resuming from checkpoint
11:50:20:Unit 02:Verified 02/wudata_01.log
11:50:20:Unit 02:Verified 02/wudata_01.trr
11:50:20:Unit 02:Verified 02/wudata_01.xtc
11:50:20:Unit 02:Verified 02/wudata_01.edr
11:50:20:Unit 02:Completed 1458650 out of 1500000 steps  (97%)
11:50:33:Unit 01: 69.68%
11:50:33:Unit 00: 69.78%
11:51:03:WARNING: Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
11:51:03:Trying to send results to collection server
11:51:03:Unit 01: Uploading 4.13MiB to 171.67.108.49
11:51:03:Connecting to 171.67.108.49:8080
11:51:04:WARNING: Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
11:51:04:Trying to send results to collection server
11:51:04:Unit 00: Uploading 4.13MiB to 171.67.108.49
11:51:04:Connecting to 171.67.108.49:8080
schwancr
Pande Group Member
Posts: 136
Joined: Wed Jun 01, 2011 9:45 pm

Re: WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by schwancr »

Hi

"Hi Dustin_Gebhardt (team 33),
Your WU (P7809 R3 C143 G4) was added to the stats database on 2011-10-15 01:07:40 for 6765.98 points of credit."

"Hi Dustin_Gebhardt (team 33),
Your WU (P7809 R2 C404 G3) was added to the stats database on 2011-10-17 03:07:17 for 6731.53 points of credit."

The work server is denying access because it recognizes that it already received these WUs.

This could be a client issue. Bruce has this phenomenon been reported before?

Thanks,
Christian
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WS 171.64.65.99 and CS 171.67.108.49 busy/down?

Post by bruce »

schwancr wrote:The work server is denying access because it recognizes that it already received these WUs.

This could be a client issue. Bruce has this phenomenon been reported before?
Yes, but it's not a common situation. I know of a couple of scenarios (and, of course, there's always the possibility of another scenario).

1) If a donor believes a WU has been submitted but it has not been credited, they sometimes restore a backup and attempt to get the client to send the WU again. For some donors who watch the credits very closely, even a short delay in crediting can induce them to try this method, particularly if the points are less than the donor expects.

2) If the donor adds a new client by copying the client's files to a new location, both clients can complete the active WU -- and of course only one of will get credit for it. Additional complications can arise if the UserID/MachineID settings are duplicated such as when Norton Ghost is used to clone a system.

3) Even if the donor does nothing, sometimes a WU is accepted by the server but message from the server to the client confirming the WU has been accepted gets "lost." The messages in the client's log say it was not uploaded and the results are left for the client to keep retrying to upload. This happens more frequently with certain types of proxies than when the client is not connected through a proxy, but it also seems to happen occasionally on any connection.
Post Reply