Page 1 of 1

171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 2:45 pm
by tourbound129
My SMP client has completed the last 2 WU's but when it tries to send it says "Already sending work" and then "Sent 0 of 2 completed units to the server".

[11:52:40] Completed 490000 out of 500000 steps (98%)
[12:00:31] Completed 495000 out of 500000 steps (99%)
[12:08:20] Completed 500000 out of 500000 steps (100%)
[12:08:22] DynamicWrapper: Finished Work Unit: sleep=10000
[12:08:31]
[12:08:31] Finished Work Unit:
[12:08:31] - Reading up to 3700128 from "work/wudata_06.trr": Read 3700128
[12:08:31] trr file hash check passed.
[12:08:31] edr file hash check passed.
[12:08:31] logfile size: 58980
[12:08:31] Leaving Run
[12:08:36] - Writing 3794660 bytes of core data to disk...
[12:08:36] ... Done.
[12:08:37] - Shutting down core
[12:08:37]
[12:08:37] Folding@home Core Shutdown: FINISHED_UNIT
[12:08:39] CoreStatus = 64 (100)
[12:08:39] Unit 6 finished with 90 percent of time to deadline remaining.
[12:08:39] Updated performance fraction: 0.884859
[12:08:39] Sending work to server
[12:08:39] - Already sending work
[12:08:39] Trying to send all finished work units
[12:08:39] - Already sending work
[12:08:39] - Already sending work
[12:08:39] + Sent 0 of 2 completed units to the server
[12:08:39] - Preparing to get new work unit...
[12:08:39] Cleaning up work directory
[12:08:40] + Attempting to get work packet
[12:08:40] Passkey found
[12:08:40] - Will indicate memory of 3582 MB
[12:08:40] - Connecting to assignment server
[12:08:40] Connecting to http://assign.stanford.edu:8080/
[12:08:41] Posted data.
[12:08:41] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[12:08:41] + News From Folding@Home: Welcome to Folding@Home
[12:08:41] Loaded queue successfully.
[12:08:41] Sent data
[12:08:41] Connecting to http://171.64.65.54:8080/
[12:08:42] Posted data.
[12:08:42] Initial: 0000; - Receiving payload (expected size: 1764113)
[12:08:48] - Downloaded at ~287 kB/s
[12:08:48] - Averaged speed for that direction ~256 kB/s
[12:08:48] + Received work.
[12:08:48] Trying to send all finished work units
[12:08:48] - Already sending work
[12:08:48] - Already sending work
[12:08:48] + Sent 0 of 2 completed units to the server
[12:08:48] + Closed connections
[12:08:48]
[12:08:48] + Processing work unit
[12:08:48] Core required: FahCore_a3.exe
[12:08:48] Core found.
[12:08:48] Working on queue slot 07 [September 7 12:08:48 UTC]
[12:08:48] + Working ...
[12:08:48] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 07 -np 4 -checkpoint 15 -verbose -lifeline 8076 -version 630'

[12:08:48]
[12:08:48] *------------------------------*
[12:08:48] Folding@Home Gromacs SMP Core
[12:08:48] Version 2.22 (Mar 12, 2010)
[12:08:48]
[12:08:48] Preparing to commence simulation
[12:08:48] - Looking at optimizations...
[12:08:48] - Created dyn
[12:08:48] - Files status OK
[12:08:49] - Expanded 1763601 -> 2248557 (decompressed 127.4 percent)
[12:08:49] Called DecompressByteArray: compressed_data_size=1763601 data_size=2248557, decompressed_data_size=2248557 diff=0
[12:08:49] - Digital signature verified
[12:08:49]
[12:08:49] Project: 6064 (Run 0, Clone 13, Gen 165)
[12:08:49]
[12:08:49] Assembly optimizations on if available.
[12:08:49] Entering M.D.
[12:08:55] Completed 0 out of 500000 steps (0%)
[12:16:43] Completed 5000 out of 500000 steps (1%)
[12:24:21] Completed 10000 out of 500000 steps (2%)
[12:32:17] Completed 15000 out of 500000 steps (3%)
[12:40:10] Completed 20000 out of 500000 steps (4%)

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 3:51 pm
by 7im
Please post the PRCG numbers of the "already sending" work unit so that someone can check it's history.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 4:28 pm
by tourbound129
6065 (0,159,39)
2633 (6,19,16)

Those are the two that are the completed WU's giving the "Already sending work" message.

Thanks for the reply and the help.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 4:39 pm
by lanbrown
Do you have more than one machine? If so, did you you just copy the client from one to the other? have you checked to see what WU the other machine is working on? It sounds like you have more than one machine and that they are both getting the same WU. In essence, both machines look identical to Stanford and thus you are only getting credit for one return.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 4:56 pm
by tourbound129
lanbrown wrote:Do you have more than one machine? If so, did you you just copy the client from one to the other? have you checked to see what WU the other machine is working on? It sounds like you have more than one machine and that they are both getting the same WU. In essence, both machines look identical to Stanford and thus you are only getting credit for one return.
No, I only have one machine running SMP and GPU. The GPU is sending just fine and receiving credit. The SMP is the problem right now. It can receive work (crunching another WU right now) but just started with this problem of "already sending work". It had sent 24 WU's in a row without a problem until yesterday when this came up.

As best as I can tell I have not received any credit for these WU's.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 5:06 pm
by lanbrown
Stop it and perform a -queueinfo and post the results. Let's see if it has really sent them or just holding on to them.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 5:21 pm
by tourbound129
Slot 05 and 06 are the projects in question.

Code: Select all

--- Opening Log file [September 7 17:16:35 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\nkc\FAH
Executable: fah6.exe
Arguments: -queueinfo -smp -verbosity 9 

[17:16:35] - Ask before connecting: No
[17:16:35] - User name: pgatour (Team 55280)
[17:16:35] - User ID: 46A1A8753D7559ED
[17:16:35] - Machine ID: 1
[17:16:35] 
[17:16:35] Loaded queue successfully.
[17:16:35] Printing Queue Information
Current Queue: 
Slot 08  Empty/Deleted
Project: 6057 (Run 0, Clone 121, Gen 55), Core: a3
Work server: 171.64.65.54:8080
Collection server: 171.67.108.25
Download date: September 2 08:42:31
Finished date: September 2 23:30:53

Slot 09  Empty/Deleted
Project: 6014 (Run 0, Clone 159, Gen 265), Core: a3
Work server: 130.237.232.140:8080
Collection server: 130.237.165.141
Download date: September 2 23:31:55
Finished date: September 3 14:50:53

Slot 00  Empty/Deleted
Project: 6065 (Run 0, Clone 81, Gen 158), Core: a3
Work server: 171.64.65.54:8080
Collection server: 171.67.108.25
Download date: September 3 14:57:17
Finished date: September 4 05:08:09

Slot 01  Empty/Deleted
Project: 6012 (Run 1, Clone 48, Gen 205), Core: a3
Work server: 130.237.232.140:8080
Collection server: 130.237.165.141
Download date: September 4 05:09:39
Finished date: September 4 18:52:56

Slot 02  Empty/Deleted
Project: 2633 (Run 10, Clone 3, Gen 11), Core: a3
Work server: 171.67.108.24:8080
Collection server: 171.67.108.25
Download date: September 4 19:00:00
Finished date: September 5 02:01:03

Slot 03  Empty/Deleted
Project: 6067 (Run 1, Clone 117, Gen 6), Core: a3
Work server: 171.64.65.54:8080
Collection server: 171.67.108.25
Download date: September 5 02:21:16
Finished date: September 5 15:31:33

Slot 04  Empty/Deleted
Project: 6054 (Run 1, Clone 197, Gen 15), Core: a3
Work server: 171.64.65.54:8080
Collection server: 171.67.108.25
Download date: September 5 15:32:40
Finished date: September 6 05:29:04

Slot 05  Done     
Project: 2633 (Run 6, Clone 19, Gen 16), Core: a3
Work server: 171.67.108.24:8080
Collection server: 171.67.108.25
Download date: September 6 05:30:07
Finished date: September 6 12:51:12
Failed uploads: 1

Slot 06  Done     
Project: 6065 (Run 0, Clone 159, Gen 39), Core: a3
Work server: 171.64.65.54:8080
Collection server: 171.67.108.25
Download date: September 6 21:59:56
Finished date: September 7 12:08:39

Slot 07 *Ready    
Project: 6064 (Run 0, Clone 13, Gen 165), Core: a3
Work server: 171.64.65.54:8080
Collection server: 171.67.108.25
Download date: September 7 12:08:48
Deadline date: September 13 12:08:48

PF: 0.884859 based on last 4 slot(s)
[17:16:35] ***** Got a SIGTERM signal (2)
[17:16:35] Killing all core threads
[17:16:35] Killing 4 cores
[17:16:35] Killing core 0
[17:16:35] Killing core 1
[17:16:35] Killing core 2
[17:16:35] Killing core 3

Folding@Home Client Shutdown.


Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 5:37 pm
by lanbrown
They have not been sent. I would stop the client and issue fah6.exe -smp -verbosity 9 -send 05 and see if it can successfully send it. Post the log of what happens. You might also try rebooting first and then trying the upload manually. It is almost like it is still trying to send the WU's.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 5:44 pm
by tourbound129
After restarting the client when doing the -queueinfo it said "attempting to send results" and then went ahead and continued working on the current WU. 14 minutes later I got "Sent 2 of 2 completed units to the server. Autosend complete"

Not sure what the issue was but atleast it seems to have resolved itself.

Re: 171.64.65.54 - Already sending work

Posted: Tue Sep 07, 2010 6:04 pm
by 7im
The client does an auto send of all completed work units each time the client starts. -send 05 is redundant.

I haven't seen that error before. I'm guessing that one of the 2 completed WUs was already trying to upload when the auto send timer was triggered. And since the upload was already in progress, you got that already sending message.

The client is programmed to be self correcting, and would have likely sorted this out eventually. But good to see you gave it a kick in the right direction. ;)

Re: 171.64.65.54 - Already sending work

Posted: Thu Sep 09, 2010 2:22 am
by gwildperson
In different topic, Kasson said "We just restarted the server code; hopefully that will help with any "stuck" transfers."

I propose the following theory: Your machine was one of those "stuck" transfers and restarting your client successfully reduced the number clogging the server by one.