Page 1 of 1

171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 6:20 am
by billford
First occasion:

Code: Select all

03:04:21:WU00:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
03:05:56:WU00:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
03:05:56:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
03:05:57:WU01:FS00:Connecting to assign3.stanford.edu:8080
03:05:58:WU01:FS00:News: Welcome to Folding@Home
03:05:58:WU01:FS00:Assigned to work server 171.64.65.124
03:05:58:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.64.65.124
03:05:58:WU01:FS00:Connecting to 171.64.65.124:8080
03:05:59:ERROR:WU01:FS00:Exception: Have already seen this work unit 0x00000046664f2de4532787bd4648660a aborting download
03:05:59:WU01:FS00:Connecting to assign3.stanford.edu:8080
03:06:00:WU01:FS00:News: Welcome to Folding@Home
03:06:00:WU01:FS00:Assigned to work server 171.64.65.124
03:06:00:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.64.65.124
03:06:00:WU01:FS00:Connecting to 171.64.65.124:8080
03:06:01:ERROR:WU01:FS00:Exception: Have already seen this work unit 0x00000046664f2de4532787bd4648660a aborting download
.
03:06:59:WU01:FS00:Connecting to assign3.stanford.edu:8080
03:07:00:WU01:FS00:News: Welcome to Folding@Home
03:07:00:WU01:FS00:Assigned to work server 171.64.65.124
03:07:00:WU01:FS00:Requesting new work unit for slot 00: READY cpu:4 from 171.64.65.124
03:07:00:WU01:FS00:Connecting to 171.64.65.124:8080
03:07:01:WU01:FS00:Downloading 862.13KiB
03:07:03:WU01:FS00:Download complete
03:07:03:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9005 run:806 clone:3 gen:46 core:0xa4 unit:0x00000030664f2de4533b285405592957
03:07:03:WU01:FS00:Starting
Then again later:

Code: Select all

05:46:01:WU01:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
05:47:37:WU01:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
05:47:37:WU01:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
05:47:38:WU00:FS00:Connecting to assign3.stanford.edu:8080
05:47:39:WU00:FS00:News: Welcome to Folding@Home
05:47:39:WU00:FS00:Assigned to work server 171.64.65.124
05:47:39:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.64.65.124
05:47:39:WU00:FS00:Connecting to 171.64.65.124:8080
05:47:40:ERROR:WU00:FS00:Exception: Have already seen this work unit 0x00000030664f2de4533b285405592957 aborting download
05:47:40:WU00:FS00:Connecting to assign3.stanford.edu:8080
05:47:41:WU00:FS00:News: Welcome to Folding@Home
05:47:41:WU00:FS00:Assigned to work server 171.64.65.124
05:47:41:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.64.65.124
05:47:41:WU00:FS00:Connecting to 171.64.65.124:8080
05:47:42:ERROR:WU00:FS00:Exception: Have already seen this work unit 0x00000030664f2de4533b285405592957 aborting download
.
05:48:40:WU00:FS00:Connecting to assign3.stanford.edu:8080
05:48:41:WU00:FS00:News: Welcome to Folding@Home
05:48:41:WU00:FS00:Assigned to work server 171.64.65.124
05:48:41:WU00:FS00:Requesting new work unit for slot 00: READY cpu:4 from 171.64.65.124
05:48:41:WU00:FS00:Connecting to 171.64.65.124:8080
05:48:42:WU00:FS00:Downloading 808.25KiB
05:48:44:WU00:FS00:Download complete
05:48:44:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9007 run:779 clone:3 gen:41 core:0xa4 unit:0x0000002b664f2de4533b36d9844ae364
05:48:44:WU00:FS00:Starting
System and config:

Code: Select all

15:09:42:************************* Folding@home Client *************************
15:09:42:    Website: http://folding.stanford.edu/
15:09:42:  Copyright: (c) 2009-2013 Stanford University
15:09:42:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:09:42:       Args: 
15:09:42:     Config: /Library/Application Support/FAHClient/config.xml
15:09:42:******************************** Build ********************************
15:09:42:    Version: 7.3.6
15:09:42:       Date: Feb 18 2013
15:09:42:       Time: 15:24:11
15:09:42:    SVN Rev: 3923
15:09:42:     Branch: fah/trunk/client
15:09:42:   Compiler: GNU 4.2.1 (Apple Inc. build 5666) (dot 3)
15:09:42:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
15:09:42:             -fno-unsafe-math-optimizations -msse3 -arch x86_64
15:09:42:             -mmacosx-version-min=10.5 -isysroot /Developer/SDKs/MacOSX10.5.sdk
15:09:42:   Platform: darwin 10.8.0
15:09:42:       Bits: 64
15:09:42:       Mode: Release
15:09:42:******************************* System ********************************
15:09:42:        CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
15:09:42:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
15:09:42:       CPUs: 4
15:09:42:     Memory: 8.00GiB
15:09:42:Free Memory: 712.34MiB
15:09:42:    Threads: POSIX_THREADS
15:09:42:Has Battery: true
15:09:42: On Battery: false
15:09:42: UTC offset: 1
15:09:42:        PID: 42011
15:09:42:        CWD: /Library/Application Support/FAHClient
15:09:42:         OS: Darwin 12.5.0 x86_64
15:09:42:    OS Arch: AMD64
15:09:42:       GPUs: 1
15:09:42:      GPU 0: NVIDIA:3 GK104 [GeForce GTX 675MX]
15:09:42:       CUDA: Not detected
15:09:42:***********************************************************************
15:09:42:<config>
15:09:42:  <!-- Folding Slot Configuration -->
15:09:42:  <client-type v='advanced'/>
15:09:42:  <power v='full'/>
15:09:42:
15:09:42:  <!-- HTTP Server -->
15:09:42:  <allow v='127.0.0.1  192.168.1.0/24'/>
15:09:42:
15:09:42:  <!-- Network -->
15:09:42:  <proxy v=':8080'/>
15:09:42:
15:09:42:  <!-- Remote Command Server -->
15:09:42:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
15:09:42:  <password v=''/>
15:09:42:
15:09:42:  <!-- Slot Control -->
15:09:42:  <pause-on-start v='true'/>
15:09:42:
15:09:42:  <!-- User Information -->
15:09:42:  <passkey v='********************************'/>
15:09:42:  <user v='lbford@cromwell-home.co.uk'/>
15:09:42:
15:09:42:  <!-- Folding Slots -->
15:09:42:  <slot id='0' type='CPU'>
15:09:42:    <cpus v='-1'/>
15:09:42:    <next-unit-percentage v='100'/>
15:09:42:  </slot>
15:09:42:</config>

edit- also happening on a second machine, essentially the same logs and setup but can provide details if requested.

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 1:47 pm
by 7im
The server only accepts an upload of a work unit from your client one time. If you try to upload it more than once, that's when the message "Have already see this work unit" is shown. It does not allow repeated uploads of the same WU to go through.

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 3:08 pm
by billford
7im wrote:The server only accepts an upload of a work unit from your client one time. If you try to upload it more than once, that's when the message "Have already see this work unit" is shown. It does not allow repeated uploads of the same WU to go through.
Very true, but hardly relevant- the message was from the client and it was downloading… it hadn't even got as far as "Finished Unit" at that point.

Code: Select all

03:05:58:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.64.65.124
03:05:58:WU01:FS00:Connecting to 171.64.65.124:8080
03:05:59:ERROR:WU01:FS00:Exception: Have already seen this work unit 0x00000046664f2de4532787bd4648660a aborting download

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 6:00 pm
by 7im
New one on me. Never seen that message on a download before.

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 6:10 pm
by billford
Me neither.

If that's as far as the problem extends then it may not be much to worry about, but it does imply that the server isn't keeping accurate/complete track of which WUs are being assigned to which clients… which could have all sorts of ramifications in a worse case scenario :(

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 6:45 pm
by Joe_H
I recall reading a post about this message coming up for another folder maybe a couple years ago. The response then was that the client does have code to avoid running the same WU more than once on the same system. My guess is that code is there to be a bit of a failsafe to the code on the servers that will resend the same WU in case there was a bad download for some reason. I don't know for how long after the initial download attempt the window is open for this retry. But if it was set too high for a combination of a fast CPU and a small WU, I can see that causing an attempt to resend a WU in the gap of time before a finished WU was uploaded. At least it appears to be working and after a few seconds getting you a different WU assignment.

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 6:55 pm
by billford
Thanks Joe.

If it's known about and it's nothing to worry about then I won't :D

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 7:54 pm
by PantherX
billford wrote:...If that's as far as the problem extends then it may not be much to worry about, but it does imply that the server isn't keeping accurate/complete track of which WUs are being assigned to which clients… which could have all sorts of ramifications in a worse case scenario :(
In could be a rare case when two duplicates (to verify if it is a bad WU or the original WU is lost) of the same WU are assigned to your client, one which is being processed (near finishing) and the other getting ready to be processed (downloading). Joe_H's explanation is correct. In v6, you had to change the Machine ID to get a different WU but V7 has a better logic which doesn't require user intervention. Moreover, you will get a similar message when it attempts to download FahCore version which is already present on your system.

Re: 171.64.65.124 repeating wus?

Posted: Sat Apr 19, 2014 8:12 pm
by billford
Joe_H's reply reassured me, but I can also see the reasoning behind it now, thanks

Re: 171.64.65.124 repeating wus?

Posted: Sun Apr 20, 2014 5:07 am
by 7im
billford wrote:Me neither.

If that's as far as the problem extends then it may not be much to worry about, but it does imply that the server isn't keeping accurate/complete track of which WUs are being assigned to which clients… which could have all sorts of ramifications in a worse case scenario :(
Please try to refrain from making a snap judgement as always being worst case. Newly seen problems are simply unexplained until explained, as Joe did. This project has been running for more than 10 years. Losing track of WUs is not a common problem, nor has it ever been.

Re: 171.64.65.124 repeating wus?

Posted: Sun Apr 20, 2014 1:40 pm
by billford
7im wrote: Please try to refrain from making a snap judgement as always being worst case.
I didn't.

I reported a symptom, made a couple of observations when that report was misread and accepted the reassurance when given. In that order.
Losing track of WUs is not a common problem, nor has it ever been.
I never said or implied that it was common. But even uncommon problems happen now and then, and this could have been one.

Would you prefer that a minor anomaly is ignored until it it develops into a major problem?

Re: 171.64.65.124 repeating wus?

Posted: Sun Apr 20, 2014 2:15 pm
by 7im
They prefer constructive feedback without the worst case commentary.

Your feedback was well written and well documented, and didn't need the part after the ...

Re: 171.64.65.124 repeating wus?

Posted: Sun Apr 20, 2014 2:25 pm
by billford
They probably don't need comments on posts when they haven't been read properly either, but heigh ho.

Your opinion is noted.

Re: 171.64.65.124 repeating wus?

Posted: Sun Apr 20, 2014 2:43 pm
by bollix47
The original question has been answered.

Topic locked.