Page 1 of 1

171.65.103.162 assigning same WU twice

Posted: Sat Apr 04, 2009 8:07 pm
by Foxery
Hello! One of my machines with multiple Uniprocessor clients fetched the same WU on 2 clients today. Both have been running quite happily for 9 months or more; seperate working directories, unique Machine IDs, etc.

Obligatory log snippet. I stopped one client and deleted \work and queue.dat, yet this repeated 4 times until I changed one of the clients from "big" to "normal" WUs and finally received p5113. The 2nd client is still going to finish the p2484 WU as expected.

Code: Select all

--- Opening Log file [April 4 19:45:01 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.20

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Meat\Folding2
Service: C:\Meat\Folding2\fah6-c2
Arguments: -svcstart -d C:\Meat\Folding2 -verbosity 9 -forceasm 

Launched as a service.
Entered C:\Meat\Folding2 to do work.

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[19:45:01] - Ask before connecting: No
[19:45:01] - User name: Foxery (Team 198)
[19:45:01] - User ID: 2102BD015E3C6C0F
[19:45:01] - Machine ID: 5
[19:45:01] 
[19:45:01] Work directory not found. Creating...
[19:45:01] Could not open work queue, generating new queue...
[19:45:01] - Preparing to get new work unit...
[19:45:01] - Autosending finished units... [April 4 19:45:01 UTC][19:45:01] + Attempting to get work packet

[19:45:01] Trying to send all finished work units
[19:45:01] - Will indicate memory of 2046 MB
[19:45:01] + No unsent completed units remaining.
[19:45:01] - Autosend completed
[19:45:01] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[19:45:01] - Connecting to assignment server
[19:45:01] Connecting to http://assign.stanford.edu:8080/
[19:45:02] Posted data.
[19:45:02] Initial: 41AB; - Successful: assigned to (171.65.103.162).
[19:45:02] + News From Folding@Home: Welcome to Folding@Home
[19:45:02] Loaded queue successfully.
[19:45:02] Connecting to http://171.65.103.162:8080/
[19:45:08] Posted data.
[19:45:08] Initial: 0000; - Receiving payload (expected size: 2964493)
[19:45:13] - Downloaded at ~579 kB/s
[19:45:13] - Averaged speed for that direction ~579 kB/s
[19:45:13] + Received work.
[19:45:13] + Closed connections
[19:45:13] 
[19:45:13] + Processing work unit
[19:45:13] Core required: FahCore_78.exe
[19:45:13] Core found.
[19:45:13] Working on queue slot 01 [April 4 19:45:13 UTC]
[19:45:13] + Working ...
[19:45:13] - Calling '.\FahCore_78.exe -dir work/ -suffix 01 -checkpoint 12 -service -forceasm -verbose -lifeline 3732 -version 620'

[19:45:13] 
[19:45:13] *------------------------------*
[19:45:13] Folding@Home Gromacs Core
[19:45:13] Version 1.90 (March 8, 2006)
[19:45:13] 
[19:45:13] Preparing to commence simulation
[19:45:13] - Assembly optimizations manually forced on.
[19:45:13] - Not checking prior termination.
[19:45:16] - Expanded 2963981 -> 15070653 (decompressed 508.4 percent)
[19:45:16] - Starting from initial work packet
[19:45:16] 
[19:45:16] Project: 2484 (Run 199, Clone 33, Gen 16)
[19:45:16] 
[19:45:16] Assembly optimizations on if available.
[19:45:16] Entering M.D.

Re: 171.65.103.162 assigning same WU twice

Posted: Sat Apr 04, 2009 8:48 pm
by bruce
If you have been running for 9 months, it's really strange to see
[19:45:01] Work directory not found. Creating...
[19:45:01] Could not open work queue, generating new queue...

Does that mean you're sneakernetting?

It is possible for two copies of the same WU to be in circulation, but the conditions are rather rare. The chances of both of them being sent to different machineIDs with the same UserID is also extremely rare, but that's not the same as impossible. If that's the case, both WUs can get credit.

If you're sneakernetting or otherwise manipulating which WU is being processed by certain clients, it's not worth reporting because you're the one who is responsible.

If you're asking for help rather than just reporting, you'll have to explain what you were doing in more detail.

Re: 171.65.103.162 assigning same WU twice

Posted: Sat Apr 04, 2009 9:00 pm
by anandhanju
Foxery wrote:I stopped one client and deleted \work and queue.dat
explains

Code: Select all

[19:45:01] Work directory not found. Creating...
[19:45:01] Could not open work queue, generating new queue...
How much of a time difference was there between the two clients receiving the same work unit? As bruce mentioned, there have been reports of race conditions where the same WU is issued to instances running different machine ids. I think it was what happened here, although I cannot say for certain.

Re: 171.65.103.162 assigning same WU twice

Posted: Sat Apr 04, 2009 9:11 pm
by bruce
Yes, that explains it, but the FAHlog copies that are helpful include the one in his recycle bin and the one belonging to the other client.

His use of the word "Obligatory" probably means he knew that the snippet that he posted wouldn't be helpful.

Re: 171.65.103.162 assigning same WU twice

Posted: Mon Apr 06, 2009 1:37 pm
by Tigerbiten
I've had identical work-units, the same P(R,C,G) numbers, on a couple of my computers in the past.
It happened when two clients on different machines asked for the same type of work-unit at the same time.
I think I got points for both work-units folded as its a slight sever glitch and not a user error.

Luck ........... :D

Re: 171.65.103.162 assigning same WU twice

Posted: Mon Apr 06, 2009 2:15 pm
by Foxery
Tigerbiten wrote:I've had identical work-units, the same P(R,C,G) numbers, on a couple of my computers in the past.
It happened when two clients on different machines asked for the same type of work-unit at the same time.
I think I got points for both work-units folded as its a slight sever glitch and not a user error.

Luck ........... :D
Yes, this is exactly what happened. Two clients on the same machine finished within a minute of each other, connected to the AS within a minute of each other, and received the same WU. Given how many thousands of connections there are every day, this hardly seems like a "rare" occurence, and I wanted to be sure Stanford's servers weren't bugging out and assigning everyone the same work.
bruce wrote: Yes, that explains it, but the FAHlog copies that are helpful include the one in his recycle bin and the one belonging to the other client.

His use of the word "Obligatory" probably means he knew that the snippet that he posted wouldn't be helpful.
Correct. There isn't much useful information to be gleaned from a perfectly normal log with no actual errors. I didn't consider that the original timestamps would be useful, so I'll try to find the appropriate spots when I get home tonight.

Re: 171.65.103.162 assigning same WU twice

Posted: Mon Apr 06, 2009 3:14 pm
by Tigerbiten
Just happened again for me.
Two of my GPU clients on the same box but with different MachineID both picked up p5774(3-12-54) at ~14:52:06

Tigger G1.

Code: Select all

[14:52:04] - Connecting to assignment server
[14:52:04] Connecting to http://assign-GPU.stanford.edu:8080/
[14:52:06] Posted data.
[14:52:06] Initial: 40AB; - Successful: assigned to (171.64.65.106).
[14:52:06] + News From Folding@Home: GPU folding beta
[14:52:06] Loaded queue successfully.
[14:52:06] Connecting to http://171.64.65.106:8080/
[14:52:07] Posted data.
[14:52:07] Initial: 0000; - Receiving payload (expected size: 68392)
[14:52:09] - Downloaded at ~33 kB/s
[14:52:09] - Averaged speed for that direction ~27 kB/s
[14:52:09] + Received work.
[14:52:09] + Closed connections
[14:52:09] 
[14:52:09] + Processing work unit
[14:52:09] Core required: FahCore_11.exe
[14:52:09] Core found.
[14:52:09] Working on queue slot 08 [April 6 14:52:09 UTC]
[14:52:09] + Working ...
[14:52:09] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 3488 -version 620'

[14:52:09] 
[14:52:09] *------------------------------*
[14:52:09] Folding@Home GPU Core - Beta
[14:52:09] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[14:52:09] 
[14:52:09] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[14:52:09] Build host: amoeba
[14:52:09] Board Type: Nvidia
[14:52:09] Core      : 
[14:52:09] Preparing to commence simulation
[14:52:09] - Looking at optimizations...
[14:52:09] - Created dyn
[14:52:09] - Files status OK
[14:52:09] - Expanded 67880 -> 350980 (decompressed 517.0 percent)
[14:52:09] Called DecompressByteArray: compressed_data_size=67880 data_size=350980, decompressed_data_size=350980 diff=0
[14:52:09] - Digital signature verified
[14:52:09] 
[14:52:09] Project: 5774 (Run 3, Clone 12, Gen 54)
Tigger G3.

Code: Select all

[14:52:01] - Connecting to assignment server
[14:52:01] Connecting to http://assign-GPU.stanford.edu:8080/
[14:52:04] Posted data.
[14:52:04] Initial: 40AB; - Successful: assigned to (171.64.65.106).
[14:52:04] + News From Folding@Home: GPU folding beta
[14:52:04] Loaded queue successfully.
[14:52:04] Connecting to http://171.64.65.106:8080/
[14:52:06] Posted data.
[14:52:06] Initial: 0000; - Receiving payload (expected size: 68392)
[14:52:08] - Downloaded at ~33 kB/s
[14:52:08] - Averaged speed for that direction ~25 kB/s
[14:52:08] + Received work.
[14:52:08] + Closed connections
[14:52:08] 
[14:52:08] + Processing work unit
[14:52:08] Core required: FahCore_11.exe
[14:52:08] Core found.
[14:52:08] Working on queue slot 05 [April 6 14:52:08 UTC]
[14:52:08] + Working ...
[14:52:08] - Calling '.\FahCore_11.exe -dir work/ -suffix 05 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 492 -version 620'

[14:52:08] 
[14:52:08] *------------------------------*
[14:52:08] Folding@Home GPU Core - Beta
[14:52:08] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[14:52:08] 
[14:52:08] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[14:52:08] Build host: amoeba
[14:52:08] Board Type: Nvidia
[14:52:08] Core      : 
[14:52:08] Preparing to commence simulation
[14:52:08] - Looking at optimizations...
[14:52:08] - Created dyn
[14:52:08] - Files status OK
[14:52:08] - Expanded 67880 -> 350980 (decompressed 517.0 percent)
[14:52:08] Called DecompressByteArray: compressed_data_size=67880 data_size=350980, decompressed_data_size=350980 diff=0
[14:52:08] - Digital signature verified
[14:52:08] 
[14:52:08] Project: 5774 (Run 3, Clone 12, Gen 54)
I only see this with the GPU client when the servers are heavily loaded and two of my clients on the same box ask for work at the same time.
I've probably done over 40k GPU2 work-units and this is only the 3rd or 4th time I've seen this.
Hence calling it a slight glitch.

Hopefully I'll get points for both work-unit turned in again ...... :p

Luck ........... :D

Re: 171.65.103.162 assigning same WU twice

Posted: Mon Apr 06, 2009 4:26 pm
by ppetrone
Hi. Thank you for these reports. At first sight it looks like an unusual event to me.
I already made some changes in the assignment parameters which should fix this problem.
If you see this happening again, please let us know, including if possible, a time stamp of the duplicate assignments.

Re: 171.65.103.162 assigning same WU twice

Posted: Wed Apr 08, 2009 2:17 am
by Foxery
(Run 199, Clone 33, Gen 16)
Found it. Both clients connected to send their completed units at the same time, and both failed several times. Original timestamp for client 1:
April 4, 2009

Code: Select all

[19:03:03] Completed 250000 out of 250000 steps  (100%)
[19:03:03] Writing final coordinates.
[19:03:04] Past main M.D. loop
[19:04:04] 
[19:04:04] Finished Work Unit:
[19:04:04] - Reading up to 2300688 from "work/wudata_05.arc": Read 2300688
[19:04:04] - Reading up to 96764 from "work/wudata_05.xtc": Read 96764
[19:04:04] goefile size: 0
[19:04:04] logfile size: 48385
[19:04:04] Leaving Run
[19:04:07] - Writing 2464669 bytes of core data to disk...
[19:04:07]   ... Done.
[19:04:07] - Shutting down core
[19:04:07] 
[19:04:07] Folding@home Core Shutdown: FINISHED_UNIT
[19:04:10] CoreStatus = 64 (100)
[19:04:10] Unit 5 finished with 98 percent of time to deadline remaining.
[19:04:10] Updated performance fraction: 0.985064
[19:04:10] Sending work to server
[19:04:10] Project: 2483 (Run 235, Clone 11, Gen 11)
[19:04:10] - Read packet limit of 540015616... Set to 524286976.


[19:04:10] + Attempting to send results [April 4 19:04:10 UTC]
[19:04:10] - Reading file work/wuresults_05.dat from core
[19:04:10]   (Read 2464669 bytes from disk)
[19:04:10] Connecting to http://171.65.103.162:8080/
[19:04:10] - Couldn't send HTTP request to server
[19:04:10]   (Got status 503)
[19:04:10] + Could not connect to Work Server (results)
[19:04:10]     (171.65.103.162:8080)
[19:04:10] + Retrying using alternative port
[19:04:10] Connecting to http://171.65.103.162:80/
[19:04:11] - Couldn't send HTTP request to server
[19:04:11] + Could not connect to Work Server (results)
[19:04:11]     (171.65.103.162:80)
[19:04:11] - Error: Could not transmit unit 05 (completed April 4) to work server.
[19:04:11] - 1 failed uploads of this unit.
[19:04:11]   Keeping unit 05 in queue.
[19:04:12] Trying to send all finished work units
[19:04:12] Project: 2483 (Run 235, Clone 11, Gen 11)
[19:04:12] - Read packet limit of 540015616... Set to 524286976.


[19:04:12] + Attempting to send results [April 4 19:04:12 UTC]
[19:04:12] - Reading file work/wuresults_05.dat from core
[19:04:12]   (Read 2464669 bytes from disk)
[19:04:12] Connecting to http://171.65.103.162:8080/
[19:04:12] - Couldn't send HTTP request to server
[19:04:12]   (Got status 503)
[19:04:12] + Could not connect to Work Server (results)
[19:04:12]     (171.65.103.162:8080)
[19:04:12] + Retrying using alternative port
[19:04:12] Connecting to http://171.65.103.162:80/
[19:04:13] - Couldn't send HTTP request to server
[19:04:13] + Could not connect to Work Server (results)
[19:04:13]     (171.65.103.162:80)
[19:04:13] - Error: Could not transmit unit 05 (completed April 4) to work server.
[19:04:13] - 2 failed uploads of this unit.
[19:04:13] - Read packet limit of 540015616... Set to 524286976.


[19:04:13] + Attempting to send results [April 4 19:04:13 UTC]
[19:04:13] - Reading file work/wuresults_05.dat from core
[19:04:13]   (Read 2464669 bytes from disk)
[19:04:13] Connecting to http://171.65.103.100:8080/
[19:04:33] Posted data.
[19:04:33] Initial: 0000; - Uploaded at ~120 kB/s
[19:04:33] - Averaged speed for that direction ~98 kB/s
[19:04:33] - Server does not have record of this unit. Will try again later.
[19:04:33]   Could not transmit unit 05 to Collection server; keeping in queue.
[19:04:33] + Sent 0 of 1 completed units to the server
[19:04:33] - Preparing to get new work unit...
[19:04:33] + Attempting to get work packet
[19:04:33] - Will indicate memory of 2046 MB
[19:04:33] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[19:04:33] - Connecting to assignment server
[19:04:33] Connecting to http://assign.stanford.edu:8080/
[19:04:34] Posted data.
[19:04:34] Initial: 41AB; - Successful: assigned to (171.65.103.162).
[19:04:34] + News From Folding@Home: Welcome to Folding@Home
[19:04:34] Loaded queue successfully.
[19:04:34] Connecting to http://171.65.103.162:8080/
[19:04:40] Posted data.
[19:04:40] Initial: 0000; - Receiving payload (expected size: 2964493)
[19:04:45] - Downloaded at ~579 kB/s
[19:04:45] - Averaged speed for that direction ~477 kB/s
[19:04:45] + Received work.
[19:04:45] Trying to send all finished work units
[19:04:45] Project: 2483 (Run 235, Clone 11, Gen 11)
[19:04:45] - Read packet limit of 540015616... Set to 524286976.


[19:04:45] + Attempting to send results [April 4 19:04:45 UTC]
[19:04:45] - Reading file work/wuresults_05.dat from core
[19:04:45]   (Read 2464669 bytes from disk)
[19:04:45] Connecting to http://171.65.103.162:8080/
[19:05:05] Posted data.
[19:05:05] Initial: 0000; - Uploaded at ~109 kB/s
[19:05:07] - Averaged speed for that direction ~100 kB/s
[19:05:07] + Results successfully sent
[19:05:07] Thank you for your contribution to Folding@Home.
[19:05:07] + Number of Units Completed: 183

[19:05:07] + Sent 1 of 1 completed units to the server
[19:05:07] + Closed connections
[19:05:07] 
[19:05:07] + Processing work unit
[19:05:07] Core required: FahCore_78.exe
[19:05:07] Core found.
[19:05:07] Working on queue slot 06 [April 4 19:05:07 UTC]
[19:05:07] + Working ...
[19:05:07] - Calling '.\FahCore_78.exe -dir work/ -suffix 06 -checkpoint 12 -service -forceasm -verbose -lifeline 704 -version 620'

[19:05:07] 
[19:05:07] *------------------------------*
[19:05:07] Folding@Home Gromacs Core
[19:05:07] Version 1.90 (March 8, 2006)
[19:05:07] 
[19:05:07] Preparing to commence simulation
[19:05:07] - Assembly optimizations manually forced on.
[19:05:07] - Not checking prior termination.
[19:05:10] - Expanded 2963981 -> 15070653 (decompressed 508.4 percent)
[19:05:10] - Starting from initial work packet
[19:05:10] 
[19:05:10] Project: 2484 (Run 199, Clone 33, Gen 16)
[19:05:10] 
[19:05:10] Assembly optimizations on if available.
[19:05:10] Entering M.D.
and for client 2... it actually took 10 minutes of retries to receive one, and it was still a dupe...? Odd.

Code: Select all

[19:12:49] Completed 250000 out of 250000 steps  (100%)
[19:12:49] Writing final coordinates.
[19:12:51] Past main M.D. loop
[19:13:51] 
[19:13:51] Finished Work Unit:
[19:13:51] - Reading up to 2295720 from "work/wudata_08.arc": Read 2295720
[19:13:51] - Reading up to 443064 from "work/wudata_08.xtc": Read 443064
[19:13:51] goefile size: 0
[19:13:51] logfile size: 47537
[19:13:51] Leaving Run
[19:13:52] - Writing 2804497 bytes of core data to disk...
[19:13:52]   ... Done.
[19:13:53] - Shutting down core
[19:13:53] 
[19:13:53] Folding@home Core Shutdown: FINISHED_UNIT
[19:13:55] CoreStatus = 64 (100)
[19:13:55] Unit 8 finished with 98 percent of time to deadline remaining.
[19:13:55] Updated performance fraction: 0.985018
[19:13:55] Sending work to server
[19:13:55] Project: 2485 (Run 123, Clone 33, Gen 6)
[19:13:55] - Read packet limit of 540015616... Set to 524286976.


[19:13:55] + Attempting to send results [April 4 19:13:55 UTC]
[19:13:55] - Reading file work/wuresults_08.dat from core
[19:13:55]   (Read 2804497 bytes from disk)
[19:13:55] Connecting to http://171.65.103.162:8080/
[19:13:55] - Couldn't send HTTP request to server
[19:13:55]   (Got status 503)
[19:13:55] + Could not connect to Work Server (results)
[19:13:55]     (171.65.103.162:8080)
[19:13:55] + Retrying using alternative port
[19:13:55] Connecting to http://171.65.103.162:80/
[19:13:56] - Couldn't send HTTP request to server
[19:13:56] + Could not connect to Work Server (results)
[19:13:56]     (171.65.103.162:80)
[19:13:56] - Error: Could not transmit unit 08 (completed April 4) to work server.
[19:13:56] - 1 failed uploads of this unit.
[19:13:56]   Keeping unit 08 in queue.
[19:13:56] Trying to send all finished work units
[19:13:56] Project: 2485 (Run 123, Clone 33, Gen 6)
[19:13:56] - Read packet limit of 540015616... Set to 524286976.


[19:13:56] + Attempting to send results [April 4 19:13:56 UTC]
[19:13:56] - Reading file work/wuresults_08.dat from core
[19:13:56]   (Read 2804497 bytes from disk)
[19:13:56] Connecting to http://171.65.103.162:8080/
[19:13:56] - Couldn't send HTTP request to server
[19:13:56]   (Got status 503)
[19:13:56] + Could not connect to Work Server (results)
[19:13:56]     (171.65.103.162:8080)
[19:13:56] + Retrying using alternative port
[19:13:56] Connecting to http://171.65.103.162:80/
[19:13:57] - Couldn't send HTTP request to server
[19:13:57] + Could not connect to Work Server (results)
[19:13:57]     (171.65.103.162:80)
[19:13:57] - Error: Could not transmit unit 08 (completed April 4) to work server.
[19:13:57] - 2 failed uploads of this unit.
[19:13:57] - Read packet limit of 540015616... Set to 524286976.


[19:13:57] + Attempting to send results [April 4 19:13:57 UTC]
[19:13:57] - Reading file work/wuresults_08.dat from core
[19:13:57]   (Read 2804497 bytes from disk)
[19:13:57] Connecting to http://171.65.103.100:8080/
[19:14:21] Posted data.
[19:14:21] Initial: 0000; - Uploaded at ~114 kB/s
[19:14:21] - Averaged speed for that direction ~92 kB/s
[19:14:21] - Server does not have record of this unit. Will try again later.
[19:14:21]   Could not transmit unit 08 to Collection server; keeping in queue.
[19:14:21] + Sent 0 of 1 completed units to the server
[19:14:21] - Preparing to get new work unit...
[19:14:21] + Attempting to get work packet
[19:14:21] - Will indicate memory of 2046 MB
[19:14:21] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[19:14:21] - Connecting to assignment server
[19:14:21] Connecting to http://assign.stanford.edu:8080/
[19:14:21] Posted data.
[19:14:21] Initial: 41AB; - Successful: assigned to (171.65.103.162).
[19:14:21] + News From Folding@Home: Welcome to Folding@Home
[19:14:21] Loaded queue successfully.
[19:14:21] Connecting to http://171.65.103.162:8080/
[19:14:27] Posted data.
[19:14:27] Initial: 0000; - Receiving payload (expected size: 2964493)
[19:14:33] - Downloaded at ~482 kB/s
[19:14:33] - Averaged speed for that direction ~425 kB/s
[19:14:33] + Received work.
[19:14:33] Trying to send all finished work units
[19:14:33] Project: 2485 (Run 123, Clone 33, Gen 6)
[19:14:33] - Read packet limit of 540015616... Set to 524286976.


[19:14:33] + Attempting to send results [April 4 19:14:33 UTC]
[19:14:33] - Reading file work/wuresults_08.dat from core
[19:14:33]   (Read 2804497 bytes from disk)
[19:14:33] Connecting to http://171.65.103.162:8080/
[19:14:56] Posted data.
[19:14:56] Initial: 0000; - Uploaded at ~109 kB/s
[19:14:58] - Averaged speed for that direction ~95 kB/s
[19:14:58] + Results successfully sent
[19:14:58] Thank you for your contribution to Folding@Home.
[19:14:58] + Number of Units Completed: 182

[19:14:58] + Sent 1 of 1 completed units to the server
[19:14:58] + Closed connections
[19:14:58] 
[19:14:58] + Processing work unit
[19:14:58] Core required: FahCore_78.exe
[19:14:58] Core found.
[19:14:58] Working on queue slot 09 [April 4 19:14:58 UTC]
[19:14:58] + Working ...
[19:14:58] - Calling '.\FahCore_78.exe -dir work/ -suffix 09 -checkpoint 12 -service -forceasm -verbose -lifeline 3988 -version 620'

[19:14:58] 
[19:14:58] *------------------------------*
[19:14:58] Folding@Home Gromacs Core
[19:14:58] Version 1.90 (March 8, 2006)
[19:14:58] 
[19:14:58] Preparing to commence simulation
[19:14:58] - Assembly optimizations manually forced on.
[19:14:58] - Not checking prior termination.
[19:15:01] - Expanded 2963981 -> 15070653 (decompressed 508.4 percent)
[19:15:01] - Starting from initial work packet
[19:15:01] 
[19:15:01] Project: 2484 (Run 199, Clone 33, Gen 16)
[19:15:01] 
[19:15:01] Assembly optimizations on if available.
[19:15:01] Entering M.D.


Re: 171.65.103.162 assigning same WU twice

Posted: Wed Apr 08, 2009 2:21 am
by bruce
Please confirm that you see different values for MachineID printed in the first page of the two FAHlogs.

Re: 171.65.103.162 assigning same WU twice

Posted: Wed Apr 08, 2009 2:34 am
by bruce
Tigerbiten wrote:Just happened again for me.
Two of my GPU clients on the same box but with different MachineID both picked up p5774(3-12-54) at ~14:52:06
Well, at least the stats system is recognizing that you should be getting credit for both WUs. That doesn't explain why the work is being duplicated, but it does say you shouldn't be worrying about the points.

CPUId: 4810XXXXXXXXXXXXAE (the sum of UserID and MachineID)
Hi Tigerbiten (team 33),
Your WU (P5774 R3 C12 G54) was added to the stats database on 2009-04-06 12:12:36 for 768 points of credit.

CPUId: 4810XXXXXXXXXXXXAC
Hi Tigerbiten (team 33),
Your WU (P5774 R3 C12 G54) was added to the stats database on 2009-04-06 12:12:36 for 768 points of credit.

NOTE: This is server 171.64.65.106 rather than 171.65.103.162

Re: 171.65.103.162 assigning same WU twice

Posted: Wed Apr 08, 2009 10:03 pm
by ppetrone
Hi and thanks for taking time to report this.
The problem seems to be solved on 171.65.103.162. Let me forward the message to someone in charge of 171.64.65.106.
Paula.

Re: 171.65.103.162 assigning same WU twice

Posted: Fri Apr 10, 2009 6:21 pm
by amba
ppetrone wrote:Hi and thanks for taking time to report this.
The problem seems to be solved on 171.65.103.162.
Paula.
Hm... I do not think so. There is a MASSIVE problems with this server IMHO. Approximately until April 5 or 6 there was all right with it. Then I've received the first duplicate WU from it. I've "fixed" it. What does it mean is -- I've stopped the client, deleted "work" dir contents and "queue.dat" and then started the client. There was a couple dupe WUs on monday and tuesday, I've "fixed" them too. But today I've got THREE identical WUs on machine with 8 cores. Pity I'd not done screenshot before I've "fixed" them too :)
To cut a long story short lower is screenshot of the monitor of my little farm. Duplicate WUs are in bold red. You may say that all dupes except one received by different machines, but I've never seen such things before.
A little explanation. Second column -- datetime of receipt (MSD -- UTC +4h, all dates are April). Fourth column -- WU data.
Gray rows -- sum stats for the machine. White rows under is the stats for each core in this machine.

Image

Re: 171.65.103.162 assigning same WU twice

Posted: Fri Apr 10, 2009 7:12 pm
by ppetrone
Thank you. This is very useful. Give me a couple of hours and I will try to figure out what's happening.
Paula

Re: 171.65.103.162 assigning same WU twice

Posted: Mon Apr 13, 2009 4:32 pm
by amba
It seems that problem has gone with this fix. At least for me.
Thanks Paula!