Observation: Duplicate HFM WU History Entries
I've seen a spate lately of duplicate WU history entries - roughly correlated to a spate of upload/download issues that may be my network carrier (Comcast), or maybe due to the Stanford servers, or may be due to some of the characteristics of the current (7.3.6) FAH software.
I'm using HFM software 0.9.1, the latest available.
I detail here one simple instance- only two entries for the completed work unit. I have other examples where there were many times repeated unsuccessful upload attempts. One has 33 HFM WU History entries over the 36 minutes of repeated upload attempts.
I still use the HFM history data via the SQL database for various EXCEL analysis, and thankfully it is easy for Excel to be programmed to eliminate duplicates. But, HFM doesn't seem to be triggering on the "Upload Complete" "final Credit Estimate" lines. This may be due to the fact that these events are entered in the FAH log but are not passed across the Client Socket Interface and official APIs.
When I have upload issues and the upload has to start multiple times, I am getting multiple line items in the HFM log. I can't completely correlate them - but let me give an example from both the HFM logs and the FAH logs. I've shortened what I post here to what I think is the most relevant, but obviously more is available.
This instance is for completed Work Unit project:8090 run:628 clone:0 gen:8 core running in Folding Slot 1 (SMP5) on my computer named DigiStorm. Slots 0 and 2 are two GPUs also folding on this machine.
The FAH log is in GMT; the HFM log seems to be in PDT local time.
FAH shows the following Upload attempts (highly edited log transcript - full transcript below):
-
16:15:18:WU03:FS01:Connecting to 171.67.108.60:8080
- 16:15:50:WU03:FS01:Upload 66.20%
- 16:15:50:WARNING:WU03:FS01:Exception: Failed to send results to work server: Transfer failed
- 16:15:50:WU03:FS01:Trying to send results to collection server
- 16:15:50:WU03:FS01:Uploading 4.15MiB to 171.65.103.160
-
16:15:50:WU03:FS01:Connecting to 171.65.103.160:8080
- 16:16:21:ERROR:WU03:FS01:Exception: Transfer failed
- 16:16:22:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:8090 run:628 clone:0 gen:8 core:0xa4 unit:0x0000000b6652edcc51674aa8dab63eb9
- 16:16:22:WU03:FS01:Uploading 4.15MiB to 171.67.108.60
-
16:16:22:WU03:FS01:Connecting to 171.67.108.60:8080
- 16:17:02:WARNING:WU03:FS01:Exception: Failed to send results to work server: Transfer failed
- 16:17:02:WU03:FS01:Trying to send results to collection server
- 16:17:02:WU03:FS01:Uploading 4.15MiB to 171.65.103.160
-
16:17:02:WU03:FS01:Connecting to 171.65.103.160:8080
- 16:17:52:ERROR:WU03:FS01:Exception: Transfer failed
- 16:17:52:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:8090 run:628 clone:0 gen:8 core:0xa4 unit:0x0000000b6652edcc51674aa8dab63eb9
- 16:17:52:WU03:FS01:Uploading 4.15MiB to 171.67.108.60
-
16:17:52:WU03:FS01:Connecting to 171.67.108.60:8080
- 16:18:32:WU03:FS01:Upload complete
- 16:18:32:WU03:FS01:Server responded WORK_ACK (400)
- 16:18:32:WU03:FS01:Final credit estimate, 7093.00 points
Now, HFM ended up with two WU history entries:
[5/14/2013-09:15:19 AM] - (DigiStorm Slot 01) Writing unit to database: P8090 (R628, C0, G8)
[5/14/2013-09:15:50 AM] - (DigiStorm Slot 01) Writing unit to database: P8090 (R628, C0, G8)
These are entries I copied from the HFM Work Unit History display. Both entries are at 9:15 AM - this log display doesn't show seconds.
8090 p8090 DigiStorm Slot 01 Unknown GRO_A4 2.27 00:06:48 0.75 7,107.8 15,051.9 5/13/2013 09:40 PM 5/14/2013 09:15 AM 100 FINISHED_UNIT 39000 628 0 8
8090 p8090 DigiStorm Slot 01 Unknown GRO_A4 2.27 00:06:48 0.75 7,110.4 15,057.3 5/13/2013 09:40 PM 5/14/2013 09:15 AM 100 FINISHED_UNIT 39000 628 0 8
FAH V7.3.6 log of repeated upload failures until one finally got through. Filtered for time, and to show only Folding Slot 01 - the SMP slot.
Code: Select all
16:15:18:WU03:FS01:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
16:15:18:WU03:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
16:15:18:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:8090 run:628 clone:0 gen:8 core:0xa4 unit:0x0000000b6652edcc51674aa8dab63eb9
16:15:18:WU03:FS01:Uploading 4.15MiB to 171.67.108.60
16:15:18:WU03:FS01:Connecting to 171.67.108.60:8080
16:15:18:WU02:FS01:Starting
16:15:18:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Al/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/beta/Core_a3.fah/FahCore_a3.exe -dir 02 -suffix 01 -version 703 -lifeline 5052 -checkpoint 15 -np 5
16:15:18:WU02:FS01:Started FahCore on PID 4524
16:15:18:WU02:FS01:Core PID:6284
16:15:18:WU02:FS01:FahCore 0xa3 started
16:15:19:WU02:FS01:0xa3:
16:15:19:WU02:FS01:0xa3:*------------------------------*
16:15:19:WU02:FS01:0xa3:Folding@Home Gromacs SMP Core
16:15:19:WU02:FS01:0xa3:Version 2.27 (Dec. 15, 2010)
16:15:19:WU02:FS01:0xa3:
16:15:19:WU02:FS01:0xa3:Preparing to commence simulation
16:15:19:WU02:FS01:0xa3:- Looking at optimizations...
16:15:19:WU02:FS01:0xa3:- Created dyn
16:15:19:WU02:FS01:0xa3:- Files status OK
16:15:19:WU02:FS01:0xa3:- Expanded 1948785 -> 2929924 (decompressed 150.3 percent)
16:15:19:WU02:FS01:0xa3:Called DecompressByteArray: compressed_data_size=1948785 data_size=2929924, decompressed_data_size=2929924 diff=0
16:15:19:WU02:FS01:0xa3:- Digital signature verified
16:15:19:WU02:FS01:0xa3:
16:15:19:WU02:FS01:0xa3:Project: 7513 (Run 0, Clone 23, Gen 85)
16:15:19:WU02:FS01:0xa3:
16:15:19:WU02:FS01:0xa3:Assembly optimizations on if available.
16:15:19:WU02:FS01:0xa3:Entering M.D.
16:15:24:WU03:FS01:Upload 45.13%
16:15:25:WU02:FS01:0xa3:Mapping NT from 5 to 5
16:15:25:WU02:FS01:0xa3:Completed 0 out of 500000 steps (0%)
16:15:30:WU03:FS01:Upload 51.15%
16:15:37:WU00:FS00:0x17:Completed 375000 out of 2500000 steps (15%)
16:15:50:WU03:FS01:Upload 66.20%
16:15:50:WARNING:WU03:FS01:Exception: Failed to send results to work server: Transfer failed
16:15:50:WU03:FS01:Trying to send results to collection server
16:15:50:WU03:FS01:Uploading 4.15MiB to 171.65.103.160
16:15:50:WU03:FS01:Connecting to 171.65.103.160:8080
16:15:58:WU03:FS01:Upload 6.02%
16:16:21:WU03:FS01:Upload 42.12%
16:16:21:ERROR:WU03:FS01:Exception: Transfer failed
16:16:22:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:8090 run:628 clone:0 gen:8 core:0xa4 unit:0x0000000b6652edcc51674aa8dab63eb9
16:16:22:WU03:FS01:Uploading 4.15MiB to 171.67.108.60
16:16:22:WU03:FS01:Connecting to 171.67.108.60:8080
16:16:28:WU03:FS01:Upload 4.51%
16:16:41:WU03:FS01:Upload 9.03%
16:17:02:WU03:FS01:Upload 24.07%
16:17:02:WARNING:WU03:FS01:Exception: Failed to send results to work server: Transfer failed
16:17:02:WU03:FS01:Trying to send results to collection server
16:17:02:WU03:FS01:Uploading 4.15MiB to 171.65.103.160
16:17:02:WU03:FS01:Connecting to 171.65.103.160:8080
16:17:10:WU03:FS01:Upload 1.50%
16:17:21:WU03:FS01:Upload 4.51%
16:17:33:WU03:FS01:Upload 37.61%
16:17:52:WU03:FS01:Upload 39.12%
16:17:52:ERROR:WU03:FS01:Exception: Transfer failed
16:17:52:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:8090 run:628 clone:0 gen:8 core:0xa4 unit:0x0000000b6652edcc51674aa8dab63eb9
16:17:52:WU03:FS01:Uploading 4.15MiB to 171.67.108.60
16:17:52:WU03:FS01:Connecting to 171.67.108.60:8080
16:18:02:WU03:FS01:Upload 1.50%
16:18:22:WU03:FS01:Upload 13.54%
16:18:28:WU03:FS01:Upload 70.71%
16:18:32:WU03:FS01:Upload complete
16:18:32:WU03:FS01:Server responded WORK_ACK (400)
16:18:32:WU03:FS01:Final credit estimate, 7093.00 points
16:18:32:WU03:FS01:Cleaning up
HFM Log - edited to the same time frame as the FAH log, and filtered for only the entries from my computer named DigiStorm (this HFM instance was monitoring four separate local computers running FAH V7)
Code: Select all
[5/14/2013-01:31:15 AM] -
[5/14/2013-01:31:15 AM] - Starting - HFM.NET v0.9.1.595
[5/14/2013-01:31:15 AM] -
[5/14/2013-01:31:28 AM] - SQLite DLL Path: C:\Program Files (x86)\HFM.NET\SQLite\AMD64\System.Data.SQLite.dll
[5/14/2013-09:15:05 AM] - (DigiStorm Slot 01) Slot Status: Running
[5/14/2013-09:15:19 AM] ! (DigiStorm Slot 01) Could not find log section for slot 1. Cannot update frame data for this slot.
[5/14/2013-09:15:19 AM] - (DigiStorm Slot 01) Writing unit to database: P8090 (R628, C0, G8)
[5/14/2013-09:15:20 AM] - (DigiStorm Slot 01) Slot Status: Running
[5/14/2013-09:15:50 AM] - (DigiStorm Slot 00) Slot Status: Running
[5/14/2013-09:15:50 AM] - (DigiStorm Slot 02) Slot Status: Running
[5/14/2013-09:15:50 AM] - (DigiStorm Slot 01) Writing unit to database: P8090 (R628, C0, G8)
[5/14/2013-09:15:50 AM] - (DigiStorm Slot 01) Slot Status: RunningNoFrameTimes
[5/14/2013-09:15:50 AM] - (DigiStorm) Retrieval finished in 114 ms
[5/14/2013-09:16:20 AM] - (DigiStorm Slot 00) Slot Status: Running
[5/14/2013-09:16:20 AM] - (DigiStorm Slot 02) Slot Status: Running
[5/14/2013-09:16:20 AM] - (DigiStorm Slot 01) Slot Status: RunningNoFrameTimes
[5/14/2013-09:16:20 AM] - (DigiStorm) Retrieval finished in 61 ms
[5/14/2013-09:16:50 AM] - (DigiStorm Slot 00) Slot Status: Running
[5/14/2013-09:16:50 AM] - (DigiStorm Slot 02) Slot Status: Running
[5/14/2013-09:16:50 AM] - (DigiStorm Slot 01) Slot Status: RunningNoFrameTimes
[5/14/2013-09:16:50 AM] - (DigiStorm) Retrieval finished in 59 ms
[5/14/2013-09:17:20 AM] - (DigiStorm Slot 00) Slot Status: Running
[5/14/2013-09:17:20 AM] - (DigiStorm Slot 02) Slot Status: Running
[5/14/2013-09:17:20 AM] - (DigiStorm Slot 01) Slot Status: RunningNoFrameTimes
[5/14/2013-09:17:20 AM] - (DigiStorm) Retrieval finished in 85 ms
[5/14/2013-09:17:50 AM] - (DigiStorm Slot 00) Slot Status: Running
[5/14/2013-09:17:50 AM] - (DigiStorm Slot 02) Slot Status: Running
[5/14/2013-09:17:50 AM] - (DigiStorm Slot 01) Slot Status: RunningNoFrameTimes
[5/14/2013-09:17:50 AM] - (DigiStorm) Retrieval finished in 80 ms
[5/14/2013-09:18:20 AM] - (DigiStorm Slot 00) Slot Status: Running
[5/14/2013-09:18:20 AM] - (DigiStorm Slot 02) Slot Status: Running
[5/14/2013-09:18:20 AM] - (DigiStorm Slot 01) Slot Status: RunningNoFrameTimes
[5/14/2013-09:18:20 AM] - (DigiStorm) Retrieval finished in 61 ms