Daily User Summary file statistics

Moderators: Site Moderators, FAHC Science Team

Post Reply
KneeDeep
Posts: 8
Joined: Tue Oct 13, 2009 3:20 pm

Daily User Summary file statistics

Post by KneeDeep »

Curiosity lead me to download and look into the DUS file... then to compare their growth/change -- 11/20 vs 11/23. 'Wrote a PERL script to compare their data; may have some bugs in the script, but most of the results are probably correct.

I was surprised (disappointed) to find only 3.5% of the accounts were Active (creating new Scores) over three days. If ~1.3M accounts are inactive, that seems like a large overhead to drag along. Is it time/appropriate to reduce the database? Say, drop accounts with no activity in [30... or chose your own number] days?

There were nearly 200 accounts with a null Name -- the first character of the line was the TAB field separator -- and two of these were accumulating WU's!

There were 100+ with the char' string "NULL" in the Score field -- and all had Team number = 2**32-1 . Perhaps this is a transient data state?

There were 100+ accounts missing a field: they have three fields of digits and a missing last field -- the last character is the TAB separator. [eg: "0<TAB>19<TAB>38296<TAB>"].

There were 72k+ accounts with a Score of Zero. Perhaps a large fraction of these walked away after the installation... should those over a certain age be purged?

There were 1.3k+ new accounts created over those three days (ie., their entries were missing in first file). That doesn't square very well with the number of Active users unless there is a VERY high rate of attrition amongst new "Folders".

There are 12k+ duplicated Name+Team entries. [These raise intrinsic problems in attempting to ascertain the Activity of accounts from the DUS file alone, but my script accounted properly in most instances.] The vast majority of those had only 2 matching dup's and many had very specific names: I'm guessing they re-installed the s/w or put it on a second computer in some way which creating a "new" rather than a "join" of their identities. I expect one or both of the accounts has had no recent activity. If there's adequate date info in the master DB, perhaps these accounts could be joined? [Nah....]

Out of curiosity: since you allow duplicate Name+Team accounts, how do you map some pairs together and not others? Why are my two computers lumped together in their results instead of being shown as two separate accounts? Is it my use of a common passkey? I'm NOT complaining -- I want them combined -- I just wonder what controls the merging.

My compliments to those with the wackiest account names: nice to see some imagination! In the other direction "PS3" [366] and null ("")[116] far exceeded other Names combo's for dup's with only a few other names breaking into the double digits.

So much for the DUS -- off to catch "House"....
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Daily User Summary file statistics

Post by bruce »

KneeDeep wrote:Out of curiosity: since you allow duplicate Name+Team accounts, how do you map some pairs together and not others? Why are my two computers lumped together in their results instead of being shown as two separate accounts? Is it my use of a common passkey? I'm NOT complaining -- I want them combined -- I just wonder what controls the merging.
Nope, there is no merging in the official stats (though some 3rd party stats sites do). If you enter your email account as your FAH name, the ISP portion (the @ and what follows it) will not be displayed as an anti-spam feature. The full name is still used internally. Thus the following three accounts will appear to be duplicates.

Joe@IPS1.com
Joe@IPS2.com
Joe
KneeDeep
Posts: 8
Joined: Tue Oct 13, 2009 3:20 pm

Re: Daily User Summary file statistics

Post by KneeDeep »

Nope, there is no merging in the official stats (though some 3rd party stats sites do). If you enter your email account as your FAH name, the ISP portion (the @ and what follows it) will not be displayed as an anti-spam feature. The full name is still used internally. Thus the following three accounts will appear to be duplicates.

Joe@IPS1.com
Joe@IPS2.com
Joe
So... you're saying the bond that unites accounts, but isn't carried into the DUS, is the "local-name" ["Joe"] of the affiliated email account? And... that some/many of the duplicates arise when folks re-create their F@h configuration using a different email account?

Something about that troubles me... and perhaps it's just my deplorable memory... but I don't recall any temptation to re-identify my email account as I added F@h to the second computer, or as I re-installed it on my original computer [which I did MANY times as I found SMP/beta unreliable]. No implied "challenge" here... I'm just a bit confused... which describes most of my life, anyway.

Whatever... 'don't suppose my observations will alter much. I was enough interested in the oddities that I've just wasted a few electrons to share them! :)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Daily User Summary file statistics

Post by bruce »

No. What I'm saying is that there are probably several people who have chosen to use the name "Joe" and they can only be differentiated because they have different email accounts.
Post Reply