Daily User Summary file statistics
Posted: Tue Nov 24, 2009 1:03 am
Curiosity lead me to download and look into the DUS file... then to compare their growth/change -- 11/20 vs 11/23. 'Wrote a PERL script to compare their data; may have some bugs in the script, but most of the results are probably correct.
I was surprised (disappointed) to find only 3.5% of the accounts were Active (creating new Scores) over three days. If ~1.3M accounts are inactive, that seems like a large overhead to drag along. Is it time/appropriate to reduce the database? Say, drop accounts with no activity in [30... or chose your own number] days?
There were nearly 200 accounts with a null Name -- the first character of the line was the TAB field separator -- and two of these were accumulating WU's!
There were 100+ with the char' string "NULL" in the Score field -- and all had Team number = 2**32-1 . Perhaps this is a transient data state?
There were 100+ accounts missing a field: they have three fields of digits and a missing last field -- the last character is the TAB separator. [eg: "0<TAB>19<TAB>38296<TAB>"].
There were 72k+ accounts with a Score of Zero. Perhaps a large fraction of these walked away after the installation... should those over a certain age be purged?
There were 1.3k+ new accounts created over those three days (ie., their entries were missing in first file). That doesn't square very well with the number of Active users unless there is a VERY high rate of attrition amongst new "Folders".
There are 12k+ duplicated Name+Team entries. [These raise intrinsic problems in attempting to ascertain the Activity of accounts from the DUS file alone, but my script accounted properly in most instances.] The vast majority of those had only 2 matching dup's and many had very specific names: I'm guessing they re-installed the s/w or put it on a second computer in some way which creating a "new" rather than a "join" of their identities. I expect one or both of the accounts has had no recent activity. If there's adequate date info in the master DB, perhaps these accounts could be joined? [Nah....]
Out of curiosity: since you allow duplicate Name+Team accounts, how do you map some pairs together and not others? Why are my two computers lumped together in their results instead of being shown as two separate accounts? Is it my use of a common passkey? I'm NOT complaining -- I want them combined -- I just wonder what controls the merging.
My compliments to those with the wackiest account names: nice to see some imagination! In the other direction "PS3" [366] and null ("")[116] far exceeded other Names combo's for dup's with only a few other names breaking into the double digits.
So much for the DUS -- off to catch "House"....
I was surprised (disappointed) to find only 3.5% of the accounts were Active (creating new Scores) over three days. If ~1.3M accounts are inactive, that seems like a large overhead to drag along. Is it time/appropriate to reduce the database? Say, drop accounts with no activity in [30... or chose your own number] days?
There were nearly 200 accounts with a null Name -- the first character of the line was the TAB field separator -- and two of these were accumulating WU's!
There were 100+ with the char' string "NULL" in the Score field -- and all had Team number = 2**32-1 . Perhaps this is a transient data state?
There were 100+ accounts missing a field: they have three fields of digits and a missing last field -- the last character is the TAB separator. [eg: "0<TAB>19<TAB>38296<TAB>"].
There were 72k+ accounts with a Score of Zero. Perhaps a large fraction of these walked away after the installation... should those over a certain age be purged?
There were 1.3k+ new accounts created over those three days (ie., their entries were missing in first file). That doesn't square very well with the number of Active users unless there is a VERY high rate of attrition amongst new "Folders".
There are 12k+ duplicated Name+Team entries. [These raise intrinsic problems in attempting to ascertain the Activity of accounts from the DUS file alone, but my script accounted properly in most instances.] The vast majority of those had only 2 matching dup's and many had very specific names: I'm guessing they re-installed the s/w or put it on a second computer in some way which creating a "new" rather than a "join" of their identities. I expect one or both of the accounts has had no recent activity. If there's adequate date info in the master DB, perhaps these accounts could be joined? [Nah....]
Out of curiosity: since you allow duplicate Name+Team accounts, how do you map some pairs together and not others? Why are my two computers lumped together in their results instead of being shown as two separate accounts? Is it my use of a common passkey? I'm NOT complaining -- I want them combined -- I just wonder what controls the merging.
My compliments to those with the wackiest account names: nice to see some imagination! In the other direction "PS3" [366] and null ("")[116] far exceeded other Names combo's for dup's with only a few other names breaking into the double digits.
So much for the DUS -- off to catch "House"....