Page 1 of 2

Donor & Team lists issues

Posted: Tue Dec 02, 2008 6:19 pm
by pafka
Downloading the daily_user_summary.txt and daily_team_summary.txt files is OK.

Issues:

1. Importing it has revealed that non alphanumeric ( even non ascii ) names exist, which does not match the site requirement?
- probably not really an issue, while i can still save those binary;
- but still would be a good hint if we know how F@H handles these names.

2. There are non existent teams ( e.g. 1565204469 ), which confronts "Default (includes all those WU returned without valid team number) (0)"?!

3. There are a number of repeating records for a given name and team ( e.g. andy from 0 team ), which implies a few issues.
- F@H site does not show more than one record, which obviously can not be true?!
- if F@H site has any id for those users that would probably provide a little help!
- it is also obvious there is no way for an end user to distinguish which of the 8 records corresponds to which user in the previous list - I'd DO appreciate any help from F@H site on that one!!

Re: Donor & Team lists issues

Posted: Tue Dec 02, 2008 6:40 pm
by codysluder
Regarding item 3: FAH accepts email addresses as names but does not publish email addresses in the interest of minimizing spam. Thus andy@domain1.com and andy@domain2.com and andy will appear as three distinct records but will all show only "andy" for the name. There is no way to tell them apart from the external data provided by Stanford.

Regarding items 1 and 2: Even though new information is probably checked for conformance with the current rules, that was not always true. The fundamental statement that data will not be adjusted after the results are uploaded applies to everything, even data that was improperly accepted before the uncoming data was checked for conformance.

Re: Donor & Team lists issues

Posted: Tue Dec 02, 2008 8:27 pm
by pafka
codysluder wrote:Regarding item 3: FAH accepts email addresses as names but does not publish email addresses in the interest of minimizing spam. Thus andy@domain1.com and andy@domain2.com and andy will appear as three distinct records but will all show only "andy" for the name. There is no way to tell them apart from the external data provided by Stanford.
Well, I am familiar with the way emails are treated when used as name.

Still, that does not answer the question how F@H are officially treating those names, and how these are used to display user stats!

So, there might be an Andy wishing to get his certificate for let me say 1234 WUs he has truly done, but he only receives the other andy's certificate which states 1 WU.
Of course, you may check the reverse situation.

Re: Donor & Team lists issues

Posted: Tue Dec 02, 2008 8:35 pm
by codysluder
To get a certificate, you have to use the official Stanford stats. If andy enters his name as andy@domain1.com, the stats will know exactly who he is and give him the correct certificate. This cannot be done on a 3rd party stats site that depends on the donor and team lists as the source for their information.

Re: Donor & Team lists issues

Posted: Thu Dec 04, 2008 1:05 am
by pafka
codysluder wrote:To get a certificate, you have to use the official Stanford stats. If andy enters his name as andy@domain1.com, the stats will know exactly who he is and give him the correct certificate. This cannot be done on a 3rd party stats site that depends on the donor and team lists as the source for their information.
That does not look like a problem solved situation - it only looks like you do understand the issue. I would appreciate it if you stick to a solution next time.
Well, I'd expect we'd soon agree that F@H put some IDs on those users, would you?
We also know that it would not be so smart to expect F@H to provide the emails as IDs ... at least I do not.
Any integer or hash value, etc. would do just fine.

Re: Donor & Team lists issues

Posted: Thu Dec 04, 2008 2:04 am
by anandhanju
If you look at the MyFolding file (which is the correct place to see a person's stats), you'll see that a donor's page is determined by the username AND the team. If Andy from Team FF wished to see his stats, he'd access it via this link. If you search for donors by name and see the page for Andy, you'll see the rolled up credits for all users with that id. FAH has no way of determining if it was one andy who worked on all these or if they were different donors.

The passkey functionality in the new clients has been introduced keeping this enhancement in mind. There is no estimate when this will be enabled on the server side.

For 1), names can contain special characters. The non alphanumeric and non ASCII characters are represented as URL safe Unicode. E.g. See this URL.

For 2), I'm not sure what you mean by confronts but I think these "non-existant" teams are GAH teams that were transitioned over to FAH but were abandoned when the team numbering was restarted.

Re: Donor & Team lists issues

Posted: Thu Dec 04, 2008 11:16 am
by pafka
anandhanju wrote:If you look at the MyFolding file (which is the correct place to see a person's stats), you'll see that a donor's page is determined by the username AND the team. If Andy from Team FF wished to see his stats, he'd access it via this link. If you search for donors by name and see the page for Andy, you'll see the rolled up credits for all users with that id. FAH has no way of determining if it was one andy who worked on all these or if they were different donors.

The passkey functionality in the new clients has been introduced keeping this enhancement in mind. There is no estimate when this will be enabled on the server side.

For 1), names can contain special characters. The non alphanumeric and non ASCII characters are represented as URL safe Unicode. E.g. See this URL.

For 2), I'm not sure what you mean by confronts but I think these "non-existant" teams are GAH teams that were transitioned over to FAH but were abandoned when the team numbering was restarted.
You are so wrong about that!
- F@H would not show you a rolled up credits for all the "andy"s!
- F@H DO have a way to determine and distinguish users using emails! ( One can check that on the user stats pages. I hope I'm not revealing a sort of an inside secret here. )

F@H uses some sort of encryption of the user names, including the email part. Well, those encrypted emails look like a win win situation at least until the passkey starts to work ( which in fact may not happen ).
Well, I am too old to reverse engineer the encryption used, just to find out its a one way encryption of a sort ( which, at a first glance, seems not to be the case ).
Frankly, I am confident it'd be better to ask for a third "daily_user_summary.txt" file with the encrypted names included, instead of scanning the site for these.

And again:
for 1) I have a solution for that ... forget about it, or consider it just as a note.
for 2) "confront" should be self explaining when you look at the description of the Default team, which is expected to collect all those WUs returned with an invalid team number.

PS: I do hope a member of F@Hs development team would finally look here and provide useful hint/information even in private messaging, because I do want to remove those thoughts about reverse engineering and parsing the F@H site from my head!
Thanks.

Re: Donor & Team lists issues

Posted: Fri Dec 05, 2008 2:51 pm
by VijayPande
pafka wrote:
codysluder wrote:Regarding item 3: FAH accepts email addresses as names but does not publish email addresses in the interest of minimizing spam. Thus andy@domain1.com and andy@domain2.com and andy will appear as three distinct records but will all show only "andy" for the name. There is no way to tell them apart from the external data provided by Stanford.
Well, I am familiar with the way emails are treated when used as name.

Still, that does not answer the question how F@H are officially treating those names, and how these are used to display user stats!

So, there might be an Andy wishing to get his certificate for let me say 1234 WUs he has truly done, but he only receives the other andy's certificate which states 1 WU.
Of course, you may check the reverse situation.
When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.

Re: Donor & Team lists issues

Posted: Fri Dec 05, 2008 2:56 pm
by MtM
pafka wrote:PS: I do hope a member of F@Hs development team would finally look here and provide useful hint/information even in private messaging, because I do want to remove those thoughts about reverse engineering and parsing the F@H site from my head!
Thanks.
Which isn't allowed :?: ;)

Re: Donor & Team lists issues

Posted: Sat Dec 06, 2008 10:16 pm
by codysluder
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.

Re: Donor & Team lists issues

Posted: Sun Dec 07, 2008 12:14 am
by pafka
codysluder wrote:
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
Well, I wouldn't come so far if that was an option.
You see - it looks like F@H provide the exports ordered by credit ( descending ), which you have to agree ruins the idea in time.
Yes, that would be an option if lists come ordered by time a name has been first seen.

I am so sorry we're still missing a valuable opinion from F@H member.

Re: Donor & Team lists issues

Posted: Mon Dec 15, 2008 12:58 am
by VijayPande
pafka wrote: I am so sorry we're still missing a valuable opinion from F@H member.
Sorry, with literally hundreds of threads, it can take a while before we get to all, especially if we have recently replied given thread.

We'll look into this, but with all that we have going on right now (in particular shoring up GPU2 and SMP/SMP2), this may have to wait a bit.

Re: Donor & Team lists issues

Posted: Mon Dec 15, 2008 1:01 am
by VijayPande
codysluder wrote:
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
The tricky part here is that the keys have to be consistent from update to update. Let's say andy@domainA.com has 100 points and andy@domainB.com has 50 in one update and we list it as
andy@1 100
andy@2 50
based on ordering by highest to lowest. However, there's no guarantee that ordering will hold. It gets even more complex when andy@domainC.com comes in. I don't see any solution here that would work without storing some additional state info, since the list can (and will) change from update to update, so simply ordering won't work.

Re: Donor & Team lists issues

Posted: Mon Dec 15, 2008 2:10 pm
by pafka
VijayPande wrote:
codysluder wrote:
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
The tricky part here is that the keys have to be consistent from update to update. Let's say andy@domainA.com has 100 points and andy@domainB.com has 50 in one update and we list it as
andy@1 100
andy@2 50
based on ordering by highest to lowest. However, there's no guarantee that ordering will hold. It gets even more complex when andy@domainC.com comes in. I don't see any solution here that would work without storing some additional state info, since the list can (and will) change from update to update, so simply ordering won't work.
There is a solution w/o storing additional information.

Currently ( if ordered by credit ):
  • andy@1 100 3 0
    andy@2 50 1 0
could become in the next update:
  • andy@2 123 4 0
    andy@1 100 3 0
and seen in the donor list like:
  • andy 123 4 0
    andy 100 3 0
and the swap is impossible to detect.


But, if the list is ordered by the moment ( unix time, datetime, timestamp, etc... ) an user has been first spotted ( e.g. reported his first unit ) by the system we'd have:
  • andy@1 100 3 0 ( 2008-11-23 01:23:45 )
    andy@2 50 1 0 ( 2008-12-01 10:45:23 )
and in the next update the order would look like:
  • andy@1 100 3 0 ( 2008-11-23 01:23:45 )
    andy@2 123 4 0 ( 2008-12-01 10:45:23 )
    andy@N 12 1 0 ( 2008-12-02 10:45:23 )
and the donor list would look like:
  • andy 100 3 0
    andy 123 4 0
    andy 12 1 0
which order, one could use to auto ID the users in a database, e.g.:
  • 21 andy 100 3 0
    22 andy 123 4 0
    23 andy 12 1 0
and we can see a swap can not occur, but the information needed exists and the file's structure has been preserved.


Those IDs are not supposed to come from F@H system, but are my responsibility to add and follow.
That would allow third party stats to show all users' stats and user's using those could self locate and keep track of their personal stats.

That is all possible without changing the structure of the donor list.

PS: Of course, a case exists, where F@H would take care of some sort of IDs.
That would require additional column and could be provided e.g. in another file.
But I have already given up demanding such a file and that is really out of my scope right now.
Would probably get back to that when ( if ever ) the password functionality becomes mandatory.

Re: Donor & Team lists issues

Posted: Mon Dec 15, 2008 8:12 pm
by codysluder
VijayPande wrote:I don't see any solution here that would work without storing some additional state info, since the list can (and will) change from update to update, so simply ordering won't work.
Well, one way to to it is to store the state information for your andy example as
domainA.com=1
domainB.com=2

There's another possiblity: Don't you already have a way to store that information? If each of the donors has a passkey, whether they've ever used it or not. Suppose that andy@domainA.com has never used a passkey. Create a dummy passkey for him. If andy@domainB.com has used their passkey, then you can use it. I'm sure there's a reasonably direct way to convert all these passkeys into a series of integers. I'd think that would be a lot better than whatever method you presently use for the stats, but even that would be a third choice.