From: walt01 on
Hi,

I would like to ask your help in coming up with an answer to the below, which appears in a way an inverse birthday problem... An answer so far has been elusive, and been the subject of many a coffee corner speculation...

We have a system with a population of (N) 4000000 users, with each user having a self generated/assigned id (this id is actually 48 bits, but it likely doesn't matter).
Each user is assigned (randomly) into one of (D) 6000 communities/servers.
In the above system we're picking up (A) 500 duplicate id alarms at any given time, whereby a duplicate is two users or more assigned to the same community who use the same id. (in the physical system we're able to detect the dupes)

What is the approximate number of duplicate id's in the system overall, and/or the probability of users assigning a duplicate?

Thanks,
Walt.
From: Tim Little on
On 2010-06-23, walt01 <wodec(a)hotmail.com> wrote:
> Each user is assigned (randomly) into one of (D) 6000
> communities/servers. In the above system we're picking up (A) 500
> duplicate id alarms at any given time, whereby a duplicate is two
> users or more assigned to the same community who use the same
> id. (in the physical system we're able to detect the dupes)
>
> What is the approximate number of duplicate id's in the system
> overall, and/or the probability of users assigning a duplicate?

At one extreme, there may be just a single id picked by about 2000
users, with no other collisions present. In this case you register a
collision whenever two such users are assigned to the same group.

At the other extreme about 3/4 of your users have an id duplicated
with one other user each. Then you see a collision when one of the
500-ish paired users per community happens to be in the same community
as their id-mate.


The actual truth is likely to be somewhere in between. Is it not
possible to measure these statistics rather than to try extrapolating
them from a single data point?


- Tim