assign a unique random integer to each unique id [SAS]

Prev: assign a unique random integer to each unique id
Next: Problem in reading a password protected Excel File: Could not

From: Gary Klein on 16 Jan 2010 17:24

You may also want to try randomly permuting your subjects, then numbering them 1 to your sample size. Then each one will have a integer in random order without duplicates.

For a random permutation, you could either use proc surveysample to do a simple random sample without replacement or proc plan:

data oldids;
do oldid = 1 to 4;
output;
end;
run;

proc plan;
factors oldid=4 random /noprint;
output out=temp data=oldids;
run;
quit;

data newids;
set temp;

newid = _n_;
run;

--- On Sat, 1/16/10, Ai Hua Wang <aihuawang(a)YAHOO.COM> wrote:

> From: Ai Hua Wang <aihuawang(a)YAHOO.COM>
> Subject: Re: assign a unique random integer to each unique id
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Saturday, January 16, 2010, 1:30 PM
> Hi Dan:
>
> Thank you very much for your thoughtful follow up. Please
> see my answers below.
>
> Why does your multiplier need to be proportional to dataset
> size?
> That is just my thought after I tried. Because when I use
> the smaller multiplier I got much more duplicates. When I
> increase it I got less. Eventually I found that it should be
> at least propotional to the size of the data set.
>
> Why do you want random integers assigned to your data?
> I need to use the assigned random integers as the unique id
> to allow the data users to identify each unique record. I
> thought it is better to use the integer than the decimal
> numbers.
>
> And why do they need to be unique?
> See above description and plus:
> It is used as the replacement of the sensitive information
> (unique id) for the privacy and confidentialiy concern.
>
> I hope this is helpful when you provide more insightful
> answers.
>
> Best Regards,
> Aihua
>
>
>
>
> --- On Sat, 1/16/10, Nordlund, Dan (DSHS/RDA) <NordlDJ(a)dshs.wa.gov>
> wrote:
>
>
> From: Nordlund, Dan (DSHS/RDA) <NordlDJ(a)dshs.wa.gov>
> Subject: RE: assign a unique random integer to each unique
> id
> To: "Ai Hua Wang" <aihuawang(a)YAHOO.COM>,
> SAS-L(a)LISTSERV.UGA.EDU
> Received: Saturday, January 16, 2010, 1:11 AM
>
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU]
> On Behalf Of Ai
> > Hua Wang
> > Sent: Friday, January 15, 2010 8:10 AM
> > To: SAS-L(a)LISTSERV.UGA.EDU
> > Subject: assign a unique random integer to each unique
> id
> >
> > Hi,
> >
> > I was wondering if anybody in this list could advise
> how I can assign a unique
> > random integer to each unique id. I have written the
> following code but it does not
> > allow me to get the unique random intergers. The
> 10000000 is proportional to the
> > size of the data set. Do I miss anything in the codes?
> Please kindly provide your
> > suggestions.
> >
> > Thanks,
> > Aihua
> >
> >
> > data temp;
> > set datset1;
> > urand=ceil(ranuni(1)*10000000);
> > run;
> >
> >
>
> Aihua,
>
> Well, without knowing what you are going to use those
> "random" numbers for, it is hard to give good advice. Why
> does your multiplier need to be proportional to dataset
> size? Why do you want random integers assigned to your
> data? And why do they need to be unique? If you tell us
> more about what your actual needs are, we might be able to
> provide better help.
>
> That being said, the following code will assign unique
> integers to your data, as long as you have fewer than 2**31
> - 1 records.
>
> data want;
> if _n_=1 then do;
> **----urand will be your random integer----**;
> urand=0;
> call ranuni(urand,dummy); **get a starting seed;
> put "original seed = " urand; **"save" starting seed
> to log;
> retain urand ;
> end;
>
> set datset1;
> call ranuni(urand,dummy);
>
> drop dummy;
> run;
>
> Hope this is helpful,
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA 98504-5204
>
>
>
>
>
>
>
> __________________________________________________________________
> Yahoo! Canada Toolbar: Search from anywhere on the web, and
> bookmark your favourite sites. Download it now
> http://ca.toolbar.yahoo.com.
>

From: Daniel Nordlund on 16 Jan 2010 20:46

Aihua,

Given your answers below, my original solution will work just fine. You will get unique integers for your random IDs. However, the IDs will be randomly selected from the range 1 to 2**31 - 1. If you would prefer to have "random" IDs that have a range of 1 to the number of observations in your dataset, then you could do something like:

data _null_;
if 0 then set have nobs=nobs;
call symput('nobs', cats(nobs));
run;

data want;
array _x[*] _x1-_x&nobs;
drop _x1-_x&nobs;
do _n_ = 1 to &nobs;
_x[_n_] = _n_;
end;

seed = 9385247;
drop seed;
call ranperm(seed, of _x:);

do _n_ = 1 to &nobs;
set have;
rand_id = _x[_n_];
output;
end;
run;

This will assign random IDs in one pass through the data. Since you probably know the number of observations in your dataset, you could drop the data _null_ data step and just replace &nobs everywhere with the known number of observations.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Ai
> Hua Wang
> Sent: Saturday, January 16, 2010 1:31 PM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: assign a unique random integer to each unique id
>
> Hi Dan:
>
> Thank you very much for your thoughtful follow up. Please see my answers
> below.
>
> Why does your multiplier need to be proportional to dataset size?
> That is just my thought after I tried. Because when I use the smaller
> multiplier I got much more duplicates. When I increase it I got less.
> Eventually I found that it should be at least propotional to the size of
> the data set.
>
> Why do you want random integers assigned to your data?
> I need to use the assigned random integers as the unique id to allow the
> data users to identify each unique record. I thought it is better to use
> the integer than the decimal numbers.
>
> And why do they need to be unique?
> See above description and plus:
> It is used as the replacement of the sensitive information (unique id) for
> the privacy and confidentialiy concern.
>
> I hope this is helpful when you provide more insightful answers.
>
> Best Regards,
> Aihua
>
>
>
>
> --- On Sat, 1/16/10, Nordlund, Dan (DSHS/RDA) <NordlDJ(a)dshs.wa.gov> wrote:
>
>
> From: Nordlund, Dan (DSHS/RDA) <NordlDJ(a)dshs.wa.gov>
> Subject: RE: assign a unique random integer to each unique id
> To: "Ai Hua Wang" <aihuawang(a)YAHOO.COM>, SAS-L(a)LISTSERV.UGA.EDU
> Received: Saturday, January 16, 2010, 1:11 AM
>
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Ai
> > Hua Wang
> > Sent: Friday, January 15, 2010 8:10 AM
> > To: SAS-L(a)LISTSERV.UGA.EDU
> > Subject: assign a unique random integer to each unique id
> >
> > Hi,
> >
> > I was wondering if anybody in this list could advise how I can assign a
> unique
> > random integer to each unique id. I have written the following code but
> it does not
> > allow me to get the unique random intergers. The 10000000 is
> proportional to the
> > size of the data set. Do I miss anything in the codes? Please kindly
> provide your
> > suggestions.
> >
> > Thanks,
> > Aihua
> >
> >
> > data temp;
> > set datset1;
> > urand=ceil(ranuni(1)*10000000);
> > run;
> >
> >
>
> Aihua,
>
> Well, without knowing what you are going to use those "random" numbers
> for, it is hard to give good advice. Why does your multiplier need to be
> proportional to dataset size? Why do you want random integers assigned to
> your data? And why do they need to be unique? If you tell us more about
> what your actual needs are, we might be able to provide better help.
>
> That being said, the following code will assign unique integers to your
> data, as long as you have fewer than 2**31 - 1 records.
>
> data want;
> if _n_=1 then do;
> **----urand will be your random integer----**;
> urand=0;
> call ranuni(urand,dummy); **get a starting seed;
> put "original seed = " urand; **"save" starting seed to log;
> retain urand ;
> end;
>
> set datset1;
> call ranuni(urand,dummy);
>
> drop dummy;
> run;
>
> Hope this is helpful,
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA 98504-5204
>
>
>
>
>
>
> __________________________________________________________________
> Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
> favourite sites. Download it now
> http://ca.toolbar.yahoo.com.

From: Warren Schlechte on 18 Jan 2010 06:14

Here's what I would do:

Extract just the ids: proc sql; select unique id, ranuni as random ;from dataset; order by random;
Then using a datastep, do: data dataset; set dataset; if _n_=1 then integer=1; else integer+1;
Then merge this dataset back into the original dataset.

Warren Schlechte

-----Original Message-----
From: Nordlund, Dan (DSHS/RDA) [mailto:NordlDJ(a)DSHS.WA.GOV]
Sent: Fri 1/15/2010 7:11 PM
Subject: Re: assign a unique random integer to each unique id

> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Ai
> Hua Wang
> Sent: Friday, January 15, 2010 8:10 AM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: assign a unique random integer to each unique id
>
> Hi,
>
> I was wondering if anybody in this list could advise how I can assign a unique
> random integer to each unique id. I have written the following code but it does not
> allow me to get the unique random intergers. The 10000000 is proportional to the
> size of the data set. Do I miss anything in the codes? Please kindly provide your
> suggestions.
>
> Thanks,
> Aihua
>
>
> data temp;
> set datset1;
> urand=ceil(ranuni(1)*10000000);
> run;
>
>

Aihua,

Well, without knowing what you are going to use those "random" numbers for, it is hard to give good advice. Why does your multiplier need to be proportional to dataset size? Why do you want random integers assigned to your data? And why do they need to be unique? If you tell us more about what your actual needs are, we might be able to provide better help.

That being said, the following code will assign unique integers to your data, as long as you have fewer than 2**31 - 1 records.

data want;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed to log;
retain urand ;
end;

set datset1;
call ranuni(urand,dummy);

drop dummy;
run;

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

From: Muthia Kachirayan on 18 Jan 2010 11:59

Aihua,

Your clarification to Dan makes your need understood. My earlier array
solution permutes the observation numbers in random order to get the
urands. There is some chance that both of them may be same. The urands will
suggest the similarity between observation numbers and urands. This can be
rectified by choosing a multiplier to the number of observations to get a
random sequence of urands.

The test data set is:

data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;

The data set, HAVE, has KEY as the primary key in ascending order with some
SAT data.

key sat
1 0101
2 0202
3 0303
4 0404
5 0505
6 0606
7 0707
8 0808
9 0909
10 1010

The following program creates new UNIQUE IDs(uid) based on the number of
observations in the data set and note that the KEY is not used in the
process. Let us use a number, NUM_FOLD, to multiply the observation number,
say 1000.

%let num_fold = 1000;

data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = ceil(ranuni(123) * num * &num_fold);
do rc = h.check() by 0 while (rc = 0);
uid = ceil(ranuni(123) * num * &num_fold); ** Try another random
number ;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;

proc print data = need;
run;

The data set, NEED, gives the UID for the corresponding Record ID(RID) and
the KEY is dropped to keep the secrecy of data set.

RID sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877

My earlier array solution is a special case when NUM_FOLD = 1.

This program also gives another data set, LOOKUP, giving links to RID/KEY.
When it is sorted by UID, the reverse process of getting RID/KEY becomes
easy.

There is another possiblity of creating UIDs based on the KEYs and not based
on the observation number. The following program does it based on MOD()
function. However, KEYs, have to nemeric but this restriction can be removed
in some circumstances when the character-type KEYs can be changed to numeric
by the use of functions like, PIBw. For further details refer to Dorfman(Key
indexing, Bitmapping and Hashing).

data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = mod(key, num) + 1;
do rc = h.check() by 0 while (rc = 0);
** Try another random number ;
uid = uid + 1;
if uid > num then uid = 1;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;

Do you find this program useful to solve your issue ? Your feedback will be
useful to SAS-Lers to give alternate solutions.

Kind regards,
Muthia Kachirayan

On Sat, Jan 16, 2010 at 5:30 PM, Ai Hua Wang <aihuawang(a)yahoo.com> wrote:

> Hi Dan:
>
> Thank you very much for your thoughtful follow up. Please see my answers
> below.
>
> Why does your multiplier need to be proportional to dataset size?
> That is just my thought after I tried. Because when I use the smaller
> multiplier I got much more duplicates. When I increase it I got less.
> Eventually I found that it should be at least propotional to the size of the
> data set.
>
> Why do you want random integers assigned to your data?
> I need to use the assigned random integers as the unique id to allow the
> data users to identify each unique record. I thought it is better to use the
> integer than the decimal numbers.
>
> And why do they need to be unique?
> See above description and plus:
> It is used as the replacement of the sensitive information (unique id) for
> the privacy and confidentialiy concern.
>
> I hope this is helpful when you provide more insightful answers.
>
> Best Regards,
> Aihua
>
>

From: Mike Zdeb on 18 Jan 2010 14:07

hi ... nice

also, same answer, slight tweak of code in the hash routine ...

data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;

%let m=1000;

data want;
declare hash rnd ();
rnd.definekey ('uid');
rnd.definedone ();
do until (done);
set have end=done nobs=n;
do until (not rc);
uid = ceil(ranuni(123) * n * &m);
rc=rnd.add();
end;
output;
end;
stop;
drop key rc;
run;

proc print data=want;
run;

Obs sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877

--
Mike Zdeb
U(a)Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475

> Aihua,
>
> Your clarification to Dan makes your need understood. My earlier array
> solution permutes the observation numbers in random order to get the
> urands. There is some chance that both of them may be same. The urands will
> suggest the similarity between observation numbers and urands. This can be
> rectified by choosing a multiplier to the number of observations to get a
> random sequence of urands.
>
> The test data set is:
>
> data have;
> do key = 1 to 10;
> sat = put(key * 100 + key, z4.);
> output;
> end;
> run;
>
> The data set, HAVE, has KEY as the primary key in ascending order with some
> SAT data.
>
> key sat
> 1 0101
> 2 0202
> 3 0303
> 4 0404
> 5 0505
> 6 0606
> 7 0707
> 8 0808
> 9 0909
> 10 1010
>
> The following program creates new UNIQUE IDs(uid) based on the number of
> observations in the data set and note that the KEY is not used in the
> process. Let us use a number, NUM_FOLD, to multiply the observation number,
> say 1000.
>
> %let num_fold = 1000;
>
> data need;
> if _n_ = 1 then do;
> declare hash h(hashexp:16);
> h.definekey('uid');
> h.definedata('RID','key','uid');
> h.definedone();
> end;
> do RID = 1 to num;
> set have nobs = num ;
> uid = ceil(ranuni(123) * num * &num_fold);
> do rc = h.check() by 0 while (rc = 0);
> uid = ceil(ranuni(123) * num * &num_fold); ** Try another random
> number ;
> rc = h.check();
> end;
> h.add();
> output;
> end;
> h.output(dataset:'LOOKUP');
> stop;
> drop rc key;
> run;
>
> proc print data = need;
> run;
>
> The data set, NEED, gives the UID for the corresponding Record ID(RID) and
> the KEY is dropped to keep the secrecy of data set.
>
> RID sat uid
> 1 0101 7504
> 2 0202 3210
> 3 0303 1784
> 4 0404 9061
> 5 0505 3572
> 6 0606 2212
> 7 0707 7865
> 8 0808 3981
> 9 0909 1247
> 10 1010 1877
>
> My earlier array solution is a special case when NUM_FOLD = 1.
>
> This program also gives another data set, LOOKUP, giving links to RID/KEY.
> When it is sorted by UID, the reverse process of getting RID/KEY becomes
> easy.
>
> There is another possiblity of creating UIDs based on the KEYs and not based
> on the observation number. The following program does it based on MOD()
> function. However, KEYs, have to nemeric but this restriction can be removed
> in some circumstances when the character-type KEYs can be changed to numeric
> by the use of functions like, PIBw. For further details refer to Dorfman(Key
> indexing, Bitmapping and Hashing).
>
> data need;
> if _n_ = 1 then do;
> declare hash h(hashexp:16);
> h.definekey('uid');
> h.definedata('RID','key','uid');
> h.definedone();
> end;
> do RID = 1 to num;
> set have nobs = num ;
> uid = mod(key, num) + 1;
> do rc = h.check() by 0 while (rc = 0);
> ** Try another random number ;
> uid = uid + 1;
> if uid > num then uid = 1;
> rc = h.check();
> end;
> h.add();
> output;
> end;
> h.output(dataset:'LOOKUP');
> stop;
> drop rc key;
> run;
>
> Do you find this program useful to solve your issue ? Your feedback will be
> useful to SAS-Lers to give alternate solutions.
>
> Kind regards,
> Muthia Kachirayan
>
>
>
>
> On Sat, Jan 16, 2010 at 5:30 PM, Ai Hua Wang <aihuawang(a)yahoo.com> wrote:
>
>> Hi Dan:
>>
>> Thank you very much for your thoughtful follow up. Please see my answers
>> below.
>>
>> Why does your multiplier need to be proportional to dataset size?
>> That is just my thought after I tried. Because when I use the smaller
>> multiplier I got much more duplicates. When I increase it I got less.
>> Eventually I found that it should be at least propotional to the size of the
>> data set.
>>
>> Why do you want random integers assigned to your data?
>> I need to use the assigned random integers as the unique id to allow the
>> data users to identify each unique record. I thought it is better to use the
>> integer than the decimal numbers.
>>
>> And why do they need to be unique?
>> See above description and plus:
>> It is used as the replacement of the sensitive information (unique id) for
>> the privacy and confidentialiy concern.
>>
>> I hope this is helpful when you provide more insightful answers.
>>
>> Best Regards,
>> Aihua
>>
>>
>

First | Prev | Next | Last
Pages: 1 2 3
Prev: assign a unique random integer to each unique id
Next: Problem in reading a password protected Excel File: Could not