Prev: assign a unique random integer to each unique id
Next: Problem in reading a password protected Excel File: Could not
From: Muthia Kachirayan on 18 Jan 2010 14:37 Mike, It is a good tweak ! Thanks. On Mon, Jan 18, 2010 at 3:07 PM, Mike Zdeb <msz03(a)albany.edu> wrote: > hi ... nice > > also, same answer, slight tweak of code in the hash routine ... > > data have; > do key = 1 to 10; > sat = put(key * 100 + key, z4.); > output; > end; > run; > > %let m=1000; > > data want; > declare hash rnd (); > rnd.definekey ('uid'); > rnd.definedone (); > do until (done); > set have end=done nobs=n; > do until (not rc); > uid = ceil(ranuni(123) * n * &m); > rc=rnd.add(); > end; > output; > end; > stop; > drop key rc; > run; > > proc print data=want; > run; > > Obs sat uid > 1 0101 7504 > 2 0202 3210 > 3 0303 1784 > 4 0404 9061 > 5 0505 3572 > 6 0606 2212 > 7 0707 7865 > 8 0808 3981 > 9 0909 1247 > 10 1010 1877 > > > -- > Mike Zdeb > U(a)Albany School of Public Health > One University Place > Rensselaer, New York 12144-3456 > P/518-402-6479 F/630-604-1475 > > > Aihua, > > > > Your clarification to Dan makes your need understood. My earlier array > > solution permutes the observation numbers in random order to get the > > urands. There is some chance that both of them may be same. The urands > will > > suggest the similarity between observation numbers and urands. This can > be > > rectified by choosing a multiplier to the number of observations to get a > > random sequence of urands. > > > > The test data set is: > > > > data have; > > do key = 1 to 10; > > sat = put(key * 100 + key, z4.); > > output; > > end; > > run; > > > > The data set, HAVE, has KEY as the primary key in ascending order with > some > > SAT data. > > > > key sat > > 1 0101 > > 2 0202 > > 3 0303 > > 4 0404 > > 5 0505 > > 6 0606 > > 7 0707 > > 8 0808 > > 9 0909 > > 10 1010 > > > > The following program creates new UNIQUE IDs(uid) based on the number of > > observations in the data set and note that the KEY is not used in the > > process. Let us use a number, NUM_FOLD, to multiply the observation > number, > > say 1000. > > > > %let num_fold = 1000; > > > > data need; > > if _n_ = 1 then do; > > declare hash h(hashexp:16); > > h.definekey('uid'); > > h.definedata('RID','key','uid'); > > h.definedone(); > > end; > > do RID = 1 to num; > > set have nobs = num ; > > uid = ceil(ranuni(123) * num * &num_fold); > > do rc = h.check() by 0 while (rc = 0); > > uid = ceil(ranuni(123) * num * &num_fold); ** Try another random > > number ; > > rc = h.check(); > > end; > > h.add(); > > output; > > end; > > h.output(dataset:'LOOKUP'); > > stop; > > drop rc key; > > run; > > > > proc print data = need; > > run; > > > > The data set, NEED, gives the UID for the corresponding Record ID(RID) > and > > the KEY is dropped to keep the secrecy of data set. > > > > RID sat uid > > 1 0101 7504 > > 2 0202 3210 > > 3 0303 1784 > > 4 0404 9061 > > 5 0505 3572 > > 6 0606 2212 > > 7 0707 7865 > > 8 0808 3981 > > 9 0909 1247 > > 10 1010 1877 > > > > My earlier array solution is a special case when NUM_FOLD = 1. > > > > This program also gives another data set, LOOKUP, giving links to > RID/KEY. > > When it is sorted by UID, the reverse process of getting RID/KEY becomes > > easy. > > > > There is another possiblity of creating UIDs based on the KEYs and not > based > > on the observation number. The following program does it based on MOD() > > function. However, KEYs, have to nemeric but this restriction can be > removed > > in some circumstances when the character-type KEYs can be changed to > numeric > > by the use of functions like, PIBw. For further details refer to > Dorfman(Key > > indexing, Bitmapping and Hashing). > > > > data need; > > if _n_ = 1 then do; > > declare hash h(hashexp:16); > > h.definekey('uid'); > > h.definedata('RID','key','uid'); > > h.definedone(); > > end; > > do RID = 1 to num; > > set have nobs = num ; > > uid = mod(key, num) + 1; > > do rc = h.check() by 0 while (rc = 0); > > ** Try another random number ; > > uid = uid + 1; > > if uid > num then uid = 1; > > rc = h.check(); > > end; > > h.add(); > > output; > > end; > > h.output(dataset:'LOOKUP'); > > stop; > > drop rc key; > > run; > > > > Do you find this program useful to solve your issue ? Your feedback will > be > > useful to SAS-Lers to give alternate solutions. > > > > Kind regards, > > Muthia Kachirayan > > > > > > > > > > On Sat, Jan 16, 2010 at 5:30 PM, Ai Hua Wang <aihuawang(a)yahoo.com> > wrote: > > > >> Hi Dan: > >> > >> Thank you very much for your thoughtful follow up. Please see my answers > >> below. > >> > >> Why does your multiplier need to be proportional to dataset size? > >> That is just my thought after I tried. Because when I use the smaller > >> multiplier I got much more duplicates. When I increase it I got less. > >> Eventually I found that it should be at least propotional to the size of > the > >> data set. > >> > >> Why do you want random integers assigned to your data? > >> I need to use the assigned random integers as the unique id to allow the > >> data users to identify each unique record. I thought it is better to use > the > >> integer than the decimal numbers. > >> > >> And why do they need to be unique? > >> See above description and plus: > >> It is used as the replacement of the sensitive information (unique id) > for > >> the privacy and confidentialiy concern. > >> > >> I hope this is helpful when you provide more insightful answers. > >> > >> Best Regards, > >> Aihua > >> > >> > > > > |