Prev: STATISTICAL ANALYSIS OF TEXT EMAILS Re: Refreshments after SOMS
Next: Send Unlimited Free sms World Wide
From: Gary on 6 Dec 2009 02:30 Hi all, I've done some reading up on hash objects and I'm really keen on picking up this tool for merging large datasets. I was wondering if someone could point me in the right direction on the following: 1. Is it possible to merge on keys with different variable names? I have tried renaming the keys through a dataset rename option, but the program complains that the key cannot be found. For example: data merge; if 0 then set lookup (rename=(oldkeyname=newkeyname)); etc... run; 2. Is there any documentation which shows how to mimic SQL joins using hash object, especially with duplicate keys? Thanks, Gary
From: Proc Me on 6 Dec 2009 09:48 Gary, Check out Judy Loren's excellent paper: How Do I Love Hash Tables? Let Me Count The Ways! http://www2.sas.com/proceedings/forum2008/029-2008.pdf Paul M Dorfman's papers are also worth checking out, especially his more recent ones such as: The Hash Object in Action http://support.sas.com/resources/papers/proceedings09/153-2009.pdf There is also a wealth of info on SAS-L, Proc Me
From: Paul Dorfman on 6 Dec 2009 12:41
Gary, If you decide to rename a key variable, you should rename it both in the place where you read the descriptor of the file being hashed for parameter matching (the IF 0 THEN SET part) and where you actually load it. If you load it explicitly, i.e. reading it in a loop, add the rename statement there. If you are loading it via the dataset: parameter, rename it there, as in dcl hash h (dataset: "to_hash (rename = (old=new))") ; Unfortunately, the possibility to specify data set options in this manner is available in 9.2 but not in 9.1.3. If you are running the latter, you can create a view where the key is renamed and then reference the view in your DATA step without worrying about renaming the key in two places. However, since there exists parameter tag KEY:, which accepts any expression as long as it is of the expected type (i.e. numeric or variable), none of the above is actually necessary. You can define and load the object with the old key, but then when calling .check or .find methods against it, specify the new key to the key: tag, as in rc = h.find (key: old) ; Once again, remember that what you specify to KEY: (or, in fact, any other parameter tag) can be any expression of the correct type. As far as mimicking SQL with hash code is concerned, it may not be as beneficial as you expect given the latest improvements in the SQL optimizer. If you are after an equijoin - expecially with many-to-many key relationship - SQL will use its inner hash algorithm (SQXJHSH access method) implicitly if you prompt it to do so with high enough buffersize= option, and it is going to be faster than its possible hash object replacement. A non-equijoin may be a different story, because for some inexplicable reason, SAS SQL does not use the SQXJHSH access method for non-equijoins. In these cases, follow the good advice and check Judy Loren's paper. Duplicate hash keys can be handled in both 9.1 and 9.2. In 9.2 it is canned functionality with the availability of multi-hash and corresponding methods; in 9.1 it can be handled programmatically in a number of different ways, all of them relying on the addition of a unique discriminating key to the hash in question. If you are interested in more details, try: http://www2.sas.com/proceedings/sugi31/232-31.pdf where dupe hash keys are handled in 9.1.3 via a second hash table. Kind regards ------------ Paul Dorfman Jax, FL ------------ On Sat, 5 Dec 2009 23:30:57 -0800, Gary <gshyoung(a)GMAIL.COM> wrote: >Hi all, > >I've done some reading up on hash objects and I'm really keen on >picking up this tool for merging large datasets. I was wondering if >someone could point me in the right direction on the following: > >1. Is it possible to merge on keys with different variable names? I >have tried renaming the keys through a dataset rename option, but the >program complains that the key cannot be found. For example: > >data merge; > if 0 then set lookup (rename=(oldkeyname=newkeyname)); > >etc... > >run; > >2. Is there any documentation which shows how to mimic SQL joins using >hash object, especially with duplicate keys? > >Thanks, >Gary |