From: Duane Rettig on 18 Jan 2010 13:18 On Jan 18, 7:11 am, Robert Uhl <eadmun...(a)NOSPAMgmail.com> wrote: > "Frode V. Fjeld" <fr...(a)netfonds.no> writes: > > > > > I.e. to hash over some combination of each key's sxhash (taking care not > > to combine them so as to generate a bignum). > > Note that SXHASH doesn't care about anything more than the top level of > a list: > > cl-user> (sxhash '(1 2 3 (4))) > 224526978 > cl-user> (sxhash '(1 2 3 (4 5))) > 224526978 Depends on your implementation. On Allegro CL: CL-USER(1): (sxhash '(1 2 3 (4))) 4246036626 CL-USER(2): (sxhash '(1 2 3 (4 5))) 4247011474 CL-USER(3): (other versions of Allegro CL will generate different hash codes, depending on endianness and 32-vs-64 bit, but all will distinguish this case) The issue is one of hash-code generation speed vs test predicate lookup speed. The hash code generation for a deeper list structure does take a bit more time, but for heavily used hash tables the better distribution of hash codes over the range of possible hash codes results in fewer collisions and thus fewer invocations of the test predicate. In an 'eq hash-table, this is not a savings. But in an 'equal hash-table, the test between '(1 2 3 (4)) and '(1 2 3 (4 5)) will have to descend into all levels and lengths anyway, so the hash- code-generation savings is lost in an average of 50% of hashing situations (depending on whether the lookup in the table got the desired key or the other one). All in all, we've found that users tend to need excellent hash-code distributions in their apps because their hash-tables must scale, so we go to extra lengths to get the hash-code as well-distributed as possible. Duane
From: Duane Rettig on 18 Jan 2010 13:38 On Jan 18, 10:04 am, Robert Uhl <eadmun...(a)NOSPAMgmail.com> wrote: > Madhu <enom...(a)meer.net> writes: > > | > > | Note that SXHASH doesn't care about anything more than the top level > > | of a list: > > | > > | cl-user> (sxhash '(1 2 3 (4))) > > | 224526978 > > | cl-user> (sxhash '(1 2 3 (4 5))) > > | 224526978 > > > I believe Frodef was suggesting you should do something like: > > > * (reduce 'logior '(1 2 3 (4)) :key #'sxhash) > > 1031 > > > * (reduce 'logior '(1 2 3 (4 5)) :key #'sxhash) > > 526343 The problem with reducing on something like logior is that it is too simple an operation, and results in many duplicate hash codes being generated for given inputs. The key to hash tables [sic] is excellent hash-code distribution. > And yet: > > cl-user> (reduce 'logior '(1 2 3 (4 5 6 (6 7))) :key #'sxhash) > 501984895 > cl-user> (reduce 'logior '(1 2 3 (4 5 6 (6 7 8))) :key #'sxhash) > 501984895 This is precisely the problem; bits are lost by logior, by virtue of being masked by other bits in the same place in the word. The distribution becomes full of holes and clumps of values. > The basic issue is that SXHASH really isn't the right tool for the > job--the guarantees that the standard gives aren't good enough for > generating a unique key, which is what's needed to key a hashtable. This isn't really true, either. The precise results of sxhash is implementation dependent, based on other rules so it can be the right tool for the job. But neither is a unique hash necessary or possible; (re: necessary): the intention of sxhash is to give a good distribution to reduce collisions. If this is done well, sxhash has done its job - (re: possible): the width of the hash-code will be bounded, yet the possible combinations of key values are not as well bounded. The best that can be done on a normal (not "perfect") hash is to close all the holes and to use as many of them up as possible before duplicating a hash-code. > That makes sense, since SXHASH is meant to _hash_ a unique key, and > collisions are expected. This is true, but there's nothing wrong with collisions (duplicate hash codes) since the hash-table implementation is expected to handle collisions. This is purely a performance issue, not a functional one. Duane
From: Robert Uhl on 18 Jan 2010 14:18 Duane Rettig <duane(a)franz.com> writes: > >> That makes sense, since SXHASH is meant to _hash_ a unique key, and >> collisions are expected. > > This is true, but there's nothing wrong with collisions (duplicate > hash codes) since the hash-table implementation is expected to handle > collisions. IIRC the original suggested case was to use SXHASH to generate hash keys, which is pretty broken, but will seem to work for fairly small hash tables. If I recall incorrectly: doh! -- Robert A. Uhl To refuse awards is another way of accepting them with more noise than is normal. --Sir Peter Ustinov
From: Ray on 26 Jan 2010 15:23
piscesboy wrote: > Is it possible to map more than one key per hash table value? For > example, if I create a hash table and I want to use a name and a date > to identify a single value in the table, that's two keys that I want > to store per entry in the table that I want to use to uniquely > identify an entry. Because one entry can share a single date, and in > that case, the second identifier, the name can be used to uniquely > identify the entry. Um, sure. If you want to use two fields to make a hash key, that's easy. It's still a single hash key. If you want a system where you can look up an article by *either* key, you just want two hash tables, both with pointers to the main object. What was the question again? How is this different from one of these two (trivial) solutions? Bear |