Prev: The future of Java
Next: weird issue with new lines
From: Patricia Shanahan on 24 Nov 2009 17:58 markspace wrote: > Tom Anderson wrote: >> >>> long bits = Double.doubleToLongBits( key ); >>> int hash = (int)(bits ^ (bits >>> 32)); >>> >>> provides terrible performance. >> >> Interesting. I chose that function because it's what java.lang.Double >> does, rather than because i thought it'd be good, but i am surprised >> to hear it's terrible - doubles are quite complicated internally, so >> would have thought that a parade of natural numbers would give >> reasonably well-distributed hashes this way (whereas longs wouldn't, >> of course). How did you conclude it's terrible? > > > Writing my own hash table implementation, I noticed that I was getting > terrible performance with a ton of collisions and everything was heaped > up in a tiny spot in the table. > > Inspecting the hash in hexadecimal, I realized that Jon's data keys -- > the natural counting numbers 1, 2, 3, etc. -- are represented in a > double as a few bits in the upper most bits of the double. The lower > bits are always 0, even after slicing the 64 bit double's bit pattern in > half and xoring the two halves. > > This xoring results in regular hash bit patterns like: > > 0x20200000 > 0x40200000 > 0x40600000 > 0x60200000 > etc. as the numbers count up > (bit patterns made up from memory, but you get the idea.) > > i.e., hashes with very few bits different, and all in the upper most > bits of the hash. This is exactly the opposite of what you want in a > good hash, which is lots of randomness in the lower bits of the hash code. > > So I concluded: absent any other perturbation in the hash, it sucks. Given current division speeds, does it really make sense to use a power-of-two bucket count? Many years ago, I had to design a hash table for use on a machine with integer remainder *very* slow compared to masking. I found that I got slightly more collisions with a power of two size than a prime size, but overall better lookup performance because of the remainder cost. If integer remainder had been within a factor of 10 of masking the prime bucket count would have won. Patricia
From: Thomas Pornin on 24 Nov 2009 18:14 According to Patricia Shanahan <pats(a)acm.org>: > Given current division speeds, does it really make sense to use a > power-of-two bucket count? Integer division tends to be slow on modern hardware. The Intel optimization manual which happens to conveniently reside on my harddisk states that latency for a simple "idiv" opcode is between 56 and 70 clock cycles, while masking uses only a fraction of a clock cycle. Division actually becomes slower over time; on a 80386, signed division took 43 clock cycles while a bit masking operation used 2 clock cycles. I do not have Intel manuals for newer processors (these should be downloadable from Intel's web site), but I expect integer division to remain slow on newer designs. On some RISC architectures (e.g. Alpha), there is no integer division opcode at all; division is performed in software and is even slower. In Sun's JRE, Hashtable and HashMap differ in their strategy. Hashtable uses a bucket count initially set to the provided value, or 11 by default. If the count is n and must be increased, then the new count is 2*n+1. Conversely, HashMap uses only powers of two for its bucket count (if given an initial capacity, it rounds it up to the next power of two). --Thomas Pornin
From: Jon Harrop on 24 Nov 2009 19:57 Tom Anderson wrote: > As long as you only want to store doubles in it. No, it applies to all untagged value types including byte, int, long, float, double and any user defined value types like complex numbers or vertex data. Interestingly, this reminds me of a lecture one of the JVM designers gave, describing how they felt it was preferable to build a VM upon dynamic language principles. The JVM's awful performance here is a direct consequence of inheriting the awful uniform representation approach taken by most dynamic languages precisely because they lack useful static type information like parametric polymorphism. > I imagine that if you're using object keys, the CLR's advantage is rather > smaller. Yes. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u
From: Marcin Rzeźnicki on 24 Nov 2009 18:55 On 24 Lis, 20:17, Tom Anderson <t...(a)urchin.earth.li> wrote: > On Sun, 22 Nov 2009, Marcin Rze?nicki wrote: > > On 21 Lis, 20:44, Tom Anderson <t...(a)urchin.earth.li> wrote: > >> On Sat, 21 Nov 2009, Marcin Rze?nicki wrote: > >>> On 21 Lis, 19:33, Jon Harrop <j...(a)ffconsultancy.com> wrote: > >>>> I'm having trouble getting Java's hash tables to run as fast as .NET's. > >>>> Specifically, the following program is 32x slower than the equivalent > >>>> on .NET: > > >>> You are using Hashtable instead of HashMap - probably the performance > >>> loss you've observed is due to synchronization (though "fat" > >>> synchronization might be optimized away in case of single thread you > >>> still pay the price, though lower). > > >> I'd be *very* surprised if that was true. In this simple program, > >> escape analysis could eliminate the locking entirely - and current > >> versions of JDK 1.6 do escape analysis. > > > First of all, escape analysis is turned off by default. > > Curses! I knew it was experimental when it first came into 1.6, but i > thought it was mature and on by default in the latest version. > I suppose that it is mature since they've made it available and I also suppose that they found out that benefits of EA were not that great for many classes of programs when compared to longer JITting - but that's just my thinking. > > The next thing is that there is subtle difference between synchronized > > method and synchronized block. Hashtable has the former - escape > > analysis does not help here very much afaik. > > Could you explain why? I don't see why they should be any different. You > can rewrite: > > synchronized void foo() { > // ... > > } > > As: > > void foo() { > synchronized (this) { > // ... > } > > } > > With exactly the same semantics, no? > Semantically yes but in the case of synchronized method there is no accompanying bytecode instruction, namely monitorenter/monitorexit (plus a 'finally' blocks added by compiler). When method is marked as synchronized it ends up as the flag for VM internals to take care of monitor. Of course, not all is lost, if method is inlined the EA engine will see the pattern but if the method is not inlined then it, as I think, will not help.
From: Tom Anderson on 24 Nov 2009 18:58
On Tue, 24 Nov 2009, markspace wrote: > Tom Anderson wrote: >> >>> long bits = Double.doubleToLongBits( key ); >>> int hash = (int)(bits ^ (bits >>> 32)); >>> >>> provides terrible performance. >> >> Interesting. I chose that function because it's what java.lang.Double >> does, rather than because i thought it'd be good, but i am surprised to >> hear it's terrible - doubles are quite complicated internally, so would >> have thought that a parade of natural numbers would give reasonably >> well-distributed hashes this way (whereas longs wouldn't, of course). >> How did you conclude it's terrible? > > Writing my own hash table implementation, I noticed that I was getting > terrible performance with a ton of collisions and everything was heaped up in > a tiny spot in the table. > > Inspecting the hash in hexadecimal, I realized that Jon's data keys -- > the natural counting numbers 1, 2, 3, etc. -- are represented in a > double as a few bits in the upper most bits of the double. The lower > bits are always 0, even after slicing the 64 bit double's bit pattern in > half and xoring the two halves. > > This xoring results in regular hash bit patterns like: > > 0x20200000 > 0x40200000 > 0x40600000 > 0x60200000 > > etc. as the numbers count up > (bit patterns made up from memory, but you get the idea.) Ahaaa, yes, of course. Those bits in full: 0.0 0000000000000000000000000000000000000000000000000000000000000000 1.0 0011111111110000000000000000000000000000000000000000000000000000 2.0 0100000000000000000000000000000000000000000000000000000000000000 3.0 0100000000001000000000000000000000000000000000000000000000000000 4.0 0100000000010000000000000000000000000000000000000000000000000000 5.0 0100000000010100000000000000000000000000000000000000000000000000 6.0 0100000000011000000000000000000000000000000000000000000000000000 7.0 0100000000011100000000000000000000000000000000000000000000000000 8.0 0100000000100000000000000000000000000000000000000000000000000000 9.0 0100000000100010000000000000000000000000000000000000000000000000 10.0 0100000000100100000000000000000000000000000000000000000000000000 11.0 0100000000100110000000000000000000000000000000000000000000000000 12.0 0100000000101000000000000000000000000000000000000000000000000000 13.0 0100000000101010000000000000000000000000000000000000000000000000 14.0 0100000000101100000000000000000000000000000000000000000000000000 15.0 0100000000101110000000000000000000000000000000000000000000000000 16.0 0100000000110000000000000000000000000000000000000000000000000000 17.0 0100000000110001000000000000000000000000000000000000000000000000 18.0 0100000000110010000000000000000000000000000000000000000000000000 19.0 0100000000110011000000000000000000000000000000000000000000000000 20.0 0100000000110100000000000000000000000000000000000000000000000000 I'd been thinking that there would be lots of mismatch between base 10 and base 2 giving us long binary expansions of those numbers, but of course 1 is sort of a common ground in both bases! On the other hand, if we divide those numbers by 10: 0.0 0000000000000000000000000000000000000000000000000000000000000000 0.1 0011111110111001100110011001100110011001100110011001100110011010 0.2 0011111111001001100110011001100110011001100110011001100110011010 0.3 0011111111010011001100110011001100110011001100110011001100110011 0.4 0011111111011001100110011001100110011001100110011001100110011010 0.5 0011111111100000000000000000000000000000000000000000000000000000 0.6 0011111111100011001100110011001100110011001100110011001100110011 0.7 0011111111100110011001100110011001100110011001100110011001100110 0.8 0011111111101001100110011001100110011001100110011001100110011010 0.9 0011111111101100110011001100110011001100110011001100110011001101 1.0 0011111111110000000000000000000000000000000000000000000000000000 1.1 0011111111110001100110011001100110011001100110011001100110011010 1.2 0011111111110011001100110011001100110011001100110011001100110011 1.3 0011111111110100110011001100110011001100110011001100110011001101 1.4 0011111111110110011001100110011001100110011001100110011001100110 1.5 0011111111111000000000000000000000000000000000000000000000000000 1.6 0011111111111001100110011001100110011001100110011001100110011010 1.7 0011111111111011001100110011001100110011001100110011001100110011 1.8 0011111111111100110011001100110011001100110011001100110011001101 1.9 0011111111111110011001100110011001100110011001100110011001100110 2.0 0100000000000000000000000000000000000000000000000000000000000000 Perhaps not radically better, but at least there are more ones around. > i.e., hashes with very few bits different, and all in the upper most > bits of the hash. This is exactly the opposite of what you want in a > good hash, which is lots of randomness in the lower bits of the hash > code. > > So I concluded: absent any other perturbation in the hash, it sucks. Indeed. I suppose you could argue that the natural numbers represented as doubles is a pathological case - if you have the naturals, why are you using doubles? - but even so, it's not hard to come up with sequences which have minimally different hashes: 1.0 0011111111110000000000000000000000000000000000000000000000000000 1.0000000000000002 0011111111110000000000000000000000000000000000000000000000000001 1.0000000000000004 0011111111110000000000000000000000000000000000000000000000000010 1.0000000000000007 0011111111110000000000000000000000000000000000000000000000000011 1.0000000000000009 0011111111110000000000000000000000000000000000000000000000000100 1.000000000000001 0011111111110000000000000000000000000000000000000000000000000101 1.0000000000000013 0011111111110000000000000000000000000000000000000000000000000110 1.0000000000000016 0011111111110000000000000000000000000000000000000000000000000111 1.0000000000000018 0011111111110000000000000000000000000000000000000000000000001000 1.000000000000002 0011111111110000000000000000000000000000000000000000000000001001 1.0000000000000022 0011111111110000000000000000000000000000000000000000000000001010 1.0000000000000024 0011111111110000000000000000000000000000000000000000000000001011 1.0000000000000027 0011111111110000000000000000000000000000000000000000000000001100 1.0000000000000029 0011111111110000000000000000000000000000000000000000000000001101 1.000000000000003 0011111111110000000000000000000000000000000000000000000000001110 1.0000000000000033 0011111111110000000000000000000000000000000000000000000000001111 1.0000000000000036 0011111111110000000000000000000000000000000000000000000000010000 1.0000000000000038 0011111111110000000000000000000000000000000000000000000000010001 1.000000000000004 0011111111110000000000000000000000000000000000000000000000010010 1.0000000000000042 0011111111110000000000000000000000000000000000000000000000010011 1.0000000000000044 0011111111110000000000000000000000000000000000000000000000010100 And, yes, for any hash function, there will be sets of values for which the hashes are minimally different, or even the same. But they should be harder to come up with than that! This still leaves us with the question of what would constitute a good hash. And perhaps a broader question: whose job is it to turn an object into a well-distributed hashtable index? Should the object - every object - undertake to provide a well-distributed hash (and i can't really define what i mean like that, but it means something not like Double), or is it up to the hashtable to take whatever hashes it gets and stir them up to make good indices? tom -- A is for Absinthe, for which I now thirst |