From: Patricia Shanahan on
markspace wrote:
> Tom Anderson wrote:
>>
>>> long bits = Double.doubleToLongBits( key );
>>> int hash = (int)(bits ^ (bits >>> 32));
>>>
>>> provides terrible performance.
>>
>> Interesting. I chose that function because it's what java.lang.Double
>> does, rather than because i thought it'd be good, but i am surprised
>> to hear it's terrible - doubles are quite complicated internally, so
>> would have thought that a parade of natural numbers would give
>> reasonably well-distributed hashes this way (whereas longs wouldn't,
>> of course). How did you conclude it's terrible?
>
>
> Writing my own hash table implementation, I noticed that I was getting
> terrible performance with a ton of collisions and everything was heaped
> up in a tiny spot in the table.
>
> Inspecting the hash in hexadecimal, I realized that Jon's data keys --
> the natural counting numbers 1, 2, 3, etc. -- are represented in a
> double as a few bits in the upper most bits of the double. The lower
> bits are always 0, even after slicing the 64 bit double's bit pattern in
> half and xoring the two halves.
>
> This xoring results in regular hash bit patterns like:
>
> 0x20200000
> 0x40200000
> 0x40600000
> 0x60200000
> etc. as the numbers count up
> (bit patterns made up from memory, but you get the idea.)
>
> i.e., hashes with very few bits different, and all in the upper most
> bits of the hash. This is exactly the opposite of what you want in a
> good hash, which is lots of randomness in the lower bits of the hash code.
>
> So I concluded: absent any other perturbation in the hash, it sucks.

Given current division speeds, does it really make sense to use a
power-of-two bucket count?

Many years ago, I had to design a hash table for use on a machine with
integer remainder *very* slow compared to masking. I found that I got
slightly more collisions with a power of two size than a prime size, but
overall better lookup performance because of the remainder cost.

If integer remainder had been within a factor of 10 of masking the prime
bucket count would have won.

Patricia
From: Thomas Pornin on
According to Patricia Shanahan <pats(a)acm.org>:
> Given current division speeds, does it really make sense to use a
> power-of-two bucket count?

Integer division tends to be slow on modern hardware. The Intel
optimization manual which happens to conveniently reside on my harddisk
states that latency for a simple "idiv" opcode is between 56 and 70
clock cycles, while masking uses only a fraction of a clock cycle.
Division actually becomes slower over time; on a 80386, signed division
took 43 clock cycles while a bit masking operation used 2 clock cycles.
I do not have Intel manuals for newer processors (these should be
downloadable from Intel's web site), but I expect integer division to
remain slow on newer designs.

On some RISC architectures (e.g. Alpha), there is no integer division
opcode at all; division is performed in software and is even slower.


In Sun's JRE, Hashtable and HashMap differ in their strategy.
Hashtable uses a bucket count initially set to the provided value,
or 11 by default. If the count is n and must be increased, then the
new count is 2*n+1. Conversely, HashMap uses only powers of two for
its bucket count (if given an initial capacity, it rounds it up to
the next power of two).


--Thomas Pornin
From: Jon Harrop on
Tom Anderson wrote:
> As long as you only want to store doubles in it.

No, it applies to all untagged value types including byte, int, long, float,
double and any user defined value types like complex numbers or vertex
data.

Interestingly, this reminds me of a lecture one of the JVM designers gave,
describing how they felt it was preferable to build a VM upon dynamic
language principles. The JVM's awful performance here is a direct
consequence of inheriting the awful uniform representation approach taken
by most dynamic languages precisely because they lack useful static type
information like parametric polymorphism.

> I imagine that if you're using object keys, the CLR's advantage is rather
> smaller.

Yes.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
From: Marcin Rzeźnicki on
On 24 Lis, 20:17, Tom Anderson <t...(a)urchin.earth.li> wrote:
> On Sun, 22 Nov 2009, Marcin Rze?nicki wrote:
> > On 21 Lis, 20:44, Tom Anderson <t...(a)urchin.earth.li> wrote:
> >> On Sat, 21 Nov 2009, Marcin Rze?nicki wrote:
> >>> On 21 Lis, 19:33, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> >>>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
> >>>> Specifically, the following program is 32x slower than the equivalent
> >>>> on .NET:
>
> >>> You are using Hashtable instead of HashMap - probably the performance
> >>> loss you've observed is due to synchronization (though "fat"
> >>> synchronization might be optimized away in case of single thread you
> >>> still pay the price, though lower).
>
> >> I'd be *very* surprised if that was true. In this simple program,
> >> escape analysis could eliminate the locking entirely - and current
> >> versions of JDK 1.6 do escape analysis.
>
> > First of all, escape analysis is turned off by default.
>
> Curses! I knew it was experimental when it first came into 1.6, but i
> thought it was mature and on by default in the latest version.
>

I suppose that it is mature since they've made it available and I also
suppose that they found out that benefits of EA were not that great
for many classes of programs when compared to longer JITting - but
that's just my thinking.

> > The next thing is that there is subtle difference between synchronized
> > method and synchronized block. Hashtable has the former - escape
> > analysis does not help here very much afaik.
>
> Could you explain why? I don't see why they should be any different. You
> can rewrite:
>
> synchronized void foo() {
>         // ...
>
> }
>
> As:
>
> void foo() {
>         synchronized (this) {
>                 // ...
>         }
>
> }
>
> With exactly the same semantics, no?
>

Semantically yes but in the case of synchronized method there is no
accompanying bytecode instruction, namely monitorenter/monitorexit
(plus a 'finally' blocks added by compiler). When method is marked as
synchronized it ends up as the flag for VM internals to take care of
monitor. Of course, not all is lost, if method is inlined the EA
engine will see the pattern but if the method is not inlined then it,
as I think, will not help.



From: Tom Anderson on
On Tue, 24 Nov 2009, markspace wrote:

> Tom Anderson wrote:
>>
>>> long bits = Double.doubleToLongBits( key );
>>> int hash = (int)(bits ^ (bits >>> 32));
>>>
>>> provides terrible performance.
>>
>> Interesting. I chose that function because it's what java.lang.Double
>> does, rather than because i thought it'd be good, but i am surprised to
>> hear it's terrible - doubles are quite complicated internally, so would
>> have thought that a parade of natural numbers would give reasonably
>> well-distributed hashes this way (whereas longs wouldn't, of course).
>> How did you conclude it's terrible?
>
> Writing my own hash table implementation, I noticed that I was getting
> terrible performance with a ton of collisions and everything was heaped up in
> a tiny spot in the table.
>
> Inspecting the hash in hexadecimal, I realized that Jon's data keys --
> the natural counting numbers 1, 2, 3, etc. -- are represented in a
> double as a few bits in the upper most bits of the double. The lower
> bits are always 0, even after slicing the 64 bit double's bit pattern in
> half and xoring the two halves.
>
> This xoring results in regular hash bit patterns like:
>
> 0x20200000
> 0x40200000
> 0x40600000
> 0x60200000
>
> etc. as the numbers count up
> (bit patterns made up from memory, but you get the idea.)

Ahaaa, yes, of course. Those bits in full:

0.0 0000000000000000000000000000000000000000000000000000000000000000
1.0 0011111111110000000000000000000000000000000000000000000000000000
2.0 0100000000000000000000000000000000000000000000000000000000000000
3.0 0100000000001000000000000000000000000000000000000000000000000000
4.0 0100000000010000000000000000000000000000000000000000000000000000
5.0 0100000000010100000000000000000000000000000000000000000000000000
6.0 0100000000011000000000000000000000000000000000000000000000000000
7.0 0100000000011100000000000000000000000000000000000000000000000000
8.0 0100000000100000000000000000000000000000000000000000000000000000
9.0 0100000000100010000000000000000000000000000000000000000000000000
10.0 0100000000100100000000000000000000000000000000000000000000000000
11.0 0100000000100110000000000000000000000000000000000000000000000000
12.0 0100000000101000000000000000000000000000000000000000000000000000
13.0 0100000000101010000000000000000000000000000000000000000000000000
14.0 0100000000101100000000000000000000000000000000000000000000000000
15.0 0100000000101110000000000000000000000000000000000000000000000000
16.0 0100000000110000000000000000000000000000000000000000000000000000
17.0 0100000000110001000000000000000000000000000000000000000000000000
18.0 0100000000110010000000000000000000000000000000000000000000000000
19.0 0100000000110011000000000000000000000000000000000000000000000000
20.0 0100000000110100000000000000000000000000000000000000000000000000

I'd been thinking that there would be lots of mismatch between base 10 and
base 2 giving us long binary expansions of those numbers, but of course 1
is sort of a common ground in both bases! On the other hand, if we divide
those numbers by 10:

0.0 0000000000000000000000000000000000000000000000000000000000000000
0.1 0011111110111001100110011001100110011001100110011001100110011010
0.2 0011111111001001100110011001100110011001100110011001100110011010
0.3 0011111111010011001100110011001100110011001100110011001100110011
0.4 0011111111011001100110011001100110011001100110011001100110011010
0.5 0011111111100000000000000000000000000000000000000000000000000000
0.6 0011111111100011001100110011001100110011001100110011001100110011
0.7 0011111111100110011001100110011001100110011001100110011001100110
0.8 0011111111101001100110011001100110011001100110011001100110011010
0.9 0011111111101100110011001100110011001100110011001100110011001101
1.0 0011111111110000000000000000000000000000000000000000000000000000
1.1 0011111111110001100110011001100110011001100110011001100110011010
1.2 0011111111110011001100110011001100110011001100110011001100110011
1.3 0011111111110100110011001100110011001100110011001100110011001101
1.4 0011111111110110011001100110011001100110011001100110011001100110
1.5 0011111111111000000000000000000000000000000000000000000000000000
1.6 0011111111111001100110011001100110011001100110011001100110011010
1.7 0011111111111011001100110011001100110011001100110011001100110011
1.8 0011111111111100110011001100110011001100110011001100110011001101
1.9 0011111111111110011001100110011001100110011001100110011001100110
2.0 0100000000000000000000000000000000000000000000000000000000000000

Perhaps not radically better, but at least there are more ones around.

> i.e., hashes with very few bits different, and all in the upper most
> bits of the hash. This is exactly the opposite of what you want in a
> good hash, which is lots of randomness in the lower bits of the hash
> code.
>
> So I concluded: absent any other perturbation in the hash, it sucks.

Indeed. I suppose you could argue that the natural numbers represented as
doubles is a pathological case - if you have the naturals, why are you
using doubles? - but even so, it's not hard to come up with sequences
which have minimally different hashes:

1.0 0011111111110000000000000000000000000000000000000000000000000000
1.0000000000000002 0011111111110000000000000000000000000000000000000000000000000001
1.0000000000000004 0011111111110000000000000000000000000000000000000000000000000010
1.0000000000000007 0011111111110000000000000000000000000000000000000000000000000011
1.0000000000000009 0011111111110000000000000000000000000000000000000000000000000100
1.000000000000001 0011111111110000000000000000000000000000000000000000000000000101
1.0000000000000013 0011111111110000000000000000000000000000000000000000000000000110
1.0000000000000016 0011111111110000000000000000000000000000000000000000000000000111
1.0000000000000018 0011111111110000000000000000000000000000000000000000000000001000
1.000000000000002 0011111111110000000000000000000000000000000000000000000000001001
1.0000000000000022 0011111111110000000000000000000000000000000000000000000000001010
1.0000000000000024 0011111111110000000000000000000000000000000000000000000000001011
1.0000000000000027 0011111111110000000000000000000000000000000000000000000000001100
1.0000000000000029 0011111111110000000000000000000000000000000000000000000000001101
1.000000000000003 0011111111110000000000000000000000000000000000000000000000001110
1.0000000000000033 0011111111110000000000000000000000000000000000000000000000001111
1.0000000000000036 0011111111110000000000000000000000000000000000000000000000010000
1.0000000000000038 0011111111110000000000000000000000000000000000000000000000010001
1.000000000000004 0011111111110000000000000000000000000000000000000000000000010010
1.0000000000000042 0011111111110000000000000000000000000000000000000000000000010011
1.0000000000000044 0011111111110000000000000000000000000000000000000000000000010100

And, yes, for any hash function, there will be sets of values for which
the hashes are minimally different, or even the same. But they should be
harder to come up with than that!

This still leaves us with the question of what would constitute a good
hash. And perhaps a broader question: whose job is it to turn an object
into a well-distributed hashtable index? Should the object - every object
- undertake to provide a well-distributed hash (and i can't really define
what i mean like that, but it means something not like Double), or is it
up to the hashtable to take whatever hashes it gets and stir them up to
make good indices?

tom

--
A is for Absinthe, for which I now thirst
First  |  Prev  |  Next  |  Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: The future of Java
Next: weird issue with new lines