Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: Jon Harrop on 24 Nov 2009 01:13

Marcin Rzeźnicki wrote:
> In your program you have:
> let m = System.Collections.Generic.Dictionary(n)
>
> Maybe I am not reading this line correctly but where are generics used
> here?

F# has type inference so you don't specify the types in the source code. In
this case, it infers Dictionary<float, float> from its subsequent use.

>> > Which still has nothing to do
>> > with type erasure. Type erasure does not forbid compiler from making
>> > this kind of optimization which you've been talking about.
>>
>> The VM cannot type specialize data structures if the type information has
>> been erased.
>
> Yes, but if a value type were descendants of Object (as it is done in
> CLR) then type erasure would not prevent compiler from using this
> optimization.

Type erasure relies upon a uniform representation to work. Value types
cannot work with that. So you'd end up boxing your value types to get the
same uniform (reference) representation for all types.

> The cost of type erasure would lie only in casts that
> would have to be performed.

Casting a value type to Object will almost certainly incur an allocation.
That is the cost (and the subsequent collection, of course).

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Peter Duniho on 24 Nov 2009 04:33

Lew wrote:
> Jon Harrop wrote:
>> Copying a few bytes of data is a *lot* faster than allocating a few bytes of
>> data.
>>
>
> According to sun.com and Brian Goetz's articles on DeveloperWorks,
> object allocation in Java (discounting GC) is an O(1) operation
> involving 10-12 machine instructions.
> <http://www.ibm.com/developerworks/java/library/j-jtp09275.html>
> "The answer may surprise you -- allocation in modern JVMs is far
> faster than the best performing malloc implementations. The common
> code path for new Object() in HotSpot 1.4.2 and later is approximately
> 10 machine instructions (data provided by Sun; see Resources),"

Copying a 64-bit word from one memory location to another should be a
single instruction, or two at most.

So, an allocation is 5 to 10 times more work, not even counting that
after the allocation, the instance needs initialization.

All of the above analysis (from Jon's post onward, including my own) is
completely bogus anyway, because a lot of the overhead here isn't
directly related to the instruction count, but rather to speed of RAM
and/or cache, and what happens to be in the cache, and a lot of other
stuff that's unpredictable from the information available and which may
be amortized over a large number of instructions executed anyway.

But if you're looking just a numbers of instructions, I'd say even based
on the reference you've provided, copying a 64-bit word from one place
to another is a lot faster than allocating a few byte of data.

Pete

From: Peter Duniho on 24 Nov 2009 04:39

Jon Harrop wrote:
> [...]
> Also, some influential people on the JVM Languages mailing list were
> discussing what they perceived to be the main issues with the JVM today and
> they flagged lack of value types as a major issue. This benchmark seems to
> have verified their findings. [...]

I agree that lack of user-defined value types in Java is a significant
missing feature. But, I don't see how that directly leads to the
performance problem here. That is, you can still have an array of
"double" instead of "Double", and that array has value type semantics
for each element.

So if you _need_ to avoid the boxing overhead, you can. It's just more
work, because you can't leverage an existing collection class to do it.

(I'll also point out that your comments about generics might not be
subjected to so much nit-picking if your original code example had in
fact used a generic class. I realize that in the end, the performance
characteristics are going to be the same whether you use Hashtable in
its raw form or generic form, but we're all programmers here, and that
means a higher-than-normal proportion of nit-pickers :) ).

Pete

From: Marcin Rzeźnicki on 24 Nov 2009 07:50

On 24 Lis, 07:13, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> Marcin RzeÅºnicki wrote:
> > In your program you have:
> > let m = System.Collections.Generic.Dictionary(n)
>
> > Maybe I am not reading this line correctly but where are generics used
> > here?
>
> F# has type inference so you don't specify the types in the source code. In
> this case, it infers Dictionary<float, float> from its subsequent use.
>

I see, thx for the explanation.

> >> > Which still has nothing to do
> >> > with type erasure. Type erasure does not forbid compiler from making
> >> > this kind of optimization which you've been talking about.
>
> >> The VM cannot type specialize data structures if the type information has
> >> been erased.
>
> > Yes, but if a value type were descendants of Object (as it is done in
> > CLR) then type erasure would not prevent compiler from using this
> > optimization.
>
> Type erasure relies upon a uniform representation to work. Value types
> cannot work with that. So you'd end up boxing your value types to get the
> same uniform (reference) representation for all types.
>

??

> > The cost of type erasure would lie only in casts that
> > would have to be performed.
>
> Casting a value type to Object will almost certainly incur an allocation.
> That is the cost (and the subsequent collection, of course).
>

I'll have to check that because I am not sure whether what you said is
true. I can certainly imagine a system where allocation is not needed
in such case.

From: markspace on 24 Nov 2009 17:06

Tom Anderson wrote:
>
>> long bits = Double.doubleToLongBits( key );
>> int hash = (int)(bits ^ (bits >>> 32));
>>
>> provides terrible performance.
>
> Interesting. I chose that function because it's what java.lang.Double
> does, rather than because i thought it'd be good, but i am surprised to
> hear it's terrible - doubles are quite complicated internally, so would
> have thought that a parade of natural numbers would give reasonably
> well-distributed hashes this way (whereas longs wouldn't, of course).
> How did you conclude it's terrible?

Writing my own hash table implementation, I noticed that I was getting
terrible performance with a ton of collisions and everything was heaped
up in a tiny spot in the table.

Inspecting the hash in hexadecimal, I realized that Jon's data keys --
the natural counting numbers 1, 2, 3, etc. -- are represented in a
double as a few bits in the upper most bits of the double. The lower
bits are always 0, even after slicing the 64 bit double's bit pattern in
half and xoring the two halves.

This xoring results in regular hash bit patterns like:

0x20200000
0x40200000
0x40600000
0x60200000
etc. as the numbers count up
(bit patterns made up from memory, but you get the idea.)

i.e., hashes with very few bits different, and all in the upper most
bits of the hash. This is exactly the opposite of what you want in a
good hash, which is lots of randomness in the lower bits of the hash code.

So I concluded: absent any other perturbation in the hash, it sucks.

First | Prev | Next | Last
Pages: 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: The future of Java
Next: weird issue with new lines