Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: Jon Harrop on 21 Nov 2009 13:33

I'm having trouble getting Java's hash tables to run as fast as .NET's.
Specifically, the following program is 32x slower than the equivalent
on .NET:

import java.util.Hashtable;

public class Hashtbl {
public static void main(String args[]){
Hashtable hashtable = new Hashtable();

for(int i=1; i<=10000000; ++i) {
double x = i;
hashtable.put(x, 1.0 / x);
}

System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
}
}

My guess is that this is because the JVM is boxing every floating point
number individually in the hash table due to type erasure whereas .NET
creates a specialized data structure specifically for a float->float hash
table with the floats unboxed. Consequently, the JVM is doing enormously
numbers of allocations whereas .NET is not.

Is that correct?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Tom Anderson on 21 Nov 2009 12:43

On Sat, 21 Nov 2009, Jon Harrop wrote:

> I'm having trouble getting Java's hash tables

I note you're using java.util.Hashtable. That's an ancient class which has
a few things wrong with it: i'd strongly suggest using java.util.HashMap
instead. I wouldn't expect it to be significantly faster in this test,
though.

> to run as fast as .NET's. [...] My guess is that this is because the JVM
> is boxing every floating point number individually in the hash table due
> to type erasure whereas .NET creates a specialized data structure
> specifically for a float->float hash table with the floats unboxed.
> Consequently, the JVM is doing enormously numbers of allocations whereas
> .NET is not.
>
> Is that correct?

I can't comment on what the CLR is doing in a comparable situation, but
that is certainly a reasonable description of what java is doing.

Could you try changing the put line to:

hashtable.put(Double.toString(x), Double.toString(1.0 / x));

And making the corresponding change in the C# or whatever version, and
making the comparison again? That eliminates boxing in java, so if the
difference is due to boxing, it will be significantly reduced, which will
give you some clues as to what's going on.

If you need optimal performance for a double->double map in java, you
would need to write one specifically for that case. Or rather, someone
would: you could have a google to see if someone already has. Also, since
converting doubles to longs is cheap and reversible, you could look for a
long->long hashmap, which strikes me as more likely to exist. I didn't
find one in two minutes of googling, but i did find some long->Object
hashmaps, which get you halfway there.

tom

--
Oh, well of course *everything* looks bad if you remember it

From: John B. Matthews on 21 Nov 2009 12:54

In article <T9GdnWzZ_sPfvJXWnZ2dnUVZ7vadnZ2d(a)brightview.co.uk>,
Jon Harrop <jon(a)ffconsultancy.com> wrote:

> I'm having trouble getting Java's hash tables to run as fast as .NET's.
> Specifically, the following program is 32x slower than the equivalent
> on .NET:
>
> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }
>
> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?

In addition to using HashMap, you might also specify an appropriate
initialCapacity and loadFactor. You may be seeing a lot of rehash
operations.

<http://java.sun.com/javase/6/docs/api/java/util/HashMap.html>

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

From: Arne Vajhøj on 21 Nov 2009 13:17

Jon Harrop wrote:
> I'm having trouble getting Java's hash tables to run as fast as .NET's.
> Specifically, the following program is 32x slower than the equivalent
> on .NET:
>
> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }
>
> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?

Yes.

And why do you ask when it is so easy to test and verify?

Arne

From: Arne Vajhøj on 21 Nov 2009 13:19

Tom Anderson wrote:
> Could you try changing the put line to:
>
> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>
> And making the corresponding change in the C# or whatever version, and
> making the comparison again? That eliminates boxing in java, so if the
> difference is due to boxing, it will be significantly reduced, which
> will give you some clues as to what's going on.

I would still consider it boxing - just to String instead of Double.

Arne

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: The future of Java
Next: weird issue with new lines