Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: Jon Harrop on 21 Nov 2009 14:47

Arne Vajhøj wrote:
> And why do you ask when it is so easy to test and verify?

In case there is faster alternative that I'm not aware of.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Marcin Rzeźnicki on 21 Nov 2009 13:47

On 21 Lis, 19:33, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> I'm having trouble getting Java's hash tables to run as fast as .NET's.
> Specifically, the following program is 32x slower than the equivalent
> on .NET:
>
> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }
>
> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?
>

Hi Jon
You are using Hashtable instead of HashMap - probably the performance
loss you've observed is due to synchronization (though "fat"
synchronization might be optimized away in case of single thread you
still pay the price, though lower). If you took a look at JavaDoc,
you'd notice that HashTable methods are synchronized As of boxing, you
are correct (though there is no type erasure in your example because
you did not specify type parameters at all) but I suspect that these
costs are not the most contributing factor to overall poor
performance. I'd blame synchronization in the first place.

From: Patricia Shanahan on 21 Nov 2009 14:00

Jon Harrop wrote:
> I'm having trouble getting Java's hash tables to run as fast as .NET's.
> Specifically, the following program is 32x slower than the equivalent
> on .NET:
>
> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }
>
> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?
>

I think there has to be something more to it than just the autoboxing.

My reasoning is that you never reuse a key, so every put call creates a
new Entry instance. Creating a Double from a double is about as simple
as object creation can be, so I don't see how the boxing could to more
than triple the time spent in object creation during an average put
call. That cannot, by itself, account for a 32x performance ratio.

Patricia

From: Peter Duniho on 21 Nov 2009 14:02

Arne Vajh�j wrote:
> Tom Anderson wrote:
>> Could you try changing the put line to:
>>
>> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>>
>> And making the corresponding change in the C# or whatever version, and
>> making the comparison again? That eliminates boxing in java, so if the
>> difference is due to boxing, it will be significantly reduced, which
>> will give you some clues as to what's going on.
>
> I would still consider it boxing - just to String instead of Double.

Right. In fact, because the conversion to a string is costlier, it
could obscure the real performance costs inherent in the hash data
structure.

One could do a similar "even the odds" change simply by forcing boxing
in .NET, by using "Object" as the type in the .NET data structure
instead of "Double". That way both versions are boxing, but in the most
efficient way available for the platform.

Pete

From: Arne Vajhøj on 21 Nov 2009 14:06

Patricia Shanahan wrote:
> Jon Harrop wrote:
>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
>> Specifically, the following program is 32x slower than the equivalent
>> on .NET:
>>
>> import java.util.Hashtable;
>> public class Hashtbl {
>> public static void main(String args[]){
>> Hashtable hashtable = new Hashtable();
>> for(int i=1; i<=10000000; ++i) {
>> double x = i;
>> hashtable.put(x, 1.0 / x);
>> }
>> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
>> }
>> }
>>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>>
>> Is that correct?
>
> I think there has to be something more to it than just the autoboxing.
>
> My reasoning is that you never reuse a key, so every put call creates a
> new Entry instance. Creating a Double from a double is about as simple
> as object creation can be, so I don't see how the boxing could to more
> than triple the time spent in object creation during an average put
> call. That cannot, by itself, account for a 32x performance ratio.

I think it can.

I did some test on my PC.

Java both HashMap and Hashtable are about 8 times slower
than .NET Dictionary<double,double>, but both .NET
Dictionary<object,object> and Hashtable are about the
same speed as Java.

Arne

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: The future of Java
Next: weird issue with new lines