Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: Tom Anderson on 21 Nov 2009 16:59

On Sat, 21 Nov 2009, Peter Duniho wrote:

> Tom Anderson wrote:
>> On Sat, 21 Nov 2009, Arne Vajh?j wrote:
>>
>>> Tom Anderson wrote:
>>>> Could you try changing the put line to:
>>>>
>>>> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>>>>
>>>> And making the corresponding change in the C# or whatever version, and
>>>> making the comparison again? That eliminates boxing in java, so if the
>>>> difference is due to boxing, it will be significantly reduced, which will
>>>> give you some clues as to what's going on.
>>>
>>> I would still consider it boxing - just to String instead of Double.
>>
>> I take the point that it's still creating an object. But it is simply,
>> undeniably, not boxing.
>>
>> Moreover, the point is that i was suggesting adding the *same* object
>> creation to both the java and .NET versions, which would eliminate any
>> differences in language-level boxing behaviour.
>
> But you can force boxing in .NET just by using Object as the type for the
> collection class type parameters instead of Double. So wouldn't a better
> test be to do that?

Yes, absolutely. It just didn't occur to me to do that!

tom

--
NO REAL THAN YOU ARE -- Ego Leonard, The Zandvoort Man

From: Tom Anderson on 21 Nov 2009 22:21

On Sat, 21 Nov 2009, markspace wrote:

> Jon Harrop wrote:
>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>
> 3. Autoboxing appears to have an impact but it's less than either #1 or #2.
>
> Lastly, removing the autoboxing by writing a specialized class saved
> about 0.7 second from just using HashMap. The specialized version time
> was 5.3 seconds. Note that I allocated a new object myself -- the
> "Entry" for the HashMap -- each time a new double is added. I suspect
> that any reasonable C# hash object must do the same, so you're not
> losing as much time as you think by Java's auto object creation.

Hang on, if i've read your code right, then what you've done is replaced
automatic boxing with manual boxing. The point is that in .NET, there is
*no* boxing in this case - doubles are stored unwrapped in the map.
Although in your code, you put both doubles in a single box, so if boxing
is a slowdown, your code should roughly halve it.

I took the liberty of converting java's HashMap to work directly with
primitive doubles:

http://urchin.earth.li/~twic/Code/DoubleMap.java

I haven't tested that this implementation is correct, but on the benchmark
i posted a little earlier, it comes out 17% faster than a HashMap with
boxing.

So, 7.5% for synchronization, 17% for boxing - we're still a good way off
this reported 32x!

tom

> <code>
> package cljp;
>
> import java.util.HashMap;
> import java.util.Hashtable;
> import static java.lang.System.out;
>
> public class HashTest
> {
>
> public static void main( String[] args )
> {
> Hashtbl.test();
> HashM.test();
> HashTest.test();
> }
>
> HashMap<Entry,Entry> hash = new HashMap<Entry,Entry>();
>
> private static class Entry {
> final double key;
> final double value;
>
> public Entry( double key, double value )
> {
> this.key = key;
> this.value = value;
> }
>
> @Override
> public int hashCode()
> {
> long bits = Double.doubleToLongBits( key );
> bits ^= bits >>> 32;
> return (int)bits;
> }
>
> @Override
> public boolean equals( Object obj )
> {
> if( !(obj instanceof Entry ) ){
> return false;
> }
> return key == ((Entry)obj).key;
> }
> }
>
> public void put( double key, double value ) {
> Entry e = new Entry(key, value );
> hash.put( e, e );
> }
>
> public double get( double key ) {
> Entry e = new Entry( key, 0.0 );
> Entry valueEntry = hash.get( e );
> if( valueEntry != null ) {
> return valueEntry.value;
> } else {
> throw new IllegalArgumentException("Not found: "+key);
> }
> }
> public static void test()
> {
> long start = System.nanoTime();
> HashTest hashTest = new HashTest();
> for( int i = 1; i <= 10000000; ++i ) {
> double x = i;
> hashTest.put( x, 1.0 / x );
> }
> long end = System.nanoTime();
> out.println( "HashTest time: "+ (end-start)/1000000 );
> out.println( "hashtable(100.0) = " +
> hashTest.get( 100.0 ) );
> }
>
> }
> class HashM
> {
>
> public static void test()
> {
> long start = System.nanoTime();
> HashMap<Double,Double> hashM = new HashMap<Double, Double>();
> for( int i = 1; i <= 10000000; ++i ) {
> double x = i;
> hashM.put( x, 1.0 / x );
> }
> long end = System.nanoTime();
> out.println( "HashMap time: "+ (end-start)/1000000 );
> out.println( "hashtable(100.0) = " +
> hashM.get( 100.0 ) );
> }
> }
> class Hashtbl
> {
>
> public static void test()
> {
> long start = System.nanoTime();
> Hashtable hashtable = new Hashtable();
> for( int i = 1; i <= 10000000; ++i ) {
> double x = i;
> hashtable.put( x, 1.0 / x );
> }
> long end = System.nanoTime();
> out.println( "HashTable time: "+ (end-start)/1000000 );
> out.println( "hashtable(100.0) = " +
> hashtable.get( 100.0 ) );
> }
> }
> </code>
>

--
NO REAL THAN YOU ARE -- Ego Leonard, The Zandvoort Man

From: Patricia Shanahan on 21 Nov 2009 22:41

Tom Anderson wrote:
....
> So, 7.5% for synchronization, 17% for boxing - we're still a good way
> off this reported 32x!
....

In my experience, there are two main ways of getting a 32x performance
ratio for the same job:

1. Different algorithm.

2. Memory issues.

In this case, I suspect possibly memory issues. If the .NET table is
more compact, because of using primitives, it might fit into a level in
Jon's computer's memory hierarchy that the Java Hashtable does not fit.
This sort of thing is configuration dependent, so the performance
difference might not be reproducible on a different computer, even using
the same programs.

Patricia

From: markspace on 22 Nov 2009 01:53

Tom Anderson wrote:

>
> Hang on, if i've read your code right, then what you've done is replaced
> automatic boxing with manual boxing. The point is that in .NET, there is

Yeah, I don't make any effort to conserve "Entries" that are already
made. I don't think it would matter. Jon's test makes new doubles, it
never repeats a value.

All I'm really trying to do is provide a close approximation to what I
think C# might be doing. So far the result doesn't seem to speed up
running time much, so I'm not sure of the value of perusing this
further. Jon's 32x number seems bogus. I think there's some
environmental or JVM issue for him.

From: Kevin McMurtrie on 22 Nov 2009 03:55

In article <T9GdnWzZ_sPfvJXWnZ2dnUVZ7vadnZ2d(a)brightview.co.uk>,
Jon Harrop <jon(a)ffconsultancy.com> wrote:

> I'm having trouble getting Java's hash tables to run as fast as .NET's.
> Specifically, the following program is 32x slower than the equivalent
> on .NET:
>
> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }
>
> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?

Yes. Creating custom hashing classes for primitives pays off if
performance needs to be very high. There are also alternative ways of
handling collisions that will eliminate common memory allocations in
put(). It's unfortunate that creating these classes is such a manual
process.
--
I won't see Goolge Groups replies because I must filter them as spam

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: The future of Java
Next: weird issue with new lines