Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: Marcin Rzeźnicki on 23 Nov 2009 11:51

On 23 Lis, 17:11, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> Lew wrote:
> > Jon Harrop wrote:
>
> Type erasure forces all values to be cast to Object which forces value types
> (like double) to be boxed into objects which is an allocation. The JVM
> boxes all 20,000,000 of the doubles generated by this program whereas the
> CLR boxes none of them. That is the "orders of magnitude" difference I was
> referring to.
>

That's not true Jon. Type erasure does not force anything in your
example - you even did not use generics at all, so how can it affect
anything? The problem here is that Java lacks value types which is
orthognal to generics implementation

> > It simply means that a parametrized type is treated as an 'Object' by the
> > JVM.
> > It affects the number of casts, which may be affected by the HotSpot
> > optimizer.
>
> Casting a primitive type to Object incurs an allocation.
>

Not necessarily - Java specification permits caching instances of
boxed numerics.

> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances,
> > you have 1001 allocations plus however many allocation/copy operations are
> > necessary to grow the 'Set' to hold 1000 references (six with a typical
> > default initial size, zero if you made it big enough to hold 1000). Type
> > erasure has nothing to do with it - you're still creating only 'Foo'
> > objects and enough 'Set' slots to refer to them.
>
> You are assuming that Foo is an object which is not true in general and is
> not true in this case.
>

It is true in general, furthermore it is always true in F# or any
language implemented on top of CLR where there is no notion of
"primitive type". The thing we should discuss after filtering half-
truths out is whether difference between value types and reference
types might cause 32x performance degradation

From: Marcin Rzeźnicki on 23 Nov 2009 11:57

On 23 Lis, 17:00, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> Lew wrote:
> > Tom Anderson wrote:
> >>> I'd be *very* surprised if that was true. In this simple program, escape
> >>> analysis could eliminate the locking entirely - and current versions of
> >>> JDK 1.6 do escape analysis.
>
> > The OP is not using a current version of Java 6.
> > Jon Harrop wrote:
> >>> $ java -version
> >>> java version "1.6.0_13"
> >>> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> >>> Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)
>
> > According to
> > <http://java.sun.com/javase/6/webnotes/6u14.html>
> >> Optionally available are two new features -
> >> escape analysis and compressed object pointers.
>
> > Which implies strongly that escape analysis, being "new" in 6u14, was not
> > available in 6u13. Â Even then, as Marcin RzeÅºnicki wrote:
> >> ... Â escape analysis is turned off by default.
> > ...
> >> Well, I am more inclined to believe that his VM somehow did not
> >> perform lock optimization.
>
> Am I correct in thinking that escape analysis might result in unboxing of
> local values in functions but it will never result in unboxed data
> structures on the heap? For example, it cannot unbox the keys and values in
> a hash table?

Neither holds because Java makes distinction between primitives and
reference types. If a primitive value is boxed than it means that it
is to be used in context where reference type is expected, if a
reference type is unboxed to a primitive then it means that primitive
is expected. No optimization mechanism can freely mix them because
that would result in non-verifiable code which would be rejected by
Java code verifier, thus rendered incorrect.

From: Marcin Rzeźnicki on 23 Nov 2009 11:59

On 23 Lis, 16:58, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> Patricia Shanahan wrote:
> > My reasoning is that you never reuse a key, so every put call creates a
> > new Entry instance.
>
> Note that an "Entry instance" is a value type on the CLR, something that the
> JVM is incapable of expressing.
>
> > Creating a Double from a double is about as simple
> > as object creation can be,
>
> Note that there is no object creation on the CLR and, indeed, I believe that
> is precisely why it is so much faster.
>

But, correct me if I am wrong, it involves copying a value type. If it
is so, then I am not sure why copying would be better than an object
allocation

From: Patricia Shanahan on 23 Nov 2009 12:01

Marcin Rzeźnicki wrote:
....
> It is true in general, furthermore it is always true in F# or any
> language implemented on top of CLR where there is no notion of
> "primitive type". The thing we should discuss after filtering half-
> truths out is whether difference between value types and reference
> types might cause 32x performance degradation
....

I would back off even further to something like "What are the probable
causes of Jon's observations?".

Patricia

From: Marcin Rzeźnicki on 23 Nov 2009 12:51

On 23 Lis, 18:01, Patricia Shanahan <p...(a)acm.org> wrote:
> Marcin RzeÅºnicki wrote:
>
> ...> It is true in general, furthermore it is always true in F# or any
> > language implemented on top of CLR where there is no notion of
> > "primitive type". The thing we should discuss after filtering half-
> > truths out is whether difference between value types and reference
> > types might cause 32x performance degradation
>
> ...
>
> I would back off even further to something like "What are the probable
> causes of Jon's observations?".
>
> Patricia

I profiled his example in net beans.

That's my JVM
C:\Users\RzeÅºnik\Documents\java>java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)

Here is the code I used:

package hashmapexample;

import java.util.HashMap;

/**
*
* @author RzeÅºnik
*/
public class Main {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
HashMap<Double, Double> hashtable = new HashMap<Double, Double>
();
for (int i = 1; i <= 1000000; ++i) { /* changed upper bound to
1m - sorry no, patience */
double x = i;
hashtable.put(x, 1.0 / x);
}

System.out.println("hashtable(100.0) = " + hashtable.get
(100.0));
}
}

I used -Xms512m -Xmx512m to eliminate extensive collections.

The results of profiling are as follows:
54.2% of time spent in java.util.HashMap.put(Object, Object) (1m
invocations)
of which:
* * 19.5% in java.util.HashMap.addEntry(int, Object, Object, int)
* * * * 11.1% in java.util.HashMap.resize(int) (17 invocations)
<--- !!!
* * * * 3.3% self-time
* * * * 1.4% in java.util.HashMap$Entry.<init>(int, Object, Object,
java.util.HashMap.Entry) <-- so the cost of allocating entries is
negligible
* * 8.1% in java.lang.Double.hashCode() <--- that's too much (?)
.... rest of put omitted, circa 1%

Now, the interesting part is
30.3% of time spent in java.lang.Double.valueOf(double) <--- that's
boxing
Furthermore, there were 2m + 1 calls to new Double meaning that no
caching occurred.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: The future of Java
Next: weird issue with new lines