Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: markspace on 23 Nov 2009 12:52

Jon Harrop wrote:

> That takes around 1s and the Java code takes around 32s here.

32 seconds for the Java code you posted is almost certainly an outlier.
I can run the original code you posed on a laptop with a 32 bit JVM in
13 seconds. Clearly, something is up with your environment.

From: Marcin Rzeźnicki on 23 Nov 2009 13:00

On 23 Lis, 18:51, Marcin Rze¼nicki <marcin.rzezni...(a)gmail.com> wrote:
>
> I profiled his example in net beans.
>
> That's my JVM
> C:\Users\Rze¼nik\Documents\java>java -version
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)
>
> Here is the code I used:
>
> package hashmapexample;
>
> import java.util.HashMap;
>
> /**
> *
> * @author Rze¼nik
> */
> public class Main {
>
> /**
> * @param args the command line arguments
> */
> public static void main(String[] args) {
> HashMap<Double, Double> hashtable = new HashMap<Double, Double>
> ();
> for (int i = 1; i <= 1000000; ++i) { /* changed upper bound to
> 1m - sorry no, patience */
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get
> (100.0));
> }
>
> }
>
> I used -Xms512m -Xmx512m to eliminate extensive collections.
>
> The results of profiling are as follows:
> 54.2% of time spent in java.util.HashMap.put(Object, Object) (1m
> invocations)
> of which:
> * * 19.5% in java.util.HashMap.addEntry(int, Object, Object, int)
> * * * * 11.1% in java.util.HashMap.resize(int) (17 invocations)
> <--- !!!
> * * * * 3.3% self-time
> * * * * 1.4% in java.util.HashMap$Entry.<init>(int, Object, Object,
> java.util.HashMap.Entry) <-- so the cost of allocating entries is
> negligible
> * * 8.1% in java.lang.Double.hashCode() <--- that's too much (?)
> ... rest of put omitted, circa 1%
>
> Now, the interesting part is
> 30.3% of time spent in java.lang.Double.valueOf(double) <--- that's
> boxing
> Furthermore, there were 2m + 1 calls to new Double meaning that no
> caching occurred.

Oh yes, conclusions:
Taking Jon's 32s of the execution time he could have saved around 3-4s
had he preallocated HashMap. He actually did that in his F# so this
modification alone might have caused F# version to run in, let's say,
28s. He, of course, could not eliminate boxing which might have taken
around 10s of his original execution time. So subtracting costs of
boxing from implied theoretical F# version's execution time we end up
with conclusion that F# should have executed in ~18s (which is
erroneous proceeder in itself because F# probably copies values from
stack). Roughly 1:2 in favor of F#.

From: Roedy Green on 23 Nov 2009 13:16

On Mon, 23 Nov 2009 16:21:25 +0000, Jon Harrop <jon(a)ffconsultancy.com>
wrote, quoted or indirectly quoted someone who said :

>I was considering the possibility of hand coding type-specialized hash
>tables in Java to work around this deficiency in generics on the JVM when I
>realised that it is actually impossible in the general case because the JVM
>lacks value types and, therefore, is incapable of expressing efficient hash
>tables.

Could you elaborate a bit on what value types would look like if they
existed in Java and how you would use them for efficient hash tables?
--
Roedy Green Canadian Mind Products
http://mindprod.com
Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.

From: Jon Harrop on 23 Nov 2009 14:30

Marcin Rzeźnicki wrote:
> On 23 Lis, 16:58, Jon Harrop <j...(a)ffconsultancy.com> wrote:
>> Patricia Shanahan wrote:
>> > My reasoning is that you never reuse a key, so every put call creates a
>> > new Entry instance.
>>
>> Note that an "Entry instance" is a value type on the CLR, something that
>> the JVM is incapable of expressing.
>>
>> > Creating a Double from a double is about as simple
>> > as object creation can be,
>>
>> Note that there is no object creation on the CLR and, indeed, I believe
>> that is precisely why it is so much faster.
>
> But, correct me if I am wrong, it involves copying a value type.

Correct.

> If it is so, then I am not sure why copying would be better than an object
> allocation

Copying a few bytes of data is a *lot* faster than allocating a few bytes of
data.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Jon Harrop on 23 Nov 2009 14:31

Patricia Shanahan wrote:
> What is the size of an F# float?

64 bits.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: The future of Java
Next: weird issue with new lines