Hash table performance [Java Programming]

Prev: The future of Java
Next: weird issue with new lines

From: markspace on 23 Nov 2009 15:34

Jon Harrop wrote:
> markspace wrote:
>> Marcin Rzeźnicki wrote:
>>> That could well be hidden in GC/heap resizing costs if he did not
>>> allocate Java heap properly. I prevented these effects mostly from
>>> occurring by running this example with -Xms512m -Xmx512m.
>> I ran my test (18 seconds for Jon's code on a 32 bit laptop)
>
> This is a 2x quadcore 2.0GHz Intel Xeon E5405. What is you CPU?

Intel Core Duo T2600 (2.16 GHz).

2 Gigs of main memory installed. 667MHz - 2 DIMM Slots (2 x 1GB).

>
> $ time java Hashtbl

This obviously includes the start-up time of the JVM. That's not the
runtime of your algorithm.

Try this:

class HashM
{

public static void main( String... args )
{
long start = System.nanoTime();
HashMap<Double,Double> hashM = new HashMap<Double, Double>();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashM.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashMap time: "+ (end-start)/1000000 );
out.println( "hashmap(100.0) = " +
hashM.get( 100.0 ) );
}
}

I'm also curious: what runs the F# runtime under Unix? Which version of
Unix are you using? Are you dual booting or running Cygwin?

I'm still suspecting that it's differences in the environment (drivers
under Unix/Cygwin vs. Microsoft) that make the most difference in your
observed times.

From: markspace on 23 Nov 2009 15:41

Jon Harrop wrote:
> Jon Harrop wrote:
>> Doesn't seem to make any difference:
>
> Spoke too soon. This is 4x faster than before:
>
> $ time java -Xms2000m -Xmx2000m Hashtbl
> hashtable(100.0) = 0.01
>
> real 0m8.198s

This number here is close to the real CPU run times I'm seeing using
HashMap, and about half the time I see using your Hashtbl benchmark. I
think we're closer to running with equivalent environments now. I'm
going to guess that your 64 bit JVM also needs a bit more memory than
mine (-Xmx800m) to prevent or at least reduce garbage collection.

From: John B. Matthews on 23 Nov 2009 15:45

In article
<f97396c8-46ac-4c38-a790-0e09b4d17fc3(a)g31g2000vbr.googlegroups.com>,
Marcin Rze�nicki <marcin.rzeznicki(a)gmail.com> wrote:

> Could you please take a look at my benchmark (it states that slowdown
> should be ~30%) and, if you have access to a machine with NetBeans
> (or any Java profiler), and perform similar profiling on your own? I
> am suspecting there is something more than than meets the eye.

As you observed above, Jon needs to specify the initialCapacity of his
Java data structure, just as he does in F# and I do in Ada. The
difference changes the Java profile percentages significantly. A larger
initial heap is helpful, too:

<http://sites.google.com/site/trashgod/hashmap>

As others have observed, something else is awry.

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

From: Roedy Green on 23 Nov 2009 21:15

On Mon, 23 Nov 2009 21:07:48 +0000, Jon Harrop <jon(a)ffconsultancy.com>
wrote, quoted or indirectly quoted someone who said :

>
>Spoke too soon. This is 4x faster than before:

Also keep in mind the way the JVM works. It runs code in interpretive
mode for a while, the takes time out to compile it to native machine
code, then picks up where it left off. You want to discount that
initial period.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I mean the word proof not in the sense of the lawyers, who set two half proofs equal to a whole one, but in the sense of a mathematician, where half proof = 0, and it is demanded for proof that every doubt becomes impossible.
~ Carl Friedrich Gauss

From: Roedy Green on 23 Nov 2009 21:43

On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <jon(a)ffconsultancy.com>
wrote, quoted or indirectly quoted someone who said :

> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }

Some more datapoints:

java Hashtbl --> out of memory error

java -Xmx1000m Hashtbl (hashtable(100.0) = 0.01) --> 29 secs

Hashtbl.exe(jet statically compiled, hashtable(100.0) = null)--> 5 sec
(Why did this code fail? I have reported the bug to jet).

machine is Athlon 64 X2 3800+, 2GHz. 3 gig ram.

Since you did not provide initial size constraints, the table will
have to be recopied over and over to double the buffer size.

Most of us now have dual core machines. Perhaps with such large
datasets you could split the work for two cpus to work simultaneously,
e.g. two HashSets, one for small numbers and one for big ones.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I mean the word proof not in the sense of the lawyers, who set two half proofs equal to a whole one, but in the sense of a mathematician, where half proof = 0, and it is demanded for proof that every doubt becomes impossible.
~ Carl Friedrich Gauss

First | Prev | Next | Last
Pages: 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: The future of Java
Next: weird issue with new lines