Prev: The future of Java
Next: weird issue with new lines
From: Tom Anderson on 22 Nov 2009 08:54 On Sat, 21 Nov 2009, Patricia Shanahan wrote: > Tom Anderson wrote: > ... >> So, 7.5% for synchronization, 17% for boxing - we're still a good way off >> this reported 32x! > ... > > In my experience, there are two main ways of getting a 32x performance > ratio for the same job: > > 1. Different algorithm. > > 2. Memory issues. > > In this case, I suspect possibly memory issues. If the .NET table is > more compact, because of using primitives, it might fit into a level in > Jon's computer's memory hierarchy that the Java Hashtable does not fit. > This sort of thing is configuration dependent, so the performance > difference might not be reproducible on a different computer, even using > the same programs. Ah, my machine is rather slow, and i didn't have the patience to wait for the full-size version to run, so i did significantly reduce the size of the table in my tests to 1000 entries, which will easily fit in L3 caches and have pretty good representation in higher ones Thus, any effects caused by data hugeness may be lost. tom -- Would you like to remember more?
From: markspace on 22 Nov 2009 11:49 Kevin McMurtrie wrote: > Yes. Creating custom hashing classes for primitives pays off if > performance needs to be very high. I've actually been playing around with "alternate ways of handling collisions" and I agree that hashing a double is hard. The default algorithm both Tom and I used: long bits = Double.doubleToLongBits( key ); int hash = (int)(bits ^ (bits >>> 32)); provides terrible performance. Java Map implementations provide some "hash mangling" that help poor hash algorithms, so if you roll your own Map class you'll have to provide a better hash function or provide your own mangle function.
From: markspace on 22 Nov 2009 12:10 markspace wrote: > Kevin McMurtrie wrote: > >> Yes. Creating custom hashing classes for primitives pays off if >> performance needs to be very high. > > > I've actually been playing around with "alternate ways of handling > collisions" and I agree that hashing a double is hard. The default > algorithm both Tom and I used: > > long bits = Double.doubleToLongBits( key ); > int hash = (int)(bits ^ (bits >>> 32)); > > provides terrible performance. Java Map implementations provide some > "hash mangling" that help poor hash algorithms, so if you roll your own > Map class you'll have to provide a better hash function or provide your > own mangle function. Note: found this site. Haven't read it all but it seems cool: <http://www.partow.net/programming/hashfunctions/>
From: Marcin Rzeźnicki on 22 Nov 2009 12:34 On 21 Lis, 20:44, Tom Anderson <t...(a)urchin.earth.li> wrote: > On Sat, 21 Nov 2009, Marcin Rze?nicki wrote: > > On 21 Lis, 19:33, Jon Harrop <j...(a)ffconsultancy.com> wrote: > >> I'm having trouble getting Java's hash tables to run as fast as .NET's.. > >> Specifically, the following program is 32x slower than the equivalent > >> on .NET: > > >> import java.util.Hashtable; > > >> public class Hashtbl { > >> public static void main(String args[]){ > >> Hashtable hashtable = new Hashtable(); > > >> for(int i=1; i<=10000000; ++i) { > >> double x = i; > >> hashtable.put(x, 1.0 / x); > >> } > > >> System.out.println("hashtable(100.0) = " + hashtable.get(100.0)); > >> } > >> } > > >> My guess is that this is because the JVM is boxing every floating point > >> number individually in the hash table due to type erasure whereas .NET > >> creates a specialized data structure specifically for a float->float hash > >> table with the floats unboxed. Consequently, the JVM is doing enormously > >> numbers of allocations whereas .NET is not. > > >> Is that correct? > > > You are using Hashtable instead of HashMap - probably the performance > > loss you've observed is due to synchronization (though "fat" > > synchronization might be optimized away in case of single thread you > > still pay the price, though lower). If you took a look at JavaDoc, you'd > > notice that HashTable methods are synchronized As of boxing, you are > > correct (though there is no type erasure in your example because you did > > not specify type parameters at all) but I suspect that these costs are > > not the most contributing factor to overall poor performance. I'd blame > > synchronization in the first place. > > I'd be *very* surprised if that was true. In this simple program, escape > analysis could eliminate the locking entirely - and current versions of > JDK 1.6 do escape analysis. First of all, escape analysis is turned off by default. The next thing is that there is subtle difference between synchronized method and synchronized block. Hashtable has the former - escape analysis does not help here very much afaik. So the only benefit would be gained from thin locks. Even if for some reason it didn't, you'd only > be using a thin lock here, which takes two x86 instructions and one memory > access for each lock and unlock operation, far less than the boxing or > unboxing. > Probably because it was off, and it wouldn't have eliminated locks anyway. I wonder - maybe something prevented taking the fast path locking? Boxing is fast as well, it is one simple monomorhic call resulting in creating one Double (which may be cached anyway) > I modified the test code to look like this (yes, with no warmup - this is > very quick and dirty): > > import java.util.Map; > import java.util.HashMap; > import java.util.Hashtable; > > public class HashPerf { > public static void main(String args[]) throws InterruptedException{ > for(int i=1; i<=100; ++i) { > long t0 = System.nanoTime(); > test(); > long t1 = System.nanoTime(); > long dt = t1 - t0; > System.out.println(dt); > System.gc(); > Thread.sleep(200); > } > } > private static void test(){ > Map<Double, Double> hashtable = new HashMap<Double, Double>(); > // Map<Double, Double> hashtable = new Hashtable<Double, Double>(); > for(int i=1; i<=1000; ++i) { > double x = i; > // synchronized (hashtable) { > hashtable.put(x, 1.0 / x); > // } > } > } > > } > > And then ran it with three variations on the comments: one as above, one > uncommenting the synchronization of the hashtable, and one switching the > HashMap to a Hashtable. I have java 1.5.0_19 on an elderly and ailing > PowerPC Mac laptop. I ran with -server and otherwise stock settings. > > The timings for each show the usual power curve distribution: 80% of the > measurements are no more than 50% longer than the fastest, and 90% are no > more than twice as long, with the last 10% being up to 10 times longer. If > we say that the slowest 10% are artifacts of warmup, GC, the machine doing > other things, etc, and ignore them, then the average times i got were > (with standard error of the mean, which is broadly like a ~60% confidence > limit IIRC): > > HashMap 933500 +/- 15006 > sync HashMap 1003200 +/- 16187 > Hashtable 868322 +/- 11602 > > That is, adding synchronization to the accesses adds a 7.5% overhead. > Although somehow, the old Hashtable comes out faster! > Interesting indeed - anyway, your synchronized block is more likely to be optimized away with EA turned on than synchronized method's lock on this, though I am not quite sure seeing your results :-) Well, I am more inclined to believe that his VM somehow did not perform lock optimization.
From: Marcin Rzeźnicki on 22 Nov 2009 12:35
On 21 Lis, 20:07, Arne Vajhøj <a...(a)vajhoej.dk> wrote: > Marcin Rzeźnicki wrote: > > On 21 Lis, 19:33, Jon Harrop <j...(a)ffconsultancy.com> wrote: > >> I'm having trouble getting Java's hash tables to run as fast as .NET's.. > >> Specifically, the following program is 32x slower than the equivalent > >> on .NET: > > >>  import java.util.Hashtable; > > >>  public class Hashtbl { > >>   public static void main(String args[]){ > >>    Hashtable hashtable = new Hashtable(); > > >>    for(int i=1; i<=10000000; ++i) { > >>     double x = i; > >>     hashtable.put(x, 1.0 / x); > >>    } > > >>    System.out.println("hashtable(100.0) = " + hashtable.get(100.0)); > >>   } > >>  } > > >> My guess is that this is because the JVM is boxing every floating point > >> number individually in the hash table due to type erasure whereas .NET > >> creates a specialized data structure specifically for a float->float hash > >> table with the floats unboxed. Consequently, the JVM is doing enormously > >> numbers of allocations whereas .NET is not. > > >> Is that correct? > > You are using Hashtable instead of HashMap - probably the performance > > loss you've observed is due to synchronization (though "fat" > > synchronization might be optimized away in case of single thread you > > still pay the price, though lower). If you took a look at JavaDoc, > > you'd notice that HashTable methods are synchronized As of boxing, you > > are correct (though there is no type erasure in your example because > > you did not specify type parameters at all) but I suspect that these > > costs are not the most contributing factor to overall poor > > performance. I'd blame synchronization in the first place. > > That does not match measurements. > > Arne Yeah, I see. Well, what VM did Jon use? That might be crucial to know what happened. |