Prev: DOWNLOAD FREE PALTALK LIVE VIDEO AND VOICE CHAT SOFTWARE
Next: Should -Xmx be a multiple of -Xms?
From: Arne Vajhøj on 6 Jun 2010 22:26 On 03-06-2010 23:09, Mike Schilling wrote: > "Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message > news:4c085e13$0$280$14726298(a)news.sunsite.dk... >> PS: A quick glance in IDS indicates that it is locking on >> mutexes not regular $ENQ/$DEQ that raises priority. > > That's a great book, but I must have given away my copy at least fifteen > yeas ago. I still have the VAX 3.3, VAX 5.2 and Alpha 1.5 versions on the shelf. Arne
From: Kevin McMurtrie on 7 Jun 2010 02:25 In article <4c0c57a7$0$282$14726298(a)news.sunsite.dk>, Arne Vajh�j <arne(a)vajhoej.dk> wrote: > On 02-06-2010 01:45, Kevin McMurtrie wrote: > > In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>, > > Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote: > >> I've been assisting in load testing some new high performance servers > >> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is > >> suspending threads for time-slicing in very unfortunate locations. For > >> example, a thread might suspend in Hashtable.get(Object) after a call to > >> getProperty(String) on the system properties. It's a synchronized > >> global so a few hundred threads might pile up until the lock holder > >> resumes. Odds are that those hundreds of threads won't finish before > >> another one stops to time slice again. The performance hit has a ton of > >> hysteresis so the server doesn't recover until it has a lower load than > >> before the backlog started. > >> > >> The brute force fix is of course to eliminate calls to shared > >> synchronized objects. All of the easy stuff has been done. Some > >> operations aren't well suited to simple CAS. Bottlenecks that are part > >> of well established Java APIs are time consuming to fix/avoid. > >> > >> Is there JVM or Linux tuning that will change the behavior of thread > >> time slicing or preemption? I checked the JDK 6 options page but didn't > >> find anything that appears to be applicable. > > > > To clarify a bit, this isn't hammering a shared resource. I'm talking > > about 100 to 800 synchronizations on a shared object per second for a > > duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't > > cause a complete collapse of concurrency. > > But either it does or your entire problem analysis is wrong. > > > My older 4 core Mac Xenon can have 64 threads call getProperty(String) > > on a shared Property instance 2 million times each in only 21 real > > seconds. That's one call every 164 ns. It's not as good as > > ConcurrentHashMap (one per 0.30 ns) but it's no collapse. > > That is a call per clock cycle. HotSpot has some (benchmark-driven?) optimizations for this case. It's hard to not hit them when using simple tests on String and ConcurrentHashMap. > ?!?! > > > Many of the basic Sun Java classes are synchronized. > > Practically only old ones that you should not be using anymore > anyway. > > Arne Properties is a biggie. A brute-force replacement of Properties caused the system throughput to collapse to almost nothing in Spring's ResourceBundleMessageSource. There's definitely a JVM/OS problem. The next test is to disable hyperthreading. -- I won't see Google Groups replies because I must filter them as spam
From: Robert Klemme on 7 Jun 2010 12:44 On 07.06.2010 08:25, Kevin McMurtrie wrote: > In article<4c0c57a7$0$282$14726298(a)news.sunsite.dk>, > Arne Vajhøj<arne(a)vajhoej.dk> wrote: > >> On 02-06-2010 01:45, Kevin McMurtrie wrote: >>> In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>, >>> Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote: >>>> I've been assisting in load testing some new high performance servers >>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is >>>> suspending threads for time-slicing in very unfortunate locations. For >>>> example, a thread might suspend in Hashtable.get(Object) after a call to >>>> getProperty(String) on the system properties. It's a synchronized >>>> global so a few hundred threads might pile up until the lock holder >>>> resumes. Odds are that those hundreds of threads won't finish before >>>> another one stops to time slice again. The performance hit has a ton of >>>> hysteresis so the server doesn't recover until it has a lower load than >>>> before the backlog started. >>>> >>>> The brute force fix is of course to eliminate calls to shared >>>> synchronized objects. All of the easy stuff has been done. Some >>>> operations aren't well suited to simple CAS. Bottlenecks that are part >>>> of well established Java APIs are time consuming to fix/avoid. >>>> >>>> Is there JVM or Linux tuning that will change the behavior of thread >>>> time slicing or preemption? I checked the JDK 6 options page but didn't >>>> find anything that appears to be applicable. >>> >>> To clarify a bit, this isn't hammering a shared resource. I'm talking >>> about 100 to 800 synchronizations on a shared object per second for a >>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't >>> cause a complete collapse of concurrency. >> >> But either it does or your entire problem analysis is wrong. >> >>> My older 4 core Mac Xenon can have 64 threads call getProperty(String) >>> on a shared Property instance 2 million times each in only 21 real >>> seconds. That's one call every 164 ns. It's not as good as >>> ConcurrentHashMap (one per 0.30 ns) but it's no collapse. >> >> That is a call per clock cycle. > > HotSpot has some (benchmark-driven?) optimizations for this case. It's > hard to not hit them when using simple tests on String and > ConcurrentHashMap. What exactly do you mean by that? I can't seem to get rid of the impression that you are doing the second step (micro optimization with JVM internals in mind) before the first (proper design and implementation). >> ?!?! >> >>> Many of the basic Sun Java classes are synchronized. >> >> Practically only old ones that you should not be using anymore >> anyway. > > Properties is a biggie. A brute-force replacement of Properties caused > the system throughput to collapse to almost nothing in Spring's > ResourceBundleMessageSource. There's definitely a JVM/OS problem. The > next test is to disable hyperthreading. As someone else (Lew?) pointed out it's a bad idea to always go to System.properties. You should rather be evaluating them on startup and initialize some other data structure - if only to not always repeat checking of input values over and over again. Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
From: Lew on 7 Jun 2010 21:40 Kevin McMurtrie wrote: >> Properties is a biggie. A brute-force replacement of Properties caused >> the system throughput to collapse to almost nothing in Spring's >> ResourceBundleMessageSource. There's definitely a JVM/OS problem. The >> next test is to disable hyperthreading. Robert Klemme wrote: > As someone else (Lew?) pointed out it's a bad idea to always go to > System.properties. You should rather be evaluating them on startup and > initialize some other data structure - if only to not always repeat > checking of input values over and over again. I worked on a big Java Enterprise project a while ago that had highly concurrent deployment but made quite a number of concurrency mistakes that hugely slowed it down. Kevin's comments about clock cycles and all are somewhat beside the point. There is a cascade effect once locks start undergoing contention. Aside from the fact that the JVM optimizes lock acquisition in the uncontended case, once a thread blocks on a monitor, all the other threads also trying to acquire that monitor also block. As soon as one finally gobbles the lock, the rest mill about waiting their turn while still more threads pile up on the monitor. Sure, the critical section might only require a few hundred clock cycles, but the threads can wait seconds, even minutes, as they jostle about the revolving door trying to enter. Up until threads start blocking, you can get quite good performance, but heaven help you once contention gets heavy. On that big project we proved this with various enterprise monitoring products that reported on locks, wait times for locks and other performance issues. Nothing beats immutable members for eliminating that contention. Right with that is not to share data between threads in the first place. We did three things on that project to improve concurrency: eliminated shared data, made shared data immutable, and used 'java.util.concurrent' classes. 'ConcurrentHashMap', for example, with its multiple lock stripes, unlimbered one major bottleneck on synchronized 'Map' access. (I fought for eliminating the shared 'Map' entirely, but lost that battle. You can lead a horse to water ...) Stop hitting System.properties altogether, except for once at static class initialization. -- Lew
From: Kevin McMurtrie on 7 Jun 2010 23:39
In article <874m08Fib7U1(a)mid.individual.net>, Robert Klemme <shortcutter(a)googlemail.com> wrote: > On 07.06.2010 08:25, Kevin McMurtrie wrote: > > In article<4c0c57a7$0$282$14726298(a)news.sunsite.dk>, > > Arne Vajh�j<arne(a)vajhoej.dk> wrote: > > > >> On 02-06-2010 01:45, Kevin McMurtrie wrote: > >>> In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>, > >>> Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote: > >>>> I've been assisting in load testing some new high performance servers > >>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is > >>>> suspending threads for time-slicing in very unfortunate locations. For > >>>> example, a thread might suspend in Hashtable.get(Object) after a call to > >>>> getProperty(String) on the system properties. It's a synchronized > >>>> global so a few hundred threads might pile up until the lock holder > >>>> resumes. Odds are that those hundreds of threads won't finish before > >>>> another one stops to time slice again. The performance hit has a ton of > >>>> hysteresis so the server doesn't recover until it has a lower load than > >>>> before the backlog started. > >>>> > >>>> The brute force fix is of course to eliminate calls to shared > >>>> synchronized objects. All of the easy stuff has been done. Some > >>>> operations aren't well suited to simple CAS. Bottlenecks that are part > >>>> of well established Java APIs are time consuming to fix/avoid. > >>>> > >>>> Is there JVM or Linux tuning that will change the behavior of thread > >>>> time slicing or preemption? I checked the JDK 6 options page but didn't > >>>> find anything that appears to be applicable. > >>> > >>> To clarify a bit, this isn't hammering a shared resource. I'm talking > >>> about 100 to 800 synchronizations on a shared object per second for a > >>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't > >>> cause a complete collapse of concurrency. > >> > >> But either it does or your entire problem analysis is wrong. > >> > >>> My older 4 core Mac Xenon can have 64 threads call getProperty(String) > >>> on a shared Property instance 2 million times each in only 21 real > >>> seconds. That's one call every 164 ns. It's not as good as > >>> ConcurrentHashMap (one per 0.30 ns) but it's no collapse. > >> > >> That is a call per clock cycle. > > > > HotSpot has some (benchmark-driven?) optimizations for this case. It's > > hard to not hit them when using simple tests on String and > > ConcurrentHashMap. > > What exactly do you mean by that? I can't seem to get rid of the > impression that you are doing the second step (micro optimization with > JVM internals in mind) before the first (proper design and implementation). > > >> ?!?! > >> > >>> Many of the basic Sun Java classes are synchronized. > >> > >> Practically only old ones that you should not be using anymore > >> anyway. > > > > Properties is a biggie. A brute-force replacement of Properties caused > > the system throughput to collapse to almost nothing in Spring's > > ResourceBundleMessageSource. There's definitely a JVM/OS problem. The > > next test is to disable hyperthreading. > > As someone else (Lew?) pointed out it's a bad idea to always go to > System.properties. You should rather be evaluating them on startup and > initialize some other data structure - if only to not always repeat > checking of input values over and over again. > > Cheers > > robert The properties aren't immutable. The best feature of properties rather than hard-coded values is being able to update them in an emergency without server restarts. Anyways, that was fixed by overriding every method in Properties with a high-concurrency implementation. Too bad Properties isn't an interface. Fixing every single shared synchronized method in every 3rd party library could take a very, very long time. Today's test had hyperthreading turned off. The performance drop-off wasn't a fatal collapse like before but it was still bad. The backlog came out of nowhere, cycled through several points of code, then went away. It's starting to sound like a HotSpot problem. Argh. We left Java 1.5.0_16 because of GC stalling. We left 1.5.0_21 because it unrolled loops incorrectly. Java 1.6.0_17 optimized away monitorexit, which is amusing but quite fatal. We're on 1.6.0_20 now so it may be time to ask Sun/Oracle for help. -- I won't see Google Groups replies because I must filter them as spam |