Serious concurrency problems on fast systems [Java Programming]

Prev: DOWNLOAD FREE PALTALK LIVE VIDEO AND VOICE CHAT SOFTWARE
Next: Should -Xmx be a multiple of -Xms?

From: Arne Vajhøj on 6 Jun 2010 22:26

On 03-06-2010 23:09, Mike Schilling wrote:
> "Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message
> news:4c085e13$0$280$14726298(a)news.sunsite.dk...
>> PS: A quick glance in IDS indicates that it is locking on
>> mutexes not regular $ENQ/$DEQ that raises priority.
>
> That's a great book, but I must have given away my copy at least fifteen
> yeas ago.

I still have the VAX 3.3, VAX 5.2 and Alpha 1.5 versions on the shelf.

Arne

From: Kevin McMurtrie on 7 Jun 2010 02:25

In article <4c0c57a7$0$282$14726298(a)news.sunsite.dk>,
Arne Vajh�j <arne(a)vajhoej.dk> wrote:

> On 02-06-2010 01:45, Kevin McMurtrie wrote:
> > In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>,
> > Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote:
> >> I've been assisting in load testing some new high performance servers
> >> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
> >> suspending threads for time-slicing in very unfortunate locations. For
> >> example, a thread might suspend in Hashtable.get(Object) after a call to
> >> getProperty(String) on the system properties. It's a synchronized
> >> global so a few hundred threads might pile up until the lock holder
> >> resumes. Odds are that those hundreds of threads won't finish before
> >> another one stops to time slice again. The performance hit has a ton of
> >> hysteresis so the server doesn't recover until it has a lower load than
> >> before the backlog started.
> >>
> >> The brute force fix is of course to eliminate calls to shared
> >> synchronized objects. All of the easy stuff has been done. Some
> >> operations aren't well suited to simple CAS. Bottlenecks that are part
> >> of well established Java APIs are time consuming to fix/avoid.
> >>
> >> Is there JVM or Linux tuning that will change the behavior of thread
> >> time slicing or preemption? I checked the JDK 6 options page but didn't
> >> find anything that appears to be applicable.
> >
> > To clarify a bit, this isn't hammering a shared resource. I'm talking
> > about 100 to 800 synchronizations on a shared object per second for a
> > duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
> > cause a complete collapse of concurrency.
>
> But either it does or your entire problem analysis is wrong.
>
> > My older 4 core Mac Xenon can have 64 threads call getProperty(String)
> > on a shared Property instance 2 million times each in only 21 real
> > seconds. That's one call every 164 ns. It's not as good as
> > ConcurrentHashMap (one per 0.30 ns) but it's no collapse.
>
> That is a call per clock cycle.

HotSpot has some (benchmark-driven?) optimizations for this case. It's
hard to not hit them when using simple tests on String and
ConcurrentHashMap.

> ?!?!
>
> > Many of the basic Sun Java classes are synchronized.
>
> Practically only old ones that you should not be using anymore
> anyway.
>
> Arne

Properties is a biggie. A brute-force replacement of Properties caused
the system throughput to collapse to almost nothing in Spring's
ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
next test is to disable hyperthreading.
--
I won't see Google Groups replies because I must filter them as spam

From: Robert Klemme on 7 Jun 2010 12:44

On 07.06.2010 08:25, Kevin McMurtrie wrote:
> In article<4c0c57a7$0$282$14726298(a)news.sunsite.dk>,
> Arne Vajhøj<arne(a)vajhoej.dk> wrote:
>
>> On 02-06-2010 01:45, Kevin McMurtrie wrote:
>>> In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>,
>>> Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote:
>>>> I've been assisting in load testing some new high performance servers
>>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
>>>> suspending threads for time-slicing in very unfortunate locations. For
>>>> example, a thread might suspend in Hashtable.get(Object) after a call to
>>>> getProperty(String) on the system properties. It's a synchronized
>>>> global so a few hundred threads might pile up until the lock holder
>>>> resumes. Odds are that those hundreds of threads won't finish before
>>>> another one stops to time slice again. The performance hit has a ton of
>>>> hysteresis so the server doesn't recover until it has a lower load than
>>>> before the backlog started.
>>>>
>>>> The brute force fix is of course to eliminate calls to shared
>>>> synchronized objects. All of the easy stuff has been done. Some
>>>> operations aren't well suited to simple CAS. Bottlenecks that are part
>>>> of well established Java APIs are time consuming to fix/avoid.
>>>>
>>>> Is there JVM or Linux tuning that will change the behavior of thread
>>>> time slicing or preemption? I checked the JDK 6 options page but didn't
>>>> find anything that appears to be applicable.
>>>
>>> To clarify a bit, this isn't hammering a shared resource. I'm talking
>>> about 100 to 800 synchronizations on a shared object per second for a
>>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
>>> cause a complete collapse of concurrency.
>>
>> But either it does or your entire problem analysis is wrong.
>>
>>> My older 4 core Mac Xenon can have 64 threads call getProperty(String)
>>> on a shared Property instance 2 million times each in only 21 real
>>> seconds. That's one call every 164 ns. It's not as good as
>>> ConcurrentHashMap (one per 0.30 ns) but it's no collapse.
>>
>> That is a call per clock cycle.
>
> HotSpot has some (benchmark-driven?) optimizations for this case. It's
> hard to not hit them when using simple tests on String and
> ConcurrentHashMap.

What exactly do you mean by that? I can't seem to get rid of the
impression that you are doing the second step (micro optimization with
JVM internals in mind) before the first (proper design and implementation).

>> ?!?!
>>
>>> Many of the basic Sun Java classes are synchronized.
>>
>> Practically only old ones that you should not be using anymore
>> anyway.
>
> Properties is a biggie. A brute-force replacement of Properties caused
> the system throughput to collapse to almost nothing in Spring's
> ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
> next test is to disable hyperthreading.

As someone else (Lew?) pointed out it's a bad idea to always go to
System.properties. You should rather be evaluating them on startup and
initialize some other data structure - if only to not always repeat
checking of input values over and over again.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From: Lew on 7 Jun 2010 21:40

Kevin McMurtrie wrote:
>> Properties is a biggie. A brute-force replacement of Properties caused
>> the system throughput to collapse to almost nothing in Spring's
>> ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
>> next test is to disable hyperthreading.

Robert Klemme wrote:
> As someone else (Lew?) pointed out it's a bad idea to always go to
> System.properties. You should rather be evaluating them on startup and
> initialize some other data structure - if only to not always repeat
> checking of input values over and over again.

I worked on a big Java Enterprise project a while ago that had highly
concurrent deployment but made quite a number of concurrency mistakes that
hugely slowed it down.

Kevin's comments about clock cycles and all are somewhat beside the point.
There is a cascade effect once locks start undergoing contention. Aside from
the fact that the JVM optimizes lock acquisition in the uncontended case, once
a thread blocks on a monitor, all the other threads also trying to acquire
that monitor also block. As soon as one finally gobbles the lock, the rest
mill about waiting their turn while still more threads pile up on the monitor.
Sure, the critical section might only require a few hundred clock cycles,
but the threads can wait seconds, even minutes, as they jostle about the
revolving door trying to enter.

Up until threads start blocking, you can get quite good performance, but
heaven help you once contention gets heavy.

On that big project we proved this with various enterprise monitoring products
that reported on locks, wait times for locks and other performance issues.

Nothing beats immutable members for eliminating that contention. Right with
that is not to share data between threads in the first place.

We did three things on that project to improve concurrency: eliminated shared
data, made shared data immutable, and used 'java.util.concurrent' classes.

'ConcurrentHashMap', for example, with its multiple lock stripes, unlimbered
one major bottleneck on synchronized 'Map' access. (I fought for eliminating
the shared 'Map' entirely, but lost that battle. You can lead a horse to
water ...)

Stop hitting System.properties altogether, except for once at static class
initialization.

--
Lew

From: Kevin McMurtrie on 7 Jun 2010 23:39

In article <874m08Fib7U1(a)mid.individual.net>,
Robert Klemme <shortcutter(a)googlemail.com> wrote:

> On 07.06.2010 08:25, Kevin McMurtrie wrote:
> > In article<4c0c57a7$0$282$14726298(a)news.sunsite.dk>,
> > Arne Vajh�j<arne(a)vajhoej.dk> wrote:
> >
> >> On 02-06-2010 01:45, Kevin McMurtrie wrote:
> >>> In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>,
> >>> Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote:
> >>>> I've been assisting in load testing some new high performance servers
> >>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
> >>>> suspending threads for time-slicing in very unfortunate locations. For
> >>>> example, a thread might suspend in Hashtable.get(Object) after a call to
> >>>> getProperty(String) on the system properties. It's a synchronized
> >>>> global so a few hundred threads might pile up until the lock holder
> >>>> resumes. Odds are that those hundreds of threads won't finish before
> >>>> another one stops to time slice again. The performance hit has a ton of
> >>>> hysteresis so the server doesn't recover until it has a lower load than
> >>>> before the backlog started.
> >>>>
> >>>> The brute force fix is of course to eliminate calls to shared
> >>>> synchronized objects. All of the easy stuff has been done. Some
> >>>> operations aren't well suited to simple CAS. Bottlenecks that are part
> >>>> of well established Java APIs are time consuming to fix/avoid.
> >>>>
> >>>> Is there JVM or Linux tuning that will change the behavior of thread
> >>>> time slicing or preemption? I checked the JDK 6 options page but didn't
> >>>> find anything that appears to be applicable.
> >>>
> >>> To clarify a bit, this isn't hammering a shared resource. I'm talking
> >>> about 100 to 800 synchronizations on a shared object per second for a
> >>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
> >>> cause a complete collapse of concurrency.
> >>
> >> But either it does or your entire problem analysis is wrong.
> >>
> >>> My older 4 core Mac Xenon can have 64 threads call getProperty(String)
> >>> on a shared Property instance 2 million times each in only 21 real
> >>> seconds. That's one call every 164 ns. It's not as good as
> >>> ConcurrentHashMap (one per 0.30 ns) but it's no collapse.
> >>
> >> That is a call per clock cycle.
> >
> > HotSpot has some (benchmark-driven?) optimizations for this case. It's
> > hard to not hit them when using simple tests on String and
> > ConcurrentHashMap.
>
> What exactly do you mean by that? I can't seem to get rid of the
> impression that you are doing the second step (micro optimization with
> JVM internals in mind) before the first (proper design and implementation).
>
> >> ?!?!
> >>
> >>> Many of the basic Sun Java classes are synchronized.
> >>
> >> Practically only old ones that you should not be using anymore
> >> anyway.
> >
> > Properties is a biggie. A brute-force replacement of Properties caused
> > the system throughput to collapse to almost nothing in Spring's
> > ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
> > next test is to disable hyperthreading.
>
> As someone else (Lew?) pointed out it's a bad idea to always go to
> System.properties. You should rather be evaluating them on startup and
> initialize some other data structure - if only to not always repeat
> checking of input values over and over again.
>
> Cheers
>
> robert

The properties aren't immutable. The best feature of properties rather
than hard-coded values is being able to update them in an emergency
without server restarts. Anyways, that was fixed by overriding every
method in Properties with a high-concurrency implementation. Too bad
Properties isn't an interface.

Fixing every single shared synchronized method in every 3rd party
library could take a very, very long time.

Today's test had hyperthreading turned off. The performance drop-off
wasn't a fatal collapse like before but it was still bad. The backlog
came out of nowhere, cycled through several points of code, then went
away. It's starting to sound like a HotSpot problem. Argh. We left
Java 1.5.0_16 because of GC stalling. We left 1.5.0_21 because it
unrolled loops incorrectly. Java 1.6.0_17 optimized away monitorexit,
which is amusing but quite fatal. We're on 1.6.0_20 now so it may be
time to ask Sun/Oracle for help.
--
I won't see Google Groups replies because I must filter them as spam

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: DOWNLOAD FREE PALTALK LIVE VIDEO AND VOICE CHAT SOFTWARE
Next: Should -Xmx be a multiple of -Xms?