From: Kevin McMurtrie on
In article <l6udnc0kra5W-pvRnZ2dnUVZ_qadnZ2d(a)earthlink.com>,
Patricia Shanahan <pats(a)acm.org> wrote:

> Kevin McMurtrie wrote:
> ...
> > To clarify a bit, this isn't hammering a shared resource. I'm talking
> > about 100 to 800 synchronizations on a shared object per second for a
> > duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
> > cause a complete collapse of concurrency.
> >
> ...
>
> Have you considered other possibilities, such as memory thrashing? The
> resource does not seem heavily enough used for contention to be a big
> issue, but it is about the sort of access rate that is low enough to
> allow a page to be swapped out, but high enough for the time waiting for
> it to matter.
>
> Patricia

It happened today again during testing of a different server class on
the same OS and hardware. This time it was under a microscope. There
were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
contention, and the total CPU was around 25%. There was no gridlock
effect - it always involved one synchronized method that did not depend
on other resources to complete. Throughput dropped to ~250 calls per
second at a specific method for several seconds then it recovered. Then
it happened again elsewhere, then recovered. After several minutes the
server was at top speed again. We then pushed traffic until its 1Gbps
Ethernet link saturated and there wasn't a trace of thread contention
ever returning.
--
I won't see Google Groups replies because I must filter them as spam
From: Robert Klemme on
On 09.06.2010 07:13, Kevin McMurtrie wrote:
> In article<l6udnc0kra5W-pvRnZ2dnUVZ_qadnZ2d(a)earthlink.com>,
> Patricia Shanahan<pats(a)acm.org> wrote:
>
>> Kevin McMurtrie wrote:
>> ...
>>> To clarify a bit, this isn't hammering a shared resource. I'm talking
>>> about 100 to 800 synchronizations on a shared object per second for a
>>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
>>> cause a complete collapse of concurrency.
>>>
>> ...
>>
>> Have you considered other possibilities, such as memory thrashing? The
>> resource does not seem heavily enough used for contention to be a big
>> issue, but it is about the sort of access rate that is low enough to
>> allow a page to be swapped out, but high enough for the time waiting for
>> it to matter.
>>
>> Patricia
>
> It happened today again during testing of a different server class on
> the same OS and hardware. This time it was under a microscope. There
> were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
> contention, and the total CPU was around 25%. There was no gridlock
> effect - it always involved one synchronized method that did not depend
> on other resources to complete. Throughput dropped to ~250 calls per
> second at a specific method for several seconds then it recovered. Then
> it happened again elsewhere, then recovered. After several minutes the
> server was at top speed again. We then pushed traffic until its 1Gbps
> Ethernet link saturated and there wasn't a trace of thread contention
> ever returning.

Did you scrutinize the GC's log? This would be something I definitively
would look into. Other than that it's difficult to come up with
concrete information with such a general problem description.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
From: Robert Klemme on
On 09.06.2010 05:09, Mike Schilling wrote:
>
>
> "Robert Klemme" <shortcutter(a)googlemail.com> wrote in message
> news:877quaFr6gU1(a)mid.individual.net...
>> On 08.06.2010 05:39, Kevin McMurtrie wrote:
>
>>
>>> Fixing every single shared synchronized method in every 3rd party
>>> library could take a very, very long time.
>>
>> I have no idea where you take that from. Nobody suggested fixing third
>> party libraries - if anything the suggestion was to use them properly.
>
> What if they use system properties promiscuously? Hypothetically:
>
> 1. My application receives XML messages.
> 2. I use a third-party library to deserialize the XML into Java objects.
> 3. The third-party library uses JAXP to find an XML parser.
> 4. JAXP always checks for a system property that points to the parser's
> class name.
>
> Even if the details are off (I don't know whether current versions of
> JAXP cache the class name), you get the idea.

In that case I would check whether the lib was used properly and if not
indeed the lib would need fixing. Alternatively you would have to
replace it with something else (or a newer version, but IIRC JAXP is
part of the JDK nowadays).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
From: Kevin McMurtrie on
In article <86mc28Fn90U1(a)mid.individual.net>,
Robert Klemme <shortcutter(a)googlemail.com> wrote:

> On 02.06.2010 07:45, Kevin McMurtrie wrote:
> > In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>,
> > Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote:
> >
> >> I've been assisting in load testing some new high performance servers
> >> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
> >> suspending threads for time-slicing in very unfortunate locations. For
> >> example, a thread might suspend in Hashtable.get(Object) after a call to
> >> getProperty(String) on the system properties. It's a synchronized
> >> global so a few hundred threads might pile up until the lock holder
> >> resumes. Odds are that those hundreds of threads won't finish before
> >> another one stops to time slice again. The performance hit has a ton of
> >> hysteresis so the server doesn't recover until it has a lower load than
> >> before the backlog started.
> >>
> >> The brute force fix is of course to eliminate calls to shared
> >> synchronized objects. All of the easy stuff has been done. Some
> >> operations aren't well suited to simple CAS. Bottlenecks that are part
> >> of well established Java APIs are time consuming to fix/avoid.
> >>
> >> Is there JVM or Linux tuning that will change the behavior of thread
> >> time slicing or preemption? I checked the JDK 6 options page but didn't
> >> find anything that appears to be applicable.
> >
> > To clarify a bit, this isn't hammering a shared resource. I'm talking
> > about 100 to 800 synchronizations on a shared object per second for a
> > duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
> > cause a complete collapse of concurrency.
>
> It's the nature of locking issues. Up to a particular point it works
> pretty well and then locking delays explode because of the positive
> feedback.
>
> If you have "a few hundred threads" accessing a single shared lock with
> a frequency of 800Hz then you have a design issue - whether you call it
> "hammering" or not. It's simply not scalable and if it doesn't break
> now it likely breaks with the next step of load increasing.
>
> > My older 4 core Mac Xenon can have 64 threads call getProperty(String)
> > on a shared Property instance 2 million times each in only 21 real
> > seconds. That's one call every 164 ns. It's not as good as
> > ConcurrentHashMap (one per 0.30 ns) but it's no collapse.
>
> Well, then stick with the old CPU. :-) It's not uncommon that moving to
> newer hardware with increased processing resources uncovers issues like
> this.
>
> > Many of the basic Sun Java classes are synchronized. Eliminating all
> > shared synchronized objects without making a mess of 3rd party library
> > integration is no easy task.
>
> It would certainly help the discussion if you pointed out which exact
> classes and methods you are referring to. I would readily agree that
> Sun did a few things wrong initially in the std lib (Vector) which they
> partly fixed later. But I am not inclined to believe in a massive (i.e.
> affecting many areas) concurrency problem in the std lib.
>
> If they synchronize they do it for good reasons - and you simply need to
> limit the number of threads that try to access a resource. A globally
> synchronized, frequently accessed resource in a system with several
> hundred threads is a design problem - but not necessarily in the
> implementation of the resource used but rather in the usage.
>
> > Next up is looking at the Linux scheduler version and the HotSpot
> > spinlock timeout. Maybe the two don't mesh and a thread is very likely
> > to enter a semaphore right as its quanta runs out.
>
> Btw, as far as I can see you didn't yet disclose how you found out about
> the point where the thread is suspended. I'm still curios to learn how
> you found out. Might be a valuable addition to my toolbox.
>
> Kind regards
>
> robert

I have tools based on java.lang.management that will trace thread
contention. Thread dumps from QUIT signals show it too. The threads
aren't permanently stuck, they're just passing through 100000 times
slower than normal.

The problem with staying with on the old system is that Oracle bought
Sun and some unpleasant changes are coming. MacOS X is only suited for
development machines.

Problem areas:

java.util.Properties - Removed from in-house code but still everywhere
else for everything. Used a lot by Sun and 3rd party code. Only
performs poorly on Linux.

org.springframework.context.support.ReloadableResourceBundleMessageSource
- Single-threaded methods down in the bowels of Spring. Only performs
poorly on Linux.

Log4J - Always sucks and needs to be replaced. In the meantime,
removing logging calls except when critical.

Pools, caches, and resource managers - In-house code that is expected to
run 100 - 300 times per second. Has no dependencies during
synchronization. Has been carefully tuned to be capable of millions of
calls per second on 2, 4, and 8 core hardware. They only stall on a
high-end Linux boxes.
--
I won't see Google Groups replies because I must filter them as spam
From: Lew on
Kevin McMurtrie wrote:
> The problem with staying with on the old system is that Oracle bought
> Sun and some unpleasant changes are coming.

Oh?

--
Lew