From: Tom Anderson on
On Tue, 8 Jun 2010, Kevin McMurtrie wrote:

> The problem with staying with on the old system is that Oracle bought
> Sun and some unpleasant changes are coming. MacOS X is only suited for
> development machines.

BSD?

tom

--
On Question Time last night, Tony Benn was saying that the way to solve
the low turnout at elections was to make voting compulsory. I think the
solution is for someone to start a political party that doesn't contain
wall-to-wall bastards. -- John Rowland
From: Tom Anderson on
On Tue, 8 Jun 2010, Robert Klemme wrote:

> On 08.06.2010 05:39, Kevin McMurtrie wrote:
>> In article<874m08Fib7U1(a)mid.individual.net>,
>> Robert Klemme<shortcutter(a)googlemail.com> wrote:
>>
>>> On 07.06.2010 08:25, Kevin McMurtrie wrote:
>>>
>>>> Properties is a biggie. A brute-force replacement of Properties caused
>>>> the system throughput to collapse to almost nothing in Spring's
>>>> ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
>>>> next test is to disable hyperthreading.
>>>
>>> As someone else (Lew?) pointed out it's a bad idea to always go to
>>> System.properties. You should rather be evaluating them on startup and
>>> initialize some other data structure - if only to not always repeat
>>> checking of input values over and over again.
>>
>> The properties aren't immutable. The best feature of properties rather
>> than hard-coded values is being able to update them in an emergency
>> without server restarts. Anyways, that was fixed by overriding every
>> method in Properties with a high-concurrency implementation. Too bad
>> Properties isn't an interface.
>
> Well, then use an immutable Hash map as Lew suggested and store it via
> AtomicReference.

Or even a volatile variable. If you're not doing CAS, there's no advantage
to using an AtomicReference over a volatile. Mind you, there shouldn't be
any disadvantage either.

tom

--
On Question Time last night, Tony Benn was saying that the way to solve
the low turnout at elections was to make voting compulsory. I think the
solution is for someone to start a political party that doesn't contain
wall-to-wall bastards. -- John Rowland
From: Robert Klemme on
On 09.06.2010 08:06, Kevin McMurtrie wrote:
> In article<86mc28Fn90U1(a)mid.individual.net>,
> Robert Klemme<shortcutter(a)googlemail.com> wrote:
>
>> On 02.06.2010 07:45, Kevin McMurtrie wrote:
>>> In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>,
>>> Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote:
>>>
>>>> I've been assisting in load testing some new high performance servers
>>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
>>>> suspending threads for time-slicing in very unfortunate locations. For
>>>> example, a thread might suspend in Hashtable.get(Object) after a call to
>>>> getProperty(String) on the system properties. It's a synchronized
>>>> global so a few hundred threads might pile up until the lock holder
>>>> resumes. Odds are that those hundreds of threads won't finish before
>>>> another one stops to time slice again. The performance hit has a ton of
>>>> hysteresis so the server doesn't recover until it has a lower load than
>>>> before the backlog started.
>>>>
>>>> The brute force fix is of course to eliminate calls to shared
>>>> synchronized objects. All of the easy stuff has been done. Some
>>>> operations aren't well suited to simple CAS. Bottlenecks that are part
>>>> of well established Java APIs are time consuming to fix/avoid.
>>>>
>>>> Is there JVM or Linux tuning that will change the behavior of thread
>>>> time slicing or preemption? I checked the JDK 6 options page but didn't
>>>> find anything that appears to be applicable.
>>>
>>> To clarify a bit, this isn't hammering a shared resource. I'm talking
>>> about 100 to 800 synchronizations on a shared object per second for a
>>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
>>> cause a complete collapse of concurrency.
>>
>> It's the nature of locking issues. Up to a particular point it works
>> pretty well and then locking delays explode because of the positive
>> feedback.
>>
>> If you have "a few hundred threads" accessing a single shared lock with
>> a frequency of 800Hz then you have a design issue - whether you call it
>> "hammering" or not. It's simply not scalable and if it doesn't break
>> now it likely breaks with the next step of load increasing.
>>
>>> My older 4 core Mac Xenon can have 64 threads call getProperty(String)
>>> on a shared Property instance 2 million times each in only 21 real
>>> seconds. That's one call every 164 ns. It's not as good as
>>> ConcurrentHashMap (one per 0.30 ns) but it's no collapse.
>>
>> Well, then stick with the old CPU. :-) It's not uncommon that moving to
>> newer hardware with increased processing resources uncovers issues like
>> this.
>>
>>> Many of the basic Sun Java classes are synchronized. Eliminating all
>>> shared synchronized objects without making a mess of 3rd party library
>>> integration is no easy task.
>>
>> It would certainly help the discussion if you pointed out which exact
>> classes and methods you are referring to. I would readily agree that
>> Sun did a few things wrong initially in the std lib (Vector) which they
>> partly fixed later. But I am not inclined to believe in a massive (i.e.
>> affecting many areas) concurrency problem in the std lib.
>>
>> If they synchronize they do it for good reasons - and you simply need to
>> limit the number of threads that try to access a resource. A globally
>> synchronized, frequently accessed resource in a system with several
>> hundred threads is a design problem - but not necessarily in the
>> implementation of the resource used but rather in the usage.
>>
>>> Next up is looking at the Linux scheduler version and the HotSpot
>>> spinlock timeout. Maybe the two don't mesh and a thread is very likely
>>> to enter a semaphore right as its quanta runs out.
>>
>> Btw, as far as I can see you didn't yet disclose how you found out about
>> the point where the thread is suspended. I'm still curios to learn how
>> you found out. Might be a valuable addition to my toolbox.

> I have tools based on java.lang.management that will trace thread
> contention.

Which tools?

> Thread dumps from QUIT signals show it too. The threads
> aren't permanently stuck, they're just passing through 100000 times
> slower than normal.

I am not sure I understand how you found out with these tools that
threads are suspended "for time-slicing in very unfortunate locations".

> The problem with staying with on the old system is that Oracle bought
> Sun and some unpleasant changes are coming. MacOS X is only suited for
> development machines.

Which changes do you expect?

> Problem areas:
>
> java.util.Properties - Removed from in-house code but still everywhere
> else for everything. Used a lot by Sun and 3rd party code. Only
> performs poorly on Linux.

Even if not shared across threads?

> org.springframework.context.support.ReloadableResourceBundleMessageSource
> - Single-threaded methods down in the bowels of Spring. Only performs
> poorly on Linux.
>
> Log4J - Always sucks and needs to be replaced. In the meantime,
> removing logging calls except when critical.

Hm, so far we haven't had issues with Log4J unless used for excessive
logging (i.e. running production in DEBUG which is not really intended
use). As long as you log into a single sink then any concurrently used
log solution will have good potential for contention. :-)

> Pools, caches, and resource managers - In-house code that is expected to
> run 100 - 300 times per second. Has no dependencies during
> synchronization. Has been carefully tuned to be capable of millions of
> calls per second on 2, 4, and 8 core hardware. They only stall on a
> high-end Linux boxes.

Since your high end box has more cores (does it?) and is generally
faster it will sooner exhibit bottlenecks via the cascading effect Lew
described earlier. Although I would readily concede that JVMs and Java
standard libraries do have bugs I am generally more inclined to believe
in a design level solution. For example: if you have a global
connection pool and all threads share it, increasing the number of
threads will at some point lead to contention. In that case you might
have to group threads with a fixed max group size and have a pool per
group. We did a similar thing with ThreadPoolExecutor where we created
several ThreadPoolExecutors and at enqueue time we use round robin to
schedule instances. This limits the number of threads competing for a
single queue's locks. Scheduling is done via
AtomicInteger.incrementAndGet().

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
From: Arne Vajhøj on
On 09-06-2010 01:13, Kevin McMurtrie wrote:
> In article<l6udnc0kra5W-pvRnZ2dnUVZ_qadnZ2d(a)earthlink.com>,
> Patricia Shanahan<pats(a)acm.org> wrote:
>> Kevin McMurtrie wrote:
>> ...
>>> To clarify a bit, this isn't hammering a shared resource. I'm talking
>>> about 100 to 800 synchronizations on a shared object per second for a
>>> duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
>>> cause a complete collapse of concurrency.
>>>
>> ...
>>
>> Have you considered other possibilities, such as memory thrashing? The
>> resource does not seem heavily enough used for contention to be a big
>> issue, but it is about the sort of access rate that is low enough to
>> allow a page to be swapped out, but high enough for the time waiting for
>> it to matter.
>
> It happened today again during testing of a different server class on
> the same OS and hardware. This time it was under a microscope. There
> were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
> contention, and the total CPU was around 25%. There was no gridlock
> effect - it always involved one synchronized method that did not depend
> on other resources to complete. Throughput dropped to ~250 calls per
> second at a specific method for several seconds then it recovered. Then
> it happened again elsewhere, then recovered. After several minutes the
> server was at top speed again. We then pushed traffic until its 1Gbps
> Ethernet link saturated and there wasn't a trace of thread contention
> ever returning.

That periodic behavior points to something related to GC.

You could try and experiment with various -XX affecting GC
to see if it could change the behavior. If it could, then
it somewhat verifies that it is related to GC.

Another interesting thing would be to try with another
JVM (from SUN, IBM and BEA/Oracle).

Arne

From: Arne Vajhøj on
On 09-06-2010 02:06, Kevin McMurtrie wrote:
> In article<86mc28Fn90U1(a)mid.individual.net>,
> Robert Klemme<shortcutter(a)googlemail.com> wrote:
>> Well, then stick with the old CPU. :-) It's not uncommon that moving to
>> newer hardware with increased processing resources uncovers issues like
>> this.

> The problem with staying with on the old system is that Oracle bought
> Sun and some unpleasant changes are coming. MacOS X is only suited for
> development machines.

AFAIK then Oracle has not announced any unpleasant things
and unless you happen to be Larry's neighbor and get some
inside tips over then fence, then it sounds as rumors.

> Log4J - Always sucks and needs to be replaced. In the meantime,
> removing logging calls except when critical.

Many people use log4j in high volume apps.

Arne