Prev: DOWNLOAD FREE PALTALK LIVE VIDEO AND VOICE CHAT SOFTWARE
Next: Should -Xmx be a multiple of -Xms?
From: Robert Klemme on 2 Jun 2010 01:36 On 02.06.2010 06:57, Mike Schilling wrote: > > > "Arne Vajhøj" <arne(a)vajhoej.dk> wrote in message > news:4c059872$0$272$14726298(a)news.sunsite.dk... >> On 01-06-2010 00:21, Kevin McMurtrie wrote: >>> I've been assisting in load testing some new high performance servers >>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is >>> suspending threads for time-slicing in very unfortunate locations. >> >> That should not come as a surprise. >> >> The thread scheduler does not examine the code for convenience. >> >> Correct code must work no matter when the in and out of >> CPU happens. >> >> High performance code must work efficiently no matter when the >> in and out of CPU happens. >> >> > For >>> example, a thread might suspend in Hashtable.get(Object) after a call to >>> getProperty(String) on the system properties. It's a synchronized >>> global so a few hundred threads might pile up until the lock holder >>> resumes. Odds are that those hundreds of threads won't finish before >>> another one stops to time slice again. The performance hit has a ton of >>> hysteresis so the server doesn't recover until it has a lower load than >>> before the backlog started. >>> >>> The brute force fix is of course to eliminate calls to shared >>> synchronized objects. All of the easy stuff has been done. Some >>> operations aren't well suited to simple CAS. Bottlenecks that are part >>> of well established Java APIs are time consuming to fix/avoid. >> >> High performance code need to be designed not to synchronize >> extensively. >> >> If the code does and there is a performance problem, then fix >> the code. >> >> There are no miracles. > > Though giving a thread higher priority while it holds a shared lock > isn't exactly rocket science; VMS did it back in the early 80s. JVMs > could do a really nice job of this, noticing which monitors cause > contention and how long they tend to be held. A shame they don't. I can imagine that changing a thread's priority frequently is causing severe overhead because the OS scheduler has to adjust all the time. Thread and process priorities are usually set once to indicate overall processing priority - not to speed up certain operations. Also, changing the priority does not guarantee anything - there could be other threads with higher priority around. I don't think it's a viable approach - especially if applied to fix broken code (or even design). Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
From: Kevin McMurtrie on 2 Jun 2010 01:45 In article <4c048acd$0$22090$742ec2ed(a)news.sonic.net>, Kevin McMurtrie <mcmurtrie(a)pixelmemory.us> wrote: > I've been assisting in load testing some new high performance servers > running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is > suspending threads for time-slicing in very unfortunate locations. For > example, a thread might suspend in Hashtable.get(Object) after a call to > getProperty(String) on the system properties. It's a synchronized > global so a few hundred threads might pile up until the lock holder > resumes. Odds are that those hundreds of threads won't finish before > another one stops to time slice again. The performance hit has a ton of > hysteresis so the server doesn't recover until it has a lower load than > before the backlog started. > > The brute force fix is of course to eliminate calls to shared > synchronized objects. All of the easy stuff has been done. Some > operations aren't well suited to simple CAS. Bottlenecks that are part > of well established Java APIs are time consuming to fix/avoid. > > Is there JVM or Linux tuning that will change the behavior of thread > time slicing or preemption? I checked the JDK 6 options page but didn't > find anything that appears to be applicable. To clarify a bit, this isn't hammering a shared resource. I'm talking about 100 to 800 synchronizations on a shared object per second for a duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't cause a complete collapse of concurrency. My older 4 core Mac Xenon can have 64 threads call getProperty(String) on a shared Property instance 2 million times each in only 21 real seconds. That's one call every 164 ns. It's not as good as ConcurrentHashMap (one per 0.30 ns) but it's no collapse. Many of the basic Sun Java classes are synchronized. Eliminating all shared synchronized objects without making a mess of 3rd party library integration is no easy task. Next up is looking at the Linux scheduler version and the HotSpot spinlock timeout. Maybe the two don't mesh and a thread is very likely to enter a semaphore right as its quanta runs out. -- I won't see Google Groups replies because I must filter them as spam
From: Mike Schilling on 2 Jun 2010 02:02 "Robert Klemme" <shortcutter(a)googlemail.com> wrote in message news:86m8vjF4ulU2(a)mid.individual.net... > On 02.06.2010 06:57, Mike Schilling wrote: >> >> >> "Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message >> news:4c059872$0$272$14726298(a)news.sunsite.dk... >>> On 01-06-2010 00:21, Kevin McMurtrie wrote: >>>> I've been assisting in load testing some new high performance servers >>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is >>>> suspending threads for time-slicing in very unfortunate locations. >>> >>> That should not come as a surprise. >>> >>> The thread scheduler does not examine the code for convenience. >>> >>> Correct code must work no matter when the in and out of >>> CPU happens. >>> >>> High performance code must work efficiently no matter when the >>> in and out of CPU happens. >>> >>> > For >>>> example, a thread might suspend in Hashtable.get(Object) after a call >>>> to >>>> getProperty(String) on the system properties. It's a synchronized >>>> global so a few hundred threads might pile up until the lock holder >>>> resumes. Odds are that those hundreds of threads won't finish before >>>> another one stops to time slice again. The performance hit has a ton of >>>> hysteresis so the server doesn't recover until it has a lower load than >>>> before the backlog started. >>>> >>>> The brute force fix is of course to eliminate calls to shared >>>> synchronized objects. All of the easy stuff has been done. Some >>>> operations aren't well suited to simple CAS. Bottlenecks that are part >>>> of well established Java APIs are time consuming to fix/avoid. >>> >>> High performance code need to be designed not to synchronize >>> extensively. >>> >>> If the code does and there is a performance problem, then fix >>> the code. >>> >>> There are no miracles. >> >> Though giving a thread higher priority while it holds a shared lock >> isn't exactly rocket science; VMS did it back in the early 80s. JVMs >> could do a really nice job of this, noticing which monitors cause >> contention and how long they tend to be held. A shame they don't. > > I can imagine that changing a thread's priority frequently is causing > severe overhead because the OS scheduler has to adjust all the time. > Thread and process priorities are usually set once to indicate overall > processing priority - not to speed up certain operations. Not at all. In time-sharing systems, it's a common scheduling algorithm to adjust the effective priority of a process dynamically, e.g. processes that require user input get a boost above compute-bound ones, to help keep response times low. As I said, I'm not inventing this: it was state of the art about 30 years ago.
From: Robert Klemme on 2 Jun 2010 02:18 On 02.06.2010 08:02, Mike Schilling wrote: > > > "Robert Klemme" <shortcutter(a)googlemail.com> wrote in message > news:86m8vjF4ulU2(a)mid.individual.net... >> On 02.06.2010 06:57, Mike Schilling wrote: >>> >>> >>> "Arne Vajhøj" <arne(a)vajhoej.dk> wrote in message >>> news:4c059872$0$272$14726298(a)news.sunsite.dk... >>>> On 01-06-2010 00:21, Kevin McMurtrie wrote: >>>>> I've been assisting in load testing some new high performance servers >>>>> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or >>>>> Linux is >>>>> suspending threads for time-slicing in very unfortunate locations. >>>> >>>> That should not come as a surprise. >>>> >>>> The thread scheduler does not examine the code for convenience. >>>> >>>> Correct code must work no matter when the in and out of >>>> CPU happens. >>>> >>>> High performance code must work efficiently no matter when the >>>> in and out of CPU happens. >>>> >>>> > For >>>>> example, a thread might suspend in Hashtable.get(Object) after a >>>>> call to >>>>> getProperty(String) on the system properties. It's a synchronized >>>>> global so a few hundred threads might pile up until the lock holder >>>>> resumes. Odds are that those hundreds of threads won't finish before >>>>> another one stops to time slice again. The performance hit has a >>>>> ton of >>>>> hysteresis so the server doesn't recover until it has a lower load >>>>> than >>>>> before the backlog started. >>>>> >>>>> The brute force fix is of course to eliminate calls to shared >>>>> synchronized objects. All of the easy stuff has been done. Some >>>>> operations aren't well suited to simple CAS. Bottlenecks that are part >>>>> of well established Java APIs are time consuming to fix/avoid. >>>> >>>> High performance code need to be designed not to synchronize >>>> extensively. >>>> >>>> If the code does and there is a performance problem, then fix >>>> the code. >>>> >>>> There are no miracles. >>> >>> Though giving a thread higher priority while it holds a shared lock >>> isn't exactly rocket science; VMS did it back in the early 80s. JVMs >>> could do a really nice job of this, noticing which monitors cause >>> contention and how long they tend to be held. A shame they don't. >> >> I can imagine that changing a thread's priority frequently is causing >> severe overhead because the OS scheduler has to adjust all the time. >> Thread and process priorities are usually set once to indicate overall >> processing priority - not to speed up certain operations. > > Not at all. In time-sharing systems, it's a common scheduling algorithm > to adjust the effective priority of a process dynamically, e.g. > processes that require user input get a boost above compute-bound ones, > to help keep response times low. As I said, I'm not inventing this: it > was state of the art about 30 years ago. That's true but in these cases it's the OS that does it - not the JVM. From the OS point of view the JVM is just another process and I doubt there is an interface for the adjustment of the automatic priority (which in a way would defy "automatic"). The base priority on the other hand is to indicate the general priority of a thread / process and I still don't think it's a good idea to change it all the time. So, either the OS does honor thread state (mutext, IO etc.) and adjusts prio accordingly or it doesn't. But I don't think it's the job of the JVM. Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
From: Robert Klemme on 2 Jun 2010 02:29
On 02.06.2010 07:45, Kevin McMurtrie wrote: > In article<4c048acd$0$22090$742ec2ed(a)news.sonic.net>, > Kevin McMurtrie<mcmurtrie(a)pixelmemory.us> wrote: > >> I've been assisting in load testing some new high performance servers >> running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is >> suspending threads for time-slicing in very unfortunate locations. For >> example, a thread might suspend in Hashtable.get(Object) after a call to >> getProperty(String) on the system properties. It's a synchronized >> global so a few hundred threads might pile up until the lock holder >> resumes. Odds are that those hundreds of threads won't finish before >> another one stops to time slice again. The performance hit has a ton of >> hysteresis so the server doesn't recover until it has a lower load than >> before the backlog started. >> >> The brute force fix is of course to eliminate calls to shared >> synchronized objects. All of the easy stuff has been done. Some >> operations aren't well suited to simple CAS. Bottlenecks that are part >> of well established Java APIs are time consuming to fix/avoid. >> >> Is there JVM or Linux tuning that will change the behavior of thread >> time slicing or preemption? I checked the JDK 6 options page but didn't >> find anything that appears to be applicable. > > To clarify a bit, this isn't hammering a shared resource. I'm talking > about 100 to 800 synchronizations on a shared object per second for a > duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't > cause a complete collapse of concurrency. It's the nature of locking issues. Up to a particular point it works pretty well and then locking delays explode because of the positive feedback. If you have "a few hundred threads" accessing a single shared lock with a frequency of 800Hz then you have a design issue - whether you call it "hammering" or not. It's simply not scalable and if it doesn't break now it likely breaks with the next step of load increasing. > My older 4 core Mac Xenon can have 64 threads call getProperty(String) > on a shared Property instance 2 million times each in only 21 real > seconds. That's one call every 164 ns. It's not as good as > ConcurrentHashMap (one per 0.30 ns) but it's no collapse. Well, then stick with the old CPU. :-) It's not uncommon that moving to newer hardware with increased processing resources uncovers issues like this. > Many of the basic Sun Java classes are synchronized. Eliminating all > shared synchronized objects without making a mess of 3rd party library > integration is no easy task. It would certainly help the discussion if you pointed out which exact classes and methods you are referring to. I would readily agree that Sun did a few things wrong initially in the std lib (Vector) which they partly fixed later. But I am not inclined to believe in a massive (i.e. affecting many areas) concurrency problem in the std lib. If they synchronize they do it for good reasons - and you simply need to limit the number of threads that try to access a resource. A globally synchronized, frequently accessed resource in a system with several hundred threads is a design problem - but not necessarily in the implementation of the resource used but rather in the usage. > Next up is looking at the Linux scheduler version and the HotSpot > spinlock timeout. Maybe the two don't mesh and a thread is very likely > to enter a semaphore right as its quanta runs out. Btw, as far as I can see you didn't yet disclose how you found out about the point where the thread is suspended. I'm still curios to learn how you found out. Might be a valuable addition to my toolbox. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ |