From: Knute Johnson on
On 7/4/2010 5:31 AM, Robert Klemme wrote:
> On 04.07.2010 11:28, Christian wrote:
>> Am 02.07.2010 16:40, schrieb Robert Klemme:
>>>
>>> the recent thread "Serious concurrency problems on fast systems"
>>> inspired me to put together a small demo that shows how different ways
>>> of concurrency control affect execution. The example shares a dummy
>>> configuration with a single long value. Multiple threads access the
>>> shared resource read only and depending on test scenario a single thread
>>> updates it from time to time. Concurrency control is done in these ways:
>>>
>>> 1. Plain synchronized on a single shared resource.
>>>
>>> 2. Synchronized but with redundant storage and update via observer
>>> pattern.
>>>
>>> 3. Copy on write with an immutable object and an AtomicReference.
>>>
>>> You can download it here
>>>
>>> http://docs.google.com/leaf?id=0B7Q7WZzdIMlIMDI4ZDk0ZGItYzk1My00ZTc1LWJlYmQtNDYzNWNlNzA3YTJm&hl=en
>>>
>>
>> You could also try the by Java provided (Reentrant)ReadWriteLock ...
>> it might (I am not sure there) be cheaper with lots of reads and few
>> writes than normal synchronization.
>
> That's an excellent idea:
>
> https://docs.google.com/leaf?id=0B7Q7WZzdIMlIZjAxZDg5YzYtNzE3YS00ZjAyLTgzMmQtMTExMmYwZjcwODAz&sort=name&layout=list&num=50
>
>
> However, while it is faster than plain old synchronized on the global
> resource, this confirms that centralized locking is an inferior approach
> in highly concurrent systems. :-)
>
> Kind regards
>
> robert
>

Patient "Doctor it hurts when I do that." Doctor "Well don't do that!"

--

Knute Johnson
email s/nospam/knute2010/

From: Kevin McMurtrie on
In article <89bd9hFs3nU1(a)mid.individual.net>,
Robert Klemme <shortcutter(a)googlemail.com> wrote:

> On 04.07.2010 11:28, Christian wrote:
> > Am 02.07.2010 16:40, schrieb Robert Klemme:
> >>
> >> the recent thread "Serious concurrency problems on fast systems"
> >> inspired me to put together a small demo that shows how different ways
> >> of concurrency control affect execution. The example shares a dummy
> >> configuration with a single long value. Multiple threads access the
> >> shared resource read only and depending on test scenario a single thread
> >> updates it from time to time. Concurrency control is done in these ways:
> >>
> >> 1. Plain synchronized on a single shared resource.
> >>
> >> 2. Synchronized but with redundant storage and update via observer
> >> pattern.
> >>
> >> 3. Copy on write with an immutable object and an AtomicReference.
> >>
> >> You can download it here
> >>
> >> http://docs.google.com/leaf?id=0B7Q7WZzdIMlIMDI4ZDk0ZGItYzk1My00ZTc1LWJlYmQ
> >> tNDYzNWNlNzA3YTJm&hl=en
> >
> > You could also try the by Java provided (Reentrant)ReadWriteLock ...
> > it might (I am not sure there) be cheaper with lots of reads and few
> > writes than normal synchronization.
>
> That's an excellent idea:
>
> https://docs.google.com/leaf?id=0B7Q7WZzdIMlIZjAxZDg5YzYtNzE3YS00ZjAyLTgzMmQtM
> TExMmYwZjcwODAz&sort=name&layout=list&num=50
>
> However, while it is faster than plain old synchronized on the global
> resource, this confirms that centralized locking is an inferior approach
> in highly concurrent systems. :-)
>
> Kind regards
>
> robert

ALL thread interaction on a multi-CPU system is slow. Try 10 threads
writing to adjacent memory in a shared array compared to 10 threads
writing to memory in their own array. Not only is that a baseline for
all concurrency techniques, but it can hinder performance in non-obvious
areas.

In my benchmarks, a 'synchronized' block has less overhead than all of
the Java 1.5 lock mechanisms. If you must alter a set of data in a
consistent state, it seems to be the way to go. Its downside is that a
thread can use up its CPU quanta while holding the lock and everything
else piles up for a while. Java 1.5 locks give you options for shared
read locks and fail-fast lock acquisition that may reduce blocking.

For syncing a single primitive, the CAS operations in 'Atomic...'
classes are faster than a 'synchronized' block. Not super fast, because
of cache syncing, but much faster. CAS loops have a worst-case of
eating CPU time but they don't have a worst-case of waiting for a
suspended thread to release a lock.

If 'wait' and 'notify' are needed without the locking of 'synchronized',
it can be improved using LockSupport.park() and unpark(). I haven't
tested on Solaris or Linux yet, but these two hit a bottleneck in the
Mac OS X kernel.
--
I won't see Google Groups replies because I must filter them as spam
From: Robert Klemme on
On 04.07.2010 18:47, Kevin McMurtrie wrote:
> In article<89bd9hFs3nU1(a)mid.individual.net>,
> Robert Klemme<shortcutter(a)googlemail.com> wrote:
>
>> On 04.07.2010 11:28, Christian wrote:
>>> Am 02.07.2010 16:40, schrieb Robert Klemme:
>>>>
>>>> the recent thread "Serious concurrency problems on fast systems"
>>>> inspired me to put together a small demo that shows how different ways
>>>> of concurrency control affect execution. The example shares a dummy
>>>> configuration with a single long value. Multiple threads access the
>>>> shared resource read only and depending on test scenario a single thread
>>>> updates it from time to time. Concurrency control is done in these ways:
>>>>
>>>> 1. Plain synchronized on a single shared resource.
>>>>
>>>> 2. Synchronized but with redundant storage and update via observer
>>>> pattern.
>>>>
>>>> 3. Copy on write with an immutable object and an AtomicReference.
>>>>
>>>> You can download it here
>>>>
>>>> http://docs.google.com/leaf?id=0B7Q7WZzdIMlIMDI4ZDk0ZGItYzk1My00ZTc1LWJlYmQ
>>>> tNDYzNWNlNzA3YTJm&hl=en
>>>
>>> You could also try the by Java provided (Reentrant)ReadWriteLock ...
>>> it might (I am not sure there) be cheaper with lots of reads and few
>>> writes than normal synchronization.
>>
>> That's an excellent idea:
>>
>> https://docs.google.com/leaf?id=0B7Q7WZzdIMlIZjAxZDg5YzYtNzE3YS00ZjAyLTgzMmQtM
>> TExMmYwZjcwODAz&sort=name&layout=list&num=50
>>
>> However, while it is faster than plain old synchronized on the global
>> resource, this confirms that centralized locking is an inferior approach
>> in highly concurrent systems. :-)
>
> ALL thread interaction on a multi-CPU system is slow. Try 10 threads
> writing to adjacent memory in a shared array compared to 10 threads
> writing to memory in their own array. Not only is that a baseline for
> all concurrency techniques, but it can hinder performance in non-obvious
> areas.
>
> In my benchmarks, a 'synchronized' block has less overhead than all of
> the Java 1.5 lock mechanisms. If you must alter a set of data in a
> consistent state, it seems to be the way to go.

The demo code shows that although you can get pretty fast with
synchronized there are faster alternatives. Especially the variant with
an AtomicReference (or plain volatile) is dramatically faster. It
depends on the situation what to choose. synchronize is not the one
size fits all solution to concurrency issues.

> Its downside is that a
> thread can use up its CPU quanta while holding the lock and everything
> else piles up for a while. Java 1.5 locks give you options for shared
> read locks and fail-fast lock acquisition that may reduce blocking.

Exactly.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
From: Knute Johnson on
On 7/5/2010 2:36 PM, Robert Klemme wrote:
>> ALL thread interaction on a multi-CPU system is slow. Try 10 threads
>> writing to adjacent memory in a shared array compared to 10 threads
>> writing to memory in their own array. Not only is that a baseline for
>> all concurrency techniques, but it can hinder performance in non-obvious
>> areas.
>>
>> In my benchmarks, a 'synchronized' block has less overhead than all of
>> the Java 1.5 lock mechanisms. If you must alter a set of data in a
>> consistent state, it seems to be the way to go.
>
> The demo code shows that although you can get pretty fast with
> synchronized there are faster alternatives. Especially the variant with
> an AtomicReference (or plain volatile) is dramatically faster. It
> depends on the situation what to choose. synchronize is not the one size
> fits all solution to concurrency issues.
>
>> Its downside is that a
>> thread can use up its CPU quanta while holding the lock and everything
>> else piles up for a while. Java 1.5 locks give you options for shared
>> read locks and fail-fast lock acquisition that may reduce blocking.
>
> Exactly.
>
> Cheers
>
> robert
>

Just as there is a diminishing return to more threads.

--

Knute Johnson
email s/nospam/knute2010/

From: Lew on
Kevin McMurtrie wrote:
>>> ALL thread interaction on a multi-CPU system is slow. Try 10 threads
>>> writing to adjacent memory in a shared array compared to 10 threads
>>> writing to memory in their own array. Not only is that a baseline for
>>> all concurrency techniques, but it can hinder performance in non-obvious
>>> areas.
>>>
>>> In my benchmarks, a 'synchronized' block has less overhead than all of
>>> the Java 1.5 lock mechanisms. If you must alter a set of data in a
>>> consistent state, it seems to be the way to go.

Robert Klemme wrote:
>> The demo code shows that although you can get pretty fast with
>> synchronized there are faster alternatives. Especially the variant with
>> an AtomicReference (or plain volatile) is dramatically faster. It
>> depends on the situation what to choose. synchronize is not the one size
>> fits all solution to concurrency issues.

Kevin McMurtrie wrote:
>>> Its downside is that a
>>> thread can use up its CPU quanta while holding the lock and everything
>>> else piles up for a while. Java 1.5 locks give you options for shared
>>> read locks and fail-fast lock acquisition that may reduce blocking.

Knute Johnson wrote:
> Just as there is a diminishing return to more threads.

Or more CPUs on a board, or more nodes in a cluster, or more programmers on a
team, ...

--
Lew