Why so little parallelism? [Computer Architecture]

Prev: Historical question, what went wrong with bubble memory?
Next: Need SimpleScalar GCC Compiler

From: "Peter "Firefly" Lund" on 14 Jan 2007 23:28

On Sun, 14 Jan 2007, Joe Seigh wrote:

> Conventional locks scale really well under low contention, i.e. only
> a single thread attempting to get the lock. I don't understand unless
> they're using a different definition of scalability here.

They are talking about the effort from the programmer necessary to
understand what's going on. When the programs grow and the number of
locks grow, the number of people who understand the locks falls
drastically (what Larry McVoy called "the locking cliff").

In that sense, transactional memory is much, much more scalable.

-Peter

From: "Peter "Firefly" Lund" on 14 Jan 2007 23:29

On Sun, 14 Jan 2007, Joe Seigh wrote:

> Conventional locks scale really well under low contention, i.e. only
> a single thread attempting to get the lock. I don't understand unless
> they're using a different definition of scalability here.

Oh, AFAIR, the benchmark apps (in Haskell) used by Simon Peyton-Jones et
al actually performed slightly better when implemented in terms of
transactional memory than in terms of "raw" locks.

-Peter

From: Nick Maclaren on 15 Jan 2007 03:48

In article <M1Cqh.66307$QY6.46542(a)fe1.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk> writes:
|>
|> Besides, the fact that it's an "optimistic" model is a weakness. The
|> worst-case performance of optimistic transactional memory is truly awful.
|>
|> (Note that non-optimistic transactional models are also perfectly possible.)

Given that the problem is mathematically intractable, and there is some
damn good evidence that many real problems approach the mathematically
hard cases, looking for a free lunch is the sign of the feeble minded.

The trouble with improving primitives is that, while it is easy, it
typically gives only a semi-constant factor improvement. Almost any
set of primitives can be built up from almost any other ones. It may
be worth while to use a different set, but it rarely solves complexity
problems.

Regards,
Nick Maclaren.

From: Nick Maclaren on 15 Jan 2007 03:52

In article <Pine.LNX.4.61.0701150526310.16197(a)ask.diku.dk>,
"Peter \"Firefly\" Lund" <firefly(a)diku.dk> writes:
|> On Sun, 14 Jan 2007, Joe Seigh wrote:
|>
|> > Conventional locks scale really well under low contention, i.e. only
|> > a single thread attempting to get the lock. I don't understand unless
|> > they're using a different definition of scalability here.
|>
|> They are talking about the effort from the programmer necessary to
|> understand what's going on. When the programs grow and the number of
|> locks grow, the number of people who understand the locks falls
|> drastically (what Larry McVoy called "the locking cliff").
|>
|> In that sense, transactional memory is much, much more scalable.

Why?

In competently designed and written codes, the hard interactions are
all at levels above the simple locking one. They are of the nature
of "will this process complete for ALL input data in a plausible time?"
or "are there any realistic failure scenarios that lead to livelock?"

I agree that it may be much, much easier to use, but I can't see that
it will be better than slightly more scalable.

Regards,
Nick Maclaren.

From: Terje Mathisen on 15 Jan 2007 03:48

Joe Seigh wrote:
> Anne & Lynn Wheeler wrote:
>>
>> recent news URL on (hardware) transactional memory
>>
>> Getting Serious About Transactional Memory
>> http://www.hpcwire.com/hpc/1196095.html
>>
> From the article
> "Like locks, transactional memory is a construct for concurrency control
> that enables access to data shared by multiple threads. But unlike locks
> it is an optimistic model. It assumes that in most cases only a single
> thread will be contending for a given data item."
>
> Conventional locks scale really well under low contention, i.e. only
> a single thread attempting to get the lock. I don't understand unless
> they're using a different definition of scalability here.

Joe, I agree.

What this article really describes is something that looks awfully close
to Read/Copy/Update, i.e. you encapsulate all the needed updates into a
standalone memory structure, then change the global data view from the
previous to the new version with a single atomic update of a pointer.

If someone else have done the same thing while you were updating the new
block/item, then you'll have to retry the actual update operation.

If we take a look under the hood of an actual "optimistic TM"
implementation, then it pretty much _has_ to look a lot like RCU, right?

OTOH, doing the same in a high contention environment ("Terascale
computing") means that you have to fall back on message passing and queuing.

Terje
--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

First | Prev | Next | Last
Pages: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
Prev: Historical question, what went wrong with bubble memory?
Next: Need SimpleScalar GCC Compiler