How Many Processor Cores Are Enough? [Computer Architecture]

Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6

From: Joe Seigh on 24 Oct 2006 08:32

Nick Maclaren wrote:
> In article <hvGdnXAAUoqlaqDYnZ2dnUVZ_rGdnZ2d(a)comcast.com>,
> Joe Seigh <jseigh_01(a)xemaps.com> writes:
> |> Transactional memory, aka magic at this point.
>
> Sigh. Indeed :-(
>
> Like 90% of the other problems with getting parallelism right in
> hardware, the problem is actually moving people to a coding paradigm
> where the problem is soluble. I don't see any problem with that
> implementing that model for BSP, transactional databases etc.!
>
> But, to do it for C++ ....
>

They seem to think it will buy them composability which will
solve all concurrency problems. Like I said, magic.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Nick Maclaren on 24 Oct 2006 10:59

In article <vb6dnUlipoM3mqPYnZ2dnUVZ_vSdnZ2d(a)comcast.com>,
Joe Seigh <jseigh_01(a)xemaps.com> writes:
|> >
|> > Like 90% of the other problems with getting parallelism right in
|> > hardware, the problem is actually moving people to a coding paradigm
|> > where the problem is soluble. I don't see any problem with that
|> > implementing that model for BSP, transactional databases etc.!
|> >
|> > But, to do it for C++ ....
|>
|> They seem to think it will buy them composability which will
|> solve all concurrency problems. Like I said, magic.

Well, it would certainly help a great deal! But I was thinking of
how to implement it, and couldn't see how to do it using physics.
So, as you say, magic ....

It's like so-called non-deterministic automata, so beloved of the
more clueless compscis. I have been told in all seriousness that
they are a realistic model for the complexity analysis of practical
parallel computation!

Regards,
Nick Maclaren.

From: Joe Seigh on 24 Oct 2006 12:18

Nick Maclaren wrote:
> In article <vb6dnUlipoM3mqPYnZ2dnUVZ_vSdnZ2d(a)comcast.com>,
> Joe Seigh <jseigh_01(a)xemaps.com> writes:
> |> >
> |> > Like 90% of the other problems with getting parallelism right in
> |> > hardware, the problem is actually moving people to a coding paradigm
> |> > where the problem is soluble. I don't see any problem with that
> |> > implementing that model for BSP, transactional databases etc.!
> |> >
> |> > But, to do it for C++ ....
> |>
> |> They seem to think it will buy them composability which will
> |> solve all concurrency problems. Like I said, magic.
>
> Well, it would certainly help a great deal! But I was thinking of
> how to implement it, and couldn't see how to do it using physics.
> So, as you say, magic ....

My impression is it's a giant load reserved/store conditional but not using
registers, just memory. Cache seems to be the mechanism they're
playing with to do this, at least according to one Sun patent
I saw. The big problem is guaranteeing sufficient forward
progress to make it workable. E.g. instant retry if underlying
reserve has been broken, or just making it an application of
scheduling if the data dependencies can be worked out. But there's
a limit to how much you can cover in a transaction so I don't
know if this will solve all of the problems. Some amount of
concurrency will still be exposed at the application level and you
will need someone who knows how to deal with that.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Eric P. on 24 Oct 2006 12:19

Del Cecchi wrote:
>
> If I interpret this article http://www.linuxjournal.com/article/8211
> correctly, expecting those stores to not be reordered as seen from
> another cpu is unrealistic unless steps are taken in the software to
> make it so. It is realistic to expect them to occur in order as seen
> from the cpu where they originate.

You are correct. However this article contains errors.

1) Under Summary of Memory Ordering is states that
"Aligned simple loads and stores are atomic."
This is not true on processors without byte and word loads and stores,
Alpha 21064 and MIPS.

What this should say is "Aligned simple loads and stores of atomic
sized data items, the definition of which is architecture and
model specific, are atomic."

2) It states that the x86 allows "Loads Reordered After Stores".
He does not define this non standard terminology, but if it means
what it sounds like then he is claiming that the x86 allows
a later store to bypass outstanding earlier loads.
That is wrong.

Since writes "are only performed for instructions that have actually
retired" (Intel Vol3, SysProg Guide, section 7.2.2, item 5)
if a preceeding load is still outstanding then the write will
not retire.

Eric

From: Nick Maclaren on 24 Oct 2006 12:56

In article <956dnSONVbw5oaPYnZ2dnUVZ_v6dnZ2d(a)comcast.com>,
Joe Seigh <jseigh_01(a)xemaps.com> writes:
|>
|> [ transactional memory ]
|>
|> My impression is it's a giant load reserved/store conditional but not using
|> registers, just memory. Cache seems to be the mechanism they're
|> playing with to do this, at least according to one Sun patent
|> I saw. The big problem is guaranteeing sufficient forward
|> progress to make it workable. ...

Quite. That is a classic problem in queuing theory, and it is well-
known to be insoluble in the general case. Like SO many other things,
you push it beyond its limit and it jams or breaks.

|> Some amount of
|> concurrency will still be exposed at the application level and you
|> will need someone who knows how to deal with that.

Which is where all these silver bullets fail - if the programmer can't
shoot or is trying an impossible shot, it doesn't matter WHAT his gun
is loaded with ....

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6