How Many Processor Cores Are Enough? [Computer Architecture]

Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6

From: rohit.nadig@gmail.com on 17 Oct 2006 13:15

Pardon my intervening, but it seems like we are beating the point to
death.

It is painful to see Dennis and Nick build a mountain of animosity over
nothing.

Architect, micro-architect, I could care less. Its the ideas,
innovation, and "fun" that really matters.

From: Nick Maclaren on 17 Oct 2006 13:33

In article <AKOdndCG_uFZkqjYnZ2dnUVZ8tSdnZ2d(a)pipex.net>,
kenney(a)cix.compulink.co.uk writes:
|>
|> > Much of the load/store atomicity is unspecified, especially
|> > with regard to multiple cores and data objects which need a
|> > lower alignment than their size. That is a common source of
|> > problem.
|>
|> Are you talking solely about factors that affect program
|> optimisation? I ask this because most high level languages
|> isolate the programmer from the architecture anyway. Come to that
|> so do operating systems with the HAL. Presumably this is most
|> important for compiler writers.

No. And I am afraid that they don't. Let's take a specific example
that has been discussed at length in the context of SC22WG21 (C++).

A threaded program has a global object that is read frequently
and updated (by simply storing into it) rarely. Under what
circumstances is it permissible to put a lock around the update
and leave the reads unlocked?

According to most architectures and most languages, the answer is
typically 'never', which has a MAJOR impact on the complexity and
performance of many codes, as well as being a very common source of
error. But, in practice, there are some pretty safe circumstances,
such as:

1) If the object has an alignment equal to its size, which is 1, 2, 4
or 8 bytes, and is being BOTH stored and read by 'simple' loads and
stores, then it is also safe.

2) If the object is normally all zero, is stored once between resets
and is never reset to zero without a global lock, then it is safe
for the reads to test for it going non-zero. They still may need to
lock it to read the value, but the zero/non-zero test is safe unlocked.

3) If the new value stored has either a superset or a subset of the
bits set in the previous value, and similar conditions to (2) apply,
then a similar conclusion holds.

And it turns out that those three are enough to make a MAJOR difference,
so a huge number of parallel and interrupt handling codes rely on them,
even when the architectures (as such) don't specify them.

Regards,
Nick Maclaren.

From: Del Cecchi on 17 Oct 2006 14:13

Nick Maclaren wrote:
> In article <AKOdndCG_uFZkqjYnZ2dnUVZ8tSdnZ2d(a)pipex.net>,
> kenney(a)cix.compulink.co.uk writes:
> |>
> |> > Much of the load/store atomicity is unspecified, especially
> |> > with regard to multiple cores and data objects which need a
> |> > lower alignment than their size. That is a common source of
> |> > problem.
> |>
> |> Are you talking solely about factors that affect program
> |> optimisation? I ask this because most high level languages
> |> isolate the programmer from the architecture anyway. Come to that
> |> so do operating systems with the HAL. Presumably this is most
> |> important for compiler writers.
>
> No. And I am afraid that they don't. Let's take a specific example
> that has been discussed at length in the context of SC22WG21 (C++).
>
> A threaded program has a global object that is read frequently
> and updated (by simply storing into it) rarely. Under what
> circumstances is it permissible to put a lock around the update
> and leave the reads unlocked?
>
> According to most architectures and most languages, the answer is
> typically 'never', which has a MAJOR impact on the complexity and
> performance of many codes, as well as being a very common source of
> error. But, in practice, there are some pretty safe circumstances,
> such as:
>
> 1) If the object has an alignment equal to its size, which is 1, 2, 4
> or 8 bytes, and is being BOTH stored and read by 'simple' loads and
> stores, then it is also safe.
>
> 2) If the object is normally all zero, is stored once between resets
> and is never reset to zero without a global lock, then it is safe
> for the reads to test for it going non-zero. They still may need to
> lock it to read the value, but the zero/non-zero test is safe unlocked.
>
> 3) If the new value stored has either a superset or a subset of the
> bits set in the previous value, and similar conditions to (2) apply,
> then a similar conclusion holds.
>
> And it turns out that those three are enough to make a MAJOR difference,
> so a huge number of parallel and interrupt handling codes rely on them,
> even when the architectures (as such) don't specify them.
>
>
> Regards,
> Nick Maclaren.

And they will work until they break. And if the application is
important enough then their behaviour becomes a defacto part of the
architecture. This has been true for many years.

del

--
Del Cecchi
"This post is my own and doesn?t necessarily represent IBM?s positions,
strategies or opinions.?

From: Nick Maclaren on 17 Oct 2006 14:31

In article <4pkkq2Fje65mU1(a)individual.net>,
Del Cecchi <cecchinospam(a)us.ibm.com> writes:
|> >
|> > According to most architectures and most languages, the answer is
|> > typically 'never', which has a MAJOR impact on the complexity and
|> > performance of many codes, as well as being a very common source of
|> > error. But, in practice, there are some pretty safe circumstances,
|> > such as: ...
|>
|> And they will work until they break. And if the application is
|> important enough then their behaviour becomes a defacto part of the
|> architecture. This has been true for many years.

To be precise, a good four decades ....

As you say, the problem isn't new. Whether it is practicable to define
an architecture so that NO program needs to rely on such aspects is
unclear. I am pretty sure that it has never been done (even Java has
had some problems in that respect), but whether it could be done is
another matter.

Regards,
Nick Maclaren.

From: Joe Seigh on 17 Oct 2006 14:50

Del Cecchi wrote:
> Nick Maclaren wrote:
>
>> In article <AKOdndCG_uFZkqjYnZ2dnUVZ8tSdnZ2d(a)pipex.net>,
>> kenney(a)cix.compulink.co.uk writes:
>> |> |> > Much of the load/store atomicity is unspecified, especially |>
>> > with regard to multiple cores and data objects which need a |> >
>> lower alignment than their size. That is a common source of |> >
>> problem.
>> |> |> Are you talking solely about factors that affect program |>
>> optimisation? I ask this because most high level languages |> isolate
>> the programmer from the architecture anyway. Come to that |> so do
>> operating systems with the HAL. Presumably this is most |> important
>> for compiler writers.
>> No. And I am afraid that they don't. Let's take a specific example
>> that has been discussed at length in the context of SC22WG21 (C++).
>>
>> A threaded program has a global object that is read frequently
>> and updated (by simply storing into it) rarely. Under what
>> circumstances is it permissible to put a lock around the update
>> and leave the reads unlocked?
>>
>> According to most architectures and most languages, the answer is
>> typically 'never', which has a MAJOR impact on the complexity and
>> performance of many codes, as well as being a very common source of
>> error. But, in practice, there are some pretty safe circumstances,
>> such as:
>>
[...]

> And they will work until they break. And if the application is
> important enough then their behaviour becomes a defacto part of the
> architecture. This has been true for many years.
>

On a different but related example of the same problem, RCU in Linux
uses dependent loads instead of load acquire for performance reasons.
To prevent it from breaking they use a macro which is just a plain
load (w/ compiler optimization hints) on most architectures. So
if the architecture changes, RCU won't break. The hardware might
be considered broken at that point since it will take a huge
performance hit on Linux anyway.

AFAIK, none of the dependent load stuff is part of the formal
memory models when those do exist. That info is usually tucked
away in some performance documentation. The performance stuff
to me seems like it's stuff that can change over an architectures
lifetime necessitating a rewrite of some performance sensitive
code from time to time. That it's required reflects different
viewpoints on ROI. Companies like Intel are hardware oriented
and view software as tied to specific models of processors,
usually at the driver and library model. The idea that
some software may have a lifetime of 10 years or more is
alien to them since none fo the driver code has an expected
lifetime of active development like that.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

First | Prev | Next | Last
Pages: 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6