Big OOO, SpMT, and possible designs [Computer Architecture]

Prev: FAKE CONFERENCE Call for papers : HPCS-10, USA, July 2010
Next: No lock for bt instruction ?

From: Robert Myers on 25 May 2010 16:24

On May 25, 1:59 pm, n...(a)cam.ac.uk wrote:
> In article <d8ea4097-64cf-42c4-af0c-8ab0a4d7b...(a)m4g2000vbl.googlegroups.com>,
> Robert Myers <rbmyers...(a)gmail.com> wrote:

> >No one can be accused of being evil or lazy, or, at least, such
> >accusations won't help anyone. We all have different priorities, and
> >there is absolutely no way that meetings of committees can make the
> >world safe for the clueless. If your calculation is critically
> >dependent on the last few bits of precision, it's your problem, not
> >the problem of some language committee.
>
> That is true, and it is the books on how to avoid getting bitten by
> that that I was thinking of. I am sure that there must have been
> some.
>
It will sound like empty piety, but I don't think there is a
substitute for understanding the physics and the mathematics and for
finding as many different ways to probe the results as possible. For
the kinds of problems I'm most familiar with, inventing artificial
test cases and comparing with (for example) perturbation analysis has
been invaluable for things like, does the effect I just added even
act with the right sign? Starting with a problem I know the answer to
and moving away from it slowly is a great way to proceed if there is
the time.

Someone asked fairly recently on a fluid mechanics board about
precision problems when using an implicit method with a large time
step. In that case, I think the precision problems were probably a
blessing to the modeler, who didn't understand that a method that is
stable isn't necessarily accurate. If he'd been working on an
infinite precision computer, he might have blindly gone forward
grinding out nonsense.

How do you write books about that kind of stuff? In that particular
case, the matrix he was trying to invert was almost certainly ill-
conditioned, something you *do* learn about from a textbook. But how
do you know when you need to work harder at solving an ill-conditioned
matrix and when you have an ill-conditioned matrix because something
more fundamental is wrong?

Mostly people want to get as many test cases through as fast as they
can. What committee or textbook can fix that?

Robert.

From: nedbrek on 4 Jun 2010 08:36

Hello all,

"Anton Ertl" <anton(a)mips.complang.tuwien.ac.at> wrote in message
news:2010May17.213934(a)mips.complang.tuwien.ac.at...
> "nedbrek" <nedbrek(a)yahoo.com> writes:
>>2) More cores use more bandwidth.
>
> And a faster core also uses more bandwidth.
>
> However, the question is: If the cores work on a common job, do they
> need more bandwidth than a faster core that gives the same
> performance. Maybe, but not necessarily.
>
>
>>You can think of
>>OOO as a technique to get more performance per memory access.
>
> More than what? In this context it sounds like you are comparing with
> multi-threading, so let me react to that:
>
> I don't think so. Ok, one can let a multi-threaded program be less
> cache-friendly than a single-threaded program, but one can make it
> similarly cache-friendly. And I think once the number of cores is so
> high that many applications become bandwidth-limited (which assumes we
> have solved the problem of making use of many threads), programmers
> will develop techniques for utilizing the given bandwidth better.

It can be helpful to examine things thusly:
Imagine a machine with 1 L1 memory port (8B) @ 1 GHz -> 8 GBps.
The on-die cache filters, say, 90% of accesses (for some workload), so you
need roughly 800 MBps of off-die bandwidth.

But, an in-order machine is not going to fully utilize that port (due to all
the stalls). So, either you are overbuilding your off-die bandwidth (for
some workloads), or you underbuild it for others (out-of-order also smooths
the performance curve).

Scaling the number of cores is mostly orthogonal to this, although it tends
to make things worse. Your on-die cache is going to become less effective
due to the increased working set (different threads work on different
tasks). There is some beneficial sharing, but also cache-to-cache transfers
(which will stall your in-order cores).

I believe the ideal hardware is heterogeneous cores. Certain software
people seem adamantly against this...

Ned

First | Prev |
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: FAKE CONFERENCE Call for papers : HPCS-10, USA, July 2010
Next: No lock for bt instruction ?