Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance [Computer Architecture]

Prev: High-bandwidth computing (hbc) wiki and mailing list
Next: Effects of Memory Latency and Bandwidth onSupercomputer,Application Performance

From: Brett Davis on 25 Jul 2010 00:03

In article <En82o.12$EJ7.2(a)hurricane>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
> On the Effects of Memory Latency and Bandwidth on Supercomputer.
> Application Performance. Richard Murphy. Sandia National Laboratories.

> www.sandia.gov/~rcmurph/doc/latency.pdf

This paper has pretty graphs, but like most academic papers it has a
fatal flaw. They clearly did not scale the fetch ahead to match the
bandwidth, not a single benchmark was improved with greater bandwidth.
Newer PowerPCs are available with greater bandwidth, and the
performance of those chips clearly would not match these graphs.

I know everyone wants reduced latency, but technology favors
increasing latency to get more bandwidth. At least until RRAM gets
embedded on the CPU die, then you will get reduced latency and
even more bandwidth, whether you want it or not. ;)

Of course if your problem does not fit in X gigabytes that are on die,
then you will have to wait for the 4k page to be streamed in.
Can you say really bad latency, yes I know you can. ;)

Brett

From: Robert Myers on 25 Jul 2010 00:31

On Jul 25, 12:03 am, Brett Davis <gg...(a)yahoo.com> wrote:
> In article <En82o.12$EJ7.2(a)hurricane>,
> Robert Myers <rbmyers...(a)gmail.com> wrote:
>
> > On the Effects of Memory Latency and Bandwidth on Supercomputer.
> > Application Performance. Richard Murphy. Sandia National Laboratories.
> >www.sandia.gov/~rcmurph/doc/latency.pdf
>
> This paper has pretty graphs, but like most academic papers it has a
> fatal flaw. They clearly did not scale the fetch ahead to match the
> bandwidth, not a single benchmark was improved with greater bandwidth.
> Newer PowerPCs are available with greater bandwidth, and the
> performance of those chips clearly would not match these graphs.
>
> I know everyone wants reduced latency, but technology favors
> increasing latency to get more bandwidth. At least until RRAM gets
> embedded on the CPU die, then you will get reduced latency and
> even more bandwidth, whether you want it or not. ;)
>
> Of course if your problem does not fit in X gigabytes that are on die,
> then you will have to wait for the 4k page to be streamed in.
> Can you say really bad latency, yes I know you can. ;)

I was asked if I had ever looked at studies of the effects of
bandwidth by others, and the answer is that I have.

Since I don't get to sit in on the deliberations of the national labs
as to how they will turn money into carbon emissions, I consider
worthwhile any kind of evidence as to their thinking, no matter how
superficial.

I agree. The plots are splendid. They always are.

Robert.

From: Brett Davis on 25 Jul 2010 01:27

In article
<57004bc1-201f-4cd1-8584-e59c01eb0e0e(a)e5g2000yqn.googlegroups.com>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:

> On Jul 25, 12:03�am, Brett Davis <gg...(a)yahoo.com> wrote:
> > In article <En82o.12$EJ7.2(a)hurricane>,
> > �Robert Myers <rbmyers...(a)gmail.com> wrote:
> >
> > > On the Effects of Memory Latency and Bandwidth on Supercomputer.
> > > Application Performance. Richard Murphy. Sandia National Laboratories.
> > >www.sandia.gov/~rcmurph/doc/latency.pdf
> >
> > This paper has pretty graphs, but like most academic papers it has a
> > fatal flaw. They clearly did not scale the fetch ahead to match the
> > bandwidth, not a single benchmark was improved with greater bandwidth.
> > Newer PowerPCs are available with greater bandwidth, and the
> > performance of those chips clearly would not match these graphs.
>
> I was asked if I had ever looked at studies of the effects of
> bandwidth by others, and the answer is that I have.
>
> I agree. The plots are splendid. They always are.

Stanford online has a truly awesome related iTunes talk by Bill Dally:
http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.edu.3692287061.03692287064.3946505769?i=2123730674
From:
http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.edu.3692287061.03692287064
iTunes store search term "The future of throughput".
Track 13 for the course "Programming Massively Parallel Processors with CUDA (CS193G)".

A related PDF:
http://cva.stanford.edu/people/dally/ISSCC2005.pdf

Brett

From: Robert Myers on 25 Jul 2010 23:12

On Jul 25, 1:27 am, Brett Davis <gg...(a)yahoo.com> wrote:
> In article
> <57004bc1-201f-4cd1-8584-e59c01eb0...(a)e5g2000yqn.googlegroups.com>,
> Robert Myers <rbmyers...(a)gmail.com> wrote:
>
>
>
>
>
> > On Jul 25, 12:03 am, Brett Davis <gg...(a)yahoo.com> wrote:
> > > In article <En82o.12$EJ7.2(a)hurricane>,
> > > Robert Myers <rbmyers...(a)gmail.com> wrote:
>
> > > > On the Effects of Memory Latency and Bandwidth on Supercomputer.
> > > > Application Performance. Richard Murphy. Sandia National Laboratories.
> > > >www.sandia.gov/~rcmurph/doc/latency.pdf
>
> > > This paper has pretty graphs, but like most academic papers it has a
> > > fatal flaw. They clearly did not scale the fetch ahead to match the
> > > bandwidth, not a single benchmark was improved with greater bandwidth..
> > > Newer PowerPCs are available with greater bandwidth, and the
> > > performance of those chips clearly would not match these graphs.
>
> > I was asked if I had ever looked at studies of the effects of
> > bandwidth by others, and the answer is that I have.
>
> > I agree. The plots are splendid. They always are.
>
> Stanford online has a truly awesome related iTunes talk by Bill Dally:http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.e...
> From:http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.e...
> iTunes store search term "The future of throughput".
> Track 13 for the course "Programming Massively Parallel Processors with CUDA (CS193G)".
>

I got stopped on Bill Dally's first slide, where he says
"efficiency=locality."

He claims that his slide tells you everything you need to know about
the future of computing, AND THAT BELIEF AND PROSELYTIZING FOR THAT
BELIEF IS WHY I HAVE USED UP SO MUCH SPACE HERE.

THE MOST INTERESTING PROBLEMS ARE NOT LOCAL, BECAUSE THE MOST
INTERESTING PROBLEMS ARE NOT LINEAR.

A problem is embarrassingly parallel IFF it is local IFF it is linear.

If a problem is linear, then there is a representation in which it is
both local and embarrassingly parallel. If a problem is not linear,
then there is, in general, no such representation. Does one need a
PhD from Cal Tech to understand this?

I suppose that it is inevitable that EE's should believe that
interesting things can happen in a linear universe, but, from a
mathematical point of view, linear systems are as interesting as the
aging of paintings hanging in the Louvre.

Robert.

> A related PDF:http://cva.stanford.edu/people/dally/ISSCC2005.pdf
>
> Brett

From: Andy Glew "newsgroup at on 26 Jul 2010 01:33

On 7/25/2010 8:12 PM, Robert Myers wrote:

> I got stopped on Bill Dally's first slide, where he says
> "efficiency=locality."
>
> He claims that his slide tells you everything you need to know about
> the future of computing, AND THAT BELIEF AND PROSELYTIZING FOR THAT
> BELIEF IS WHY I HAVE USED UP SO MUCH SPACE HERE.
>
> THE MOST INTERESTING PROBLEMS ARE NOT LOCAL, BECAUSE THE MOST
> INTERESTING PROBLEMS ARE NOT LINEAR.
>
> A problem is embarrassingly parallel IFF it is local IFF it is linear.
>
> If a problem is linear, then there is a representation in which it is
> both local and embarrassingly parallel. If a problem is not linear,
> then there is, in general, no such representation. Does one need a
> PhD from Cal Tech to understand this?

(1) Is there a proof for this? Not that there are non-linear systems
that are not embarassingly parallel, but that there are no interesting
non-linear systems that are not amenable to parallel solutions.

E.g. the N-body problem, gravitational dynamics?

(2) If what you and Dally say is true, Robert, then you may be tilting
at windmills. There may be no computationally efficient way of doing
what you want.

I don't believe this, because I do not equate computationally efficient
to embarassingly parallel.

Also: even though locality really matters, nevertheless we have used
that as an excuse to pessimize non-local activities. At the very least
we can influence the constant factor by removing these pessimizations.
As I have attempted to do with my proposal for a scatter/gather based
memory subsystem.

| Next | Last
Pages: 1 2 3 4 5 6 7
Prev: High-bandwidth computing (hbc) wiki and mailing list
Next: Effects of Memory Latency and Bandwidth onSupercomputer,Application Performance