High-bandwidth computing interest group [Computer Architecture]

Prev: Last Call for Papers Reminder (extended): World Congress on Engineering and Computer Science WCECS 2010
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))

From: Robert Myers on 18 Jul 2010 18:02

I have lamented, at length, the proliferation of flops at the expense of
bytes-per-flop in what are now currently styled as supercomputers.

This subject came up recently on the Fedora User's Mailing List when
someone claimed that GPU's are just what the doctor ordered to make
high-end computation pervasively available. Even I have fallen into
that trap, in this forum, and I was quickly corrected. In the most
general circumstance, GPU's seem practically to have been invented to
expose bandwidth starvation.

At least one person on the Fedora list got it and says that he has
encountered similar issues in his own work (what is in short supply is
not flops, but bytes per flop). He also seems to understand that the
problem is fundamental and cannot be made to go away with an endless
proliferation of press releases, photographs of "supercomputers," and an
endless procession of often meaningless color plots.

Since the issue is only tangentially related to the list, he suggested a
private mailing list to pursue the issue further without annoying others
with a topic that most are manifestly not interested in.

The subject is really a mix of micro and macro computer architecture,
the physical limitations of hardware, the realities of what is ever
likely to be funded, and the grubby details of computational mathematics.

Since I have talked most about the subject here and gotten the most
valuable feedback here, I thought to solicit advice as to what kind of
forum would seem most plausible/attractive to pursue such a subject. I
could probably host a mailing list myself, but would that be the way to
go about it and would anyone else be interested?

Email me privately if you don't care to respond publicly.

Thanks.

Robert.

From: Edward Feustel on 19 Jul 2010 05:54

On Sun, 18 Jul 2010 18:02:49 -0400, Robert Myers
<rbmyersusa(a)gmail.com> wrote:

>I have lamented, at length, the proliferation of flops at the expense of
>bytes-per-flop in what are now currently styled as supercomputers.
>
---
>Since I have talked most about the subject here and gotten the most
>valuable feedback here, I thought to solicit advice as to what kind of
>forum would seem most plausible/attractive to pursue such a subject. I
>could probably host a mailing list myself, but would that be the way to
>go about it and would anyone else be interested?
>
>Email me privately if you don't care to respond publicly.
>
>Thanks.
>
>Robert.
This is an important subject. I would suggest that everything be
archived in a searchable environment. Keywords and tightly focused
discussion would be helpful (if possible). Please let me know if you
decide to do a wiki or e-mail list.

Ed Feustel
Dartmouth College

From: jacko on 19 Jul 2010 11:02

I might click a look see.

From: MitchAlsup on 19 Jul 2010 11:36

It seems to me that having less than 8 bytes of memory bandwidth per
flop leads to an endless series of cache excersizes.**

It also seems to me that nobody is going to be able to put the
required 100 GB/s/processor pin interface on the part.*

Nor does it seam, it would have the latency needed to strip mine main
memory continuously were the required BW made available.

Thus, we are in essence screwed.

* current bandwidths
a) 3 GHz processors with 2 FP pipes running 128-bit double DP flops
(ala SSE) This gives 12 GFlop/processor
b) 12 GFlop/processor demands 100 GByte/processor
c) DDR3 can achieve 17 GBytes/channel
d) high end PC processors can afford 2 memory channels
e) therefore we are screwed:
e.1)The memory system can supply only 1/3rd of what a single processor
wants
e.2)There are 4 and growing numbers of processors
e.3) therefore the memory systen can support less than 1/12 as much BW
as required.

Mitch

** The Ideal memBW/Flop is 3 memory operations per flop, and back in
the Cray-1 to XMP transition much of the vectorization gain occurred
from the added memBW and the better chaining.

From: Thomas Womack on 19 Jul 2010 11:54

In article <QSK0o.10246$Zp1.7167(a)newsfe15.iad>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>I have lamented, at length, the proliferation of flops at the expense of
>bytes-per-flop in what are now currently styled as supercomputers.
>
>This subject came up recently on the Fedora User's Mailing List when
>someone claimed that GPU's are just what the doctor ordered to make
>high-end computation pervasively available. Even I have fallen into
>that trap, in this forum, and I was quickly corrected. In the most
>general circumstance, GPU's seem practically to have been invented to
>expose bandwidth starvation.

Yes, they've got a very low peak bandwidth:peak flops ratio; but the
peak bandwidth is reasonably high in absolute terms - the geforce 480
peak bandwidth is about that of a Cray T916.

(the chip has about 2000 balls on the bottom, 384 of which are memory
I/O running at 4GHz)

I don't think it makes sense to complain about low bw:flops ratios;
you could always make the ratio higher by removing ALUs, getting you a
machine which is less capable at the many jobs that can be made to
need flops but not bytes.

Tom

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Last Call for Papers Reminder (extended): World Congress on Engineering and Computer Science WCECS 2010
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))