Interesting presentation [Computer Architecture]

Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?

From: Robert Myers on 30 Mar 2010 16:07

On Mar 29, 6:24 am, Terje Mathisen <terje.mathi...(a)tmsw.no> wrote:

>
> The seismic paper shows how they started with a straight-forward port,
> and got pretty much no speedup at all, then they went on to do more and
> more platform-specific optimizations, ending up with something which was
> 40X (afair) faster, but of course totally non-portable.
>
> I.e. the only real key to the speedups was to grok the mapping of the
> problem onto the available hardware.
>
And you can't do that if you leave optimization in the hands of the
applications programmer, who throws away information that could be
used for optimization and cuts off paths to optimization at lower
levels because of over-specificity. That's the real key to
programming in abstract, domain-specific languages: they maximally
preserve abstract, non-implementation-dependent, information, avoid
premature optimization, and don't close off possible optimization
paths at lower level.

Robert.

From: Robert Myers on 30 Mar 2010 16:21

On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
> On the Research channel, which I receive through Dish Network, they show
> some of the computer science colloquiums at the University of
> Washington. I recently watched a lecture by professor Pat Hanrahan of
> Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It
> ties together some of the topics that have occurred in different recent
> threads in this group, including the highly parallel SIMT stuff and the
> need for appropriate domain specific languages to get the most out of
> the environment
>
> You can watch the presentation at
>
> http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345
>
> There were several things that I thought were interesting and perhaps
> even promising.
>
> First is that the Folding(a)Home client has been rewritten to use a
> graphics card with a great speedup. The thing I thought that was
> significant about this is that protein folding is a more traditional HPC
> application than the more graphics oriented things like Photoshop
> effects that seem to be dominating the GPGPU scene.

Del asked me rhetorically in another thread how you could reach a
petaflop without delivering semi-trailer loads of money to national
laboratories or to one of my alma maters (UIUC)--and from those
stopovers mostly to IBM?

Answer: @home networks are generally good on the same problems that
Blue Gene and its many cousins do well.

Folding(a)home is already doing it on PS3's alone, and on exactly the
same problem that Blue Gene was ostensibly designed for.

Downsides: No photo to put into bulletin to alumni, no huge salary
payout for management structure mostly doing nothing more exotic than
is done at, say, an insurance company.

Conclusion: If your goal is new computational frontiers, you won't be
building or buying machines that could be done just as well on a
distributed platform. If you're a bureaucrat (or a senator), there is
no such thing as an installation that is too big, too grand, or too
photogenic, so can the PS3's.

Robert.

From: Del Cecchi` on 30 Mar 2010 21:21

Robert Myers wrote:
> On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
>
>>On the Research channel, which I receive through Dish Network, they show
>>some of the computer science colloquiums at the University of
>>Washington. I recently watched a lecture by professor Pat Hanrahan of
>>Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It
>>ties together some of the topics that have occurred in different recent
>>threads in this group, including the highly parallel SIMT stuff and the
>>need for appropriate domain specific languages to get the most out of
>>the environment
>>
>>You can watch the presentation at
>>
>>http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345
>>
>>There were several things that I thought were interesting and perhaps
>>even promising.
>>
>>First is that the Folding(a)Home client has been rewritten to use a
>>graphics card with a great speedup. The thing I thought that was
>>significant about this is that protein folding is a more traditional HPC
>>application than the more graphics oriented things like Photoshop
>>effects that seem to be dominating the GPGPU scene.
>
>
> Del asked me rhetorically in another thread how you could reach a
> petaflop without delivering semi-trailer loads of money to national
> laboratories or to one of my alma maters (UIUC)--and from those
> stopovers mostly to IBM?
>
> Answer: @home networks are generally good on the same problems that
> Blue Gene and its many cousins do well.
>
> Folding(a)home is already doing it on PS3's alone, and on exactly the
> same problem that Blue Gene was ostensibly designed for.
>
> Downsides: No photo to put into bulletin to alumni, no huge salary
> payout for management structure mostly doing nothing more exotic than
> is done at, say, an insurance company.
>
> Conclusion: If your goal is new computational frontiers, you won't be
> building or buying machines that could be done just as well on a
> distributed platform. If you're a bureaucrat (or a senator), there is
> no such thing as an installation that is too big, too grand, or too
> photogenic, so can the PS3's.
>
> Robert.

Uh, aren't you the one decrying the lack of bandwidth in the network
interconnecting the nodes in the expensive supercomputers? And now you
are praising a configuration which has essentially zero bandwidth
connecting the nodes?

Which is it? Embarrassingly parallel or heavy duty bandwidth?

From: Robert Myers on 30 Mar 2010 21:48

On Mar 30, 9:21 pm, Del Cecchi` <delcec...(a)gmail.com> wrote:
> Robert Myers wrote:
> > On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
>
> >>On the Research channel, which I receive through Dish Network, they show
> >>some of the computer science colloquiums at the University of
> >>Washington. I recently watched a lecture by professor Pat Hanrahan of
> >>Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It
> >>ties together some of the topics that have occurred in different recent
> >>threads in this group, including the highly parallel SIMT stuff and the
> >>need for appropriate domain specific languages to get the most out of
> >>the environment
>
> >>You can watch the presentation at
>
> >>http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345
>
> >>There were several things that I thought were interesting and perhaps
> >>even promising.
>
> >>First is that the Folding(a)Home client has been rewritten to use a
> >>graphics card with a great speedup. The thing I thought that was
> >>significant about this is that protein folding is a more traditional HPC
> >>application than the more graphics oriented things like Photoshop
> >>effects that seem to be dominating the GPGPU scene.
>
> > Del asked me rhetorically in another thread how you could reach a
> > petaflop without delivering semi-trailer loads of money to national
> > laboratories or to one of my alma maters (UIUC)--and from those
> > stopovers mostly to IBM?
>
> > Answer: @home networks are generally good on the same problems that
> > Blue Gene and its many cousins do well.
>
> > Folding(a)home is already doing it on PS3's alone, and on exactly the
> > same problem that Blue Gene was ostensibly designed for.
>
> > Downsides: No photo to put into bulletin to alumni, no huge salary
> > payout for management structure mostly doing nothing more exotic than
> > is done at, say, an insurance company.
>
> > Conclusion: If your goal is new computational frontiers, you won't be
> > building or buying machines that could be done just as well on a
> > distributed platform. If you're a bureaucrat (or a senator), there is
> > no such thing as an installation that is too big, too grand, or too
> > photogenic, so can the PS3's.
>
>
> Uh, aren't you the one decrying the lack of bandwidth in the network
> interconnecting the nodes in the expensive supercomputers? And now you
> are praising a configuration which has essentially zero bandwidth
> connecting the nodes?
>
> Which is it? Embarrassingly parallel or heavy duty bandwidth?

I must not have made my point clearly enough.

My point is that the current batch of "supercomputers" can perform
efficiently only on those jobs that can be made nearly embarrassingly
parallel so that they require small global bandwidth. I'm suspicious
of whatever is being done to localize the protein folding problem, but
the code is out there and the burden is on me to look at it. In any
case, you manifestly don't need a Blue Gene to do petaflop
calculations for the protein-folding problem.

If these are the kinds of computers we're going to restrict ourselves
to building, then, I ask, why build them at all? Maybe we should just
more aggressively marshal the computational resources we already have
and that we don't have to pay another penny for.

The problems that *can't* be done on a distributed platform also, for
the most part, also can't be done efficiently on our current crop of
"supercomputers."

Now, if you're LLNL, you'll argue that it's all Q clearance stuff and
has to be behind the barbed wire. So be it, but it's a consideration
that has nothing to do with science. So far as science is concerned,
they might just as well be spending money on fuel-air munitions or
perhaps better armored land vehicles. The implication that they are
advancing *anything* beyond their own institutional needs is pure
fraud.

If you don't have a classification excuse, then, if you're going to
put money into centralized, big computers, those computers should have
heavy-duty global bandwidth so that they can do the problems that
*can't* be done on a distributed platform.

If I still haven't made myself clear, I'll be glad to try again:
localized, nearly embarrassingly parallel problems on distributed
platforms. Big, centralized computers for those problems that need
good global communication.

Robert.

From: Del Cecchi on 31 Mar 2010 13:04

Robert Myers wrote:
> On Mar 30, 9:21 pm, Del Cecchi` <delcec...(a)gmail.com> wrote:
>> Robert Myers wrote:
>>> On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
>>>> On the Research channel, which I receive through Dish Network, they show
>>>> some of the computer science colloquiums at the University of
>>>> Washington. I recently watched a lecture by professor Pat Hanrahan of
>>>> Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It
>>>> ties together some of the topics that have occurred in different recent
>>>> threads in this group, including the highly parallel SIMT stuff and the
>>>> need for appropriate domain specific languages to get the most out of
>>>> the environment
>>>> You can watch the presentation at
>>>> http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345
>>>> There were several things that I thought were interesting and perhaps
>>>> even promising.
>>>> First is that the Folding(a)Home client has been rewritten to use a
>>>> graphics card with a great speedup. The thing I thought that was
>>>> significant about this is that protein folding is a more traditional HPC
>>>> application than the more graphics oriented things like Photoshop
>>>> effects that seem to be dominating the GPGPU scene.
>>> Del asked me rhetorically in another thread how you could reach a
>>> petaflop without delivering semi-trailer loads of money to national
>>> laboratories or to one of my alma maters (UIUC)--and from those
>>> stopovers mostly to IBM?
>>> Answer: @home networks are generally good on the same problems that
>>> Blue Gene and its many cousins do well.
>>> Folding(a)home is already doing it on PS3's alone, and on exactly the
>>> same problem that Blue Gene was ostensibly designed for.
>>> Downsides: No photo to put into bulletin to alumni, no huge salary
>>> payout for management structure mostly doing nothing more exotic than
>>> is done at, say, an insurance company.
>>> Conclusion: If your goal is new computational frontiers, you won't be
>>> building or buying machines that could be done just as well on a
>>> distributed platform. If you're a bureaucrat (or a senator), there is
>>> no such thing as an installation that is too big, too grand, or too
>>> photogenic, so can the PS3's.
>>
>> Uh, aren't you the one decrying the lack of bandwidth in the network
>> interconnecting the nodes in the expensive supercomputers? And now you
>> are praising a configuration which has essentially zero bandwidth
>> connecting the nodes?
>>
>> Which is it? Embarrassingly parallel or heavy duty bandwidth?
>
> I must not have made my point clearly enough.
>
> My point is that the current batch of "supercomputers" can perform
> efficiently only on those jobs that can be made nearly embarrassingly
> parallel so that they require small global bandwidth. I'm suspicious
> of whatever is being done to localize the protein folding problem, but
> the code is out there and the burden is on me to look at it. In any
> case, you manifestly don't need a Blue Gene to do petaflop
> calculations for the protein-folding problem.
>
> If these are the kinds of computers we're going to restrict ourselves
> to building, then, I ask, why build them at all? Maybe we should just
> more aggressively marshal the computational resources we already have
> and that we don't have to pay another penny for.
>
> The problems that *can't* be done on a distributed platform also, for
> the most part, also can't be done efficiently on our current crop of
> "supercomputers."
>
> Now, if you're LLNL, you'll argue that it's all Q clearance stuff and
> has to be behind the barbed wire. So be it, but it's a consideration
> that has nothing to do with science. So far as science is concerned,
> they might just as well be spending money on fuel-air munitions or
> perhaps better armored land vehicles. The implication that they are
> advancing *anything* beyond their own institutional needs is pure
> fraud.
>
> If you don't have a classification excuse, then, if you're going to
> put money into centralized, big computers, those computers should have
> heavy-duty global bandwidth so that they can do the problems that
> *can't* be done on a distributed platform.
>
> If I still haven't made myself clear, I'll be glad to try again:
> localized, nearly embarrassingly parallel problems on distributed
> platforms. Big, centralized computers for those problems that need
> good global communication.
>
> Robert.

There certainly seems to be a spectrum of applications that can use a
spectrum of bandwidth, and it would be unnecessary to spend a lot of
money on interconnect and packaging if the problem can be solved with
PCs connected together with string. Whether all protein folding and
genomics software falls into this category is beyond me, but some seems
to.

There are other problems which require both lots of flops and lots of
bandwidth. Flops are cheap. Bandwidth, especially bandwidth in a
network with many nodes, is expensive.

The various generations of Blue Gene seem to have increased bandwidth,
although I'm not sure whether it has kept up with the flops. More
bandwidth can be had especially if the nodes are quite fat, like blue
waters. BTW thanks for the pointer to that, it was interesting.

I tend to think the technical folks are doing the best they can, whilst
you apparently think there is some kind of dominant hidden agenda in the
form of sort a farm program for supercomputers where the government
spends a lot of money on useless boxes good only for publicity and IBM's
bottom line.

Custom hardware is pretty expensive. So it seems that most supers are
build out of pieces parts or technologies that were developed for
something else. For example, I think the PowerPC cores in BG/L were
405's left over from the embedded business. The interconnect physical
layer was also taken from some other project, but I forget which.
Likewise the packaging power and cooling were derivitives of earlier
stuff with some new ideas.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?