Why does Intel favor thin rectangular CPUs? [Computer Architecture]

Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?

From: Robert Myers on 8 Mar 2010 20:50

On Mar 8, 3:16 pm, Larry <lstewa...(a)gmail.com> wrote:

>
> Actually I think R. Myers' facts are wrong here. I downloaded the
> HPCC Challenge Results data from the UTK website, and added
> a column comparing computing the ration of global FFT performance
> versus global HPL performance. Then I filtered away all systems not
> achieving at least 100 GFlops FFT performance.
>
> For systems achieving over a teraflop of FFT performance, BG/L with
> 32K cores is beaten <only> by the SX-9 in this figure of
> merit. For systems achieving over 100 GF, it is beaten by the SX-9,
> by the best Nahelem/Infiniband systems, by the Cray XT3, and
> by the largest SiCortex machine.
>
> BG/L is hard to program, and weird in many ways, buy inability to do
> FFT is a bum rap.
>
> Incidently, only the SX9 gets over 10% of HPL this way, the rest of
> the pack is in the 3-6% area. Global FFT is hard.
>

**sigh**

It's not a matter that global FFT is hard. It is that these machines,
that claim to be scalable, don't actually scale for the FFT because of
severely crippled global bandwidth. That machines the DoE get lousy
actual/peak performance is something people have been complaining
about for years. As far as the FFT is concerned, we are making
essentially no progress. It is the result of budgetary and
architectural choices, and it is not an unsolvable problem.

> There is a very large market for FFT cycles, particularly in doing 2D
> and 3D ffts for seismic processing. The data sets for
> these calculations get so large, that there is no way to do
> substantial numbers of them in parallel, because the machines
> do not have enough memory to hold, say a terabyte of data per run,.
> If you try to do it any other
> way, you've succeeded in turning a cluster communications problem into
> a (worse) I/O problem.
>
If you're going to criticize me, at least read what I write. Your
example is completely irrelevant to the problem that concerns me, as
is your conclusion about I/O.

Robert.

From: Robert Myers on 8 Mar 2010 23:52

On Mar 7, 10:48 pm, Del Cecchi <delcec...(a)gmail.com> wrote:

> If there was some miracle architecture that would be much more effective
> than BG for the questions of interest that doesn't require unobtanium to
> fabricate, I think someone would have at least proposed it.
>
> Coherent allcache? KSR1?
>
> No cache multithreaded? Terra MTA.
>
> Hottest microprocessor around in moderate numbers using state of art
> technology? Power7 boxes.
>
> AMD processors in a cluster with custom interconnect? (what did cray
> call it? I am drawing a blank)
>
I suppose this is an insulting response, but here goes.

You go to the DriveUp window at MacDonald's. There is a menu. You
pick. How smart do you have to be to do that?

Design a meal I've never thought of, that will be irreistibly
appealing and inexpensive. How many yahoo's can do that? Who works
at IBM and at the national labs?

Yes, I intend to be provocative. I have to pay [name deleted] so he
can drop names and be a menu-chooser? Get out of town, all of you.

This is comp.arch, not the hotline for EE's who think the world owes
them.

Robert.

From: Terje Mathisen "terje.mathisen at on 9 Mar 2010 02:46

Robert Myers wrote:
> If you're going to criticize me, at least read what I write. I am not
> talking about problems where I/O would be the limiting factor. I'd be
> surprised if seismic processing were done on anything but rack
> clusters. Completely different world.

Be surprised:

Seismic processing (at least in our installation) is split into two
parts, balanced to take approximately the same amount of wall clock
time, and so that the total cost of a full run is optimized.

The first part run on an SMP Itanium box with lots of memory and very
good bandwidth, it is used to do the initial gridding of the problem
space, while the second half uses a pretty standard Linux cluster with
2K dual-core cpus.

Afaik the smaller SMP machine did account for a significant part of the
setup.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Robert Myers on 9 Mar 2010 08:39

On Mar 9, 2:46 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
> Robert Myers wrote:
> > If you're going to criticize me, at least read what I write. I am not
> > talking about problems where I/O would be the limiting factor. I'd be
> > surprised if seismic processing were done on anything but rack
> > clusters. Completely different world.
>
> Be surprised:
>
> Seismic processing (at least in our installation) is split into two
> parts, balanced to take approximately the same amount of wall clock
> time, and so that the total cost of a full run is optimized.
>
> The first part run on an SMP Itanium box with lots of memory and very
> good bandwidth, it is used to do the initial gridding of the problem
> space, while the second half uses a pretty standard Linux cluster with
> 2K dual-core cpus.
>
> Afaik the smaller SMP machine did account for a significant part of the
> setup.
>
I don't know enough about seismic processing as it is currently done
to understand the logic of the setup you describe. My guess about
seismic processing was based on comments by others here who are in the
business and on the assumption that speed is essential, so you don't
have much time to fiddle around with computer architecture: best just
to cobble together the latest and greatest and get moving so you can
find the good prospects first.

If you want to talk about how to sell lots of chips and boxes, you
have one problem and one set of considerations and constraints.
That's the position that IBM and Intel would mostly like to be in.
They don't care all that much whether you are doing seismic processing
or computer animation. They want the revenue, as do the people who
are buying their hardware.

National security spending in the U.S., and especially spending by
NASA, the NSF, and the bomb labs *claims* to have other goals. Every
press release about a new machine claims to be advancing the state of
the art. In most cases, those claims are little different from
marketspeak coming from IBM or Intel. They have done whatever they
could with the money they had, and their goal is to advertise having
one of the baddest, fastest machines on earth.

A machine that achieves 5% efficiency doing a bog-standard problem has
little right to claim to be super in any respect. That's my mantra.

Those who are in commerce will continue to try to maximize standard
measures of business performance. It is not their job to advance
basic science. Various departments of the Federal government
repeatedly claim that they are advancing basic science, but, so far as
computational science is concerned, those claims are largely
fraudulent.

Now, to be sure, IBM and Intel would like to be able to gild the lily
by emphasizing their position on the Top 500 list. If they could be
key to a fundamental scientific breakthrough, they'd love to take
credit. But as far as actually investing in making computers more
powerful and useful for fundamental science, no one (except perhaps
DARPA) is in the business.

Robert.

From: Del Cecchi on 9 Mar 2010 22:13

Robert Myers wrote:
> On Mar 7, 10:48 pm, Del Cecchi <delcec...(a)gmail.com> wrote:
>
>> If there was some miracle architecture that would be much more effective
>> than BG for the questions of interest that doesn't require unobtanium to
>> fabricate, I think someone would have at least proposed it.
>>
>> Coherent allcache? KSR1?
>>
>> No cache multithreaded? Terra MTA.
>>
>> Hottest microprocessor around in moderate numbers using state of art
>> technology? Power7 boxes.
>>
>> AMD processors in a cluster with custom interconnect? (what did cray
>> call it? I am drawing a blank)
>>
> I suppose this is an insulting response, but here goes.
>
> You go to the DriveUp window at MacDonald's. There is a menu. You
> pick. How smart do you have to be to do that?
>
> Design a meal I've never thought of, that will be irreistibly
> appealing and inexpensive. How many yahoo's can do that? Who works
> at IBM and at the national labs?
>
> Yes, I intend to be provocative. I have to pay [name deleted] so he
> can drop names and be a menu-chooser? Get out of town, all of you.
>
> This is comp.arch, not the hotline for EE's who think the world owes
> them.
>
> Robert.
>

I am an EE. I don't think the world owes me. I have been disabused of
that notion long ago. If you want a meal that hasn't been done before
at least for very many folks there are restaurants and chefs that will
do the job. El Bulli for example. You just have to pay the money.

Now, given the constraints of the technology that exists today, what
should the folks with the money and capability build? It won't be
inexpensive since it is new and the time of skilled folks must be paid for.

First | Prev | Next | Last
Pages: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?