Why does Intel favor thin rectangular CPUs? [Computer Architecture]

Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?

From: "Andy "Krazy" Glew" on 10 Mar 2010 01:59

Robert Myers wrote:

> A machine that achieves 5% efficiency doing a bog-standard problem has
> little right to claim to be super in any respect. That's my mantra.

I've been lurking and listening and learning, but, hold on: this is silly.

Utilization has nothing to do with "super-ness". Cost effectiveness is what matters.

When you say 5% efficient, I assume that you are talking about as a fraction of peak flops.

But if the flops are cheap compared to the bandwidth, then it may very well make sense to add lots of flops. You might
want to add more bandwidth, but if you can add a 1% utilized flop and eke a little bit more out of the expensive
interconnect...

What you really need to do is show that there are better architectures, that deliver the bandwidth you want, as well as
getting better utilization.

OK, now I'll step back and listen some more.

From: "Andy "Krazy" Glew" on 10 Mar 2010 02:00

nedbrek wrote:
> Hello all,
>
> "Del Cecchi" <delcecchinospamofthenorth(a)gmail.com> wrote in message
> news:7viusjFjaqU1(a)mid.individual.net...
>> Andy "Krazy" Glew wrote:
>>> If you are a computer architect, it's Intel in Oregon, Silicon Valley,
>>> Austin. Where else?
>> Perhaps IBM in Rochester MN or maybe even Mayo Clinic, Rochester. The
>> clinic does a lot of special stuff for medical equipment and Dr Barry
>> Gilbert had a group that did high speed stuff for Darpa.
>
> I sometimes fantasize about going to Intel Israel.

I think that no longer having the prospect of visiting Haifa is the only thing I really miss about Intel.

From: Terje Mathisen "terje.mathisen at on 10 Mar 2010 02:16

Andy "Krazy" Glew wrote:
> Robert Myers wrote:
>
>> A machine that achieves 5% efficiency doing a bog-standard problem has
>> little right to claim to be super in any respect. That's my mantra.
>
> I've been lurking and listening and learning, but, hold on: this is silly.
>
> Utilization has nothing to do with "super-ness". Cost effectiveness is
> what matters.

Exactly, ref. my post about splitting seismic processing into two parts,
on two different architectures.
>
> When you say 5% efficient, I assume that you are talking about as a
> fraction of peak flops.

Probably.
>
> But if the flops are cheap compared to the bandwidth, then it may very
> well make sense to add lots of flops. You might want to add more
> bandwidth, but if you can add a 1% utilized flop and eke a little bit
> more out of the expensive interconnect...

This is crucial: Intel (Larrabee) and others have shown that you can put
an awful lot of flops on a single die, while still staying within a
reasonable power budget.

If having a teraflop available in a $200 chip makes it possible to get
10% better use out of the $2000 (per board) interconnect, then that's a
good bargain.
>
> What you really need to do is show that there are better architectures,
> that deliver the bandwidth you want, as well as getting better utilization.
>
> OK, now I'll step back and listen some more.

As will I.
:-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Robert Myers on 10 Mar 2010 11:51

On Mar 10, 2:16 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> Andy "Krazy" Glew wrote:
> > Robert Myers wrote:
>
> >> A machine that achieves 5% efficiency doing a bog-standard problem has
> >> little right to claim to be super in any respect. That's my mantra.
>
> > I've been lurking and listening and learning, but, hold on: this is silly.
>
> > Utilization has nothing to do with "super-ness". Cost effectiveness is
> > what matters.
>
> Exactly, ref. my post about splitting seismic processing into two parts,
> on two different architectures.
>
A commercial customer can use cost-effectiveness as the sole
criterion, but my argument, which applies specifically to the national
security establishment of the US, is that the cost-effectiveness
argument is leading us down a blind alley so far as basic science is
concerned. Historically, basic science is funded by sovereigns or
others with no immediate commercial goal.

I focus on the FFT because there I am certain of my footing. In a
broader sense that I can't defend with such specificity, if all your
computer designs force localized computation or favor it heavily, you
will skew the kind of science that you can and will do.

>
>
> > When you say 5% efficient, I assume that you are talking about as a
> > fraction of peak flops.
>
> Probably.
>
Some people use fraction of peak flops, but, to make things look less
bad, fraction of linpack flops is becoming more popular.

>
>
> > But if the flops are cheap compared to the bandwidth, then it may very
> > well make sense to add lots of flops. You might want to add more
> > bandwidth, but if you can add a 1% utilized flop and eke a little bit
> > more out of the expensive interconnect...
>
> This is crucial: Intel (Larrabee) and others have shown that you can put
> an awful lot of flops on a single die, while still staying within a
> reasonable power budget.
>
> If having a teraflop available in a $200 chip makes it possible to get
> 10% better use out of the $2000 (per board) interconnect, then that's a
> good bargain.
>
Maybe it is and maybe it isn't. If you've skewed the design of the
entire architecture (which includes not just the nodes and the
interconnect, but power budgets and supporting infrastructure,
cooling, floor space, etc.) to favor beating the living daylights out
of small local datasets, the burden is on you to show how free it
really is. My argument is that the scalability of current machines is
a PT Barnum hoax with very little scientific payoff.

Those huge buildings and staff are expensive. Keeping a gigantic
cluster running is itself hugely expensive, even with gigabit ethernet
as an interconnect. I'm not saying that we should stop building such
machines altogether, but I am (and have been for a long time) waving a
very big red flag. People may intuitively think that if you just pile
up enough flops in one place, you can do any kind of math you want,
but that intuition is dangerous.

When I start down this road, I will inevitably start to sputter
because I feel so outnumbered. No such thing as a free lunch. If the
interesting physics are in 512 nodes, then that's all you should
bother with. If you think you can do very large nonlinear problems
and get the physics right with localized computation, my position is
that you are fooling only yourself.

>
>
> > What you really need to do is show that there are better architectures,
> > that deliver the bandwidth you want, as well as getting better utilization.
>
> > OK, now I'll step back and listen some more.
>
> As will I.
> :-)

There's a chunk of this history I'm missing because I was not paying
any attention at all at the time. People *appear* to have given up
with the KSR failure, and one of the problems there (aside from
accounting "irregularities") was with scalability being limited by the
interconnect that was supposed to allow the machine to scale. If the
limit on interconnects is fundamental, I'd sure like to understand
why.

Robert.

From: Robert Myers on 10 Mar 2010 12:02

On Mar 9, 10:13 pm, Del Cecchi <delcec...(a)gmail.com> wrote:

>
> I am an EE. I don't think the world owes me. I have been disabused of
> that notion long ago. If you want a meal that hasn't been done before
> at least for very many folks there are restaurants and chefs that will
> do the job. El Bulli for example. You just have to pay the money.
>
You're not a EE who feels entitled, and I'm not someone whose
experience is so limited as never to have seen just how many smart
people there are in the world, how hard they work, or what *really*
smart really amounts to.

My read on computer architecture is that is is obsessed with latency
and has been for a long time. Latency across the network bridge is
what killed KSR scalability. You can and we have argued over just how
effectively the memory latency problem has been addressed. My
complaint about Blue Gene is that the latency for such a large cluster
is amazing, but the global bandwidth is pathetic. You've already
hinted that the limitation may be fundamental, at least for that
design: the boxes were already stuffed full of wires.

> Now, given the constraints of the technology that exists today, what
> should the folks with the money and capability build? It won't be
> inexpensive since it is new and the time of skilled folks must be paid for.

There is no doubt that it will be expensive. My argument is that, at
this point, there is little reason to build more of the same, other
perhaps than to gain energy efficiency.

Robert.

First | Prev | Next | Last
Pages: 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?