Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: nmm1 on 4 Jan 2010 04:43

In article <7qd7bmFfo9U1(a)mid.individual.net>,
Del Cecchi <delcecchi(a)gmail.com> wrote:
>
>You could use the provided hardware scatter-gather if you were astute
>enough to use InfiniBand interconnect. :-)

And program right down on the bare metal! When I last investigated,
there was a distinct shortage of support for InfiniBand's fancy
features, including that one. As I read it, almost everyone used
OpenIB for commodity clusters, and the support of that was merely a
glint in the eye of the developers. And had been for some years.

The trouble with InfinBand is that it was design by committee, and
is complicated beyond all reason. Even more than SCSI, much of it
is likely never to be implemented and used.

Regards,
Nick Maclaren.

From: Thomas Womack on 4 Jan 2010 04:56

In article <b7dd84e0-0cbc-41c6-8a85-b51197c2d960(a)e27g2000yqd.googlegroups.com>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>On Jan 3, 11:25=A0pm, wclod...(a)lost-alamos.pet (William Clodius) wrote:
>
>> I think Nick is saying that to improve locality it is essential to
>> transpose the dimensions of the array as you cycle through each
>> dimension in a multi-dimensional array in a multi-dimensional FFT.
>
>But it sure wasn't necessary on a Cray-1--a side benefit of not having
>cache and having easily worked-around limitations on an arbitrary
>stride in memory. But, of course, none of this matters any more,
>because no one has to bother with multi-dimensional FFT's. Or, at
>least, they'd better not.

I can't read this as other than obtuse; yes, you don't need to do
these tricks if main memory is the same speed as your processor, and
indeed if you're running on an i7 a job small enough to fit on a
Cray-1 you probably don't have to do that many of the tricks - the
level-3 cache is eight megabytes long, the size of Cray-1 main memory,
and the other caches are reasonably good at batching up accesses to
L3.

The place I work for does 3D FFTs from morning until night, but since
nobody knows how to grow large flawless protein crystals, the _size_
of the FFTs hasn't changed since it was difficult to fit them in a
Cray-1; FFTs on data small enough to fit in the shared memory of a
current cheap SMP are really quite fast and parallelise quite
reasonably, and (much more importantly) Frigo&Johnson at MIT have done
the parallelisation and FFTW is not expensive even for commercial
users.

Tom

From: nmm1 on 4 Jan 2010 06:40

In article <CGc*A7g0s(a)news.chiark.greenend.org.uk>,
Thomas Womack <twomack(a)chiark.greenend.org.uk> wrote:
>
>The place I work for does 3D FFTs from morning until night, but since
>nobody knows how to grow large flawless protein crystals, the _size_
>of the FFTs hasn't changed since it was difficult to fit them in a
>Cray-1; FFTs on data small enough to fit in the shared memory of a
>current cheap SMP are really quite fast and parallelise quite
>reasonably, and (much more importantly) Frigo&Johnson at MIT have done
>the parallelisation and FFTW is not expensive even for commercial
>users.

When I last looked, the performance wasn't brilliant, but that was
simply due to the fact that FFTs are excellent memory stress testers.
It was as good as you could expect, given the limitations of the
memory subsystem.

I am merely one of many thousands of people who have tried to work
out how to parallelise multi-dimensional FFTs when communication
is the limit, and failed to think of anything new.

Regards,
Nick Maclaren.

From: Anne & Lynn Wheeler on 4 Jan 2010 07:17

medusa was all about footprint, cooling (handling heat as more & more
was compressed into smaller area), and interconnect ... and we
were doing cluster scaleup as part of ha/cmp ... which also was
heavily into availability
http://www.garlic.com/~lynn/subtopic.html#hacmp

in fact, when I was out marketing ha/cmp, I had coined the terms
"disaster survivability" and "geographic survivability" to differentiate
from disaster/recovery
http://www.garlic.com/~lynn/submain.html#available

primary person behind medusa was engineer that I previously mentioned
involved with 6000 serial-link-adaptor (SLA) and then worked in FCS
committee ... recent reference:
http://www.garlic.com/~lynn/2009s.html#32 Larrabee delayed: anyone know what's happening?

misc. old medusa email &/or cluster scaleup
http://www.garlic.com/~lynn/lhwemail.html#medusa

this is old email (lots of stuff about working with LLNL &/or parties
doing stuff for LLNL.) ... possibly just hrs before being told it was
being transferred and we weren't to work on anything with more than
4-processors
http://www.garlic.com/~lynn/2006x.html#email920129

and then little over 2weeks later announcement (only for *scientific
and technical*)
http://www.garlic.com/~lynn/2001n.html#6000clusters1

and then later that summer (about the time we were leaving)
.... *clusters caught us by surprise*
http://www.garlic.com/~lynn/2001n.html#6000clusters2

other recent threads mentioning above:
http://www.garlic.com/~lynn/2009o.html#81 big iron mainframe vs. x86 servers
http://www.garlic.com/~lynn/2009p.html#54 big iron mainframe vs. x86 servers
http://www.garlic.com/~lynn/2009p.html#55 MasPar compiler and simulator
http://www.garlic.com/~lynn/2009p.html#85 Anyone going to Supercomputers '09 in Portland?
http://www.garlic.com/~lynn/2009q.html#27 Supercomputers Are Still Fast, but Less Super
http://www.garlic.com/~lynn/2009q.html#33 Check out Computer glitch to cause flight delays across U.S.

as previously mentioned ... some connection between *cluster scaleup*
and *electronic commerce* ... reference to jan92 meeting about
cluster scaleup for commercial dataprocessing
http://www.garlic.com/~lynn/95.html#13

two of the people in that meeting later leave and show up at small
client/server startup responsible for something called "commerce
server" and we were brought in because they wanted to do payment
transactions on the server. The "commerce server" started out as
collection of servers providing a multi-store "mall" paradigm
implemented with large oracle dbms backend. the startup had also
invented this technology called "SSL" they wanted to use.

Now before we had left ... besides ha/cmp, FCS, HIPPI and misc. other
stuff ... we also played some in SCI. Part of ha/cmp issue involved
purity of "801" and simplified hardware ... and major 801
simplification theme was no cache coherency (ruling out shared memory
multiprocessing). Since there was no cache coherency & multiprocessing
.... that forced scaleup to purely interconnect solution.

While doing electronic commerce ... we also talked some to convex
about their examplar sci implementation (using HP risc chips). Then
HP acquired both Verifone (a major point-of-sale terminal vendor that
was looking at moving into the electronic commerce value chain) and
Convex ... and we spent some amount of time at HP, talking to the
respective responsible executives.

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970

From: "Andy "Krazy" Glew" on 4 Jan 2010 08:42

> You could use the provided hardware scatter-gather if you were astute
> enough to use InfiniBand interconnect. :-)
>
> del
>
> you can lead a horse to water but you can't make him give up ethernet.

Del:

What's the stoy on Infiniband?

First | Prev | Next | Last
Pages: 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Prev: PEEEEEEP
Next: Texture units as a general function