Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: nmm1 on 9 Jan 2010 13:32

In article <hiae8b$pg5$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>
>I once worked with an IBM marketing guy who had some great "laws".
>Things like "There is always a worst bug". I once compiled a few and
>posted them on my office wall.

His name wasn't Zorn, was it?

Regards,
Nick Maclaren.

From: Stephen Fuld on 9 Jan 2010 14:34

nmm1(a)cam.ac.uk wrote:
> In article <hiae8b$pg5$1(a)news.eternal-september.org>,
> Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>> I once worked with an IBM marketing guy who had some great "laws".
>> Things like "There is always a worst bug". I once compiled a few and
>> posted them on my office wall.
>
> His name wasn't Zorn, was it?

No, it was Bob(?) Hallem. He called them Hallem's Laws.

Another was an extension of Disraeli's "Mendacity Index"

"Lies, Damned Lies, Statistics, Development Schedules"

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: nmm1 on 9 Jan 2010 15:02

In article <hialo8$rs6$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>
>>> I once worked with an IBM marketing guy who had some great "laws".
>>> Things like "There is always a worst bug". I once compiled a few and
>>> posted them on my office wall.
>>
>> His name wasn't Zorn, was it?
>
>No, it was Bob(?) Hallem. He called them Hallem's Laws.

It was a very bad joke, I admit.

>Another was an extension of Disraeli's "Mendacity Index"
>
>"Lies, Damned Lies, Statistics, Development Schedules"

Rather like the one I use:

"Lies, Damned Lies, Statistics, Official Publications"

Regards,
Nick Maclaren.

From: Anne & Lynn Wheeler on 10 Jan 2010 15:56

Kai Harrekilde-Petersen <khp(a)harrekilde.dk> writes:
> SCI started out as the grand be-all-end-all cache-coherent
> super-thing, but it could also do non-coherent transfers, and at
> Dolphin ICS, we sure had more success in attracting customers to use
> the noncoherent transfers for clustering (or as an IO extension bus)
> than doing the cache coherence stuff.
>
> On the cc-SCI side, we only had DG as a customer, with their Numaliine
> series.

a couple items from long ago and far away (following also includes
Dolphin/Convex/Examplar announcement):

Date: January 23, 1991
Subject: DOLPHIN SERVER ABANDONS ECL 88000 RISC PLAN

Computergram - Norsk Data A/S affiliate Dolphin Server Technology
A/S is re-focussing its effort to build a 1,000 MIPS, multi-processing
server based on an ECL version of the Motorola 88000 RISC chip. The
project, known as Orion, was originally slated for completion late in
1992, but the ECL CPU development effort has run into trouble - similar
problems have already bedevilled other RISC projects, most recently MIPS
Computer Systems' R6000 ECL part. Dolphin and Motorola collaborated on
the design of the ECL part, and National Semiconductor was to fabricate
it. Central to Dolphin's long- term plan has been the use of SCI, the
Scalable Coherent Interface bus architecture, which is similar to the
Futurebus+ system in concept, but has attracted a good deal less
attention. Dolphin already has the 88000-based Triton 88 server under
its belt, and now plans to introduce an interim Triton SCI system early
in 1992. It will be a 300 MIPS multi-processor system combining SCI,
cache and memory components from the Orion, with Motorola's much
previewed 88110 RISC chip. The Orion - now not expected until 1993 -
will use Motorola's post-88110 100MHz BiCMOS technology, rather than the
Dolphin-designed CPU. Dolphin, which says it has "peeped behind the
curtain and seen what Motorola is up to," has both feet firmly in the
Motorola camp, and expects single-chip, multi-processors with 100m
transistors clocking at 300MHz from the firm by the late 1990s, with a
4,000 MIPS part by the year 2000. Dolphin is awaiting final ratification
of an SCI standard from the IEEE - expected later this year - and will
then go straight into production of the Triton SCI. Dolphin is
implementing SCI in a Token Ring-like formation, which it claims, offers
up to five times the throughput of Futurebus+. On the Triton SCI Dolphin
will offer bridges to VME-based systems, to other types of SCI systems,
and may also develop links to Futurebus+. Enhancements planned for the
Triton 88 this year include the addition of Unix System V.4, Novell and
Banyan Vines networking support, increased storage options and a new
plug-in CPU board with up to five 88000 processors. Following its OEM
deal with Thomson-CSF SA subsidiary Cetia SA, Dolphin says it is now
finalising a European distribution channel, and will also make a UK
announcement soon. Dolphin claims an installed base of 225 Triton 88s.

.... snip ...

above mentions SCI in T/R-like ... offers five times the throughput of
Futurebus+

following very long & heavily "snipped" ... totally unrelated, SLACVM
was the original website outside of CERN:
http://www.slac.stanford.edu/history/earlyweb/history.shtml

Date: Tue, 23 Jun 1992 22:11 -0800 (PST)
From: DBG(a)SLACVM.SLAC.Stanford.EDU
Subject: Some online SCI documents
To: Distribution

Current status: the base standard is now approved by the IEEE as
IEEE Std 1596-1992. It originally went out for official ballot in
late January 91. Voters approved it by a 92% affirmative vote that
ended April 15. Final corrections and polishing were done, and
the revised draft was recirculated to the voters again and passed.
Draft 2.00 was approved by the IEEE Standards board on
18 March 1992. Pre-publication copies of the standard are
available from the IEEE Service Center, Piscataway, NJ, (800)678-4333.

Commercial products to support and use SCI are already in final design
and simulation, so the support chips should be available soon, 3Q92.
-
SCI-related documents are available electronically via anonymous FTP
from HPLSCI.HPL.HP.COM, except for a few documents which are paper only.
Online formats are Macintosh Word 4 (Compacted,self expg) and PostScript.
The PostScript includes Unix compressed and uncompressed forms.
Paper documents can be ordered from Kinko's 24hr copy Service, Palo Alto,
California, (415)328-3381. Various payment forms can be arranged.
Newcomers should order the latest mailing plus the package NEW, which
contains the most essential documents from previous mailings.
SCI depends on the IEEE 1212 CSR Architecture as well, so you will
also need a copy of that, which is available from the IEEE Service Ctr.
-
Send your name, mailing address, phone number, fax number, email
address, to me and I will put you on a list of people to be notified
when new mailings are available; you will also be listed in an
occasional directory of people who are participating in or observing
SCI development.
-
Contact:
-
David B. Gustavson
IEEE P1596 Chairman
Stanford Linear Accelerator Center
Computation Research Group
P.O.Box 4349, Bin 88
Stanford, CA 94309
415-926-2863 or dbg(a)slacvm.slac.stanford.edu

An SCI Extensions Study Group has been formed to consider what SCI-related
extensions to pursue and how to organize them into standards.

Related standards projects:

1212: Control and Status Register Architecture. This specification
defines the I/O architecture for SCI, Futurebus+ (896.x) and SerialBus
(P1394). Chaired by David V. James, Apple Computer, dvj(a)apple.com,
408-974-1321, fax 408-974-0781. An approved standard as of December
1991. Being published by the IEEE.

P1596.1: SCI/VME Bridge. This specification defines a bridge
architecture for interfacing VME buses to an SCI node. This will provide
early I/O support for SCI systems via VME. Products are likely to be
available in 1992. Chaired by Bjorn Solberg, CERN, CH-1211 Geneva 23,
Switzerland. bsolberg(a)dsy-srv3.cern.ch, ++41-22-767-2677, fax
++41-22-782-1820.

P1596.2: Cache Optimizations for Large Numbers of Processors using the
Scalable Coherent Interface. Develop request combining, tree-structured
coherence directories and fast data distribution mechanisms that may be
important for systems with thousands of processors, compatible with the
base SCI coherence mechanism. Chaired by Ross Johnson, U of Wisconsin,
ross(a)cs.wisc.edu, 608-262-6617, fax 608-262-9777.

P1596.3: Low-Voltage Differential Interface for the Scalable Coherent
Interface. Specify low-voltage (less than 1 volt) differential signals
suitable for high speed communication between CMOS, GaAs and BiCMOS
logic arrays used to implement SCI. The object is to enable low-cost
CMOS chips to be used for SCI implementations in workstations and PCs,
at speeds of at least 200 MBytes/sec. This work seems to have converged
on a signal swing of 0.25 V centered on +1 V. Chairman is Stephen
Kempainen,National Semiconductor, 408-721-2836, fax 408-721-7218.
asdksc(a)tevm2.nsc.com

P1596.4: High-Bandwidth Memory Interface, based on SCI Signalling
Technology. Define a high-bandwidth interface that will permit access to
the large internal bandwidth already available in dynamic memory chips.
The goal is to increase the performance and reduce the complexity of
memory systems by using a subset of the SCI protocols. Started by Hans
Wiggers of Hewlett Packard, current chairman is David Gustavson,
Stanford Linear Accelerator Center, 415-961-3539, fax 415-961-3530.

P1596.5: Data Transfer Formats Optimized for SCI. This working group has
defined a set of data types and formats that will work efficiently on
SCI for transferring data among heterogeneous processors in a
multiprocessor SCI system. The working group has finished, voting to
send the draft out for sponsor ballot. Chairman is David V. James, Apple
Computer, dvj(a)apple.com, 408-974-1321, fax 408-974-0781.

CONVEX SELECTS DOLPHIN'S SCI INTERCONNECT TECHNOLOGY FOR USE IN FUTURE
PROCESSORS.

According to a technology transfer agreement announced today, Dolphin
SCI Technology A.S, a subsidiary of Dolphin Server Technology A.S, will
share its Scalable Coherent Interface (SCI) technology with Convex
Computer Corporation for use in Convex' future generation supercomputers
currently under development. The agreement is initially worth several
hundred thousands USD to Dolphin, and includes intentions of future
cooperation between the two companies.

Convex is continously working to develop new machines to strengthen the
company's supercomputer market position. Convex manufactures systems
which solve many of today's most demanding applications such as climate
modelling, genetic sequencing and computational fluid dynamics. Convex
recently announced a relationship with Hewlett-Packard which will result
in Convex's adoption of HP's PA-RISC processor technology to build these
future high performance machines.

The Scalable Coherent Interface is an enabling technology for
multiprocessor systems. With the current rapidly increasing RISC
microprocessor power, even the best of today's interconnect - or "bus" -
systems can only support small multiprocessor configurations. Buses are
inherently bottlenecks, because only one processor "talks" at a time,
and clock rates are limited by the physics of tapped transmission lines
with variable loading. Buses also scale poorly with system size because
propagation delays limit handshake and arbitration speed.

.... snip ...

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970

From: Steven G. Johnson on 13 Jan 2010 11:20

On Jan 4, 11:08 am, Thomas Womack <twom...(a)chiark.greenend.org.uk>
wrote:
> The use-17-instead-of-16 tricks still work, since you can often get
> bank clashes inside the L2 cache; I did various benchmarks of
> [120..129]x[120..129]x[120..129] FFTs withFFTW, and was initially
> slightly surprised to find that 128x128x128 was among the slowest.
> As far as I know,FFTWdoesn't let you specify the data layout so you
> can't tell it to do a 128x128x128 FFT on data stored in the top part
> of a 129x129x128 box;

Actually, FFTW does allow arbitrary data layouts. You can use the
"advanced" interface with the "nembed" parameter to specify a smaller
multidimensional array embedded in a larger one, or the guru interface
for even more general layouts.

Even if you use a 128x128x128 array, FFTW will in some cases do a
sequence of the discontiguous subtransforms by copying a few of them
at a time to a contiguous buffer, and similar tricks to avoid cache-
line conflicts.

> I believe an early version of the manual said
> that they had often found performance improvements by doing this but
> couldn't figure out how to exploit them in a product with a
> comprehensible interface.

What you're referring to is that at one point we found performance
improvements (I believe on an IBM RS/6000, if I remember correctly),
by inserting padding into the middle of a *one-dimensional* array
(again to avoid cache conflicts), but it didn't seem like there was a
sane interface for specifying a 1d array with padding in the middle.

Regards,
Steven G. Johnson

First | Prev | Next | Last
Pages: 54 55 56 57 58 59 60 61 62 63 64 65
Prev: PEEEEEEP
Next: Texture units as a general function