From: Weng Tianxiang on
On Apr 1, 7:05 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net>
wrote:
> On 4/1/2010 11:07 AM, glen herrmannsfeldt wrote:
>
> > In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net>  wrote:
> > (snip)
>
> >> I never really knew how the 360/91 did register renaming.
> >> I don't think it used a RAM style map.  I think it used CAMs.
>
> >> I actually asked Tomasulo this, but he never really answered
> >> the question.
>
> > Never having had anyone to ask, but only read about it in books,
> > that sounds about right.
>
> All I know is that I proposed having a separate pipestage to rename registers, using a RAM (SRAM) table indexed by
> logical register number returning physical register number, in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
> class - after he had taken us through Tomasulo and HPSm.
>
> I.e. I proposed eliminating the CAMs, replacing them by a RAM and an additional pipestage.
>
> The idea seemed new to everyone who encountered it. It was not universally accepted as good.  Indeed, I remember arguing
> with Tom Olson of AMD (if memory serves), who said that spending an extra pipestage was not a good idea.
>
> I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the
>
> > The explanation I have seen for the CDB, common data bus, was
> > that results come out broadcast to all possible destinations.
> > Those destinations expecting a result from that source accept it.
> > Possible destinations are registers, reservation stations
> > (for adders or mutliply/divide), or to be written to main memory.
> > Sources are results from arithmetic units, or data read from
> > (750 ns, 16 way interleaved) main memory.
>
> Many people say that the CDB was an important invention.  I think it was a bad idea - long wires, CAMs.
>
> Conceptually it is elegant, but implementation wise it is a bad idea.
>
> The important thing is taking that conceptually elegant CAM-ful idea, and implementing it in an efficient non-CAM manner.
>
> The modern style of register renaming accomplishes this - certainly for the registers, but also, depending on the
> system, for the reservation stations (if those are still being used).
>
> > Among the not so obvious ones, if you store to memory and then
> > refetch, register renaming will detect the same address is
> > being used and go directly to the source.  (No cache on the
> > 360/91, it originated on the 360/85.)
>
> I'd love to see a reference for this.
>
> I believe that a UWisc patent on this was one of the things that resulted in a big payment from Intel to UWisc.
>
> Myself, I thought it was obvious.

Hi Andy,
Your opinion is bright.

Can you tell me UWisc patent number or its title?

I have a design which is expected to work in a core of modern
multiprocessors in more than 3GHz world,
and the output drives one target.

The design can have two implementations:
1. One source always drives the one target and it uses a lot of power;
2. 16 sources can selectively use a common output bus to drive the
target with much less power.

The output must be finished within 1 clock cycle.

Which implementation is more wise in real world?

In another words, a 16 sources selectively drives a common output bus
with one target
is implementation wise in more than 3GHz world?

Thank you.

Weng




Thank you.

Weng
From: MitchAlsup on
On Apr 3, 12:19 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net>
wrote:
> On 4/1/2010 7:48 PM, MitchAlsup wrote:
>
> > On Apr 1, 9:05 pm, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net>
> > wrote:
> >> I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the
>
> > Any chance you could complete this sentance?
>
> > Perhaps from {88100, 88110, 88120, crazy, insane, Asilomar
> > participants, Hot Chips participants, all of the preceeding?}
>
> Got distracted, forgot to finish.  Wasn't exactly sure I remembered what you were working on.
>
> Remember the first time I met you, Mitch, and Willie Anderson? What were you working on?  Memory bandwidth spreadsheets
> for the 88110? SIMD vectors?  I remember we talked about DRAM bank structure, and you made your usual "If DRAMs were
> designed the way I want them to be designed..." speech.  I remember that you were interested in Linpack, while I was
> interested in OOO and GCC.

Willie was on 88110
Sounds like I was already on 88120
As to DRAM see USPTO 5367494

It was not so much that I was concentratng on Linpack, We (shebanow
and I) were trying to build a machine that could perform as if it were
a vector machine on vectorizable codes (without vector instructions::
i.e. native 88100 instructions at 6 per cycle) and also perform well
on GCC-like spaghetti codes. Linpack (Matrix 300) was simply the
vector code expample.

Mitch
First  |  Prev  | 
Pages: 1 2 3 4
Prev: Maximum output rate
Next: desgin suspended