From: "Andy "Krazy" Glew" on
On 4/1/2010 11:07 AM, glen herrmannsfeldt wrote:
> In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote:
> (snip)
>
>> I never really knew how the 360/91 did register renaming.
>> I don't think it used a RAM style map. I think it used CAMs.
>
>> I actually asked Tomasulo this, but he never really answered
>> the question.
>
> Never having had anyone to ask, but only read about it in books,
> that sounds about right.

All I know is that I proposed having a separate pipestage to rename registers, using a RAM (SRAM) table indexed by
logical register number returning physical register number, in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
class - after he had taken us through Tomasulo and HPSm.

I.e. I proposed eliminating the CAMs, replacing them by a RAM and an additional pipestage.

The idea seemed new to everyone who encountered it. It was not universally accepted as good. Indeed, I remember arguing
with Tom Olson of AMD (if memory serves), who said that spending an extra pipestage was not a good idea.

I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the



> The explanation I have seen for the CDB, common data bus, was
> that results come out broadcast to all possible destinations.
> Those destinations expecting a result from that source accept it.
> Possible destinations are registers, reservation stations
> (for adders or mutliply/divide), or to be written to main memory.
> Sources are results from arithmetic units, or data read from
> (750 ns, 16 way interleaved) main memory.

Many people say that the CDB was an important invention. I think it was a bad idea - long wires, CAMs.

Conceptually it is elegant, but implementation wise it is a bad idea.

The important thing is taking that conceptually elegant CAM-ful idea, and implementing it in an efficient non-CAM manner.

The modern style of register renaming accomplishes this - certainly for the registers, but also, depending on the
system, for the reservation stations (if those are still being used).




> Among the not so obvious ones, if you store to memory and then
> refetch, register renaming will detect the same address is
> being used and go directly to the source. (No cache on the
> 360/91, it originated on the 360/85.)

I'd love to see a reference for this.

I believe that a UWisc patent on this was one of the things that resulted in a big payment from Intel to UWisc.

Myself, I thought it was obvious.
From: "Andy "Krazy" Glew" on
On 4/1/2010 7:48 PM, MitchAlsup wrote:
> On Apr 1, 9:05 pm, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net>
> wrote:
>> I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the
>
> Any chance you could complete this sentance?
>
> Perhaps from {88100, 88110, 88120, crazy, insane, Asilomar
> participants, Hot Chips participants, all of the preceeding?}

Got distracted, forgot to finish. Wasn't exactly sure I remembered what you were working on.

Remember the first time I met you, Mitch, and Willie Anderson? What were you working on? Memory bandwidth spreadsheets
for the 88110? SIMD vectors? I remember we talked about DRAM bank structure, and you made your usual "If DRAMs were
designed the way I want them to be designed..." speech. I remember that you were interested in Linpack, while I was
interested in OOO and GCC.
From: "Andy "Krazy" Glew" on
On 4/1/2010 9:31 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote:
> (snip)
>
>> All I know is that I proposed having a separate pipestage
>> to rename registers, using a RAM (SRAM) table indexed by
>> logical register number returning physical register number,
>> in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
>> class - after he had taken us through Tomasulo and HPSm.
>
>> I.e. I proposed eliminating the CAMs, replacing them by a
>> RAM and an additional pipestage.
>
> With the 360/91 system, though, values can easily have more than
> one destination. I suppose that could be done other ways,
> too, but it is especially convenient that way.

That's basically why P6 both renamed to physical registers, and had an RS with CAMs.

RAM style indexing for the big data structure.

CAMs for the relatively smaller RS, broadcast.

I've always regretted not totally eliminating the CAMs in the RS. Always meant to get around to it in P6 v2.0, but that
never happened.

(BTW, no, Willamette did not eliminate the CAMs The bitmap scheduler is CAMs, but decoded CAs rather than encoded CAMs.
Many people think that the term "CAM" only apples to encoded CAMs, but don't really have a name for the decoded CAMs,
e.g. 1-hots. Me, I think encoded vs. decoded is just a circuit trick.)




>> The modern style of register renaming accomplishes this -
>> certainly for the registers, but also, depending on the
>> system, for the reservation stations (if those are still
>> being used).
>
> Logic was much more expensive then, than now, so the
> tradoffs are likely different. If you used RAM tables
> with more than one entry for each source, you could do
> multiple destinations easily.

Right The problem then has always bee "how may destinations", and "how do you handle exceeding the number of
destinations without (a) falling of a cliff, and (b) complexity".



>
>>> Among the not so obvious ones, if you store to memory and then
>>> refetch, register renaming will detect the same address is
>>> being used and go directly to the source. (No cache on the
>>> 360/91, it originated on the 360/85.)
>
>> I'd love to see a reference for this.
>
> There is an issue of the IBM Journal of Research and
> Development pretty much devoted to the 91. I believe
> it is in there. The 91 is pretty much a favorite for
> books on pipelined processor design, mostly referencing
> that journal issue.

I practically memorized that issue. Not there that I remember. Likely we are talking about different things.