From: glen herrmannsfeldt on
In comp.arch.fpga Jason Zheng <Xin.Zheng(a)jpl.nasa.gov> wrote:
(snip)

> My favorite is the Translation Look-aside Buffers (TLB), of course
> invented by the IBM engineers. You have to appreciate the way it sounds
> (and its irrelevance to its true purpose).

If you read the IBM description of virtual storage, you would
first believe that it went to the segment and page tables for
each reference. That would make everything three times slower,
so there is the TLB to speed thing up. Always interesting to
me is that the IBM name stuck, unlike many IBM names.
(Data set, IPL, to name two.)

The TLB is carefully documented by IBM, including the PTLB
instruction. (Purge TLB.) On the other hand, IBM doesn't
document much about the data and/or instruction cache, leaving
that up to the implementations to get right. Also, regarding
virtual storage, there is the STO (segment table origin) cache
that is also not documented by the architecture, but needed
to speed thing up in the case of multiple address spaces.

-- glen

From: glen herrmannsfeldt on
In comp.arch.fpga "Andy \"Krazy\" Glew" <ag-news(a)patten-glew.net> wrote:
> The two hardware datastructures supporting out of order execution:

> Reservation stations.

> And, less beautifully, the register renaming map.

Both from the IBM 360/91, as far as I know.

S/360 has only four floating point registers, so register
renaming was pretty important for out-of-order execution.

OK, how about imprecise interrupts?

-- glen
From: nmm1 on
In article <houi8s$rdm$1(a)naig.caltech.edu>,
glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:
>In comp.arch.fpga "Andy \"Krazy\" Glew" <ag-news(a)patten-glew.net> wrote:
>> The two hardware datastructures supporting out of order execution:
>
>> Reservation stations.
>
>> And, less beautifully, the register renaming map.
>
>Both from the IBM 360/91, as far as I know.
>
>S/360 has only four floating point registers, so register
>renaming was pretty important for out-of-order execution.
>
>OK, how about imprecise interrupts?

Not a problem, until you try to resume after trapping them :-)

And the reason they were a problem was that they DID'T have a
lot of data structure to support them ....

I like them, as a design methodology, but only if integrated into
a restartable code sequence design and/or NOT used for anything
that might need resumption. E.g. one of the Alpha's most stupid
mistakes was to try and merge them with the use of interrupts for
supporting IEEE's edge cases. The 8087 was just plain idiotic.


Regards,
Nick Maclaren.
From: glen herrmannsfeldt on
In comp.arch.fpga "Andy \"Krazy\" Glew" <ag-news(a)patten-glew.net> wrote:
(snip)

> I never really knew how the 360/91 did register renaming.
> I don't think it used a RAM style map. I think it used CAMs.

> I actually asked Tomasulo this, but he never really answered
> the question.

Never having had anyone to ask, but only read about it in books,
that sounds about right.

The explanation I have seen for the CDB, common data bus, was
that results come out broadcast to all possible destinations.
Those destinations expecting a result from that source accept it.
Possible destinations are registers, reservation stations
(for adders or mutliply/divide), or to be written to main memory.
Sources are results from arithmetic units, or data read from
(750 ns, 16 way interleaved) main memory.

Among the not so obvious ones, if you store to memory and then
refetch, register renaming will detect the same address is
being used and go directly to the source. (No cache on the
360/91, it originated on the 360/85.)

-- glen
From: glen herrmannsfeldt on
In comp.arch.fpga "Andy \"Krazy\" Glew" <ag-news(a)patten-glew.net> wrote:
(snip)

> All I know is that I proposed having a separate pipestage
> to rename registers, using a RAM (SRAM) table indexed by
> logical register number returning physical register number,
> in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
> class - after he had taken us through Tomasulo and HPSm.

> I.e. I proposed eliminating the CAMs, replacing them by a
> RAM and an additional pipestage.

With the 360/91 system, though, values can easily have more than
one destination. I suppose that could be done other ways,
too, but it is especially convenient that way.

> The idea seemed new to everyone who encountered it. It was
> not universally accepted as good. Indeed, I remember arguing
> with Tom Olson of AMD (if memory serves), who said that
> spending an extra pipestage was not a good idea.

> Many people say that the CDB was an important invention.
> I think it was a bad idea - long wires, CAMs.

If the wires are too long, then add more pipeline stages along
the way. With 750ns 16way interleaved core, though, the 91
wasn't going to get much faster than 60ns.

> Conceptually it is elegant, but implementation wise it is a bad idea.

> The important thing is taking that conceptually elegant
> CAM-ful idea, and implementing it in an efficient non-CAM manner.

> The modern style of register renaming accomplishes this -
> certainly for the registers, but also, depending on the
> system, for the reservation stations (if those are still
> being used).

Logic was much more expensive then, than now, so the
tradoffs are likely different. If you used RAM tables
with more than one entry for each source, you could do
multiple destinations easily.

>> Among the not so obvious ones, if you store to memory and then
>> refetch, register renaming will detect the same address is
>> being used and go directly to the source. (No cache on the
>> 360/91, it originated on the 360/85.)

> I'd love to see a reference for this.

There is an issue of the IBM Journal of Research and
Development pretty much devoted to the 91. I believe
it is in there. The 91 is pretty much a favorite for
books on pipelined processor design, mostly referencing
that journal issue.

> I believe that a UWisc patent on this was one of the things
> that resulted in a big payment from Intel to UWisc.

> Myself, I thought it was obvious.

-- glen