Prev: Which is the most beautiful and memorable hardware structure in a CPU?
Next: Comparing GPUs array processor architectures: AMD vs. Nvidia vs.Intel
From: "Andy "Krazy" Glew" on 30 Mar 2010 23:12 The two hardware datastructures supporting out of order execution: Reservation stations. And, less beautifully, the register renaming map. But then I am biased. -- Really, I do think that the reservation stations are beautiful. Even the naive CAM implementation. Especially since there are more efficient implementations that are logically equivalent. I am also pretty high on bit matrix schedulers.
From: "Andy "Krazy" Glew" on 30 Mar 2010 23:13 On 3/29/2010 7:39 PM, MitchAlsup wrote: > The most memorable hardware structure is the vector indirect > addressing mode. > > Mitch Aagh! No! Although work I did on that veered towards reservation stations, which I like. Nvidia has shown that vector indirect is unnecessary on a SIMT. Although^2, it turns out that very similar hardware is needed for SIMT scalar indirect.
From: "Andy "Krazy" Glew" on 1 Apr 2010 11:35 On 3/30/2010 9:15 PM, glen herrmannsfeldt wrote: > In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote: >> The two hardware datastructures supporting out of order execution: > >> Reservation stations. > >> And, less beautifully, the register renaming map. > > Both from the IBM 360/91, as far as I know. > > S/360 has only four floating point registers, so register > renaming was pretty important for out-of-order execution. > > OK, how about imprecise interrupts? > > -- glen I never really knew how the 360/91 did register renaming. I don't think it used a RAM style map. I think it used CAMs. I actually asked Tomasulo this, but he never really answered the question.
From: "Andy "Krazy" Glew" on 1 Apr 2010 11:46 > In article<houi8s$rdm$1(a)naig.caltech.edu>, >> OK, how about imprecise interrupts? Not a good idea.
From: "Andy "Krazy" Glew" on 1 Apr 2010 22:05
On 4/1/2010 11:07 AM, glen herrmannsfeldt wrote: > In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote: > (snip) > >> I never really knew how the 360/91 did register renaming. >> I don't think it used a RAM style map. I think it used CAMs. > >> I actually asked Tomasulo this, but he never really answered >> the question. > > Never having had anyone to ask, but only read about it in books, > that sounds about right. All I know is that I proposed having a separate pipestage to rename registers, using a RAM (SRAM) table indexed by logical register number returning physical register number, in 1986 or 1987 - in Wen-mei Hwu's microprocessor design class - after he had taken us through Tomasulo and HPSm. I.e. I proposed eliminating the CAMs, replacing them by a RAM and an additional pipestage. The idea seemed new to everyone who encountered it. It was not universally accepted as good. Indeed, I remember arguing with Tom Olson of AMD (if memory serves), who said that spending an extra pipestage was not a good idea. I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the > The explanation I have seen for the CDB, common data bus, was > that results come out broadcast to all possible destinations. > Those destinations expecting a result from that source accept it. > Possible destinations are registers, reservation stations > (for adders or mutliply/divide), or to be written to main memory. > Sources are results from arithmetic units, or data read from > (750 ns, 16 way interleaved) main memory. Many people say that the CDB was an important invention. I think it was a bad idea - long wires, CAMs. Conceptually it is elegant, but implementation wise it is a bad idea. The important thing is taking that conceptually elegant CAM-ful idea, and implementing it in an efficient non-CAM manner. The modern style of register renaming accomplishes this - certainly for the registers, but also, depending on the system, for the reservation stations (if those are still being used). > Among the not so obvious ones, if you store to memory and then > refetch, register renaming will detect the same address is > being used and go directly to the source. (No cache on the > 360/91, it originated on the 360/85.) I'd love to see a reference for this. I believe that a UWisc patent on this was one of the things that resulted in a big payment from Intel to UWisc. Myself, I thought it was obvious. |