Prev: Maximum output rate
Next: desgin suspended
From: Weng Tianxiang on 2 Apr 2010 09:27 On Apr 1, 7:05 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net> wrote: > On 4/1/2010 11:07 AM, glen herrmannsfeldt wrote: > > > In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net> wrote: > > (snip) > > >> I never really knew how the 360/91 did register renaming. > >> I don't think it used a RAM style map. I think it used CAMs. > > >> I actually asked Tomasulo this, but he never really answered > >> the question. > > > Never having had anyone to ask, but only read about it in books, > > that sounds about right. > > All I know is that I proposed having a separate pipestage to rename registers, using a RAM (SRAM) table indexed by > logical register number returning physical register number, in 1986 or 1987 - in Wen-mei Hwu's microprocessor design > class - after he had taken us through Tomasulo and HPSm. > > I.e. I proposed eliminating the CAMs, replacing them by a RAM and an additional pipestage. > > The idea seemed new to everyone who encountered it. It was not universally accepted as good. Indeed, I remember arguing > with Tom Olson of AMD (if memory serves), who said that spending an extra pipestage was not a good idea. > > I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the > > > The explanation I have seen for the CDB, common data bus, was > > that results come out broadcast to all possible destinations. > > Those destinations expecting a result from that source accept it. > > Possible destinations are registers, reservation stations > > (for adders or mutliply/divide), or to be written to main memory. > > Sources are results from arithmetic units, or data read from > > (750 ns, 16 way interleaved) main memory. > > Many people say that the CDB was an important invention. I think it was a bad idea - long wires, CAMs. > > Conceptually it is elegant, but implementation wise it is a bad idea. > > The important thing is taking that conceptually elegant CAM-ful idea, and implementing it in an efficient non-CAM manner. > > The modern style of register renaming accomplishes this - certainly for the registers, but also, depending on the > system, for the reservation stations (if those are still being used). > > > Among the not so obvious ones, if you store to memory and then > > refetch, register renaming will detect the same address is > > being used and go directly to the source. (No cache on the > > 360/91, it originated on the 360/85.) > > I'd love to see a reference for this. > > I believe that a UWisc patent on this was one of the things that resulted in a big payment from Intel to UWisc. > > Myself, I thought it was obvious. Hi Andy, Your opinion is bright. Can you tell me UWisc patent number or its title? I have a design which is expected to work in a core of modern multiprocessors in more than 3GHz world, and the output drives one target. The design can have two implementations: 1. One source always drives the one target and it uses a lot of power; 2. 16 sources can selectively use a common output bus to drive the target with much less power. The output must be finished within 1 clock cycle. Which implementation is more wise in real world? In another words, a 16 sources selectively drives a common output bus with one target is implementation wise in more than 3GHz world? Thank you. Weng Thank you. Weng
From: MitchAlsup on 3 Apr 2010 14:08
On Apr 3, 12:19 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net> wrote: > On 4/1/2010 7:48 PM, MitchAlsup wrote: > > > On Apr 1, 9:05 pm, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net> > > wrote: > >> I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the > > > Any chance you could complete this sentance? > > > Perhaps from {88100, 88110, 88120, crazy, insane, Asilomar > > participants, Hot Chips participants, all of the preceeding?} > > Got distracted, forgot to finish. Wasn't exactly sure I remembered what you were working on. > > Remember the first time I met you, Mitch, and Willie Anderson? What were you working on? Memory bandwidth spreadsheets > for the 88110? SIMD vectors? I remember we talked about DRAM bank structure, and you made your usual "If DRAMs were > designed the way I want them to be designed..." speech. I remember that you were interested in Linpack, while I was > interested in OOO and GCC. Willie was on 88110 Sounds like I was already on 88120 As to DRAM see USPTO 5367494 It was not so much that I was concentratng on Linpack, We (shebanow and I) were trying to build a machine that could perform as if it were a vector machine on vectorizable codes (without vector instructions:: i.e. native 88100 instructions at 6 per cycle) and also perform well on GCC-like spaghetti codes. Linpack (Matrix 300) was simply the vector code expample. Mitch |