From: John Mashey on 21 Sep 2005 02:46 Peter Dickerson wrote: > "glen herrmannsfeldt" <gah(a)ugcs.caltech.edu> wrote in message > news:yM2dnfGDOKa1HK3eRVn-hw(a)comcast.com... > > Don't most processors use register renaming when a register is > > overwritten like this? There has to be some way to keep track > > of which value is going where. > > > > Getting the right value into the real register for the interrupt > > would be an extra challenge, though. > > > > -- glen > > I thought JM had explicitly included simple in-order pipelined processors. > Such microarchitectures don't normally rename. Yes, especially since: 1) "Most" distinct CPU designs are in-order issue, whether superscalar or not. The fraction of OOO designs is minuscule, although of course (due to X86) they account for a lot of $. 2) "Most" actual CPU chips are in-order, since embedded chips rarely use OOO, and outsell PC / system chips. Also, for this newsgroup, I'd guess that if someone actually has a chance to participate in a CPU design, it is much more likely to be in-order (in an FPGA, or an SoC) than an OOO chip, as the latter are not done at very many places. The nubmer of people on the planet who actually design OOO CPUs is a tiny fraction of the total who design CPUs.
From: Iain McClatchie on 27 Sep 2005 23:29 Mash> THE GOOD CASE Mash> If the ISA semantics follow the rules I described earlier Mash> a) FP DIV and FP MUL stall until they are sure they don't cause an Mash> exception. Then they run to completion. Back when I worked on this stuff ('92-'94), it seemed that most programs did not turn on any of the user-level exception triggers. The only common exception was caused by denormalized inputs requiring a trap to the kernel for emulation, since the hardware (R8000 in this case) didn't do denorms. These traps caused quite a lot of grief. The R8000 had a 4-cycle multiply-add pipeline. I often wonder if we would have experienced less grief with a 5-cycle multiple-add pipe that could do input and output denorms without exceptions. Performance would have been lower, register pressure would have been higher... but every application run by folks that weren't sure whether they could turn on flush-to-zero mode would have gone faster anyway. I never had exposure to data that would have told me if there were more folks (more sales dollars, really) in the can-flush- denorms-to-zero camp than in the don't-know and must-handle- denorms camps. But my guess is that handling denorms in hardware would have been the better choice. In hindsight, we probably had the area to do it, too. (You need an extra output shifter, IIRC.)
From: Jan Vorbrüggen on 28 Sep 2005 02:52 > The R8000 had a 4-cycle multiply-add pipeline. I often wonder if > we would have experienced less grief with a 5-cycle multiple-add > pipe that could do input and output denorms without exceptions. Would a single cycle extension to the pipeline been enough? Was the actual hardware (as built) already generating denorms, or does that cause an exception as well? Jan
From: Bernd Paysan on 28 Sep 2005 07:24
Jan Vorbr?ggen wrote: >> The R8000 had a 4-cycle multiply-add pipeline. I often wonder if >> we would have experienced less grief with a 5-cycle multiple-add >> pipe that could do input and output denorms without exceptions. > > Would a single cycle extension to the pipeline been enough? Was the > actual hardware (as built) already generating denorms, or does that > cause an exception as well? Handling denoms requires another barrel shift operation - you find out that your result doesn't fit into the required range, so you shift it right by the overflow exponent. If your MAC pipeline is multiply (carry save adder network), sum, shift, add, count leading zeros, shift (normalize), shift (denorms), it can take up to seven or eight cycles. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/ |