From: nedbrek on 13 Aug 2010 07:23 Hello, "Brett Davis" <ggtgp(a)yahoo.com> wrote in message news:ggtgp-6E8E47.23285012082010(a)news.isp.giganews.com... > In article <i40k1g$ker$1(a)news.eternal-september.org>, > "nedbrek" <nedbrek(a)yahoo.com> wrote: >> "Brett Davis" <ggtgp(a)yahoo.com> wrote in message >> > Most RISC chips implement a register move as a ALU binary OR with >> > immediate value zero. Even the NOP is actually "OR r0 = r0, #0" >> > which I remember from my Moto 68k days. >> >> In x86, it is encoded as "mov dst = src". Internally, this can be >> converted >> to an "(x)or/add/sub imm0" uop, or there might be a mov uop. Not sure >> what >> the tradeoffs are... >> >> > An ALU is always going to read two values and write one, even x86. >> >> Consider the "clear" operation (XOR AX ^= AX). There is one write, the >> value 0. Or a "mov r = imm". These are usually executed at the ALU >> port. > > That is two values going in, they just happen to be the same. AX = AX ^ AX You might convert it to a "mov imm0" uop. Depending on how your register read/bypass logic works, you might be able to rename this value to "use the special index of the bypass logic (which always forwards 0)". > This does bring up a big point, just because you have three ALUs does > not mean that you need six read ports on the register file. > Most of the time you get your values from the bypasses, if you can > predict that with some accuracy you can do with fewer ports, saving > die size and power. > To really make this useful a ALU would have to keep values when it > is idle, as you have many 3 cycle stalls waiting on L1. > Grabbing values from the bypass saves power, as it is closer, and > involves far fewer transistors to select. Sure, there are papers on this (sadly, I've forgotten who and when...). The biggest problem with speculation in the scheduler (which includes register caching, and even simple loads to some extent) is that when you are wrong, you need to cancel all the dependent ops that have scheduled in the meantime. If your scheduler deallocs on pick, you then need to get all these cancelled ops re-inserted... Ned
From: nedbrek on 13 Aug 2010 07:28
Hello all, "Paul A. Clayton" <paaronclayton(a)embarqmail.com> wrote in message news:115aae10-c46c-43fc-95f9-9e3547f8645f(a)f6g2000yqa.googlegroups.com... >On Aug 12, 7:55 am, "nedbrek" <nedb...(a)yahoo.com> wrote: >[snip] >> Right, the mystery is resolved using physical register numbers. The >> renamer >> provides the number for each source. You would like there to be one >> source >> at this point, although you could make the bypass logic execute the >> cmov - >> this would require the renamer to produce 3 numbers (remember the flags >> have >> a producer!). > > For a simple single small FLAGS register, the source operation could > use its operation number (ROB number) plus one and the consuming > operation its operation number. The trick is then to elide any > intermediate names (for x86, intermediate names would probably be > rare since FLAGS consumers usually immediately follow the source > operation--correct?). Because FLAGS is small, replication is > relatively inexpensive; because it is singular, special handling > might be simpler and/or more cost-effective--such features should > be exploitable. (For non-selected consuming operations, writing > the FLAGS value into the operation might make sense.) One might > choose to handle the nearby consumers differently, inserting > FLAGS 'reassert' operations to wake-up later consumers. This assumes that physical registers are bound to ROB entries. That is true for some machines, but not for all... as you increase ROB size, you like to be able to size the physical register seperately (a lot of instructions don't need regs - stores and jumps being the most common). Of course, there are even more radical proposals (reference counted, shared values) that will break this. Ned |