CMov implementation (Was Re: What will Microsoft use its ARMlicense for?) [Computer Architecture]

Prev: Last CFP - International Conference WWW/Internet 2010: 3 September 2010
Next: CMov implementation (Was Re: What will Microsoft use its ARM license for?)

From: Paul A. Clayton on 12 Aug 2010 20:50

On Aug 12, 7:55 am, "nedbrek" <nedb...(a)yahoo.com> wrote:
[snip]
> Right, the mystery is resolved using physical register numbers. The renamer
> provides the number for each source. You would like there to be one source
> at this point, although you could make the bypass logic execute the cmov -
> this would require the renamer to produce 3 numbers (remember the flags have
> a producer!).

For a simple single small FLAGS register, the source operation could
use its operation number (ROB number) plus one and the consuming
operation its operation number. The trick is then to elide any
intermediate names (for x86, intermediate names would probably be
rare since FLAGS consumers usually immediately follow the source
operation--correct?). Because FLAGS is small, replication is
relatively inexpensive; because it is singular, special handling
might be simpler and/or more cost-effective--such features should
be exploitable. (For non-selected consuming operations, writing
the FLAGS value into the operation might make sense.) One might
choose to handle the nearby consumers differently, inserting
FLAGS 'reassert' operations to wake-up later consumers.

(The above is severely lacking in several areas, but I think it
contains a rough start for the creation of an idea.)

Paul A. Clayton
just a technophile

From: Terje Mathisen "terje.mathisen at on 13 Aug 2010 01:47

Paul A. Clayton wrote:
> On Aug 12, 7:55 am, "nedbrek"<nedb...(a)yahoo.com> wrote:
> [snip]
>> Right, the mystery is resolved using physical register numbers. The renamer
>> provides the number for each source. You would like there to be one source
>> at this point, although you could make the bypass logic execute the cmov -
>> this would require the renamer to produce 3 numbers (remember the flags have
>> a producer!).
>
> For a simple single small FLAGS register, the source operation could
> use its operation number (ROB number) plus one and the consuming
> operation its operation number. The trick is then to elide any
> intermediate names (for x86, intermediate names would probably be
> rare since FLAGS consumers usually immediately follow the source

In compiled code, (intentional) FLAGS consumers will almost often follow
within the next 2-4 instructions.

In hand-optimized code, and even 16-bit operating systems, you can even
see FLAGS values that are used as part of the return value, i.e. they
have to survive for quite a while.

The TCPIP copy/checksum code is an example:

next:
ADC EAX,EDX ;; Uses previous carry, sets new
MOV [EDI+ESI],EDX
first_iteration:
MOV EDX,[ESI]
LEA ESI,[ESI+4]
DEC ECX ;; Updates all flags EXCEPT carry
JNZ next

(The core loop is usually unrolled of course.)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Paul A. Clayton on 13 Aug 2010 19:05

On Aug 13, 7:28 am, "nedbrek" <nedb...(a)yahoo.com> wrote:
[snip]
> This assumes that physical registers are bound to ROB entries. That is true
> for some machines, but not for all... as you increase ROB size, you like to
> be able to size the physical register seperately (a lot of instructions
> don't need regs - stores and jumps being the most common). Of course, there
> are even more radical proposals (reference counted, shared values) that will
> break this.

The proto-idea might be simplest when the FLAGS are associated
with the ROB entries (the ordinary registers are independently
renamed). However, it might be possible that the name is not
an address for storage. FLAGS (being small) could probably (?)
be written into the instruction payload or forwarded if selected
for execution immediately after the source instruction. A
future file could perhaps handle the dependent operations which
issue after the FLAGS producing operation has been selected for
execution. (Small can be beautiful.)

Paul A. Clayton
just a technophile

|
Pages: 1
Prev: Last CFP - International Conference WWW/Internet 2010: 3 September 2010
Next: CMov implementation (Was Re: What will Microsoft use its ARM license for?)