What will Microsoft use its ARM license for? [Computer Architecture]

Prev: Effects of Memory Latency and Bandwidth onSupercomputer,Application Performance
Next: Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance

From: Owen Shepherd on 11 Aug 2010 13:42

wrote:

> Kai Harrekilde-Petersen wrote:
>> Owen Shepherd<owen.shepherd(a)e43.eu> writes:
>>> A couple of simple examples from Thumb2:
>>> 1. Registers r0-r7 are preferred to r8-r12*, because most instructions
>>> only use 3 bits to encode each opcode (Thumb2 added a bunch of
>>> longer opcodes to make the upper registers more accessible, but
>>> they're 32-bit instructions)
>
> x86 prefers AL/AX/EAX for many instructions, since they have special,
> shorter encodings.
>
> It also prefers, in 64-bit mode, the 8 old registers vs the 8 new, since
> those new regs require an extra prefix byte.

x86 has a lot of preferences, yes, but they're not enforced. Prefering A is
probably pushing non-orthogonality too far from the compiler's perspective.

As for preferring the first 8 registers: This is untrue. It prefers the
first 8 registers for sub-64-bit operations, yes, however

>>> 2. ARM has an array of modes for the STM/LDM modes: increment before,
>>> increment after, decrement before, decrement after. Thumb only has
>>> STM decrement before (STMDB) and LDM increment after (LDMIA). This
>>> is not coincidentally the way the stack operates
>
> x86 has a real stack...

How is x86' stack any different from ARM's? In fact, ARM's is more flexible,
because you can push down as many registers as you want in a single
instruction

It may surprise you, but on x86 its faster to do
sub $n, %rsp
mov %rax, 0(%rsp)
mov %rbx, 8(%rsp)
and so on, than to do it with pushes, if you're spilling quite a few
registers.

Plus, ARM gets the use of its instructions for all registers.

>>>
>>> Owen
>>>
>>> * Remember that r13=SP, r14=LR, r15=PC, so they're somewhat less useful
>>> from many perspectives
>>
>> Basically, they've traded orthogonal-ness for code density. In
>> small/cost-sensitive embedded designs, where the code footprint
>> determines the significant part of the IC area and thereby cost, this
>> could be just the right solution.
>
> I sort of accept all that, what I don't get is the fact that for more or
> less my entire IT career, I've been told that all the special x86
> instructions with fixed and/or implied register operands made it very
> hard/impossible to generate really good compiled code, and that this
> problem was solved by having more registers and an othogonal instruction
> set, i.e. RISC. :-)
>
> (Personally I've never really understood what was so hard about x86,
> except for register pressure, mapping algorithms onto the
> register/instruction set have felt quite natural.)
>
> Terje
>

Register Allocation is a *hard* problem. When the architecture fixes
registers, it gets harder. ARM's 8 registers may be limited compared to the
usual 12, but its not like it ever forces you to use a given register. This
helps quite a bit

- Owen

From: Morten Reistad on 11 Aug 2010 14:04

In article <lro9j7-38b.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Kai Harrekilde-Petersen wrote:
>> Owen Shepherd<owen.shepherd(a)e43.eu> writes:

>>> * Remember that r13=SP, r14=LR, r15=PC, so they're somewhat less useful
>>> from many perspectives
>>
>> Basically, they've traded orthogonal-ness for code density. In
>> small/cost-sensitive embedded designs, where the code footprint
>> determines the significant part of the IC area and thereby cost, this
>> could be just the right solution.
>
>I sort of accept all that, what I don't get is the fact that for more or
>less my entire IT career, I've been told that all the special x86
>instructions with fixed and/or implied register operands made it very
>hard/impossible to generate really good compiled code, and that this
>problem was solved by having more registers and an othogonal instruction
>set, i.e. RISC. :-)
>
>(Personally I've never really understood what was so hard about x86,
>except for register pressure, mapping algorithms onto the
>register/instruction set have felt quite natural.)

The x86 is a little weird, but not overly so. The various instructions
have lots of implied associations, but nothing totally exotic. For
exotic, try the VAX, or the Prime 50-series.

There is an effect of code compression in the x86 because of all the
implict associations, but we pay for it with register transfers. With
the cache-memory transfers being the limiting factor such compresion
actually has merit.

It makes writing the compiler somewhat more involved, and the
linear equations for optimising code get a few more terms; with the
possibility of local optima that gets in the way of optimisation.

The current state of the art seems to be to make "meta-operations"
in the compiler, map these to the x86 api, and then the hardware designers
decode the x86 code, make meta-operations and execute these.

-- mrr

From: Terje Mathisen "terje.mathisen at on 11 Aug 2010 15:02

Owen Shepherd wrote:
>> It also prefers, in 64-bit mode, the 8 old registers vs the 8 new, since
>> those new regs require an extra prefix byte.
>
> x86 has a lot of preferences, yes, but they're not enforced. Prefering A is
> probably pushing non-orthogonality too far from the compiler's perspective.
>
> As for preferring the first 8 registers: This is untrue. It prefers the
> first 8 registers for sub-64-bit operations, yes, however

So basically it does prefer the first 8 regs, right?
:-)

>
>>>> 2. ARM has an array of modes for the STM/LDM modes: increment before,
>>>> increment after, decrement before, decrement after. Thumb only has
>>>> STM decrement before (STMDB) and LDM increment after (LDMIA). This
>>>> is not coincidentally the way the stack operates
>>
>> x86 has a real stack...
>
> How is x86' stack any different from ARM's? In fact, ARM's is more flexible,
> because you can push down as many registers as you want in a single
> instruction

That's a load/store multi, it is mostly a win for program size, more
seldom an actual cpu speedup.
>
> It may surprise you, but on x86 its faster to do
> sub $n, %rsp
> mov %rax, 0(%rsp)
> mov %rbx, 8(%rsp)
> and so on, than to do it with pushes, if you're spilling quite a few
> registers.

On many x86 models that is true, but not all afaik.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Owen Shepherd on 11 Aug 2010 15:39

>> How is x86' stack any different from ARM's? In fact, ARM's is more
>> flexible, because you can push down as many registers as you want in a
>> single instruction
>
> That's a load/store multi, it is mostly a win for program size, more
> seldom an actual cpu speedup.

It depends. Load/Store multiple generally are slightly faster, if only
because it gives the CPU more opportunities to perform 64-bit (or bigger)
accesses.

From: Nick Maclaren on 11 Aug 2010 15:58

In article <s95bj7-ib7.ln1(a)laptop.reistad.name>,
Morten Reistad <first(a)last.name> wrote:
>In article <lro9j7-38b.ln1(a)ntp.tmsw.no>,
>Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>>Kai Harrekilde-Petersen wrote:
>>> Owen Shepherd<owen.shepherd(a)e43.eu> writes:
>
>>>> * Remember that r13=SP, r14=LR, r15=PC, so they're somewhat less useful
>>>> from many perspectives
>>>
>>> Basically, they've traded orthogonal-ness for code density. In
>>> small/cost-sensitive embedded designs, where the code footprint
>>> determines the significant part of the IC area and thereby cost, this
>>> could be just the right solution.
>>
>>I sort of accept all that, what I don't get is the fact that for more or
>>less my entire IT career, I've been told that all the special x86
>>instructions with fixed and/or implied register operands made it very
>>hard/impossible to generate really good compiled code, and that this
>>problem was solved by having more registers and an othogonal instruction
>>set, i.e. RISC. :-)

I am afraid that you were taught by religious dogmatists :-(

>>(Personally I've never really understood what was so hard about x86,
>>except for register pressure, mapping algorithms onto the
>>register/instruction set have felt quite natural.)
>
>The x86 is a little weird, but not overly so. The various instructions
>have lots of implied associations, but nothing totally exotic. For
>exotic, try the VAX, or the Prime 50-series.

Yes. The same remarks were made by the same dogmatists about the
System/370 series, and they were even less justified. There were
some weird instructions, but they were used only by people who wrote
assembler procedures and run-time systems. The basic instruction set
was very simple, and that is all that almost all compilers used.

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Effects of Memory Latency and Bandwidth onSupercomputer,Application Performance
Next: Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance