RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: MitchAlsup on 23 Jun 2010 15:19

On Jun 23, 10:42 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net>
wrote:
> On 6/22/2010 11:12 AM, Tim McCaffrey wrote:
>
> > 2) Easy to decode: reduces gate count, which reduces power consumption, and
> > potentially removes a pipeline stage (maybe). AFAICT, every x86 has a
> > limitation of only being able to decode/issue one instruction if it hasn't
> > been executed before. It appears all x86 implementations use the I-cache to
> > mark instruction boundaries for parallel decoding on the following passes.
<snip>
> AMD has long had this limit.

No, not quite. When the Athlon/Opteron processors fetch an instruction
that has no marker bits, it decodes 4 bytes per cycle. There can be
0,1,2,3, or 4 instructions, and the decode pipeline is capable of
doing 0,1,2,3 from there. A majority of the time, the choice is from
the set {0,1} due to boundary spanning.

Mitch

From: jacko on 23 Jun 2010 21:07

On Jun 23, 8:19 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
> On Jun 23, 10:42 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net>
> wrote:
>
> > On 6/22/2010 11:12 AM, Tim McCaffrey wrote:
>
> > > 2) Easy to decode: reduces gate count, which reduces power consumption, and
> > > potentially removes a pipeline stage (maybe). AFAICT, every x86 has a
> > > limitation of only being able to decode/issue one instruction if it hasn't
> > > been executed before. It appears all x86 implementations use the I-cache to
> > > mark instruction boundaries for parallel decoding on the following passes.
> <snip>
> > AMD has long had this limit.
>
> No, not quite. When the Athlon/Opteron processors fetch an instruction
> that has no marker bits, it decodes 4 bytes per cycle. There can be
> 0,1,2,3, or 4 instructions, and the decode pipeline is capable of
> doing 0,1,2,3 from there. A majority of the time, the choice is from
> the set {0,1} due to boundary spanning.
>
> Mitch

And I thought CISC was to reduce memory bandwidth loading the cache,
when actually in a cache variable size, causes extra bit, faults on
double meaning tricks and a whole need to issue 0 instructions due to
miss alignments. Are you sure the bytes are not expanded so that they
occupy a fixed width format, and take advantage of the 4 way
associativity?

Cheers Jacko

From: Andy 'Krazy' Glew on 23 Jun 2010 22:44

On 6/23/2010 8:31 AM, Andy 'Krazy' Glew wrote:
> On 6/22/2010 9:24 AM, MitchAlsup wrote:
> I agree that bounds checking (buffer overflow) is the most important
> problem (that hardware has much chance of helping with [*]).

I forgot to add the footnote [*]:

My personal ranking for "hardware support that catches or prevents security bugs" is

1) buffer overflow
2) integer overflow

and then some making up the train.

With a few bandaids - incomplete, but low hanging fruit - like stack shadowing, etc.

I'm not sure if crypto-obfuscated execution is on this list or not - it finds many bugs, but it is also an enabler of
much more advanced applications.

Anyway, the footnote: taint tracking, or poison propagation, aka Dynamic Information Flow Tracking (DIFT), e.g. Raksha.

Cool idea. Every young computer architect plays with it (at least, I did, in my 20s). Has the potential of catching a
completely different class of bug, like SQL injection and scripting. The Raksha papers also show it doing surprisingly
well catching buffer overflows. Especially attractive because conventional buffer overflow detection a la Milo
Martin's HardBound requires recompilation - the compiler has to indicate what the bounds of objects are. Whereas Raksha
DIFT tainting poison can work with legacy binaries - tainting at OS interfaces.

Hard to make generic. Hard to make a scheme that works with several different users of taint propagation - performance,
security, etc. The proposals that I have seen all strike me as a "Bill Joy is death" 80% solution, only solving part of
the problem. Not inappropriate since Raksha is from Berkeley. But maybe good enough. I can imagine a startup pushing
Raksha, whereas it is had to imagine anyone other than Intel or AMD pushing Hardbound.

I usually prefer to have software demo a technology, and only add hardware support for performance (or atomicity, or
security). IMHO HardBound-like technology has been demoed in software. So has tainting - although I think that the jury
is still out on tainting, e.g. in Perl.

But I feel obliged to mention taint propagation. For that matter, think on it.

From: Andy 'Krazy' Glew on 23 Jun 2010 22:45

On 6/23/2010 11:29 AM, Terje Mathisen wrote:
> Andy 'Krazy' Glew wrote:
>> Whereas, if you use the normal behaviour of 2's complement integers
>> (signed - what does it mean to say that a 2's complement number is
>> unsigned?)
>>
>> #define sat_add(a,b)
>> ((typeof<a>(a+b)>(a))&&(typeof<a>(a+b)>(b))?(a+b):SAT_MAX)
>>
>> works for all 2's complement types. signed. and, yes, unsigned.
>
> Huh???
>
> What happens when both a and b are negative?
>
> (-1 + -2) is less than both -1 and -2, so both parts of that test will
> agree that the proper answer is SAT_MAX: Probably not what you want!
>
> The next issue is of course when you do (-100 + -100) with 8-bit values
> and end up with +56 instead of -200 or a saturated -128.
>
> Terje
>

I know. The expression got too long for me to write in the margins of cmp.arch.

From: Mike Hore on 24 Jun 2010 02:16

Andy 'Krazy' Glew wrote:
> On 6/23/2010 11:29 AM, Terje Mathisen wrote:
>> Andy 'Krazy' Glew wrote:
>>> Whereas, if you use the normal behaviour of 2's complement integers
>>> (signed - what does it mean to say that a 2's complement number is
>>> unsigned?)
>>>
>>> #define sat_add(a,b)
>>> ((typeof<a>(a+b)>(a))&&(typeof<a>(a+b)>(b))?(a+b):SAT_MAX)
>>>
>>> works for all 2's complement types. signed. and, yes, unsigned.
>>
>> Huh???
>>
>> What happens when both a and b are negative?
>>
>> (-1 + -2) is less than both -1 and -2, so both parts of that test will
>> agree that the proper answer is SAT_MAX: Probably not what you want!
>>
>> The next issue is of course when you do (-100 + -100) with 8-bit values
>> and end up with +56 instead of -200 or a saturated -128.
>>
>> Terje
>>
>
>
> I know. The expression got too long for me to write in the margins of
> cmp.arch.
>

Even assuming you fix the expression so it works, would it still survive
an optimization that takes "undefined on overflow" to mean "assume
overflow can't happen" - as we were discussing earlier?

Cheers, Mike.

(BTW, I've retired, so I never have to use C. Or hardly ever x86, for
that matter :-)

---------------------------------------------------------------
Mike Hore mike_horeREM(a)OVE.invalid.aapt.net.au
---------------------------------------------------------------

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?