From: MitchAlsup on 23 Jun 2010 15:19 On Jun 23, 10:42 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote: > On 6/22/2010 11:12 AM, Tim McCaffrey wrote: > > > 2) Easy to decode: reduces gate count, which reduces power consumption, and > > potentially removes a pipeline stage (maybe). AFAICT, every x86 has a > > limitation of only being able to decode/issue one instruction if it hasn't > > been executed before. It appears all x86 implementations use the I-cache to > > mark instruction boundaries for parallel decoding on the following passes. <snip> > AMD has long had this limit. No, not quite. When the Athlon/Opteron processors fetch an instruction that has no marker bits, it decodes 4 bytes per cycle. There can be 0,1,2,3, or 4 instructions, and the decode pipeline is capable of doing 0,1,2,3 from there. A majority of the time, the choice is from the set {0,1} due to boundary spanning. Mitch
From: jacko on 23 Jun 2010 21:07 On Jun 23, 8:19 pm, MitchAlsup <MitchAl...(a)aol.com> wrote: > On Jun 23, 10:42 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> > wrote: > > > On 6/22/2010 11:12 AM, Tim McCaffrey wrote: > > > > 2) Easy to decode: reduces gate count, which reduces power consumption, and > > > potentially removes a pipeline stage (maybe). AFAICT, every x86 has a > > > limitation of only being able to decode/issue one instruction if it hasn't > > > been executed before. It appears all x86 implementations use the I-cache to > > > mark instruction boundaries for parallel decoding on the following passes. > <snip> > > AMD has long had this limit. > > No, not quite. When the Athlon/Opteron processors fetch an instruction > that has no marker bits, it decodes 4 bytes per cycle. There can be > 0,1,2,3, or 4 instructions, and the decode pipeline is capable of > doing 0,1,2,3 from there. A majority of the time, the choice is from > the set {0,1} due to boundary spanning. > > Mitch And I thought CISC was to reduce memory bandwidth loading the cache, when actually in a cache variable size, causes extra bit, faults on double meaning tricks and a whole need to issue 0 instructions due to miss alignments. Are you sure the bytes are not expanded so that they occupy a fixed width format, and take advantage of the 4 way associativity? Cheers Jacko
From: Andy 'Krazy' Glew on 23 Jun 2010 22:44 On 6/23/2010 8:31 AM, Andy 'Krazy' Glew wrote: > On 6/22/2010 9:24 AM, MitchAlsup wrote: > I agree that bounds checking (buffer overflow) is the most important > problem (that hardware has much chance of helping with [*]). I forgot to add the footnote [*]: My personal ranking for "hardware support that catches or prevents security bugs" is 1) buffer overflow 2) integer overflow and then some making up the train. With a few bandaids - incomplete, but low hanging fruit - like stack shadowing, etc. I'm not sure if crypto-obfuscated execution is on this list or not - it finds many bugs, but it is also an enabler of much more advanced applications. Anyway, the footnote: taint tracking, or poison propagation, aka Dynamic Information Flow Tracking (DIFT), e.g. Raksha. Cool idea. Every young computer architect plays with it (at least, I did, in my 20s). Has the potential of catching a completely different class of bug, like SQL injection and scripting. The Raksha papers also show it doing surprisingly well catching buffer overflows. Especially attractive because conventional buffer overflow detection a la Milo Martin's HardBound requires recompilation - the compiler has to indicate what the bounds of objects are. Whereas Raksha DIFT tainting poison can work with legacy binaries - tainting at OS interfaces. Hard to make generic. Hard to make a scheme that works with several different users of taint propagation - performance, security, etc. The proposals that I have seen all strike me as a "Bill Joy is death" 80% solution, only solving part of the problem. Not inappropriate since Raksha is from Berkeley. But maybe good enough. I can imagine a startup pushing Raksha, whereas it is had to imagine anyone other than Intel or AMD pushing Hardbound. I usually prefer to have software demo a technology, and only add hardware support for performance (or atomicity, or security). IMHO HardBound-like technology has been demoed in software. So has tainting - although I think that the jury is still out on tainting, e.g. in Perl. But I feel obliged to mention taint propagation. For that matter, think on it.
From: Andy 'Krazy' Glew on 23 Jun 2010 22:45 On 6/23/2010 11:29 AM, Terje Mathisen wrote: > Andy 'Krazy' Glew wrote: >> Whereas, if you use the normal behaviour of 2's complement integers >> (signed - what does it mean to say that a 2's complement number is >> unsigned?) >> >> #define sat_add(a,b) >> ((typeof<a>(a+b)>(a))&&(typeof<a>(a+b)>(b))?(a+b):SAT_MAX) >> >> works for all 2's complement types. signed. and, yes, unsigned. > > Huh??? > > What happens when both a and b are negative? > > (-1 + -2) is less than both -1 and -2, so both parts of that test will > agree that the proper answer is SAT_MAX: Probably not what you want! > > The next issue is of course when you do (-100 + -100) with 8-bit values > and end up with +56 instead of -200 or a saturated -128. > > Terje > I know. The expression got too long for me to write in the margins of cmp.arch.
From: Mike Hore on 24 Jun 2010 02:16
Andy 'Krazy' Glew wrote: > On 6/23/2010 11:29 AM, Terje Mathisen wrote: >> Andy 'Krazy' Glew wrote: >>> Whereas, if you use the normal behaviour of 2's complement integers >>> (signed - what does it mean to say that a 2's complement number is >>> unsigned?) >>> >>> #define sat_add(a,b) >>> ((typeof<a>(a+b)>(a))&&(typeof<a>(a+b)>(b))?(a+b):SAT_MAX) >>> >>> works for all 2's complement types. signed. and, yes, unsigned. >> >> Huh??? >> >> What happens when both a and b are negative? >> >> (-1 + -2) is less than both -1 and -2, so both parts of that test will >> agree that the proper answer is SAT_MAX: Probably not what you want! >> >> The next issue is of course when you do (-100 + -100) with 8-bit values >> and end up with +56 instead of -200 or a saturated -128. >> >> Terje >> > > > I know. The expression got too long for me to write in the margins of > cmp.arch. > Even assuming you fix the expression so it works, would it still survive an optimization that takes "undefined on overflow" to mean "assume overflow can't happen" - as we were discussing earlier? Cheers, Mike. (BTW, I've retired, so I never have to use C. Or hardly ever x86, for that matter :-) --------------------------------------------------------------- Mike Hore mike_horeREM(a)OVE.invalid.aapt.net.au --------------------------------------------------------------- |