From: Terje Mathisen "terje.mathisen at on 17 Jun 2010 07:02 Brett Davis wrote: > RISC load-store verses x86 Add from memory. > t = a->x + a->y; > > RISC > load x,a[0] > load y,a[1] > add t = x,y > > x86 > load x,a[0] > add t = x,a[1] This should really be: mov eax,[a.x] add eax,[a.y] > Same number of loads, so dont fall into the trap of saying > more x86 instructions cause slow loads... > > What am I missing. On an OoO machine, there is absolutely no difference between the two methods: They both perform two load operations, allocate three (renamed) target registers (one for each basic/micro operation), and have the same total latency. Assuming a single 1-cycle L1 load/cycle gives 3 cycles of total latency for both models, with a 2-cycle L1 we end up with 4 cycles. If the L1 cache can supply two load ops/cycle, then we save one cycle of latency for both cpu types. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: jacko on 17 Jun 2010 07:14 You also imply RISC has to use multiporting. http://nibz.googlecode.com
From: MitchAlsup on 17 Jun 2010 10:34 On Jun 17, 12:33 am, Brett Davis <gg...(a)yahoo.com> wrote: > What am I missing. The measure of performance is not instructions, but the <lack of> time it takes to execute the semantic contents of the program. Given the program at hand, there will be more gate delays of logic to execute the x86 program than to execute the RISC program. x86 instructions take more gates to parse and decode, the x86 data path has an additional operand formatting multiplexer in the integer data path, and some additional logic after integer computations to deal with partial word writes in the register files. So, instead of having about an 80 gate delay per instruction pipeline, the x86 has about a 100 gate delay instruction pipeline. An extra pipe stage and you can make this added complexity almost vanish. Where x86 wins, is that they (now Intel; AMD used to do this too) can throw billions at FAB development technology (i.e. making faster transistors and faster interconnect wire). Secondarily, once you microarchitect a fully out-of-order processor, it really does not mater what the native instruction set is. ReRead that previous sentance until you understand. The native instruction set no longer maters once the microarchitecture has gone fully OoO! Mitch
From: Anton Ertl on 17 Jun 2010 11:24 Stephen Sprunk <stephen(a)sprunk.org> writes: >On 17 Jun 2010 00:33, Brett Davis wrote: >> RISC load-store verses x86 Add from memory. >> t = a->x + a->y; >> >> RISC >> load x,a[0] >> load y,a[1] >> add t = x,y > >load r1, a[0] >load r2, a[1] >add r3, r1, r2 >store t, r3 > >> x86 >> load x,a[0] >> add t = x,a[1] > >load r1, a[0] >add r1, a[1] >store t, r1 If t is a local variable, decent C compilers will usually allocate it into a register, and no store is needed. >> RISC shows its superiority by being 50% more instructions and 50% slower... It's just as easy to find an example where IA-32 and AMD64 have 100% more instructions: x = y+z; where x, y, and z are locals that live in registers, and y and z are alive after this statement. On RISC: add x<-y+z; On IA-32/AMD64: mov x<-y add x<-x+z So looking at one particular code fragment proves nothing. As for speed, that depends on the actual CPU. Both the 386 and the Phenom II can execute IA-32 code, yet they do it at vastly different speeds; likewise, MIPS R2000 and Power7 are two RISC processors with very different speeds. >You are missing that a modern x86 chip is not a CISC chip; it is a RISC >chip with a CISC decoder slapped on the front, That statement does not make sense. CISC and RISC are instruction-set styles. Modern IA-32/AMD64 chips only execute the IA-32, AMD64, and maybe 8086 instruction sets, all of which are CISC instruction sets. What you may be thinking of is that the microarchitectures of current high-performance CISC and RISC CPUs are relatively similar, and quite different from the microarchitectures of CISC and RISC CPU when RISCs were introduced. - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: Stephen Sprunk on 18 Jun 2010 01:08
On 17 Jun 2010 10:24, Anton Ertl wrote: > Stephen Sprunk <stephen(a)sprunk.org> writes: >> On 17 Jun 2010 00:33, Brett Davis wrote: >>> RISC load-store verses x86 Add from memory. >>> t = a->x + a->y; >>> >>> RISC >>> load x,a[0] >>> load y,a[1] >>> add t = x,y >> >> load r1, a[0] >> load r2, a[1] >> add r3, r1, r2 >> store t, r3 >> >>> x86 >>> load x,a[0] >>> add t = x,a[1] >> >> load r1, a[0] >> add r1, a[1] >> store t, r1 > > If t is a local variable, decent C compilers will usually allocate it > into a register, and no store is needed. True, but if you're going to talk about compiler optimizations, then odds are the code is unlikely to resemble what you wrote in a HLL in the first place except for the most trivial of programs. The point I was trying to make is that x86 has no 3-operand add instruction like the one he used in his example, nor does RISC allow a memory address as the destination of an add instruction as he did in his example. I corrected both to show a fairer comparison. >> You are missing that a modern x86 chip is not a CISC chip; it is a RISC >> chip with a CISC decoder slapped on the front, > > That statement does not make sense. CISC and RISC are instruction-set > styles. Modern IA-32/AMD64 chips only execute the IA-32, AMD64, and > maybe 8086 instruction sets, all of which are CISC instruction sets. > > What you may be thinking of is that the microarchitectures of current > high-performance CISC and RISC CPUs are relatively similar, and quite > different from the microarchitectures of CISC and RISC CPU when RISCs > were introduced. Alternately, one can look at a modern x86 chip as a core that runs a model-specific RISC ISA hidden behind a decoder that translates x86 CISC instructions into that ISA. That may offend purists, but IMHO it's accurate enough for those of us who don't actually design CPUs. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking |