From: rickman on 8 Jul 2010 16:50 On Jul 7, 12:33 am, Al Clark <acl...(a)danvillesignal.com> wrote: > Raymond Toy <toy.raym...(a)gmail.com> wrote innews:i10bse$h90$1(a)news.eternal- > september.org: > > > On 7/6/10 6:33 PM, HardySpicer wrote: > >> For floating point arithmetic how much faster is an add/subtract than > >> a multiply/accumulate? (percentage wise). > > > Probably depends on the chip. The last time I used a floating point dsp > > (C30!) all floating point ops (add, sub, mul, mac) finished in a single > > cycle. (I think.) > > > Ray > > I entered into the middle of this thread so unless I have the context > wrong.... > > On a SHARC, floating point multiply and floating add have the same cost - one > instruction, actually you can do two each in SIMD with some constraints. > Fixed point math also operates in one cycle. > > Instructions on a SHARC operate at the core clock, which can be as high as > 450M. They all execute in 1 cycle. > > I assume that the TI floating point DSPs would be similar. > > Single cycle (1 instruction) processing is quite normal for DSPs. Algorithms > that trade off multiplies for adds are not generally helpful with DSPs. OTOH, > these techniques can be very useful for other type of devices such as FPGAs > or GP microcontrollers. > > Al Clarkwww.danvillesignal.com I haven't checked the specs on the SHARC, but aren't the TI floating point chips pipelined? I remember the fixed point chips are (or were, it's been a while since I've worked closely with them). To do a MAC operation takes multiple cycles, but you can start an new one on each CPU clock. Certainly it is possible to do floating point operations in purely combinatorial logic, but pipelining lets it run much faster with little added logic. BTW, there are at least two types of floating point chips these days. TI has their C67' family which is a barn burner with multiple compute engines. TI also has smaller, low cost chips which are built for control apps. But to be honest, I don't recall reading if they are pipelined or not. Rick
From: Al Clark on 8 Jul 2010 17:47 rickman <gnuarm(a)gmail.com> wrote in news:74e09402-9a68-4256-80cb- d087368c340c(a)b35g2000yqi.googlegroups.com: > On Jul 7, 12:33�am, Al Clark <acl...(a)danvillesignal.com> wrote: >> Raymond Toy <toy.raym...(a)gmail.com> wrote innews:i10bse$h90$1 @news.eterna > l- >> september.org: >> >> > On 7/6/10 6:33 PM, HardySpicer wrote: >> >> For floating point arithmetic how much faster is an add/subtract than >> >> a multiply/accumulate? (percentage wise). >> >> > Probably depends on the chip. �The last time I used a floating point > dsp >> > (C30!) all floating point ops (add, sub, mul, mac) finished in a single >> > cycle. �(I think.) >> >> > Ray >> >> I entered into the middle of this thread so unless I have the context >> wrong.... >> >> On a SHARC, floating point multiply and floating add have the same cost - > one >> instruction, actually you can do two each in SIMD with some constraints. >> Fixed point math also operates in one cycle. >> >> Instructions on a SHARC operate at the core clock, which can be as high a > s >> 450M. They all execute in 1 cycle. >> >> I assume that the TI floating point DSPs would be similar. >> >> Single cycle (1 instruction) processing is quite normal for DSPs. Algorit > hms >> that trade off multiplies for adds are not generally helpful with DSPs. O > TOH, >> these techniques can be very useful for other type of devices such as FPG > As >> or GP microcontrollers. >> >> Al Clarkwww.danvillesignal.com > > I haven't checked the specs on the SHARC, but aren't the TI floating > point chips pipelined? I remember the fixed point chips are (or were, > it's been a while since I've worked closely with them). To do a MAC > operation takes multiple cycles, but you can start an new one on each > CPU clock. Certainly it is possible to do floating point operations > in purely combinatorial logic, but pipelining lets it run much faster > with little added logic. > The TI DSPs are heavily pipelined. I think this is the main reason that assembly language programming is so difficult with them. SHARC instructions execute promptly. You can easily write either assembly (looks a bit like C) and C. Al Clark www.danvillesignal.com
From: steveu on 9 Jul 2010 02:45 >rickman <gnuarm(a)gmail.com> wrote in news:74e09402-9a68-4256-80cb- >d087368c340c(a)b35g2000yqi.googlegroups.com: > >> On Jul 7, 12:33�am, Al Clark <acl...(a)danvillesignal.com> wrote: >>> Raymond Toy <toy.raym...(a)gmail.com> wrote innews:i10bse$h90$1 >@news.eterna >> l- >>> september.org: >>> >>> > On 7/6/10 6:33 PM, HardySpicer wrote: >>> >> For floating point arithmetic how much faster is an add/subtract than >>> >> a multiply/accumulate? (percentage wise). >>> >>> > Probably depends on the chip. �The last time I used a floating point >> dsp >>> > (C30!) all floating point ops (add, sub, mul, mac) finished in a >single >>> > cycle. �(I think.) >>> >>> > Ray >>> >>> I entered into the middle of this thread so unless I have the context >>> wrong.... >>> >>> On a SHARC, floating point multiply and floating add have the same cost >- >> one >>> instruction, actually you can do two each in SIMD with some constraints. >>> Fixed point math also operates in one cycle. >>> >>> Instructions on a SHARC operate at the core clock, which can be as high >a >> s >>> 450M. They all execute in 1 cycle. >>> >>> I assume that the TI floating point DSPs would be similar. >>> >>> Single cycle (1 instruction) processing is quite normal for DSPs. >Algorit >> hms >>> that trade off multiplies for adds are not generally helpful with DSPs. >O >> TOH, >>> these techniques can be very useful for other type of devices such as >FPG >> As >>> or GP microcontrollers. >>> >>> Al Clarkwww.danvillesignal.com >> >> I haven't checked the specs on the SHARC, but aren't the TI floating >> point chips pipelined? I remember the fixed point chips are (or were, >> it's been a while since I've worked closely with them). To do a MAC >> operation takes multiple cycles, but you can start an new one on each >> CPU clock. Certainly it is possible to do floating point operations >> in purely combinatorial logic, but pipelining lets it run much faster >> with little added logic. >> > >The TI DSPs are heavily pipelined. I think this is the main reason that >assembly language programming is so difficult with them. > >SHARC instructions execute promptly. You can easily write either assembly >(looks a bit like C) and C. All high performance processors are heavily pipelined. The only alternative to taking a number of cycles to complete a floating point operation is to have a very low clock rate. Both the TI and ADI cores are deeply pipelined. The difference is in how much the pipeline is exposed to or hidden from the programmer. If you need an answer from one of these processors to feed into the next step of the calculation you need to wait quite a few cycles, whether it is by explicit programmer action, or by a hardware controlled processor stall. In either case, if you don't want to waste cycles you have to do some serious work hand scheduling the flow. Steve
From: Vladimir Vassilevsky on 10 Jul 2010 10:48 Al Clark wrote: > The TI DSPs are heavily pipelined. I think this is the main reason that > assembly language programming is so difficult with them. TI assembler is evil. Perhaps, it was purposely done that way. > SHARC instructions execute promptly. You can easily write either assembly > (looks a bit like C) and C. Sharc is also pipelined, and the pipeline is even not fully interlocked. Delayed branches and multiple bugs illustrate that. AD processors make an impression of nice concept, but lacking realization. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
From: rickman on 12 Jul 2010 01:24 On Jul 9, 2:45 am, "steveu" <steveu(a)n_o_s_p_a_m.coppice.org> wrote: > >rickman <gnu...(a)gmail.com> wrote in news:74e09402-9a68-4256-80cb- > >d087368c3...(a)b35g2000yqi.googlegroups.com: > > >> On Jul 7, 12:33 am, Al Clark <acl...(a)danvillesignal.com> wrote: > >>> Raymond Toy <toy.raym...(a)gmail.com> wrote innews:i10bse$h90$1 > >@news.eterna > >> l- > >>> september.org: > > >>> > On 7/6/10 6:33 PM, HardySpicer wrote: > >>> >> For floating point arithmetic how much faster is an add/subtract > than > >>> >> a multiply/accumulate? (percentage wise). > > >>> > Probably depends on the chip. The last time I used a floating > point > >> dsp > >>> > (C30!) all floating point ops (add, sub, mul, mac) finished in a > >single > >>> > cycle. (I think.) > > >>> > Ray > > >>> I entered into the middle of this thread so unless I have the context > >>> wrong.... > > >>> On a SHARC, floating point multiply and floating add have the same cost > >- > >> one > >>> instruction, actually you can do two each in SIMD with some > constraints. > >>> Fixed point math also operates in one cycle. > > >>> Instructions on a SHARC operate at the core clock, which can be as high > >a > >> s > >>> 450M. They all execute in 1 cycle. > > >>> I assume that the TI floating point DSPs would be similar. > > >>> Single cycle (1 instruction) processing is quite normal for DSPs. > >Algorit > >> hms > >>> that trade off multiplies for adds are not generally helpful with DSPs. > >O > >> TOH, > >>> these techniques can be very useful for other type of devices such as > >FPG > >> As > >>> or GP microcontrollers. > > >>> Al Clarkwww.danvillesignal.com > > >> I haven't checked the specs on the SHARC, but aren't the TI floating > >> point chips pipelined? I remember the fixed point chips are (or were, > >> it's been a while since I've worked closely with them). To do a MAC > >> operation takes multiple cycles, but you can start an new one on each > >> CPU clock. Certainly it is possible to do floating point operations > >> in purely combinatorial logic, but pipelining lets it run much faster > >> with little added logic. > > >The TI DSPs are heavily pipelined. I think this is the main reason that > >assembly language programming is so difficult with them. > > >SHARC instructions execute promptly. You can easily write either assembly > >(looks a bit like C) and C. > > All high performance processors are heavily pipelined. The only alternative > to taking a number of cycles to complete a floating point operation is to > have a very low clock rate. Both the TI and ADI cores are deeply pipelined. > The difference is in how much the pipeline is exposed to or hidden from the > programmer. If you need an answer from one of these processors to feed into > the next step of the calculation you need to wait quite a few cycles, > whether it is by explicit programmer action, or by a hardware controlled > processor stall. In either case, if you don't want to waste cycles you have > to do some serious work hand scheduling the flow. > > Steve That is not my experience. I recall now that the TI processors are pipelined just as most modern CPUs are pipelined. But the only stalls are when a branch instruction is executed, just as any pipelined processor stalls when you require an out of line instruction fetch. There can also be stalls for data, but that should only be when external memory is accessed or simultaneous accesses are made to the same memory block, although some memory is dual ported. DSP functions typically don't have a problem executing at full speed on these processors, they are designed to do that. I don't recall having any particular trouble with that. The TI C6x families are a bit trickier just because it can be hard to keep all the execution units working at full speed, but it is more that you are given more flexibility and it can be hard to use while the SHARC devices don't have as much flexibility, but are a bit easier to use because of it. But then I have not worked much with the SHARC devices so I am no expert with them. Rick
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: save your money and open all blocked sites now Next: pulsewidth and bandwidth |