From: pnachtwey on
On Jul 6, 3:33 pm, HardySpicer <gyansor...(a)gmail.com> wrote:
> For floating point arithmetic how much faster is an add/subtract than
> a multiply/accumulate? (percentage wise).
>
> Hardy
I have experience with the C32 and C33. Multiplies and adds or
multiplies and subtracts can happen at a rate of 1 per clock cycle but
that doesn't mean the complete in that time as other have mentioned.
The C32 and C33 can sometimes do two floating point operations but
usually one is fetching from memory. The big enemy is not multiplies,
adds or subtracts but divides. Also pipe line stalls due to getting
and storing data back to memory. To estimate time I usually count
memory cycles.

Peter Nachtwey
From: Vladimir Vassilevsky on


pnachtwey wrote:
> On Jul 6, 3:33 pm, HardySpicer <gyansor...(a)gmail.com> wrote:
>
>>For floating point arithmetic how much faster is an add/subtract than
>>a multiply/accumulate? (percentage wise).
>>
>>Hardy
>
> I have experience with the C32 and C33. Multiplies and adds or
> multiplies and subtracts can happen at a rate of 1 per clock cycle but
> that doesn't mean the complete in that time as other have mentioned.
> The C32 and C33 can sometimes do two floating point operations but
> usually one is fetching from memory. The big enemy is not multiplies,
> adds or subtracts but divides.

IIRC there is no penalty for floating point division in Intel P5+ CPUs;
with their huge pipelines all arithmetic operations have the same cost.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
From: Manny on
On Jul 6, 11:33 pm, HardySpicer <gyansor...(a)gmail.com> wrote:
> For floating point arithmetic how much faster is an add/subtract than
> a multiply/accumulate? (percentage wise).
>
> Hardy

It depends on your code, compiler, and architecture. Best practice is
to measure this statistically.

I tend to think architecture all the time. Sometimes a memory-memory
instruction can make all the difference in the world and even with
increased compiler smartness, there is no substitute to human
prudence. Because nothing can understand exactly what you'r after and
what you can do with or without but you.

-Momo
From: Manny on
On Jul 7, 11:45 pm, Manny <mlou...(a)hotmail.com> wrote:
> On Jul 6, 11:33 pm, HardySpicer <gyansor...(a)gmail.com> wrote:
>
> > For floating point arithmetic how much faster is an add/subtract than
> > a multiply/accumulate? (percentage wise).
>
> > Hardy
>
> It depends on your code, compiler, and architecture. Best practice is
> to measure this statistically.
>
> I tend to think architecture all the time. Sometimes a memory-memory
> instruction can make all the difference in the world and even with
> increased compiler smartness, there is no substitute to human
> prudence. Because nothing can understand exactly what you'r after and
> what you can do with or without but you.
>
> -Momo

Ah. Well that was in reply to other posts rather than your original.
If you'r building a case against something, power might be of
relevance here.

-Momo
From: bryant_s on
>For floating point arithmetic how much faster is an add/subtract than
>a multiply/accumulate? (percentage wise).
>
>
>Hardy
>

The previous replies are correct if your metric is programmable processer
clock cycles.

In the hardware - a floating point number consists of a mantissa
(normalized fractional portion) and an exponent (the power of 2 of the
number).

When multiplying, the two mantissas are multiplied in a fashion similiar to
fixed-point multiplies. The exponents are added. The result is then
adjusted in its exponent to re-normalize the mantissa. The mantissa
generally is normalized to be on [0.5, 1.0), meaning two mantissas
multiplied together will range on [0.25, 1.0), meaning to re-normalize this
result back to the the [0.5, 1.0) range, there can be an extra shift of 1
bit (i.e. added to the resultant exponent).

However - standard floating point multipliers also check for floating-point
overflow (exponent too large) and zero. This adds another level of logic
at the output.

So - the mantissa multiply will be roughly of (relative) complexity M^2,
where M = number of mantissa bits. The exponent add is of (relative)
complexity N, where N = number of exponent bits. The single-bit shift,
depending on how it's done, can be extremely simple, but let's call it
complexity N because of the exponent decrement.

For the addition - this requires that the two numbers be adjusted so they
have the same exponent. This requires a compare of the two exponents
(complexity N), a shift of the smaller number to match the larger number
(complexity 2M), then an add of the mantissas (complexity M), a small-shift
adjustment of the result (complexity N), plus the misc logic to check
overflow and zero.

So, ignoring the output checking logic, a VERY rough estimate is that a
floating point multiply is of complexity (M^2 + N + N). A floating point
add is of complexity (N + 2M + M + N). Based on your specific floating
point format, you can then calculate your percentage comparison.

That being said, there are tricks to simplify this. For instance, the
final single-bit adjust of the multiply output can be incorporated into the
exponent add with some look ahead logic. I also add the caveat that I am
making a gross assumption that size / # of gates <--> delay.

Bryant Sorensen
DSP Platforms
Starkey Laboratories