Use of double type in float subtraction vs precision [C++]

Prev: But adding two extra digits of precision gives -937566.236469986849
Next: deleted functions and conversions

From: LiloLilo on 26 Jun 2010 23:56

Hi,

can double type rapresent exactly every result when subtracting two
floats? Example:

float a;
float b;

double result;

result = (double)a - (double)b;

Does result suffer of some rounding issue in this case or it would
always rapresent the very correct result?

Thank you all for help.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Jens Schmidt on 27 Jun 2010 21:08

LiloLilo wrote:

> can double type rapresent exactly every result when subtracting two
> floats?

> double result;
>
> result = (double)a - (double)b;
>
> Does result suffer of some rounding issue in this case or it would
> always rapresent the very correct result?

Because the Range of possible float values is usually larger than the
number of digits in double, the answer is no.
Example: use 1e+30 for a and 1e-30 for b.
--
Greetings,
Jens Schmidt

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on 27 Jun 2010 21:09

LiloLilo wrote:
> can double type rapresent exactly every result when subtracting two
> floats? Example:
>
> float a;
> float b;
>
> double result;
>
> result = (double)a - (double)b;
>
> Does result suffer of some rounding issue in this case or it would
> always rapresent the very correct result?

If the difference in the binary exponents of a and b exceed the additional
significand precision bits in a double (29), you'll lose a corresponding
number
of bits of precision.

The reason is because in order to do the subtraction, the significands
must be
normalized, meaning shifted left or right until the exponents of a and b
match.
When significand bits get shifted off the end of the representation,
they're gone.

So, no, a double is not anywhere near a complete answer to float precision.

---
Walter Bright, Digital Mars
free C, C++, D programming language compilers
http://www.digitalmars.com

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Pierre Asselin on 28 Jun 2010 05:39

LiloLilo <danilobrambilla(a)tiscali.it> wrote:

> can double type rapresent exactly every result when subtracting two
> floats?

No.

> Example:

> float a;
> float b;

> double result;

> result = (double)a - (double)b;

> Does result suffer of some rounding issue in this case or it would
> always rapresent the very correct result?

float a= 1e20f;
float b= 1e-20f;
double result= (double)a - (double)b;

--
pa at panix dot com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Keith H Duggar on 28 Jun 2010 06:14

On Jun 27, 10:56 am, LiloLilo <danilobrambi...(a)tiscali.it> wrote:
> Hi,
>
> can double type rapresent exactly every result when subtracting two
> floats? Example:
>
> float a;
> float b;
>
> double result;
>
> result = (double)a - (double)b;
>
> Does result suffer of some rounding issue in this case or it would
> always rapresent the very correct result?

No.

"What Every Scientist Should Know About Floating-Point Arithmetic"
http://dlc.sun.com/pdf/800-7895/800-7895.pdf

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

KHD

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

|
Pages: 1
Prev: But adding two extra digits of precision gives -937566.236469986849
Next: deleted functions and conversions