floating point, how many significant figures? [C++]

Prev: Accessor Functions (getter) for C String (Character Array) Members
Next: Why is the return type of count_if() "signed" rather than "unsigned"?

From: George Neuner on 26 Jun 2010 05:22

On Fri, 25 Jun 2010 07:11:54 CST, Andrew
<marlow.andrew(a)googlemail.com> wrote:

>On 25 June, 08:41, George Neuner <gneun...(a)comcast.net> wrote:
>>
>> The VC++ stream library has a known precision issue when writing and
>> reading back floating point values as text using different
>> representations.
>
>What issues are these? Can you give a reference please?

I don't have a cite ready, but problems similar to yours have been
discussed before in this group. The last one I remember was about
exponential notation: someone was reading in values like "0.0003" and
"3.e-4" and finding they don't compare equal. This is a known issue
with the VC++ library - it goes back at least to VS2002 and maybe
further.

>> The fallback answer in VC++ is to read FP data
>> using scanf if you don't know how it was written (scanf always works).
>
>Actually, I think the thing to do to ensure that the numbers have
>their exact string representation preserved is to keep them as strings
>when the file is read in. This was not done for memory space reasons.

Does the file have to be human readable? If not, store the exact bit
pattern in binary or in hexadecimal (16 chars).

>I can prove there is a problem. Here is a little program:
>
>
>#include <iostream>
>#include <iomanip>
>#include <sstream>
>#include <cmath>
>
>double convertStringToValue(const std::string input)
>{
> double value;
> std::stringstream str(input);
> str >> value;
> return value;
>}
>
>std::string formatValue(double value)
>{
> std::stringstream str;
>
> int prec = std::min<int>(15 - (int)log10(fabs(value)), 15);
>
> str << std::fixed << std::setprecision(prec) << value;
>
> return str.str();
>}
>
>int main()
>{
> std::string input = "-937566.2364699869";
> double value = convertStringToValue(input);
> std::string converted = formatValue(value);
>
> if (input == converted)
> std::cout << input << " converted ok." << std::endl;
> else
> {
> std::cout << "Conversion failed:" << std::endl
> << "Input: " << input << std::endl
> << "Converted: " << converted << std::endl;
>
> }
>
> return 0;
>}
>
>Here's what I get when I run it (built using GCC 4.4.2):-
>
>Conversion failed:
>Input: -937566.2364699869
>Converted: -937566.2364699868

I don't deny there is a discrepancy, but the question still is which
compiler is wrong. I see that VC++ is giving back the input value,
but it is by coincidence.

If you stop in the debugger and examine the value of the double,
you'll find it to be -937566.23646998685 ... on output VC++ is
rounding whereas G++ is truncating.

If you change the input string to "-937566.2364699868", the value of
the double is still -937566.23646998685 ... but in this case, G++
matches the input while VC++ does not.

For my money VC++ is the one doing it wrong.

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Martin Vejnár on 26 Jun 2010 23:55

On 6/26/2010 10:21 AM, Andrew wrote:
> On 25 June, 13:11, SG<s.gesem...(a)gmail.com> wrote:
>> If you're interested in a lossless double->string->double roundtrip
>> you should use 17 decimal digits and high quality conversions.

And for the other roundtrip (string->double->string) to be guaranteed to
work the original string must have at most 15 significant digits.

> See my sample program in this thread that uses the value
> -937566.2364699869. When GCC takes that string, converts it to a
> double, then converts the double back to a string, it gives
> -937566.2364699868. Adding an extra digit of precision gives
> -937566.23646998685. IFAICS this means it is doing the rounding
> incorrectly.

No, as mentioned before, the nearest double is -937566.2364699868 485...
Rounding to 16 digits correctly returns -937566.2364699868. Rounding to
17 yields (also correctly) -937566.23646998685.
--
Martin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Seungbeom Kim on 27 Jun 2010 00:04

On 2010-06-26 01:21, Andrew wrote:
> On 25 June, 13:11, SG <s.gesem...(a)gmail.com> wrote:
>>
>> 937566.2364699869 =
>> 11100100111001011110.001111001000100101001100000011000 01110...
>>
>> The closest representable number with an IEEE-754 64bit float is
>>
>> 11100100111001011110.001111001000100101001100000011000 =
>> 937566.2364699868 485...
>>
>> The closest representable 16-digit decimal number is
>>
>> 937566.2364699868
>>
>> So, your program you compiled with GCC did a good job.
>
> I'm not convinced.

Why not?

Consider a fictional system where numbers are internally represented in
increments of 1/3, i.e. { ..., -1, -2/3, -1/3, 0, 1/3, 2/3, 1, ... }.
You give "0.4" as a string representation. The system chooses 1/3 as
the closest internal representation. It gets converted as "0.3", not
"0.4", because 1/3 is better represented as "0.3", than as "0.4".

Does this make sense? I perceive this to be essentially the same as
SG's explanation above.

>> If you're interested in a lossless double->string->double roundtrip
>> you should use 17 decimal digits and high quality conversions.

Using long double instead of double in your sample program (mentioned
just below) indeed gives back the same input string.

> See my sample program in this thread that uses the value
> -937566.2364699869. When GCC takes that string, converts it to a
> double, then converts the double back to a string, it gives
> -937566.2364699868. Adding an extra digit of precision gives
> -937566.23646998685. IFAICS this means it is doing the rounding
> incorrectly.

Adding another extra digit of precision gives -937566.236469986849,
which is consistent to what SG has already demonstrated: the closest
representable number is -937566.2364699868485...
AFAICS, it is correct that this is converted to -937566.2364699868.

--
Seungbeom Kim

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Seungbeom Kim on 27 Jun 2010 00:04

On 2010-06-25 06:11, Andrew wrote:
>
> I can prove there is a problem. Here is a little program:

Depending on what the "problem" really means, i.e. how it is defined.
See my other post in this thread.

> std::string formatValue(double value)
> {
> std::stringstream str;
>
> int prec = std::min<int>(15 - (int)log10(fabs(value)), 15);
>
> str << std::fixed << std::setprecision(prec) << value;
>
> return str.str();
> }

prec isn't always positive. I'm not sure if the behavior is defined
even when precision is non-positive.

Why don't you just do:

// relying on the default state of str:
// (str.flags() & std::ios_base::floatfield) == 0
str << std::setprecision(16) << value;

(The precision means the number of significant digits in the default-
float format, while it means the number of digits after the decimal-
point character in the fixed format, so I used 16 instead of 15 here.)

By the way, you were asking for 16 digits, which is one more than
DBL_DIG=15. No wonder the value appears changed after the conversion.

--
Seungbeom Kim

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: George Neuner on 27 Jun 2010 00:12

On Sat, 26 Jun 2010 02:21:56 CST, Andrew
<marlow.andrew(a)googlemail.com> wrote:

>See my sample program in this thread that uses the value
>-937566.2364699869. When GCC takes that string, converts it to a
>double, then converts the double back to a string, it gives
>-937566.2364699868. Adding an extra digit of precision gives
>-937566.23646998685. IFAICS this means it is doing the rounding
>incorrectly.

The problem is that a floating point number is a finite APPROXIMATION
of a real number. The decimal fractional value you are offering is
not representable exactly in the binary FP format. When the
in-between value is read in, it is being coerced to the nearest
representable value. There is absolutely nothing you can do about
this other than to change value representation.

You could try IEEE decimal math. There are only a few (brand new)
CPUs - none popular - which implement IEEE-754R decimal math, but
Intel has an implementation in software available at
http://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library/

However, decimal 754 math suffers from some of the same finite
representation problems as binary 754 ... there still are values which
cannot be exactly represented.

You could also try an arbitrary precision math library. Offhand I'm
not aware of a C++ library that offers floating point (like Java's
BigReals), but most integer libraries offer arbitrary precision
fractions. Arbitrary precision math takes a lot of memory though.

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Accessor Functions (getter) for C String (Character Array) Members
Next: Why is the return type of count_if() "signed" rather than "unsigned"?