Porting Mathlab code to C++ - accuracy problem [Matlab]

Prev: exemple for mlp code
Next: Learning Fortran

From: Rune Allnor on 20 May 2010 16:25

On 20 Mai, 21:22, "Jan Simon" <matlab.THIS_Y...(a)nMINUSsimon.de> wrote:
> Dear Rune!
>
> > > > During this work, i have rewrite several mathlab function and check my implementation vs Mathlab output using a precision of 1e-6.
> > > > this criteria work well until the end of the program, because result of computation which are 'double' are dump into a file as 'int32'.
>
> > > After Rune's charming advice to study the problem for some years,
>
> > Only a fool would ridicule that advice.
>
> The OP "thinks he can control the mathlab accuracy to fit with C++/compiler/ia32 constraint".

The OP has no clue what he is talking about.

> He will need a more practical advice than starting to study numerics for years:

Like it or not, but that's what he needs to do if he wants to
replicate matlab's numerical results with some accuracy.

> You cannot control the accuracy of Matlab. If two (correct) implementations of an algorithm reply different values, this is caused by the limited precision of floating point calculations.

Sure. And those of us who actually have studied numerics for some
non-trivial amount of time would know that there are a multitude
of tricks, methods and strategies one would need to be accutely
aware of to get within those couple of orders of magnitude of
the correct result.

Matlab is based on LINPACK. I used to read the companion literature
to that library. You get the big picture quite quickly by reading
a little bit about finite-precision binary formats and one book on
linear algebra (the Golub - van Loan book). From there on, the
remaining >99% of the literature treated the question about squeezing
another couple of significant digits or bits out of the computations.

Rune

From: Alex Buisson on 24 May 2010 02:53

Thanks to your answers,

I just want to say that if my experience with mathlab is very small, i have a huge (i hope) knowledge in programming (more than 10 years) in video coding, and that i'm n'ot discoreing the floatting point problems.

I agree that using 1e-6 as a precision threshold to perform my validation is'nt enough.

I have made some change to check my C/C++ accuracy again 2*DLB_EPSILON (from the standard math lib : 2.2204460492503131e-016 /* smallest such that 1.0+DBL_EPSILON != 1.0 */).
And in the same time i take a look to mathlab EPS and how to change my simple round instruction to a custom rounding at a given number of digits.

Thanks guys!

From: Rune Allnor on 24 May 2010 06:00

On 24 Mai, 08:53, "Alex Buisson" <alex.buis...(a)gmail.com> wrote:
> Thanks to your answers,
>
> I just want to say that if my experience with mathlab is very small, i have a huge (i hope) knowledge in programming (more than 10 years) in video coding, and that i'm n'ot discoreing the floatting point problems.

I am not sure you are fully aware what you are heading into.
Try the script below.

It generates a vector of random data, and sums the numbers
in different orders. From an analytic standpoint the sums
should be equal, since in exact arithmetics addition commutes

a + b = b + a

but with finite precision arithmetics that no longer is true:

a + b =/= b + a.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
N = 100000;
x0 = randn(N,1);

s0 = 0;
for n=1:N
s0 = s0+x0(n);
end

x1 = sort(x0);
s1 = 0;
for n=1:N
s1 = s1+x1(n);
end

x2 = flipud(x1);
s2 = 0;
for n=1:N
s2 = s2 + x2(n);
end

e01 = abs(s0 - s1);
e02 = abs(s0 - s2);
e12 = abs(s1 - s2);

fprintf('s0 = %g \n',s0);
fprintf('s1 = %g \n',s1);
fprintf('s2 = %g \n',s2);
fprintf('e01 = %g \n',e01);
fprintf('e02 = %g \n',e02);
fprintf('e12 = %g \n',e12);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Output on my screen:

s0 = 185.971
s1 = 185.971
s2 = 185.971
e01 = 2.45421e-010
e02 = 2.57046e-010
e12 = 1.16245e-011

So even an as simple and trivial operation as summing
a sequence of numbers introduces errors, depending on
the order in which the numbers are summed. The effect
is significant with a surprisingly low number of terms;
just try to run the same script a few times with N = 5,
N = 10, or N =15.

The difference between a good and a poor implementation
of the same numerical algorithm lies in the managment
of these kinds of errors, and the control of the resulting
damage.

It can be proved that the sum s1 above, sorting the numbers
and summing them in ascending order, is the optimum strategy
to minimuze the error: The sum is, at any one time, dominated
by the largest error term yet added. Make sure to always
add the smallest term to minimize the effect on the sum.

However, this optimum strategy is obtained at a sigificant
runtime cost.

There are stuff one can do to prevent these kinds of errors
to creeping in (like sorting the numbers in the example above).
The trick is to know *both* what to do *and* when to do it:

- How badly do any errors introduced at *this* stage mess
up the end result? Maybe the damage is small enough to
be deemed acceptable, and can go unhandled. But the rule
of thumb is that one does what little one can, at any one
stage.
- What can be done about it? The answer to this one might
very well be 'nothing.'

Again, this is stuff that takes years to learn and master.
Do not underestimate the task ahead of you.

Rune

From: Jan Simon on 24 May 2010 10:51

Dear Rune!

> N = 100000;
> x0 = randn(N,1);
>
> s0 = 0;
> for n=1:N
> s0 = s0+x0(n);
> end
>
> x1 = sort(x0);
> s1 = 0;
> for n=1:N
> s1 = s1+x1(n);
> end

> It can be proved that the sum s1 above, sorting the numbers
> and summing them in ascending order, is the optimum strategy
> to minimuze the error: The sum is, at any one time, dominated
> by the largest error term yet added. Make sure to always
> add the smallest term to minimize the effect on the sum.

I disagree. Sorting normally distributed values decreases the accuracy of a sum: The first hals of the sum accumulates the nagative values. The temporary result is a large negative number. Then some small positive numbers are added - without influencing the sum due to the round off!

Sorting normally distributed values according to their absolute value is more accurate:
format long g
x0 = randn(1e5,1);
s0 = sum(x0)
s1 = sum(sort(x0))
[dum, index] = sort(abs(x0));
s2 = sum(x0(index))
==> Compare with sum calculated with quadruple precision:
http://www.mathworks.com/matlabcentral/fileexchange/26800
s3 = XSum(x0)
See e.g.:
sum([1e20, 1, -1e20])

Standard sorting can help, if all values are positive.

@OP:
If you talk about such a omplicated field as floating point properties, the terminology gets really important. So:
- "Matlab" is not called "mathlab".
- I still cannot see, if your "1e-6" means an absolute or relative accuracy.
- The "precision" is equivalent to the internally used number of bits to represent the floating point variables - 64 for DOUBLE, 32 for SINGLE. In addition the processor can store intermediate values with 80 bits also, see:
feature('setprecision', 64) % 80 bit
feature('setprecision', 53) % 64 bit
Therefore using the term "1e-6 precision" is meaningless. But it is obvious, that you mean the "accuracy".

Kind regards, Jan

From: Rune Allnor on 24 May 2010 10:57

On 24 Mai, 16:51, "Jan Simon" <matlab.THIS_Y...(a)nMINUSsimon.de> wrote:
> Dear Rune!
>
>
>
>
>
> > N = 100000;
> > x0 = randn(N,1);
>
> > s0 = 0;
> > for n=1:N
> > s0 = s0+x0(n);
> > end
>
> > x1 = sort(x0);
> > s1 = 0;
> > for n=1:N
> > s1 = s1+x1(n);
> > end
> > It can be proved that the sum s1 above, sorting the numbers
> > and summing them in ascending order, is the optimum strategy
> > to minimuze the error: The sum is, at any one time, dominated
> > by the largest error term yet added. Make sure to always
> > add the smallest term to minimize the effect on the sum.
>
> I disagree. Sorting normally distributed values decreases the accuracy of a sum: The first hals of the sum accumulates the nagative values. The temporary result is a large negative number. Then some small positive numbers are added - without influencing the sum due to the round off!
>
> Sorting normally distributed values according to their absolute value is more accurate:

Agreed.

Rune

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: exemple for mlp code
Next: Learning Fortran