min ||Ax-b|| for large sparse A [Matlab]

Prev: Use function imerode
Next: Calling To Video Display blocks into customs GUI

From: Bruno Luong on 22 Jun 2010 17:15

Sorry Matt, in optimization the poor performance of gradient method is one of the first lesson students leaned in school. This is no surprise for anyone. Try this example: after 10 iterations your algorithm is still far from being not converge, whereas a simple conjugate gradient converge in 2 iterations.

A=[100 9;
9 1];
b=[1; 1];

%%
X = [0; 0];
numIter=10
absA=abs(A);
C=absA*sum(absA,2);
for ii=1:numIter
X=X-A.'*(A*X-b)./C;
end
A*X-b

% Conjugate gradient

niter = size(b,1); % 2
x = [0;0]
r = b - A*x;
w = -r;
z = A*w;
a = (r'*w)/(w'*z);
x = x + a*w;
B = 0;

for i = 1:niter
r = r - a*z;
if( norm(r) < 1e-10 )
break;
end
B = (r'*z)/(w'*z);
w = -r + B*w;
z = A*w;
a = (r'*w)/(w'*z);
x = x + a*w;
end
A*x-b

% Bruno

From: Matt J on 22 Jun 2010 19:03

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvr95c$lhj$1(a)fred.mathworks.com>...
> Sorry Matt, in optimization the poor performance of gradient method is one of the first lesson students leaned in school. This is no surprise for anyone. Try this example: after 10 iterations your algorithm is still far from being not converge, whereas a simple conjugate gradient converge in 2 iterations.
=================

Not entirely an applicable comparison, Bruno, for the following reasons

(1) Your example favors CG mainly because the dimension N of the problem was small enough here to let you run a full N iterations, at which point theory says CG will attain the solution regardless of how well conditioned the problem is. Normally, however, you would have to stop CG after a number of iterations far fewer than N, in which case how close you come to the solution is more sensitive to the conditioning of the problem.

(2) It's not always true that gradient methods perform poorly. That depends on the choice of preconditioner. Maybe CG converged in 2 iterations, but had we used the pre-conditioner inv(A.'*A), the gradient method would have converged in only 1 iteration.

(3) As I said previously, CG does not modify well to handle constraints, whereas the proposed gradient accomodates it easily. To modify CG for constraints, you have to employ restarts, which can limit it's speed to that of gradient methods.

(4) As I said previously, and for reasons related to (2), the gradient method becomes more competitive when the Hessian is more diagonally concentrated. Competitiveness also depends on accuracy requirements. If you repeat your experiment with, say

A=[100 1; 1 15];

you will find that the gradient method will converge to the solution within a 3% error and within less CPU time than CG.

From: Bruno Luong on 22 Jun 2010 23:04

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvrffq$fvc$1(a)fred.mathworks.com>...
> "Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvr95c$lhj$1(a)fred.mathworks.com>...
> > Sorry Matt, in optimization the poor performance of gradient method is one of the first lesson students leaned in school. This is no surprise for anyone. Try this example: after 10 iterations your algorithm is still far from being not converge, whereas a simple conjugate gradient converge in 2 iterations.
> =================
>
> Not entirely an applicable comparison, Bruno, for the following reasons
>
> (1) Your example favors CG mainly because the dimension N of the problem was small enough here to let you run a full N iterations, at which point theory says CG will attain the solution regardless of how well conditioned the problem is. Normally, however, you would have to stop CG after a number of iterations far fewer than N, in which case how close you come to the solution is more sensitive to the conditioning of the problem.

I can't how your algorithm that performs poorly already in 2D and how better chance in larger dimension. Both method at gradient and conjugate at iteration #n try to minimize the cost function projected on the nth Krylov's space, the difference is one method minimize exactly, the other don't and requires many iterations to converge.

>
> (2) It's not always true that gradient methods perform poorly. That depends on the choice of preconditioner. Maybe CG converged in 2 iterations, but had we used the pre-conditioner inv(A.'*A), the gradient method would have converged in only 1 iteration.

Common: Preconditioning is finding an approximation of inv(A.'*A) and *fast* to apply. If you apply inv(A.'*A) it is not doable (you already did do the work btw).

>
> (3) As I said previously, CG does not modify well to handle constraints, whereas the proposed gradient accomodates it easily. To modify CG for constraints, you have to employ restarts, which can limit it's speed to that of gradient methods.
>

Well you should take a look at pcg with projection. The gradient method is simply poor, and this is no secret, all the textbook on optimization I read will explain why no one use gradient method.

>
> (4) As I said previously, and for reasons related to (2), the gradient method becomes more competitive when the Hessian is more diagonally concentrated. Competitiveness also depends on accuracy requirements. If you repeat your experiment with, say
>
> A=[100 1; 1 15];
>
> you will find that the gradient method will converge to the solution within a 3% error and within less CPU time than CG.

Why not choose A = eye(2) and show your method can converge in 1 iteration? Oh you ready did it with the inv(A'*A) preconditioning thing.

Bruno

From: Matt J on 23 Jun 2010 08:24

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvrtj5$e7m$1(a)fred.mathworks.com>...

> I can't how your algorithm that performs poorly already in 2D and how better chance in larger dimension. Both method at gradient and conjugate at iteration #n try to minimize the cost function projected on the nth Krylov's space, the difference is one method minimize exactly, the other don't and requires many iterations to converge.
=================

I didn't say the gradient algorithm would perform better. I said CG will perform worse, if it is not allowed to perform N full iterations, which is a realistic assumption, since in larger dimensional problems it is often not practical to iterate N times. That will put the two algorithms on a more level playing field.

As a test of this, you might rerun your code with the following higher dimensional data, which is also a lot less diagonally concentrated than my last example,

A=diag(1:7)+rand(7); A=A+A.';
N=size(A,1);
b=ones(N,1);
numIter=5;
nIter=numIter; %for both the gradient algorithm and CG

You will see that the two algorithms perform quite similarly, and sometimes (depending on the output of rand(7)), the gradient algorithm performs better than CG.

Here's what a will agree with, however. I would agree that for unconstrained, quadratic minimization, CG will probably always outperform gradient methods WHEN BOTH USE THE SAME PRECONDITIONER. For some reason, though, you have chosen here to race conditioned gradiented against unconditioned CG and claim that CG will always be better.

> > (2) It's not always true that gradient methods perform poorly. That depends on the choice of preconditioner. Maybe CG converged in 2 iterations, but had we used the pre-conditioner inv(A.'*A), the gradient method would have converged in only 1 iteration.
>
> Common: Preconditioning is finding an approximation of inv(A.'*A) and *fast* to apply. If you apply inv(A.'*A) it is not doable (you already did do the work btw).
===============

True, but my only point was that a well-conditioned gradient algorithm can outperform a poorly conditioned conjugate gradient algorithm.

> > (3) As I said previously, CG does not modify well to handle constraints, whereas the proposed gradient accomodates it easily. To modify CG for constraints, you have to employ restarts, which can limit it's speed to that of gradient methods.
> >
>
> Well you should take a look at pcg with projection. The gradient method is simply poor, and this is no secret, all the textbook on optimization I read will explain why no one use gradient method.
=====================

I don't see how pcg with projection can avoid the need to restart. The projection operation will kill the conjugacy of the sequence if the algorithm hits a constraint boundary. Once you restart, pcg becomes a lot like a normal gradient algorithm again.

In any case, I've never tried pcg with projection myself, but I know people who have done so and who report that PCG gives no advantage over the gradient algorithms used for that application. This is possibly for the above reasons.

Also, for large nonquadratic minimization, the line searches required by CG are often impractical. This has led a lot of people to opt for gradient over CG as well.

> Why not choose A = eye(2) and show your method can converge in 1 iteration? Oh you ready did it with the inv(A'*A) preconditioning thing.
============

My example above for N=7 is a lot less diagonally concentrated. But why was this example so bad? Surely you wouldn't disagree that the Hessian often is diagonally concentrated. In any case, I already conceded that if PCG is given the advantage of the same preconditioner whether it be C or
inv(A'*A), it will likely do better.

From: Bruno Luong on 23 Jun 2010 09:15

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvsud5$hdt$1(a)fred.mathworks.com>...
> "Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvrtj5$e7m$1(a)fred.mathworks.com>...
>
> > I can't how your algorithm that performs poorly already in 2D and how better chance in larger dimension. Both method at gradient and conjugate at iteration #n try to minimize the cost function projected on the nth Krylov's space, the difference is one method minimize exactly, the other don't and requires many iterations to converge.
> =================
>
> I didn't say the gradient algorithm would perform better. I said CG will perform worse, if it is not allowed to perform N full iterations, which is a realistic assumption, since in larger dimensional problems it is often not practical to iterate N times. That will put the two algorithms on a more level playing field.
>
> As a test of this, you might rerun your code with the following higher dimensional data, which is also a lot less diagonally concentrated than my last example,
>
> A=diag(1:7)+rand(7); A=A+A.';
> N=size(A,1);
> b=ones(N,1);
> numIter=5;
> nIter=numIter; %for both the gradient algorithm and CG
>
> You will see that the two algorithms perform quite similarly, and sometimes (depending on the output of rand(7)), the gradient algorithm performs better than CG.
>
> Here's what a will agree with, however. I would agree that for unconstrained, quadratic minimization, CG will probably always outperform gradient methods WHEN BOTH USE THE SAME PRECONDITIONER. For some reason, though, you have chosen here to race conditioned gradiented against unconditioned CG and claim that CG will always be better.

No what I said is the gradient method fails even with a the *very* simple case of (2 x 2) of condition number about 500 (not large by any standard) and unconstrained. After 10 iterations it provides a current solution of

% For A=[100 9; 9 1]; b=[1; 1]

X =

0.0098
0.0115

instead of

>> A\b

ans =

-0.4211
4.7895

So this method is good to go to the trash can (to me). I certainly won't use such method if it already hits a wall at this level.

>
> True, but my only point was that a well-conditioned gradient algorithm can outperform a poorly conditioned conjugate gradient algorithm.

I'm not argue about the benefit of preconditioning, What I'm telling the gradient method is bad by itself. And if you think the right diagonal-preconditioner can do some miracle compensation, the (2 x 2) example showed above proves it won't.

Of course you an always find the case where it works more or less, such as identity matrix and diagonal preconditioning on a diagonal dominant matrix, thus make it looks like an identity matrix at the end. It's quite narrow IMO.

Bruno

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Use function imerode
Next: Calling To Video Display blocks into customs GUI