min ||Ax-b|| for large sparse A [Matlab]

Prev: Use function imerode
Next: Calling To Video Display blocks into customs GUI

From: Matt J on 23 Jun 2010 10:47

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvt1d8$3dc$1(a)fred.mathworks.com>...

>
> No what I said is the gradient method fails even with a the *very* simple case of (2 x 2) of condition number about 500 (not large by any standard) and unconstrained.
>[snip]
> So this method is good to go to the trash can (to me). I certainly won't use such method if it already hits a wall at this level.
==============

But what would your alternatives be? I showed you a 7-dimensional, significantly non-diagonal example where CG does just as badly or worse, and with a condition number of only 10 or so....

If other algorithms hit even harder walls in higher dimensions and other more realistic situations, what are you left with?

> Of course you an always find the case where it works more or less, such as identity matrix and diagonal preconditioning on a diagonal dominant matrix, thus make it looks like an identity matrix at the end. It's quite narrow IMO.
================

It's not. There are quite a few applications where the Hessian is, even if not perfectly diagonal or even diagonally dominant, as concentrated around the diagonal as my N=7 example.

From: Jacob on 23 Jun 2010 10:58

Your discussion regarding the matter seems great, and I must admit your simple method works very well for my matrix (which is highly diagonalized).

But I need more data about this method --- could you point me to some proofs regarding C?

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvr6u0$m83$1(a)fred.mathworks.com>...
> "Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvr2sl$odb$1(a)fred.mathworks.com>...
> > "Jacob " <mithunjacob_oohay(a)yahoo.com> wrote in message <hvqrvj$irq$1(a)fred.mathworks.com>...
> > > Could you provide some information about this method (name, paper, ...)? Also, if A is size m x n, then sum(A,2) will be of size, m x 1 so, I don't understand that step.
> ==============
>
> Sorry, I meant to write
>
> C=absA.' * sum(absA,2); %note the transpose
>
> So, yes sum(A,2) will be of size m x 1, but absA.' will be compatibly sized for this multiplication and C will be nx1
>
>
>
> > Matt's suggestion is no thing more than the simple (as opposed to conjugate) gradient method with empirical preconditioning. My professors told me never use it beside academic purpose. In any case it certainly can't compete with LSQR.
> ==================
>
> No, the preconditioning is not empirical. This particular choice of preconditioner C will ensure monotone descent of the algorithm without the need for line searches. You can find some information about it in the paper referenced down at the bottom.
>
> How fast it will descend compared to other algorithms depends a lot on the structure of A. One expects it to be faster for more diagonally concentrated A. The MATLAB documentation doesn't say what algorithm LSQR uses, but judging from a peek inside lsqr.m, the computation per iteration for LSQR seems much greater. So, I don't think it can be a priori obvious which algorithm will perform better.
>
> It's also much easier to add box constraints to this algorithm than others. For example, positivity constraints can be added with the following simple modification. LSQR cannot, apparently, accomodate any constraints.
>
>
> for ii=1:numIter
>
> X=X-A.'*(A*X-b)./C;
>
> X(X<0)=0;
>
> end
>
>
> A relevant paper is:
>
> @ARTICLE{
> DePa,
> author = {A R {De Pierro}},
> title = {{On} the relation between the {ISRA} and the {EM} algorithm for positron emission tomography},
> journal = {IEEE Tr. Med. Im.},
> volume = 12,
> number = 2,
> pages = {328--33},
> month = jun,
> year = 1993
> }

From: Matt J on 23 Jun 2010 11:52

"Jacob " <mithunjacob_oohay(a)yahoo.com> wrote in message <hvt7ds$ctr$1(a)fred.mathworks.com>...
> Your discussion regarding the matter seems great, and I must admit your simple method works very well for my matrix (which is highly diagonalized).
>
> But I need more data about this method --- could you point me to some proofs regarding C?
===============

I mentioned this paper earlier in the thread. Did you already check it?

@ARTICLE{
DePa,
author = {A R {De Pierro}},
title = {{On} the relation between the {ISRA} and the {EM} algorithm for positron emission tomography},
journal = {IEEE Tr. Med. Im.},
volume = 12,
number = 2,
pages = {328--33},
month = jun,
year = 1993
}

From: Bruno Luong on 23 Jun 2010 12:24

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvt6p8$hj$1(a)fred.mathworks.com>...

> But what would your alternatives be? I showed you a 7-dimensional, significantly non-diagonal example where CG does just as badly or worse, and with a condition number of only 10 or so....

Not what I observe, and consistently CG beats the gradient method that perform poorly:

>> t
Exact solution X:
0.2008
0.1093
0.0750
0.0633
0.0458
0.0199
0.0271

Gradient solution X:
0.0978
0.0943
0.0861
0.0699
0.0522
0.0358
0.0373

Residual 0.347107:

CG solution X:
Residual x:
0.2008
0.1094
0.0750
0.0633
0.0458
0.0199
0.0271

Residual 0.000587:
>> t
Exact solution X:
0.3615
0.1135
0.0428
0.0410
0.0259
0.0286
0.0230

Gradient solution X:
0.0950
0.0838
0.0809
0.0648
0.0516
0.0505
0.0458

Residual 0.474012:

CG solution X:
Residual x:
0.3613
0.1144
0.0418
0.0416
0.0256
0.0287
0.0230

Residual 0.009334:
>> t
Exact solution X:
0.2051
0.1161
0.0950
0.0448
0.0316
0.0400
0.0279

Gradient solution X:
0.1020
0.0989
0.0903
0.0628
0.0515
0.0458
0.0382

Residual 0.352715:

CG solution X:
Residual x:
0.2050
0.1163
0.0948
0.0449
0.0316
0.0400
0.0279

Residual 0.001528:
>> t
Exact solution X:
0.2362
0.1138
0.0953
0.0552
0.0414
0.0255
0.0210

Gradient solution X:
0.0884
0.1035
0.0961
0.0735
0.0583
0.0464
0.0406

Residual 0.360280:

CG solution X:
Residual x:
0.2362
0.1138
0.0951
0.0556
0.0411
0.0256
0.0210

Residual 0.005527:
>>

% Here is my code

A=diag(1:7)+rand(7); A=A+A.';
N=size(A,1);
b=ones(N,1);

%%
fprintf('Exact solution X:\n')
disp(A\b);

%%
X = zeros(N,1);
numIter=5;
absA=abs(A);
C=absA*sum(absA,2);
for ii=1:numIter
X=X-A.'*(A*X-b)./C;
end
fprintf('\nGradient solution X:\n')
disp(X);
fprintf('Residual %f:\n', norm(A*X-b))

% Conjugate gradient

niter = numIter;
x = zeros(N,1);
r = b - A*x;
w = -r;
z = A*w;
a = (r'*w)/(w'*z);
x = x + a*w;
B = 0;

for i = 1:niter
r = r - a*z;
if( norm(r) < 1e-10 )
break;
end
B = (r'*z)/(w'*z);
w = -r + B*w;
z = A*w;
a = (r'*w)/(w'*z);
x = x + a*w;
end
fprintf('\nCG solution X:\n')

fprintf('Residual x:\n')
disp(x);
fprintf('Residual %f:\n', norm(A*x-b))

> It's not. There are quite a few applications where the Hessian is, even if not perfectly diagonal or even diagonally dominant, as concentrated around the diagonal as my N=7 example.

If you want to convince your algorithm works you need a much more solid argument than showing one or two examples. But so far I see zero example where your algorithm really works.

Bruno

From: Matt J on 23 Jun 2010 15:43

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvtaj4$c1n$1(a)fred.mathworks.com>...
> "Jacob " <mithunjacob_oohay(a)yahoo.com> wrote in message <hvt7ds$ctr$1(a)fred.mathworks.com>...
> > Your discussion regarding the matter seems great, and I must admit your simple method works very well for my matrix (which is highly diagonalized).
> >
> > But I need more data about this method --- could you point me to some proofs regarding C?
> ===============
>
> I mentioned this paper earlier in the thread. Did you already check it?

Sorry, I checked that paper, and it's not precisely what you need. I'll give you my own quick derivation. The objective function is

f(x)=||A*x-b||^2/2;

but for a fixed y it can also be written equivalently as follows

f(x)=(x-y).'*(A.'*A)*(x-y)/2 + A.'*(A*y-b)*(x-y)+ ||A*y-b||^2/2;

Now consider the alternative function

Q(x)=(x-y).'*diag(C)*(x-y)/2 + A.'*(A*y-b)*(x-y)+||A*y-b||^2/2;

Subtracting the two functions we obtain

Q(x) - f(x) = (x-y).' * (diag(C) - A.'*A) * (x-y)/2

I now claim that diag(C) - A.'*A is non-negative definite. It is a simple exercise to prove this using the definition of C and the Gerschgorin circle theorem

http://en.wikipedia.org/wiki/Gerschgorin_disk_theorem

From the non-negative definiteness of diag(C) - A.'*A, the last equation tells us that

Q(x) >= f(x)

for all x with equality at x=y. This means that

f(y) - f(x) >= Q(y) - Q(x)

for all x. This is sometimes called the "optimization transfer principle". The above implies that if Q() is minimized at z then

f(y)-f(z) >= Q(y) - Q(z) >=0

and hence

f(z)<=f(y)

Thus, minimizing Q produces a point z that reduces the objective function from the starting point y. The formula for z is easily obtained by setting the gradient of Q to zero and gives

z=y-A.'*(A*y-b)./C

which is precisely the recursion formula for the proposed algorithm.

This proves that the algorithm is monotone for this choice of preconditioner C. Proof of convergence can be established using the same theory as for gradient algorithms.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Use function imerode
Next: Calling To Video Display blocks into customs GUI