From: Matt J on 23 Jun 2010 23:18 "Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvtcf4$epi$1(a)fred.mathworks.com>... > "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvt6p8$hj$1(a)fred.mathworks.com>... > > > But what would your alternatives be? I showed you a 7-dimensional, significantly non-diagonal example where CG does just as badly or worse, and with a condition number of only 10 or so.... > > Not what I observe, and consistently CG beats the gradient method that perform poorly: ==================== OK. I had a typo in my code. But regardless, try the code below, which also plots the progress of each algorithm versus CPU time. I consistently find the gradient method to have a significantly faster initial convergence rate, for various choices of problem size N. So at the very least, it seems like using the gradient alg initially and switching to CG later would be a worthwhile hybrid. N=10; q=1./(1:N).^2; W=toeplitz(q,q); V=diag(sqrt(1:N)); A= V*W*V; b=A*(N:-1:1).'; niter=30; numIter=niter; xInit=ones(N,1); %% X = xInit; absA=abs(A); C=absA.'*sum(absA,2); AtA=A.'*A; Atb=A'*b; Z=eye(N)-diag(1./C)*AtA; d=Atb./C; tic; for ii=1:numIter %X=X-(AtA*X-Atb)./C; X=Z*X+d; resMM(ii)=norm(A*X-b); end tMM=toc, % Conjugate gradient x = xInit; r = b - A*x; w = -r; z = A*w; a = (r'*w)/(w'*z); x = x + a*w; B = 0; tic, for i = 1:niter r = r - a*z; B = (r'*z)/(w'*z); w = -r + B*w; z = A*w; a = (r'*w)/(w'*z); x = x + a*w; resCG(i)=norm(A*x-b); end tCG=toc, plot((1:numIter)/numIter*tMM,resMM, 'o-', (1:niter)/niter*tCG, resCG,'*--'); legend('Gradient','CG') xlabel 'CPU Time' ylabel 'Cost' figure(gcf)
From: Bruno Luong on 24 Jun 2010 02:33 "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvuipd$qre$1(a)fred.mathworks.com>... >I consistently find the gradient method to have a significantly faster initial convergence rate, for various choices of problem size N. So at the very least, it seems like using the gradient alg initially and switching to CG later would be a worthwhile hybrid. Sight... Drawing an hative conclusion can be dangerous. It has been proved that for the same matrix, rhs and starting at the same point: 1. CG = GM at the first iteration 2. CG must be superior to GM from the 2nd iteration 3. The conjugacy built by CG are global over all iterations even though only the last one is explicitly aimed So making hybrid method will destroy the global conjugacy and degrade the performance of CG. What you observe is the effect of preconditioning and not the superiority of GM, it cannot be possible mathematically that GM provides faster cost decreasing, assuming everything else equal, that's the fact. Now we can use the same right diagonal preconditioning you propose in GM to CG, and CG will be superior to GM again, and might be there is even better preconditioner out there such as incomplete SOR, or Cholesky incomplete factorization, etc... depending on type of problems. Bruno
From: Matt J on 24 Jun 2010 10:34 "Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hvuu7l$2ms$1(a)fred.mathworks.com>... > "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvuipd$qre$1(a)fred.mathworks.com>... > >I consistently find the gradient method to have a significantly faster initial convergence rate, for various choices of problem size N. So at the very least, it seems like using the gradient alg initially and switching to CG later would be a worthwhile hybrid. > > Sight... Drawing an hative conclusion can be dangerous. > > It has been proved that for the same matrix, rhs and starting at the same point: > 1. CG = GM at the first iteration > 2. CG must be superior to GM from the 2nd iteration > 3. The conjugacy built by CG are global over all iterations even though only the last one is explicitly aimed > > So making hybrid method will destroy the global conjugacy and degrade the performance of CG. > > What you observe is the effect of preconditioning and not the superiority of GM, it cannot be possible mathematically that GM provides faster cost decreasing, assuming everything else equal, that's the fact. =============== Assuming everything else to be equal is not practical, however. The CPU time of the two methods, in particular, would never be equal. Even though CG may have a faster rate of convergence, the lower computational cost per iteration of the gradient method, helps it stay ahead of CG in the race initially. It is that, and not just the preconditioning, which accounts for the faster initial descent of the gradient method in these tests. This becomes even more of a factor when you move away from unconstrained, quadratic problems. In nonquadratic problems, the line searches required by CG can't be done analytically anymore and become far more CPU time consuming. Also, the faster descent of CG is only asymptotic, and doesn't necessarily occur at the initial iterations. (That, incidentally, is when the algorithm hybrid I was talking about becomes a logical thing to do.) Furthermore, even the asymptotic superiority of CG is only guaranteed by the theory if the solution is an interior point.
From: Jacob on 1 Jul 2010 08:05 Thanks for the proof, but do you have any citable material? Papers/textbooks/etc? "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvto48$8k$1(a)fred.mathworks.com>... > "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hvtaj4$c1n$1(a)fred.mathworks.com>... > > "Jacob " <mithunjacob_oohay(a)yahoo.com> wrote in message <hvt7ds$ctr$1(a)fred.mathworks.com>... > > > Your discussion regarding the matter seems great, and I must admit your simple method works very well for my matrix (which is highly diagonalized). > > > > > > But I need more data about this method --- could you point me to some proofs regarding C? > > =============== > > > > I mentioned this paper earlier in the thread. Did you already check it? > > Sorry, I checked that paper, and it's not precisely what you need. I'll give you my own quick derivation. The objective function is > > f(x)=||A*x-b||^2/2; > > but for a fixed y it can also be written equivalently as follows > > f(x)=(x-y).'*(A.'*A)*(x-y)/2 + A.'*(A*y-b)*(x-y)+ ||A*y-b||^2/2; > > Now consider the alternative function > > Q(x)=(x-y).'*diag(C)*(x-y)/2 + A.'*(A*y-b)*(x-y)+||A*y-b||^2/2; > > > Subtracting the two functions we obtain > > Q(x) - f(x) = (x-y).' * (diag(C) - A.'*A) * (x-y)/2 > > I now claim that diag(C) - A.'*A is non-negative definite. It is a simple exercise to prove this using the definition of C and the Gerschgorin circle theorem > > http://en.wikipedia.org/wiki/Gerschgorin_disk_theorem > > From the non-negative definiteness of diag(C) - A.'*A, the last equation tells us that > > Q(x) >= f(x) > > for all x with equality at x=y. This means that > > f(y) - f(x) >= Q(y) - Q(x) > > for all x. This is sometimes called the "optimization transfer principle". The above implies that if Q() is minimized at z then > > f(y)-f(z) >= Q(y) - Q(z) >=0 > > and hence > > f(z)<=f(y) > > > Thus, minimizing Q produces a point z that reduces the objective function from the starting point y. The formula for z is easily obtained by setting the gradient of Q to zero and gives > > z=y-A.'*(A*y-b)./C > > which is precisely the recursion formula for the proposed algorithm. > > This proves that the algorithm is monotone for this choice of preconditioner C. Proof of convergence can be established using the same theory as for gradient algorithms.
From: Matt J on 2 Jul 2010 10:53 "Jacob " <foobar(a)yahoo.com> wrote in message <i0i09k$99m$1(a)fred.mathworks.com>... > Thanks for the proof, but do you have any citable material? Papers/textbooks/etc? ========== See Example 7 in the following article. It's the same algorithm, but gives a different derivation than mine @ARTICLE{ Lan00, author = {K. Lange and D. R. Hunter and I. Yang}, title = {{Optimization} transfer using surrogate objective functions}, journal = {J. Computational and Graphical Stat.}, volume = 9, number = 1, pages = {1--20}, month = mar, year = 2000 }
First
|
Prev
|
Pages: 1 2 3 4 Prev: Use function imerode Next: Calling To Video Display blocks into customs GUI |