From: fabio freschi on
Hi everybody,
I discovered a curious behavior of lu from the computational time viewpoint.
I have a complex system to solve (I can provide it) and I tried the two possibilities: 'backslash' and 'lu'. I am actually interested in LU, since I need factors for further iterative computations.
The problem is that the timings of the two approaches are quite different:

>> tic; x = A\b; toc
Elapsed time is 22.144157 seconds.
>> tic; [L,U,p,q,R] = lu(A,'vector'); toc
Elapsed time is 34.637534 seconds.

I tried to investigate the algorithm settings, by using

spparms('spumoni',2)

The output is quite the same for both functions, also in terms of computational time:

*** A\b ***
....
symbolic + numeric CPU time (sec): 41.09
symbolic + numeric mflops (CPU time): 5336.13
symbolic + numeric wall clock time (sec): 21.62
symbolic + numeric mflops (wall clock): 10141.62
....

*** lu(A) ***
....
symbolic + numeric CPU time (sec): 40.88
symbolic + numeric mflops (CPU time): 5363.55
symbolic + numeric wall clock time (sec): 21.89
symbolic + numeric mflops (wall clock): 10016.53
....

my conclusion is that a lot of time is wasted in assembling the 5-outputs of LU. But this time is extremely long compared to the total time (13 additional seconds vs 22 for the solution).
Is my conclusion correct?

After that I tried to limit the number of outputs, by providing an external AMD preordering (as suggested by the verbose output)

tic
p = amd(A);
[L,U] = lu(A(p,p))
toc

but the last statement lasts forever...

Any suggestion to obtain an LU factorization as fast as the built-in backslash?
NB: memory is not an issue here.

Thanks in advance
Fabio
From: us on
"fabio freschi" <fabio.freschi(a)remove.gmail.com> wrote in message <i219hg$3ep$1(a)fred.mathworks.com>...
> Hi everybody,
> I discovered a curious behavior of lu from the computational time viewpoint.
> I have a complex system to solve (I can provide it) and I tried the two possibilities: 'backslash' and 'lu'. I am actually interested in LU, since I need factors for further iterative computations.
> The problem is that the timings of the two approaches are quite different:
>
> >> tic; x = A\b; toc
> Elapsed time is 22.144157 seconds.
> >> tic; [L,U,p,q,R] = lu(A,'vector'); toc
> Elapsed time is 34.637534 seconds.
>
> I tried to investigate the algorithm settings, by using
>
> spparms('spumoni',2)
>
> The output is quite the same for both functions, also in terms of computational time:
>
> *** A\b ***
> ...
> symbolic + numeric CPU time (sec): 41.09
> symbolic + numeric mflops (CPU time): 5336.13
> symbolic + numeric wall clock time (sec): 21.62
> symbolic + numeric mflops (wall clock): 10141.62
> ...
>
> *** lu(A) ***
> ...
> symbolic + numeric CPU time (sec): 40.88
> symbolic + numeric mflops (CPU time): 5363.55
> symbolic + numeric wall clock time (sec): 21.89
> symbolic + numeric mflops (wall clock): 10016.53
> ...
>
> my conclusion is that a lot of time is wasted in assembling the 5-outputs of LU. But this time is extremely long compared to the total time (13 additional seconds vs 22 for the solution).
> Is my conclusion correct?
>
> After that I tried to limit the number of outputs, by providing an external AMD preordering (as suggested by the verbose output)
>
> tic
> p = amd(A);
> [L,U] = lu(A(p,p))
> toc
>
> but the last statement lasts forever...
>
> Any suggestion to obtain an LU factorization as fast as the built-in backslash?
> NB: memory is not an issue here.
>
> Thanks in advance
> Fabio

which ML ver do you have(?)...
can you show an example of your input(?)...

us
From: fabio freschi on
Ops, I forgot these important information

1) Matlab R2009b (maci64, but the same happens under linux)
2) the system is sparse complex symmetric (not hermitian). I am not sure if this is enough to answer to your second question

Thanks us for the quick reply
Fabio
From: us on
"fabio freschi" <fabio.freschi(a)remove.gmail.com> wrote in message <i21alf$e6i$1(a)fred.mathworks.com>...
> Ops, I forgot these important information
>
> 1) Matlab R2009b (maci64, but the same happens under linux)
> 2) the system is sparse complex symmetric (not hermitian). I am not sure if this is enough to answer to your second question
>
> Thanks us for the quick reply
> Fabio

2) can you show simple ML code to simulate one of your typical A/B data sets...

us
From: fabio freschi on
Unfortunately it is not easy to reproduce the matrix. The problem comes from a finite element analysis of an electromagnetic structure. I can only provide the matrices for different meshes, but the complete code to assemble the stiffness matrix comes from many years of research and implementation. It is not easy to extract a way

I can guarantee that the matrix is relatively "good" from the conditioning point of view, since it is explicitly gauged. A 51000x51000 matrix has an estimated condition number (via condest) of 1.7971e+09, not too bad for this kind of simulations.
Fabio