From: Janne Blomqvist on
Joost wrote:
> Also here, CGESV will be part of MKL (typically used on intel chips)
> and ACML (used on AMD), and should be relatively fast also with netlib
> lapack if combined with atlas or GOTO blas (available for both brands).

Going off on a tangent wrt your "relatively fast" comment, a while ago
I stumbled upon a paper where it was mentioned that in solving Ax=B
for large systems, >99.9 % of the flops is actually spent in
*GEMM. That would suggest that having a really good *GEMM
implementation is all that matters, the speed of *GESV isn't really
significant.

Of course, for a parallel solver things change, and the paper was
actually about SGI:s implementation of linpack for their big shared
memory computers. You can see it here:

http://amrit.ittc.ku.edu/tclark/europe2005/isc2005/www.supercomp.de/papers/panzi.pdf

and the slides here:

http://www.isc2005.org/download/cp/Panziera_Baron.pdf


--
Janne Blomqvist
From: Joost on
Hi Janne,

I think that's right (even though they cite N>100000 for 99.9%).
I decided to give it a try on an opteron using 3 combinations

1) g95 compiled LAPACK + Goto BLAS
2) ifort + mkl (721/emt64)
3) pgf90 + acml (the version that seems to come with pgi 6.0-5)

(notice that the point is that acml/mkl might be optimised, whereas g95
uses netlib blas. Of course, precise timings might change depending on
the version of the library used and so on)

For N=351
goto/netlib mkl acml
CGEMM 0.042 0.046 0.045
CGESV 0.069 0.079 0.100

For N=1351
goto/netlib mkl acml
CGEMM 2.21 2.48 2.29
CGESV 3.27 3.66 3.77

For N=3351
goto/netlib mkl acml
CGEMM 32.8 37.2 34.3
CGESV 46.6 52.4 51.2

So, indeed, a fast blas is enough in this case.

Joost

From: Joost on
> (notice that the point is that acml/mkl might be optimised, whereas g95
> uses netlib blas)
--------------------^
read: lapack

Joost

From: Victor Eijkhout on
Gordon Sande <g.sande(a)worldnet.att.net> wrote:

> The fastest way to invert matrices is to not invert matrices.

One of the reasons for that being that inversion is not numerically
stable, whereas solution with a factored matrix is.

Victor.
--
Victor Eijkhout -- eijkhout at tacc utexas edu
ph: 512 471 5809