From: Janne Blomqvist on 12 Jan 2006 11:37 Joost wrote: > Also here, CGESV will be part of MKL (typically used on intel chips) > and ACML (used on AMD), and should be relatively fast also with netlib > lapack if combined with atlas or GOTO blas (available for both brands). Going off on a tangent wrt your "relatively fast" comment, a while ago I stumbled upon a paper where it was mentioned that in solving Ax=B for large systems, >99.9 % of the flops is actually spent in *GEMM. That would suggest that having a really good *GEMM implementation is all that matters, the speed of *GESV isn't really significant. Of course, for a parallel solver things change, and the paper was actually about SGI:s implementation of linpack for their big shared memory computers. You can see it here: http://amrit.ittc.ku.edu/tclark/europe2005/isc2005/www.supercomp.de/papers/panzi.pdf and the slides here: http://www.isc2005.org/download/cp/Panziera_Baron.pdf -- Janne Blomqvist
From: Joost on 12 Jan 2006 13:14 Hi Janne, I think that's right (even though they cite N>100000 for 99.9%). I decided to give it a try on an opteron using 3 combinations 1) g95 compiled LAPACK + Goto BLAS 2) ifort + mkl (721/emt64) 3) pgf90 + acml (the version that seems to come with pgi 6.0-5) (notice that the point is that acml/mkl might be optimised, whereas g95 uses netlib blas. Of course, precise timings might change depending on the version of the library used and so on) For N=351 goto/netlib mkl acml CGEMM 0.042 0.046 0.045 CGESV 0.069 0.079 0.100 For N=1351 goto/netlib mkl acml CGEMM 2.21 2.48 2.29 CGESV 3.27 3.66 3.77 For N=3351 goto/netlib mkl acml CGEMM 32.8 37.2 34.3 CGESV 46.6 52.4 51.2 So, indeed, a fast blas is enough in this case. Joost
From: Joost on 12 Jan 2006 13:56 > (notice that the point is that acml/mkl might be optimised, whereas g95 > uses netlib blas) --------------------^ read: lapack Joost
From: Victor Eijkhout on 26 Jan 2006 17:57 Gordon Sande <g.sande(a)worldnet.att.net> wrote: > The fastest way to invert matrices is to not invert matrices. One of the reasons for that being that inversion is not numerically stable, whereas solution with a factored matrix is. Victor. -- Victor Eijkhout -- eijkhout at tacc utexas edu ph: 512 471 5809
First
|
Prev
|
Pages: 1 2 3 4 Prev: Anybody use GPPTOOL to do programming before Next: Multiple Key Sort in Fortran |