Prev: kennycells
Next: setf explansion question
From: allchemist on 11 Apr 2010 07:23 > What are those? different neural network architectures. some of them need plenty of matrix multiplication > Wow. I presume you mean that "speed (or lack thereof) is responsible for > 90% of the time taken on some tasks". I take it that you place the > magical (sleep 1) statements liberally in your code? > And what determines how much time the other tasks take? The profiler does the job perfectly. In my current (slow) implementation the operation takes more than 90% of all time. I expect to reduce it about ten times (to be 50/50).
From: Nicolas Neuss on 11 Apr 2010 10:12 allchemist <hohlovivan(a)gmail.com> writes: >> What are those? > > different neural network architectures. some of them need plenty of > matrix multiplication Which size are those matrices? Are they densely populated? Do all these multiplications involve new factors? Nicolas
From: allchemist on 12 Apr 2010 09:12 > Which size are those matrices? Are they densely populated? Do all > these multiplications involve new factors? They are very different, from small (with about 10 elements in the matrix) to very large (about 10^7 elements), densely populated. Since LLA avoids copying matrix to foreign memory and back, it seems to be a good variant.
From: Liam Healy on 12 Apr 2010 11:16 allchemist <hohlovivan(a)gmail.com> writes: >> Currently, LLA works on SBCL only, where it does _not_ copy arrays, >> either before or after the function call (except when arrays contain >> multiple matrices, but if I remember correctly, that happens only for >> xGELS calls). >> >> When I port it to other implementation which don't have pinned arrays, >> they will be copied. > Then, in sbcl matrix multiplications are rether quick. After some > tests, > LLA does it several times quicker than GSLL. (and about several times > slower > than strait foreign funcall of sgemm). But it uses some kind of lisp > arrays > and provides blas and lapack, so it is very good! I'll use it in my > code > till have enough time to try to do the same with native arrays with > pinning. > GSLL is several times slower than LLA, which is several times slower than directly calling than sgemm? I don't understand why. Here is the macro expansion of #'matrix-product on SBCL: (DEFMETHOD MATRIX-PRODUCT ((A MATRIX-SINGLE-FLOAT) (B MATRIX-SINGLE-FLOAT) &OPTIONAL C (ALPHA 1.0) (BETA 1.0) (TRANSA :NOTRANS) (TRANSB :NOTRANS) &AUX (CARR (OR C (MAKE-MARRAY 'SINGLE-FLOAT :DIMENSIONS (MATRIX-PRODUCT-DIMENSIONS A B) :INITIAL-ELEMENT 0)))) (DECLARE (TYPE SINGLE-FLOAT ALPHA) (TYPE SINGLE-FLOAT BETA)) (SB-SYS:WITH-PINNED-OBJECTS ((C-ARRAY:ORIGINAL-ARRAY CARR) (C-ARRAY:ORIGINAL-ARRAY A) (C-ARRAY:ORIGINAL-ARRAY B)) (LET ((#:CRETURN (FOREIGN-FUNCALL "gsl_blas_sgemm" CBLAS-TRANSPOSE TRANSA CBLAS-TRANSPOSE TRANSB :FLOAT ALPHA :POINTER (MPOINTER A) :POINTER (MPOINTER B) :FLOAT BETA :POINTER (MPOINTER CARR) :INT))) (CHECK-GSL-STATUS #:CRETURN 'MATRIX-PRODUCT) (VALUES CARR)))) Why would this cause a several x several slowdown over directly calling sgemm? There is a generic function dispatch of course, and maybe making a marray (but to be fair, that should be pre-made). Liam
From: Mario S. Mommer on 12 Apr 2010 11:28
Liam Healy <lnp(a)healy.washington.dc.us> writes: > GSLL is several times slower than LLA, which is several times slower > than directly calling than sgemm? I don't understand why. Here is the > macro expansion of #'matrix-product on SBCL: [...] > Why would this cause a several x several slowdown over directly > calling sgemm? There is a generic function dispatch of course, > and maybe making a marray (but to be fair, that should be > pre-made). Just guessing here, but if your matrix is small, this overhead will be significant. |