blas, sgemm, cffi [Lisp]

Prev: kennycells
Next: setf explansion question

From: allchemist on 11 Apr 2010 07:23

> What are those?

different neural network architectures. some of them need plenty of
matrix multiplication

> Wow. I presume you mean that "speed (or lack thereof) is responsible for
> 90% of the time taken on some tasks". I take it that you place the
> magical (sleep 1) statements liberally in your code?
> And what determines how much time the other tasks take?

The profiler does the job perfectly. In my current (slow)
implementation the operation takes more than 90% of all time. I expect
to reduce it about ten times (to be 50/50).

From: Nicolas Neuss on 11 Apr 2010 10:12

allchemist <hohlovivan(a)gmail.com> writes:

>> What are those?
>
> different neural network architectures. some of them need plenty of
> matrix multiplication

Which size are those matrices? Are they densely populated? Do all
these multiplications involve new factors?

Nicolas

From: allchemist on 12 Apr 2010 09:12

> Which size are those matrices? Are they densely populated? Do all
> these multiplications involve new factors?

They are very different, from small (with about 10 elements in the
matrix) to very large (about 10^7 elements), densely populated. Since
LLA avoids copying matrix to foreign memory and back, it seems to be a
good variant.

From: Liam Healy on 12 Apr 2010 11:16

allchemist <hohlovivan(a)gmail.com> writes:

>> Currently, LLA works on SBCL only, where it does _not_ copy arrays,
>> either before or after the function call (except when arrays contain
>> multiple matrices, but if I remember correctly, that happens only for
>> xGELS calls).
>>
>> When I port it to other implementation which don't have pinned arrays,
>> they will be copied.
> Then, in sbcl matrix multiplications are rether quick. After some
> tests,
> LLA does it several times quicker than GSLL. (and about several times
> slower
> than strait foreign funcall of sgemm). But it uses some kind of lisp
> arrays
> and provides blas and lapack, so it is very good! I'll use it in my
> code
> till have enough time to try to do the same with native arrays with
> pinning.
>

GSLL is several times slower than LLA, which is several times slower
than directly calling than sgemm? I don't understand why. Here is the
macro expansion of #'matrix-product on SBCL:

(DEFMETHOD MATRIX-PRODUCT
((A MATRIX-SINGLE-FLOAT) (B MATRIX-SINGLE-FLOAT) &OPTIONAL C
(ALPHA 1.0) (BETA 1.0) (TRANSA :NOTRANS) (TRANSB :NOTRANS) &AUX
(CARR
(OR C
(MAKE-MARRAY 'SINGLE-FLOAT :DIMENSIONS
(MATRIX-PRODUCT-DIMENSIONS A B) :INITIAL-ELEMENT
0))))
(DECLARE (TYPE SINGLE-FLOAT ALPHA)
(TYPE SINGLE-FLOAT BETA))
(SB-SYS:WITH-PINNED-OBJECTS ((C-ARRAY:ORIGINAL-ARRAY CARR)
(C-ARRAY:ORIGINAL-ARRAY A)
(C-ARRAY:ORIGINAL-ARRAY B))
(LET ((#:CRETURN
(FOREIGN-FUNCALL "gsl_blas_sgemm" CBLAS-TRANSPOSE TRANSA
CBLAS-TRANSPOSE TRANSB :FLOAT ALPHA
:POINTER (MPOINTER A) :POINTER
(MPOINTER B) :FLOAT BETA :POINTER
(MPOINTER CARR) :INT)))
(CHECK-GSL-STATUS #:CRETURN 'MATRIX-PRODUCT)
(VALUES CARR))))

Why would this cause a several x several slowdown over directly
calling sgemm? There is a generic function dispatch of course,
and maybe making a marray (but to be fair, that should be
pre-made).

Liam

From: Mario S. Mommer on 12 Apr 2010 11:28

Liam Healy <lnp(a)healy.washington.dc.us> writes:
> GSLL is several times slower than LLA, which is several times slower
> than directly calling than sgemm? I don't understand why. Here is the
> macro expansion of #'matrix-product on SBCL:
[...]
> Why would this cause a several x several slowdown over directly
> calling sgemm? There is a generic function dispatch of course,
> and maybe making a marray (but to be fair, that should be
> pre-made).

Just guessing here, but if your matrix is small, this overhead will be
significant.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: kennycells
Next: setf explansion question