blas, sgemm, cffi [Lisp]

Prev: kennycells
Next: setf explansion question

From: allchemist on 10 Apr 2010 10:13

> I share your dissatisfaction with arrays represented outside CL,
> that's why LLA objects are just thin wrappers around a SIMPLE-ARRAYs
> (matrices are stored in column-major order, though).
Am I right, that in LLA you copy lisp arrays to foreign, apply foreign
funcalls and copy result into a lisp array again? I hope there is a
method of avoiding it, such as sbcl's 'vector-sap' with 'sap-alien',
but for two dimensions.
Also, is it possible to convert simple arrays to LLA-arrays and vice
versa without creating new objects? The problem is that I have a
number of subroutines (http://github.com/allchemist/annil/blob/master/
matrix.lisp) and some code using it. I would like to use LLA, but then
I'll need to rewrite it all.

> The algorithms you mention above are theoretically interesting, but
> ill-behaved numerically. AFAIK they are not used very much in
> practice. Matrix multiplication is an operation that is usually
> constrained by memory bandwidth, hence the optimization that makes
> most sense is to make element access cache-friendly. Google for
> "cache matrix multiplication". With known element types and
> simple-arrays, I guess you could write a fairly fast matrix
> multiplication routine in CL.
I have done some type ot this optimization (per-block multiplying),
but practically with no effect. Does sbcl really control processor
cache?

From: Tamas K Papp on 10 Apr 2010 10:35

On Sat, 10 Apr 2010 07:13:54 -0700, allchemist wrote:

>> I share your dissatisfaction with arrays represented outside CL, that's
>> why LLA objects are just thin wrappers around a SIMPLE-ARRAYs (matrices
>> are stored in column-major order, though).
> Am I right, that in LLA you copy lisp arrays to foreign, apply foreign
> funcalls and copy result into a lisp array again? I hope there is a
> method of avoiding it, such as sbcl's 'vector-sap' with 'sap-alien', but
> for two dimensions.

Currently, LLA works on SBCL only, where it does _not_ copy arrays,
either before or after the function call (except when arrays contain
multiple matrices, but if I remember correctly, that happens only for
xGELS calls).

When I port it to other implementation which don't have pinned arrays,
they will be copied.

> Also, is it possible to convert simple arrays to LLA-arrays and vice
> versa without creating new objects? The problem is that I have a number

Yes. You can use the accessor ELEMENTS to get back your array. It
will be a one-dimensional array, even for matrix, 2d indexing is done
by LLA when needed. You can use MAKE-NV* and MAKE-MATRIX* to create
LLA objects from Lisp arrays, but you are responsible for making them
of the appropriate element type (see NV-ELEMENT-TYPE).

But be advised that even though LLA is pretty stable now, I still keep
experimenting with it. And nothing but SBCL will be supported for a
while, until I have some time to write the other wrappers.

>> The algorithms you mention above are theoretically interesting, but
>> ill-behaved numerically. AFAIK they are not used very much in
>> practice. Matrix multiplication is an operation that is usually
>> constrained by memory bandwidth, hence the optimization that makes most
>> sense is to make element access cache-friendly. Google for "cache
>> matrix multiplication". With known element types and simple-arrays, I
>> guess you could write a fairly fast matrix multiplication routine in
>> CL.
> I have done some type ot this optimization (per-block multiplying), but
> practically with no effect. Does sbcl really control processor cache?

Not that I know of.

Anyhow, I am surprised that you are so concerned about speed, I
generally choose not to worry about these things. For example, SBCL
is nice because it has pinned arrays, but copying an array is a very
cheap operation anyway. For me, the linear algebra costs (SVD, LU,
etc) pretty much dominate everything else, so I don't worry too much
about stuff like this. Matrix multiplication is cheap, no matter how
you do it.

Tamas

From: allchemist on 10 Apr 2010 12:01

> Currently, LLA works on SBCL only, where it does _not_ copy arrays,
> either before or after the function call (except when arrays contain
> multiple matrices, but if I remember correctly, that happens only for
> xGELS calls).
>
> When I port it to other implementation which don't have pinned arrays,
> they will be copied.
Then, in sbcl matrix multiplications are rether quick. After some
tests,
LLA does it several times quicker than GSLL. (and about several times
slower
than strait foreign funcall of sgemm). But it uses some kind of lisp
arrays
and provides blas and lapack, so it is very good! I'll use it in my
code
till have enough time to try to do the same with native arrays with
pinning.

>
> Anyhow, I am surprised that you are so concerned about speed, I
> generally choose not to worry about these things. For example, SBCL
> is nice because it has pinned arrays, but copying an array is a very
> cheap operation anyway. For me, the linear algebra costs (SVD, LU,
> etc) pretty much dominate everything else, so I don't worry too much
> about stuff like this. Matrix multiplication is cheap, no matter how
> you do it.
>
It is O(n^3), which is not very cheep, since LU(P) costs the same
O(n^3) and SVD costs O(mn^2). Speed is very important for me, cause it
limitates up to 9/10 of all time
on some tasks.

BTW, thank you for your answers. You helped me very much.

From: Nicolas Neuss on 10 Apr 2010 16:29

allchemist <hohlovivan(a)gmail.com> writes:

> It is O(n^3), which is not very cheep, since LU(P) costs the same
> O(n^3) and SVD costs O(mn^2). Speed is very important for me, cause it
> limitates up to 9/10 of all time
> on some tasks.

What are those?

Nicolas

From: Rupert Swarbrick on 10 Apr 2010 16:49

allchemist <hohlovivan(a)gmail.com> writes:
> Speed is very important for me, cause it
> limitates up to 9/10 of all time
> on some tasks.

Wow. I presume you mean that "speed (or lack thereof) is responsible for
90% of the time taken on some tasks". I take it that you place the
magical (sleep 1) statements liberally in your code? And what determines
how much time the other tasks take?

Rupert

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: kennycells
Next: setf explansion question