From: Yves on
Dear all,

I encountered a similar problem where matlab's matrix multiplication function turned out to be inefficient. The problem was the following:

I needed to compute y=A*x where A is a large square matrix with non-zero elements only in the first row and first column. I noticed that coding this by hand was a lot more efficient than using matlab's multiplication function. Here's the code to show you:

------------------------
clear all

N = 10000;
gk = 0.01;

A = zeros(N,N);

x = zeros(N,1);
x(1) = 1;

A(1,2:end) = ones(1,N-1)*gk;
A(2:end,1) = -ones(1,N-1)*gk;

%multiplication method 1
tic
y1=A*x;
toc

%multiplication method 2
y2=zeros(N,1);
tic
y2(1) = A(1,:)*x;
y2(2:end) = A(2:end,1)*x(1);
toc
----------------------

It turns out that on my computer for N=10000 the first method is more than 100 times slower than the second method. As James I was under the impression that matlab was highly optimized when it comes to matrix operations. Does anybody know what's going on here?

Best,
Yves



As James I thought that this function was highly optimized, and I am very surprised



"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message <hpd015$9j7$1(a)fred.mathworks.com>...
> "Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message <hpbh1e$adv$1(a)fred.mathworks.com>...
> > Hello,
> >
> > I've had a problem that's been bothering me for a couple of weeks now and wanted to see other people's opinions,
> >
> > Apart from cases like this: http://www.mathworks.com/matlabcentral/newsreader/view_thread/255653#664380
> >
> > where the for loop is actually more efficient than all vectorized alternatives,
> > as a rule of thumb I think most people would agree that MATLAB's matrix multiplication is much faster than doing the summation in a for loop.
> >
> > The problem I have is that when the matrices become extremely large, even the sparse representation takes up >100GB of memory, so its multiplication by a vector still ends up being slow.
> >
> > If I were to do the summation in a for loop instead of using matrix-vector multiplication,
> > I could delete elements of the summation as they are added to the total sum ...
> > thereby saving on the storage costs of storing ALL elements of the sum in a big matrix.
> >
> > In other words, the matrix-vector multiplication way stores ALL elements of the sum, even after some of them might not be needed anymore - while doing the summation in a for loop allows one to generate the elements on the fly, and delete them after they've been added to the total sum.
> > -----
> > In matlab, is the matrix-vector multiplication not itself just a for loop that is implemented efficiently since it was compiled to machine code ?
> >
> > If that's true, then if I code a for loop summation and compile it to machine code (using FORTRAN or C) would it be just as fast as matrix-vector multiplication ? or is there something else that makes the matrix-vector multiplication more efficient ?
> >
> > In each iteration of the for loop I might have to be calling an external function, or reading something from a file in order to figure out what is going to be added to the total sum, would this slow down my code considerably ?
> >
> > I'm hoping there's a way to code a for loop that's just as fast as matrix-vector multiplication but doesn't store unnecessary elements
>
> I have re-read your post several times now and am still not sure what your exact question is. Are you asking specifically about the operation A* x where A is a 2D matrix and x is a 1d vector? And are you asking how does MATLAB do this and could you do it more efficiently (or as efficiently) by hand coding your own and compiling it? If so, the answer generally is no. For full matrices, MATLAB calls a 3rd party highly optimized BLAS library for this (and similar) operations and it is unlikely you will be able to meet or beat this library for speed. Presumably the writers of this library have taken into account the cache size etc. of the machine to optimize memory access etc. for the operation. I doubt there is any significant large memory wasting code in the library. My experience in writing my mtimesx submission have shown only some limited cases involving complex variables with
> conjugates and transposes that are not coded optimally in MATLAB and can be beat by hand written code ... particularly hand written dot product code. So I don't know what you mean by this part of your post:
>
> > In other words, the matrix-vector multiplication way stores ALL elements of the sum, even after some of them might not be needed anymore
>
> It seems you are implying that the MATLAB BLAS library does this for the A * x operation and you have some evidence of it. What exact operation in particular do you think stores all elements of a sum in memory before adding them together?
>
> If you want to try it, there are several implementations of the BLAS library routines for full matrices available for free on the web, both Fortran and C source code. You can download them and try them out. e.g., the Fortran ones can be found at www.netlib.org. I seriously doubt that you will meet or beat the MATLAB BLAS library for any of these self-compiled codes, however (excepting my own hand written code in mtimesx for some specialized cases).
>
> James Tursa
>
> (You can store sparse matrices > 100GB in memory on your computer?)
From: Yi Cao on
"Yves " <yves.physicist(a)yahoo.com> wrote in message <hqc8ki$3t5$1(a)fred.mathworks.com>...
> Dear all,
>
> I encountered a similar problem where matlab's matrix multiplication function turned out to be inefficient. The problem was the following:
>
> I needed to compute y=A*x where A is a large square matrix with non-zero elements only in the first row and first column. I noticed that coding this by hand was a lot more efficient than using matlab's multiplication function. Here's the code to show you:
>
> ------------------------
> clear all
>
> N = 10000;
> gk = 0.01;
>
> A = zeros(N,N);
>
> x = zeros(N,1);
> x(1) = 1;
>
> A(1,2:end) = ones(1,N-1)*gk;
> A(2:end,1) = -ones(1,N-1)*gk;
>
> %multiplication method 1
> tic
> y1=A*x;
> toc
>
> %multiplication method 2
> y2=zeros(N,1);
> tic
> y2(1) = A(1,:)*x;
> y2(2:end) = A(2:end,1)*x(1);
> toc
> ----------------------
>
> It turns out that on my computer for N=10000 the first method is more than 100 times slower than the second method. As James I was under the impression that matlab was highly optimized when it comes to matrix operations. Does anybody know what's going on here?
>
> Best,
> Yves
>
>
>
> As James I thought that this function was highly optimized, and I am very surprised
>

Your matrix A is sparse, but you treated it as full matrix when you do y=A*x. In your second method, your calculation used the advantage of sparsity. You can try

B = sparse(A);
tic
y3 = B*x;
toc

to see how efficient on sparse matrix-vector multiplication.

HTH
Yi