From: Juliette Salexa on 4 Apr 2010 22:14 Hello, I've had a problem that's been bothering me for a couple of weeks now and wanted to see other people's opinions, Apart from cases like this: http://www.mathworks.com/matlabcentral/newsreader/view_thread/255653#664380 where the for loop is actually more efficient than all vectorized alternatives, as a rule of thumb I think most people would agree that MATLAB's matrix multiplication is much faster than doing the summation in a for loop. The problem I have is that when the matrices become extremely large, even the sparse representation takes up >100GB of memory, so its multiplication by a vector still ends up being slow. If I were to do the summation in a for loop instead of using matrix-vector multiplication, I could delete elements of the summation as they are added to the total sum ... thereby saving on the storage costs of storing ALL elements of the sum in a big matrix. In other words, the matrix-vector multiplication way stores ALL elements of the sum, even after some of them might not be needed anymore - while doing the summation in a for loop allows one to generate the elements on the fly, and delete them after they've been added to the total sum. ----- In matlab, is the matrix-vector multiplication not itself just a for loop that is implemented efficiently since it was compiled to machine code ? If that's true, then if I code a for loop summation and compile it to machine code (using FORTRAN or C) would it be just as fast as matrix-vector multiplication ? or is there something else that makes the matrix-vector multiplication more efficient ? In each iteration of the for loop I might have to be calling an external function, or reading something from a file in order to figure out what is going to be added to the total sum, would this slow down my code considerably ? I'm hoping there's a way to code a for loop that's just as fast as matrix-vector multiplication but doesn't store unnecessary elements
From: Bruno Luong on 5 Apr 2010 03:43 "Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message <hpbh1e$adv$1(a)fred.mathworks.com>... > Hello, > > I've had a problem that's been bothering me for a couple of weeks now and wanted to see other people's opinions, > > Apart from cases like this: http://www.mathworks.com/matlabcentral/newsreader/view_thread/255653#664380 > > where the for loop is actually more efficient than all vectorized alternatives, > as a rule of thumb I think most people would agree that MATLAB's matrix multiplication is much faster than doing the summation in a for loop. > > The problem I have is that when the matrices become extremely large, even the sparse representation takes up >100GB of memory, so its multiplication by a vector still ends up being slow. > > If I were to do the summation in a for loop instead of using matrix-vector multiplication, > I could delete elements of the summation as they are added to the total sum ... > thereby saving on the storage costs of storing ALL elements of the sum in a big matrix. > > In other words, the matrix-vector multiplication way stores ALL elements of the sum, even after some of them might not be needed anymore - while doing the summation in a for loop allows one to generate the elements on the fly, and delete them after they've been added to the total sum. > ----- > In matlab, is the matrix-vector multiplication not itself just a for loop that is implemented efficiently since it was compiled to machine code ? > > If that's true, then if I code a for loop summation and compile it to machine code (using FORTRAN or C) would it be just as fast as matrix-vector multiplication ? or is there something else that makes the matrix-vector multiplication more efficient ? > > In each iteration of the for loop I might have to be calling an external function, or reading something from a file in order to figure out what is going to be added to the total sum, would this slow down my code considerably ? > > I'm hoping there's a way to code a for loop that's just as fast as matrix-vector multiplication but doesn't store unnecessary elements For loop by itself is not slow, what is slow is overhead due to code inside the iteration body. Statements such as calling another function, creating a matrix, creating a scalar, deleting them all cost time, let alone reading from file. The for-loop in C fortran is fast because it usually manipulates the basic type of the computer: a bared large chunk data array of numbers (Matlab array has an overhead before this bared array can be directly accessed by lower level code). If you need to read a file to build you matrix element, then building matrix element will surely cost more time than any matrix x vector multiplication, regardless the later is carried out with or without for-loop - by adding or using built-in function(s). In any case, use PROFILER is a recommended strategy to detect what take time in your code (note that the profiler only provides an approximation of what's going on, but it is largely enough to start with). Bruno
From: Matt J on 5 Apr 2010 06:49 "Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message <hpbh1e$adv$1(a)fred.mathworks.com>... > The problem I have is that when the matrices become extremely large, even the sparse representation takes up >100GB of memory, so its multiplication by a vector still ends up being slow. ======== It might be worth describing the application, so that we can assess if there is an unnecessary inefficiency here. I've never seen a situation requiring anything close to a 100GB sparse matrix.
From: James Tursa on 5 Apr 2010 11:36 "Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message <hpbh1e$adv$1(a)fred.mathworks.com>... > Hello, > > I've had a problem that's been bothering me for a couple of weeks now and wanted to see other people's opinions, > > Apart from cases like this: http://www.mathworks.com/matlabcentral/newsreader/view_thread/255653#664380 > > where the for loop is actually more efficient than all vectorized alternatives, > as a rule of thumb I think most people would agree that MATLAB's matrix multiplication is much faster than doing the summation in a for loop. > > The problem I have is that when the matrices become extremely large, even the sparse representation takes up >100GB of memory, so its multiplication by a vector still ends up being slow. > > If I were to do the summation in a for loop instead of using matrix-vector multiplication, > I could delete elements of the summation as they are added to the total sum ... > thereby saving on the storage costs of storing ALL elements of the sum in a big matrix. > > In other words, the matrix-vector multiplication way stores ALL elements of the sum, even after some of them might not be needed anymore - while doing the summation in a for loop allows one to generate the elements on the fly, and delete them after they've been added to the total sum. > ----- > In matlab, is the matrix-vector multiplication not itself just a for loop that is implemented efficiently since it was compiled to machine code ? > > If that's true, then if I code a for loop summation and compile it to machine code (using FORTRAN or C) would it be just as fast as matrix-vector multiplication ? or is there something else that makes the matrix-vector multiplication more efficient ? > > In each iteration of the for loop I might have to be calling an external function, or reading something from a file in order to figure out what is going to be added to the total sum, would this slow down my code considerably ? > > I'm hoping there's a way to code a for loop that's just as fast as matrix-vector multiplication but doesn't store unnecessary elements I have re-read your post several times now and am still not sure what your exact question is. Are you asking specifically about the operation A* x where A is a 2D matrix and x is a 1d vector? And are you asking how does MATLAB do this and could you do it more efficiently (or as efficiently) by hand coding your own and compiling it? If so, the answer generally is no. For full matrices, MATLAB calls a 3rd party highly optimized BLAS library for this (and similar) operations and it is unlikely you will be able to meet or beat this library for speed. Presumably the writers of this library have taken into account the cache size etc. of the machine to optimize memory access etc. for the operation. I doubt there is any significant large memory wasting code in the library. My experience in writing my mtimesx submission have shown only some limited cases involving complex variables with conjugates and transposes that are not coded optimally in MATLAB and can be beat by hand written code ... particularly hand written dot product code. So I don't know what you mean by this part of your post: > In other words, the matrix-vector multiplication way stores ALL elements of the sum, even after some of them might not be needed anymore It seems you are implying that the MATLAB BLAS library does this for the A * x operation and you have some evidence of it. What exact operation in particular do you think stores all elements of a sum in memory before adding them together? If you want to try it, there are several implementations of the BLAS library routines for full matrices available for free on the web, both Fortran and C source code. You can download them and try them out. e.g., the Fortran ones can be found at www.netlib.org. I seriously doubt that you will meet or beat the MATLAB BLAS library for any of these self-compiled codes, however (excepting my own hand written code in mtimesx for some specialized cases). James Tursa (You can store sparse matrices > 100GB in memory on your computer?)
From: Juliette Salexa on 6 Apr 2010 16:02
"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message > Are you asking specifically about the operation A* x where A is a 2D matrix and x is a 1d vector? Thank you very much James, Yes, I apologize for not being clear > It seems you are implying that the MATLAB BLAS library does this for the A * x operation and you have some evidence of it. What exact operation in particular do you think stores all elements of a sum in memory before adding them together? I realize that I was very unclear in my initial post (sorry!) In order to multiply a (10^5) x (10^5) matrix by a (10^5 x 1) vector, I need to have that matrix and vector stored in matlab in the first place, which would require more than 1TB of RAM. Even if it wasn't stored in matlab, but just stored in a file on my hard drive, (which would of course slow me down significantly), I wouldn't want to store all of the elements of the matrix-vector multiplication at the same time. Instead, let's say I did the matrix-vector multiplication by my own forloop. Let's say M(i,j) is my matrix and V(k) is my vector I create the elements of M(i,j) on the fly as they are needed, multiply them by the appropriate V(k) values, and then delete them when they are no longer needed. For example, M(1,1)=rand; sum=M(1,1)*V(1); clear(M); M(1,2)=rand; sum=sum + M(1,2)*V(2); clear(M); M(1,3)=rand; sum=sum + M(1,3)*V(3); clear(M); ..... looped .... *Of course M(1,1) could just have been M, in the above example, but I put the indices for illustrative purposes* In this way, I don't need to store *all elements at the same time* , which would take a lot of space, but instead I can just store the ones I need at the moment. ------------------------- Although using matlab's internal matrix-vector multiplication function would require storage of huge arrays, the summation itself would be done extremely fast. The disadvantage of my above forloop is that although the memory requirement is negligible, the computation is very slow (because the elements are not generated by RAND, but by a more complicated set of instructions). I was wondering if I did my above forloop in fortran instead of matlab, if the speed of my summation could compete with matlab's speed for matrix-vector multiplication. that way I can save memory cost with not too much sacrifice on the speed. *I hope I'm making sense ... =) * > And are you asking how does MATLAB do this and could you do it more efficiently (or as efficiently) by hand coding your own and compiling it? If so, the answer generally is no. For full matrices, MATLAB calls a 3rd party highly optimized BLAS library for this (and similar) operations and it is unlikely you will be able to meet or beat this library for speed. Presumably the writers of this library have taken into account the cache size etc. of the machine to optimize memory access etc. for the operation. Is it possible for the BLAS library to have an optimized code for my matrix vector multiplication, where the elements are created and deleted only when necessary ? > > (You can store sparse matrices > 100GB in memory on your computer?) Not my own computer, but I submit my jobs to a cluster: 16 CPU's, each with 32GB of RAM! "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message > It might be worth describing the application, so that we can assess if there is an unnecessary inefficiency here. I've never seen a situation requiring anything close to a 100GB sparse matrix. Thank you Matt J, it's a Feynman integral .. summation over 49^12 paths, each with different amplitude, with no symmetry or cancellations ... I've thought about it for awhile and people have wrote papers about this since the early 1990's , and the 100GB matrix is pretty much as small is it gets Also, thank you Bruno for your comments, those were helpful too Juliette |