question regarding matrix multiplication [Matlab]

Prev: problem with multiple gui's exchanging handles
Next: How to install Matlab on my Mac from the net ?

From: James Tursa on 5 Apr 2010 12:41

"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message <hpb54p$6vs$1(a)fred.mathworks.com>...
> "Paul" <pauldotjackson(a)jhuapl.edu> wrote in message <hpahke$p31$1(a)fred.mathworks.com>...
> >
> > James,
> >
> > For this simple case, it's straightforward to generate a mex file using the emlmex command, if the dimensions of the inputs C and E are known a priori. Not sure if this is the case for the OP. Supposing it is, I'm curious how your mex code (or whatever you would rewrite to take advantage of the known input dimensions) compares with that generated by emlmex from an efficiency/speed standpoint. In general, do you have an opinion about how well emlmex works for those of us who aren't as proficient at rolling our own mex files?
> >
> > Paul
>
> That is an interesting question in general. I am aware of emlmex and have used it in the past, but the fixed size restriction generally is too limiting for my typical uses so I end up writing my own. I don't know how an emlmex version of the m-code loop will compare to my mex routine but I will make some tests. As to your more specific question, there will be *no* speed improvement in my mex routine if the dimensions are fixed and hard coded. The only time savings would be an extremely minor one in that the loop to calculate the size of each slice (imax) could be eliminated. That time savings is negligible and not even worth considering. Other than multi-threading, I don't know any way the speed could be improved.

I ran some tests using emlmex ... the result was only a 5% speed improvement over the m-code. So at least in this particular case, it is evident that emlmex did not do the array slices efficiently. I don't use emlmex much except for sporadic tests like this so I don't know if there is some particular syntax for m-code that it likes and can optimize better than others. So this may not be a fair test of emlmex.

As for advice about using emlmex for people with little to no C / Fortran coding skills, I guess I would just try it and see for particular functions. It might save you some time and it might not. Generally, what I would advise is to get *very* familiar with how MATLAB stores and accesses arrays in memory. Since MATLAB stores arrays in column order in memory, once you get used to thinking that way you will start to naturally design your arrays such that your algorithms will access them by columns rather than by rows. This in and of itself will greatly improve the efficiency of your m-code and lend itself to vectorizing much easier than row-based algorithms. It also becomes easier to visualize where the bottlenecks in the code are. For example, just one look at the loop posted by OP in his original post and I was able to tell that MATLAB would be doing a huge amount of temporary copying
& storing of data that could be eliminated in a mex routine. In general, MATLAB does not do array slicing efficiently in hand-coded equations, which is one reason why calling vectorized functions that work internally with array slices (rather that creating temporary copies of them) is often much faster, and one of the main reasons mex routines can beat MATLAB dramatically for these types of cases.

Another piece of advice is to try different functions that can do the same thing for you ... sometimes one approach will work better than others. For example, take a simple dot product on two large complex vectors:

>> a = rand(1000000,1)+rand(1000000,1)*i;
>> b = rand(1000000,1)+rand(1000000,1)*i;
>> tic;dot(a,b);toc
Elapsed time is 0.040879 seconds.
>> tic;a'*b;toc
Elapsed time is 0.016309 seconds.

Would you have thought that the dot function could be that much slower than the matrix product operator? I wouldn't have, but there it is.

Another example is the cross function. If you are working with simply 3-vectors and not arrays of them, it is 2x faster to write your own code than to call the built-in cross function. If this calculation is part of a large loop it can make a significant difference in the overall run time.

Things like this can often be found by the profiler and then you know where in your code it will make sense to spend your time rewriting / optimizing.

James Tursa

First | Prev |
Pages: 1 2 3
Prev: problem with multiple gui's exchanging handles
Next: How to install Matlab on my Mac from the net ?