Accelerating the code by running a loop in C/Fortran? [Matlab]

Prev: video compression using matlab code
Next: PITCH ESTIMATION matlab code

From: mikin Nede on 3 Apr 2010 06:31

Hi guys,

I am doing a large simulation and at each iteration I am (among other things) evaluating gaussian weight integral over R^s. I use a simple (crude) MC approximation of the integral by drawing 5000 points. The code I put below takes the bulk of time (around 14s per each iteration). I would be grateful if anyone can tell me if I can speed it up by running this part in Fortran/C and is it worthwhile doing it? I haven't coded before in Fort/C and I'd certainly like to learn it, but since I'm under time constraint, the efficiency gain is my biggest concern.

Thanks a lot in advance,
Nik

In the code, say n=300; s=5; S=5000 and beta(s,S) are the evaluation points of the integral

E = zeros(n*(n-1)/2,1);
LW = zeros(n*(n-1)/2,s);
J_temp = zeros(S,1);

% compute prelim
iii=1;
for i=1:n-1
for ii=i+1:n
E(iii)=((Y(i)<=qt(i))-qq)*Z(i)*((Y(ii)<=qt(ii))-qq)*Z(ii);
LW(iii,:)=W(i,:)-W(ii,:);
iii=iii+1;
end
end
%%% approx the integral
jj=1;
while jj<=S
LW_temp = (LW*beta(:,jj));
L = f(LW_temp);
V = 4*sum((L.^2).*Z);
J_temp(jj) = (2*sum(E.*L))/sqrt(V);
jj=jj+1;
end

From: James Tursa on 3 Apr 2010 16:10

"mikin Nede" <mnedelj(a)gmail.com> wrote in message <hp75dn$c5l$1(a)fred.mathworks.com>...
>
> I am doing a large simulation and at each iteration I am (among other things) evaluating gaussian weight integral over R^s. I use a simple (crude) MC approximation of the integral by drawing 5000 points. The code I put below takes the bulk of time (around 14s per each iteration). I would be grateful if anyone can tell me if I can speed it up by running this part in Fortran/C and is it worthwhile doing it?

Maybe. But it looks like maybe you could do some vectorization and rearranging of your m-code to get some speed improvements without resorting to a mex routine (Do you even have a Fortran compiler that MATLAB supports?).

> % compute prelim
> iii=1;
> for i=1:n-1
> for ii=i+1:n
> E(iii)=((Y(i)<=qt(i))-qq)*Z(i)*((Y(ii)<=qt(ii))-qq)*Z(ii);

For example, in the above line it looks like the calculation Y(i)<=qt(i) does not depend on the ii loop. In fact, looks like Y and qt don't change at all anywhere in this double loop, so you repeatedly do the same calculation over and over again with Y(ii)<=qt(ii). So why not vectorize this calculation and move it out of the loops entirely. e.g., do Yqt = (Y<=qt) outside the loop and then use Yqt inside the loop. After you clean up stuff like this and vectorize your calculations you may find that it runs quite a bit faster and you don't need to resort to a mex routine.

James Tursa

From: mikin Nede on 6 Apr 2010 06:57

"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message <hp87b1$rt9$1(a)fred.mathworks.com>...
> "mikin Nede" <mnedelj(a)gmail.com> wrote in message <hp75dn$c5l$1(a)fred.mathworks.com>...
> >
Hi James,

Thanks a lot for your suggestion. You are right, I can try to speed it up via vectorization. For the first part simple vectorization and use of upper triangular part will do the job. However, the majority of time is spent in the second loop (the first loop was 0.85s, with vectorization is 0.04s):
jj=1;
while jj<=S
LW_temp = (LW*beta(:,jj));
L = f(LW_temp);

Here, if I do vectorization, I either run out of memory since I have to deal with 44500x5000 matrix (or larger) or if I do it in blocks I get similar time since function f (which is my function, a trimmed quadratic function) takes time dealing with the large matrix.

Now, if I understood you right, there is no guarantee that doing the loop and evaluating function will be faster in mex? Alternative would be to evaluate the integral analytically without relying on MC (and 5000 points), but as far as I know 'int' does not support vectors?

Thanks again
Nik

From: Jan Simon on 6 Apr 2010 07:48

Dear Mik!

> I am doing a large simulation and at each iteration I am (among other things) evaluating gaussian weight integral over R^s. I use a simple (crude) MC approximation of the integral by drawing 5000 points. The code I put below takes the bulk of time (around 14s per each iteration). I would be grateful if anyone can tell me if I can speed it up by running this part in Fortran/C and is it worthwhile doing it? I haven't coded before in Fort/C and I'd certainly like to learn it, but since I'm under time constraint, the efficiency gain is my biggest concern.
>
> In the code, say n=300; s=5; S=5000 and beta(s,S) are the evaluation points of the integral
>
> E = zeros(n*(n-1)/2,1);
> LW = zeros(n*(n-1)/2,s);
> J_temp = zeros(S,1);
>
> % compute prelim
> iii=1;
> for i=1:n-1
> for ii=i+1:n
> E(iii)=((Y(i)<=qt(i))-qq)*Z(i)*((Y(ii)<=qt(ii))-qq)*Z(ii);
> LW(iii,:)=W(i,:)-W(ii,:);
> iii=iii+1;
> end
> end
> %%% approx the integral
> jj=1;
> while jj<=S
> LW_temp = (LW*beta(:,jj));
> L = f(LW_temp);
> V = 4*sum((L.^2).*Z);
> J_temp(jj) = (2*sum(E.*L))/sqrt(V);
> jj=jj+1;
> end

You can simplify the 2nd loop also:
> V = 4 * sum((L.^2).*Z);
> J_temp(jj) = (2*sum(E.*L))/sqrt(V);

> V = 4 * sum((L.^2).*Z)
is usually faster as:
Zt = transpose(Z); % Before the loop!
Et = transpose(E);
...
J_temp(jj) = (Et * L) / sqrt((L.*L) * Zt); % "2 / sqrt(4)" vanishes
Or swap the operands of the dot products according to the dimensions of L and Z.

Nevertheless, I assume "L = f(LW_temp)" to need the most time! Please show us the code of it.

Good luck, Jan

From: mikin Nede on 6 Apr 2010 11:53

"Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <hpf71k$5i3$1(a)fred.mathworks.com>...
> Dear Mik!

Hello Jan!

Thanks a lot for the tip, it does speed it up for ~ 1.5s, but as you also said, it is
LW_temp = (LW*beta(:,jj));
L = f(LW_temp);

that takes most of the time. Function f() is simple:

f=@(u)(1-(0.2*(u.^2))).*(u.^2<5);

and it takes time if I evaluate it within the loop or if I vectorize and do it pointwise on say 2000X5000 matrix (because otherwise I run out of memory). Needless to say, am grateful for any suggestions that you might have.

Cheers,
Mik

| Next | Last
Pages: 1 2
Prev: video compression using matlab code
Next: PITCH ESTIMATION matlab code