From: Aino on
Hello all!

I am again struggling with a same problem that I have encountered time and time again with no real luck of solving it: how to speed up my code. Here is a little code than calculates Sample Entropy and the error for it that is basically one for loop in side another (it is also inside several for-loops in the program that calls it..):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [SampEnt,Error]=SampEntropy4(cop,M,r)

N=length(cop);
CIM=[];

for m=[M,M+1]
Cim=NaN(1,N-m);

parfor i=1:N-m
count=0;
seg_i=cop(i:i+m-1);

for j=1:N-m
seg_j=cop(j:j+m-1);

if max(abs(seg_i-seg_j))<r
count=count+1;
end

end

Cim(i)=(count-1)/(N-m-1);
count=0;
end
CIM=cat(2,CIM,sum(Cim)/(N-m));

end

B=CIM(1)*((N-M-1)*(N-M));
%A=CIM(2)*((N-M-1)*(N-M));
CP=CIM(2)/CIM(1);

SampEnt=-log(CP);

%Error estimation:
KbOneround=2*M-2;%Overlaps with one template of length m (one i segment)
Kb=length(1:N-M)*KbOneround;%Overlaps with all matches with templetes of length m

KaOneround=2*(M+1)-2;
Ka=length(1:N-(M+1))*KaOneround;

sigma=sqrt(CP*(1-CP)/B+1/B^2*(Ka-Kb*CP^2));

Error=max(sigma/CP,sigma/(SampEnt*CP));%Should be<0.05
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

At the moment it takes 22 second to run this code with my computer, when cop=rand(1,1500), M=2 (M is always small) and r=0.2*std(cop). Profiler showed that the line that takes the most time is "if max(abs(seg_i-seg_j))<r", 60% of all time and the second biggest time consumer was "seg_j=cop(j:j+m-1);" with 32% of all time. I calculated that it would take me 10 days to get just the first set of results with this code.. Please help me, the code should take no more than ~5 seconds to run so that I could actually use it!

Here is what I have been thinking:

1) Pre allocation: I believe I have allocated all that I can (basically just Cim, CIM is only used twice). Allocating seg_i and seg_j made no difference.

2)Vectorization: The line "if max(abs(seg_i-seg_j))<r" is in vector form, but I don't see how to do this with the other for-loops. This would probably be the easiest way of speeding up the code, but even though I do get the easy examples about vectorization, I don't see how to put this in the vector form. Is it possible? Could you show me the right direction how to do it?

3) Parfor-loops: As you can see, there is one parfor in my code. It took ~5 s from the time. I hope I have used it correctly. :)

4) Mex-files: The trouble is that it has been years when I last coded anything with C and I have tried this unsuccessfully about three times in the past. Also back then I had some help, now there is a 7 hour time difference between me and my usual support group..

5) Batch: I am not really familiar with using batch, could this be used to save some time? Does this only work with scripts?

6) More computing power: The last resort. Might be an option if all other options fail.

So, thank you for reading this post. Where would you start with this? Is it possible to do little changes to the code or should I just go for Mex-files or computing powers? Is there perhaps some method I haven't thought about?

Thank You,
Aino



From: Aino on
Ou nou, now I saw there is some "slicing" problem with seg_j.. First time parfor user.. Maybe correcting this would help speed up the code.

-Aino

"Aino" <aino.tietavainen(a)removeThis.helsinki.fi> wrote in message <huqv0s$aio$1(a)fred.mathworks.com>...
> Hello all!
>
> I am again struggling with a same problem that I have encountered time and time again with no real luck of solving it: how to speed up my code. Here is a little code than calculates Sample Entropy and the error for it that is basically one for loop in side another (it is also inside several for-loops in the program that calls it..):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> function [SampEnt,Error]=SampEntropy4(cop,M,r)
>
> N=length(cop);
> CIM=[];
>
> for m=[M,M+1]
> Cim=NaN(1,N-m);
>
> parfor i=1:N-m
> count=0;
> seg_i=cop(i:i+m-1);
>
> for j=1:N-m
> seg_j=cop(j:j+m-1);
>
> if max(abs(seg_i-seg_j))<r
> count=count+1;
> end
>
> end
>
> Cim(i)=(count-1)/(N-m-1);
> count=0;
> end
> CIM=cat(2,CIM,sum(Cim)/(N-m));
>
> end
>
> B=CIM(1)*((N-M-1)*(N-M));
> %A=CIM(2)*((N-M-1)*(N-M));
> CP=CIM(2)/CIM(1);
>
> SampEnt=-log(CP);
>
> %Error estimation:
> KbOneround=2*M-2;%Overlaps with one template of length m (one i segment)
> Kb=length(1:N-M)*KbOneround;%Overlaps with all matches with templetes of length m
>
> KaOneround=2*(M+1)-2;
> Ka=length(1:N-(M+1))*KaOneround;
>
> sigma=sqrt(CP*(1-CP)/B+1/B^2*(Ka-Kb*CP^2));
>
> Error=max(sigma/CP,sigma/(SampEnt*CP));%Should be<0.05
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> At the moment it takes 22 second to run this code with my computer, when cop=rand(1,1500), M=2 (M is always small) and r=0.2*std(cop). Profiler showed that the line that takes the most time is "if max(abs(seg_i-seg_j))<r", 60% of all time and the second biggest time consumer was "seg_j=cop(j:j+m-1);" with 32% of all time. I calculated that it would take me 10 days to get just the first set of results with this code.. Please help me, the code should take no more than ~5 seconds to run so that I could actually use it!
>
> Here is what I have been thinking:
>
> 1) Pre allocation: I believe I have allocated all that I can (basically just Cim, CIM is only used twice). Allocating seg_i and seg_j made no difference.
>
> 2)Vectorization: The line "if max(abs(seg_i-seg_j))<r" is in vector form, but I don't see how to do this with the other for-loops. This would probably be the easiest way of speeding up the code, but even though I do get the easy examples about vectorization, I don't see how to put this in the vector form. Is it possible? Could you show me the right direction how to do it?
>
> 3) Parfor-loops: As you can see, there is one parfor in my code. It took ~5 s from the time. I hope I have used it correctly. :)
>
> 4) Mex-files: The trouble is that it has been years when I last coded anything with C and I have tried this unsuccessfully about three times in the past. Also back then I had some help, now there is a 7 hour time difference between me and my usual support group..
>
> 5) Batch: I am not really familiar with using batch, could this be used to save some time? Does this only work with scripts?
>
> 6) More computing power: The last resort. Might be an option if all other options fail.
>
> So, thank you for reading this post. Where would you start with this? Is it possible to do little changes to the code or should I just go for Mex-files or computing powers? Is there perhaps some method I haven't thought about?
>
> Thank You,
> Aino
>
>
>
From: Aino on
Ou nou, now I saw there is some "slicing" problem with seg_j.. First time parfor user.. Maybe correcting this would help speed up the code.

-Aino

"Aino" <aino.tietavainen(a)removeThis.helsinki.fi> wrote in message <huqv0s$aio$1(a)fred.mathworks.com>...
> Hello all!
>
> I am again struggling with a same problem that I have encountered time and time again with no real luck of solving it: how to speed up my code. Here is a little code than calculates Sample Entropy and the error for it that is basically one for loop in side another (it is also inside several for-loops in the program that calls it..):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> function [SampEnt,Error]=SampEntropy4(cop,M,r)
>
> N=length(cop);
> CIM=[];
>
> for m=[M,M+1]
> Cim=NaN(1,N-m);
>
> parfor i=1:N-m
> count=0;
> seg_i=cop(i:i+m-1);
>
> for j=1:N-m
> seg_j=cop(j:j+m-1);
>
> if max(abs(seg_i-seg_j))<r
> count=count+1;
> end
>
> end
>
> Cim(i)=(count-1)/(N-m-1);
> count=0;
> end
> CIM=cat(2,CIM,sum(Cim)/(N-m));
>
> end
>
> B=CIM(1)*((N-M-1)*(N-M));
> %A=CIM(2)*((N-M-1)*(N-M));
> CP=CIM(2)/CIM(1);
>
> SampEnt=-log(CP);
>
> %Error estimation:
> KbOneround=2*M-2;%Overlaps with one template of length m (one i segment)
> Kb=length(1:N-M)*KbOneround;%Overlaps with all matches with templetes of length m
>
> KaOneround=2*(M+1)-2;
> Ka=length(1:N-(M+1))*KaOneround;
>
> sigma=sqrt(CP*(1-CP)/B+1/B^2*(Ka-Kb*CP^2));
>
> Error=max(sigma/CP,sigma/(SampEnt*CP));%Should be<0.05
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> At the moment it takes 22 second to run this code with my computer, when cop=rand(1,1500), M=2 (M is always small) and r=0.2*std(cop). Profiler showed that the line that takes the most time is "if max(abs(seg_i-seg_j))<r", 60% of all time and the second biggest time consumer was "seg_j=cop(j:j+m-1);" with 32% of all time. I calculated that it would take me 10 days to get just the first set of results with this code.. Please help me, the code should take no more than ~5 seconds to run so that I could actually use it!
>
> Here is what I have been thinking:
>
> 1) Pre allocation: I believe I have allocated all that I can (basically just Cim, CIM is only used twice). Allocating seg_i and seg_j made no difference.
>
> 2)Vectorization: The line "if max(abs(seg_i-seg_j))<r" is in vector form, but I don't see how to do this with the other for-loops. This would probably be the easiest way of speeding up the code, but even though I do get the easy examples about vectorization, I don't see how to put this in the vector form. Is it possible? Could you show me the right direction how to do it?
>
> 3) Parfor-loops: As you can see, there is one parfor in my code. It took ~5 s from the time. I hope I have used it correctly. :)
>
> 4) Mex-files: The trouble is that it has been years when I last coded anything with C and I have tried this unsuccessfully about three times in the past. Also back then I had some help, now there is a 7 hour time difference between me and my usual support group..
>
> 5) Batch: I am not really familiar with using batch, could this be used to save some time? Does this only work with scripts?
>
> 6) More computing power: The last resort. Might be an option if all other options fail.
>
> So, thank you for reading this post. Where would you start with this? Is it possible to do little changes to the code or should I just go for Mex-files or computing powers? Is there perhaps some method I haven't thought about?
>
> Thank You,
> Aino
>
>
>
From: Jan Simon on
Dear Aino!

> if max(abs(seg_i-seg_j))<r

You could try to use anyExceed to avoid the explicte creation of the temporary vectors in this line:
http://www.mathworks.com/matlabcentral/fileexchange/27857

E.g.:
mr = -r;
...
if anyExceed(seg_i - seg_j, mr, r)

Good luck, Jan
From: Aino on
Dear Jan,

thank you for your help. I tried the anyExceed code, but unfortunately it only slowed my code down. Perhaps this might be different if M would be bigger? I also have to add to my previous post that after putting that parfor the Profiler says that that takes 90% of all time.

I think I'm using the parfor wrong, I cannot figure out how to make that sig_i a sliced variable. All of the sliced variables in the examples I have found have indexing like A(i)=C(...). Mine is sig_i=cop(..).

I don't know where to start..

Best Regards,
Aino

"Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <hur1uk$qtl$1(a)fred.mathworks.com>...
> Dear Aino!
>
> > if max(abs(seg_i-seg_j))<r
>
> You could try to use anyExceed to avoid the explicte creation of the temporary vectors in this line:
> http://www.mathworks.com/matlabcentral/fileexchange/27857
>
> E.g.:
> mr = -r;
> ...
> if anyExceed(seg_i - seg_j, mr, r)
>
> Good luck, Jan