From: Jason on
I have a mex function that operates on a vector. In matlab i call this function for 3600 columns in a matrix in a for loop. This takes about 40 seconds. In order to test moving the loop intto c I altered the mex function to run 3600 times (i call the mex function with only 1 column in the matrix, so I did not change the memory overhead). However now the code takes about 90 seconds to complete.

At first I thought the problem might be that there is some memory leak in the c code that does not effect the performance when I call it 3600 times from matlab. So I checked the allocation/deallocation code which seems to be fine.

Can anyone suggest why the c loop idea would not work? It seems to me that, ignoring the overhead incurred by passing the entire 3600 column matrix into the mex function, the looping through the entire matrix in c should be much faster than looping in matlab.

Any help is greatly appreciated.
From: dpb on
Jason wrote:
....

> Can anyone suggest why the c loop idea would not work? ...

Think one would have to see the basics of the code and the test case to
have any ideas...my crystal ball is murky, anyway.

--
From: Jason on
dpb <none(a)non.net> wrote in message <i0e9mm$d4m$1(a)news.eternal-september.org>...
> Jason wrote:
> ...
>
> > Can anyone suggest why the c loop idea would not work? ...
>
> Think one would have to see the basics of the code and the test case to
> have any ideas...my crystal ball is murky, anyway.
>
> --

The length and complexity of the code really precludes me from posting anything more useful than what i already wrote. But since your crystal ball doesn't work i guess ill give it a try.

Matlab loop:
[M,N]=size(img);
sig=std(img);
for cols=1:N
[a,b]=mexFunction(double(img(:,cols)),double(sig),a,T,3);
end

MexFunction:
void
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
{

/* VARIABLES DECLARATION */
double sigma2;
double a;
double T;
int *Iext;
int K;
int i=0,j=0;
int M ;
int MinSegLen;
double *Wext;
double *y; //array containing noisy signal to find the coefficients
int *outIext;
double *outWext;

/* Getting Input parameters */
y=mxGetPr(prhs[0]);
sigma2=mxGetScalar(prhs[1]);
a=mxGetScalar(prhs[2]);
T=mxGetScalar(prhs[3]);
MinSegLen=(int)mxGetScalar(prhs[4]);

M=mxGetNumberOfElements(prhs[0]);

K=FUNCTION(y,M,&sigma2,a,T,MinSegLen,&Iext,&Wext);

// Preparing output results
i=0;
plhs[i]= mxCreateNumericMatrix(1,K+2,mxUINT32_CLASS,mxREAL); //I
outIext= (int*) mxGetPr(plhs[i++]);

plhs[i]=mxCreateDoubleMatrix(1,K+1,mxREAL); //w
outWext= mxGetPr(plhs[i++]);

for (i=0;i<K+1;i++){
outIext[i]=Iext[i];
outWext[i]=Wext[i];
}
outIext[K+1]=Iext[K+1];
mxAssert(Iext[K+1]==M,"Problem in the Mexfile");
}


FUNCTION:
int FUNCTION (double *y,
int M,
double *Psigma2,
double a,
double T,
int MinSegLen,
int **pIext,
double **pWext
//int *pK
)
{
int i,K;
int debug=0; //I should convert that into a parameter
double mxaux;
double sigma2;
double ymean;
double tol,maxalpha,b;
double *alpha,*Wext,*tn,*aux;
int *Iext;
int maxit;
int NumEMsteps;

sigma2=*Psigma2;

tn=mxCalloc(M,sizeof(double));
for(i=0;i<M;i++)
tn[i]=y[i];


//Mean removal
ymean=0;
for(i=0;i<M;++i)
ymean+=tn[i];
ymean=ymean/M;
for(i=0;i<M;++i)
tn[i]=tn[i]-ymean;


K=M-1;
Iext=mxCalloc(K+2,sizeof(int)); //
Wext=mxCalloc(K+1,sizeof(double)); //
alpha=mxCalloc(K,sizeof(double));
aux=mxCalloc(K,sizeof(double));

//Initialize
for (i=0;i<K;i++)
alpha[i]=0.0;
for (i=0;i<=K;i++)
Wext[i]=0.0;
for (i=0;i<=K+1;i++)
Iext[i]=i;

//Freeing memory
mxFree(alpha);
mxFree(tn);
mxFree(aux);

Wext=mxRealloc(Wext,(K+1)*sizeof(double));
Wext[0]=ymean;

FUNCTION2(Wext,Iext,&K,sigma2,T,MinSegLen);

Iext=mxRealloc(Iext,(K+2)*sizeof(int));
Wext=mxRealloc(Wext,(K+1)*sizeof(double));

*pIext=Iext;
*pWext=Wext;
*Psigma2=sigma2;

mxFree(Iext);
mxFree(Wext);

return K;
)

Note that there are some things that I left out so it may seem as though some of the variables are not used.
From: Rune Allnor on
On 30 Jun, 04:14, "Jason " <jtoka...(a)optonline.net> wrote:

> Can anyone suggest why the c loop idea would not work? It seems to me that, ignoring the overhead incurred by passing the entire 3600 column matrix into the mex function, the looping through the entire matrix in c should be much faster than looping in matlab.

What is the question? Why do you think the c function
'does not work'? Does it not produce the desired result?
If it does, it works.

As for speed, it does not come for free, even with C or C++.
The programmer needs to know what he is doing - there is
nothing to prevent people from writing inefficient code
even in these languages.

You need to understand the way C(++) works, and the options
available to you, to obtain speed. To be specific, the
dereference call

double array[1000];
double x;
int i;

for (i = 0; i < 1000; i++)
x = array[i];

is a bit more complex than you might think:

- Fetch the value of i
- Fetch the base adress of array
- Add i to the base address
- Fetch the number from the location
- Increment i

In contrast, the alternative formulation

double array[1000];
double x;
double *p;
int i;

p = array;
for (i = 0; i < 1000; ++i)
x = *p++;

In this case, the chain of events is something like
Init:
- Fetch the base address of the array, and store in p

Inside the loop:
- Fetch the number from the location p points to
- Increment p
- Fetch i
- Increment i

When the loop is this tiny, this altered, and shorter,
chain of events might easily mean a factor 30% or more
on run-time. The more work is done inside the loop,
the less the relative saving, but in that case you will
need to work in a similar manner through the operations
that take place inside the loop.

And of course, there is that other detail I changed,
that I will not comment on, that in this case might
mean another 10-20% run-time.

If you want speed from C or C++ code, you will need to
work through your code on this miniscule level, knowing
the options for each operation, testing out most of them
with the help of a decent profiler.

Rune
From: James Tursa on
"Jason " <jtokayer(a)optonline.net> wrote in message <i0edu3$7j8$1(a)fred.mathworks.com>...
>
> The length and complexity of the code really precludes me from posting anything more useful than what i already wrote. But since your crystal ball doesn't work i guess ill give it a try.
>
> Matlab loop:
> [M,N]=size(img);
> sig=std(img);
> for cols=1:N
> [a,b]=mexFunction(double(img(:,cols)),double(sig),a,T,3);
> end

I think you are missing an important point about how to make a mex routine more efficient than MATLAB. As I stated in an earlier post, MATLAB is poor when it comes to array slice manipulation. To work with img(:,cols), for instance, MATLAB has to copy this array slice into a brand new memory location and *then* it can work with it. That is precisely what you can avoid in a mex routine, but you haven't avoided it at all in your example! Instead of passing in img(:,cols), which forces MATLAB to copy the data and pass in the address of this copy, you *should* be passing in img and cols separately and then using mxGetPr or mxGetData inside the mex routine to get the address of the data and then work with the data directly *without* resorting to an extra copy. If you really need to convert from an integer type to double class, do it inside the mex routine, as this will avoid the overhead of
creating the mxArray structure at the MATLAB level. etc. etc. etc.

:
>
> Wext=mxRealloc(Wext,(K+1)*sizeof(double));
> Wext[0]=ymean;
>
> FUNCTION2(Wext,Iext,&K,sigma2,T,MinSegLen);
>
> Iext=mxRealloc(Iext,(K+2)*sizeof(int));
> Wext=mxRealloc(Wext,(K+1)*sizeof(double));

FYI, for later versions of MATLAB the mxRealloc function does not work when going from larger blocks to smaller blocks ... no memory is recovered. If that is what you are doing then you may need to write your own version of memory reallocation.

James Tursa