From: James Tursa on
"ErasmusMC " <z.rijnen(a)erasmusmc.nl> wrote in message <hp4o8f$9on$1(a)fred.mathworks.com>...
> Hi,
> I would like to multiply a 1 dimensional vector "C" of certain length,say 12 with a 5 dimensional array "E" without using for-loops. The last dimension of this 5 dimensional array has the same length as vector C.
>
> So I want to do this:
>
> for i=1:12
> A(:,:,:,:,i)=C(i).*E(:,:,:,:,i)
> end

FYI, depending on how large E is and how critical speed is to you, this can be done more efficiently in a mex routine. Basically, every time you form E(:,:,:,:,i) MATLAB has to copy the entire contents of this slice to a new temporary variable which then gets multiplied by C(i), creating another large temporary variable with new memory allocation requirements, which then gets copied into A and then the temporary variable is cleared. This happens for each of your 12 iterations. In a mex routine all of the temporary allocation and copying can be avoided. e.g., I coded up a simple C mex routine to do the above calculation for a size (30,30,30,30,12) complex array and the routine ran 3x as fast as the above equivalent loop in MATLAB. This was a stressing 150MB array test and may not be applicable to your needs, however. But if you are interested I can post the code.

There are a lot of calculations like this that can be done more efficiently in a mex routine. That being said, I would usually opt for the MATLAB m-code over C mex code for readability and maintainability reasons unless there are critical memory and/or speed concerns you need to address.

James Tursa
From: ErasmusMC on
Hi James, i'm more than interested in your mex routine, since i'm also using these large sets of data. So if you can post it,than please do.

Thanks in advance,
ErasmusMC


"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message <hp5do4$fbc$1(a)fred.mathworks.com>...
> "ErasmusMC " <z.rijnen(a)erasmusmc.nl> wrote in message <hp4o8f$9on$1(a)fred.mathworks.com>...
> > Hi,
> > I would like to multiply a 1 dimensional vector "C" of certain length,say 12 with a 5 dimensional array "E" without using for-loops. The last dimension of this 5 dimensional array has the same length as vector C.
> >
> > So I want to do this:
> >
> > for i=1:12
> > A(:,:,:,:,i)=C(i).*E(:,:,:,:,i)
> > end
>
> FYI, depending on how large E is and how critical speed is to you, this can be done more efficiently in a mex routine. Basically, every time you form E(:,:,:,:,i) MATLAB has to copy the entire contents of this slice to a new temporary variable which then gets multiplied by C(i), creating another large temporary variable with new memory allocation requirements, which then gets copied into A and then the temporary variable is cleared. This happens for each of your 12 iterations. In a mex routine all of the temporary allocation and copying can be avoided. e.g., I coded up a simple C mex routine to do the above calculation for a size (30,30,30,30,12) complex array and the routine ran 3x as fast as the above equivalent loop in MATLAB. This was a stressing 150MB array test and may not be applicable to your needs, however. But if you are interested I can post the code.
>
> There are a lot of calculations like this that can be done more efficiently in a mex routine. That being said, I would usually opt for the MATLAB m-code over C mex code for readability and maintainability reasons unless there are critical memory and/or speed concerns you need to address.
>
> James Tursa
From: James Tursa on
"ErasmusMC " <z.rijnen(a)erasmusmc.nl> wrote in message <hp72qp$8us$1(a)fred.mathworks.com>...
> Hi James, i'm more than interested in your mex routine, since i'm also using these large sets of data. So if you can post it,than please do.
>
> Thanks in advance,
> ErasmusMC

Here it is. I cleaned it up a bit from the quick & dirty version I wrote earlier. It will handle either real or complex double arguments in any order, one an nD array and the other a vector with size matching the last dimension of the nD array. Spot checks on large arrays show about a 3x speed improvement over the MATLAB loop. In addition to this standalone C mex function, I have decided to incorporate this capability into my mtimesx submission on the FEX, which has more general multi-dimensional matrix multiply capabilities but did not have this one in particular. The new version of mtimesx will probably be uploaded to the FEX sometime next week. Enjoy.

James Tursa

--------------------------------------------------------------------------------------------

// Function C = ndscalarmultiply(s,A) or
// C = ndscalarmultiply(A,s)
// s = vector of scalars (double real or complex)
// A = nD array (double real or complex)
// The last dimension of A must match the size of s
//
// Multiplies each nD slice of A (everything except the last dimension)
// by the corresponding scalar in s, putting the result in C.
//
// Programmer: James Tursa
// Date: April 3, 2010

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize ndim;
register mwSize i, k, imax, kmax;
mwSize *dims;
register double sr, si;
register double *Apr, *Api, *Cpr, *Cpi, *spr, *spi;
mxArray *C, *A, *s;

// Check for errors

if( nrhs != 2 ) {
mexErrMsgTxt("Need exactly two inputs.");
}
if( nlhs > 1 ) {
mexErrMsgTxt("Too many outputs.");
}
if( mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ) {
mexErrMsgTxt("Sparse matrices not supported.");
}
if( !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ) {
mexErrMsgTxt("Inputs must be double.");
}
if( mxGetM(prhs[0]) == 1 || mxGetN(prhs[0]) == 1 ) {
s = prhs[0];
A = prhs[1];
} else if( mxGetM(prhs[1]) == 1 || mxGetN(prhs[1]) == 1 ) {
s = prhs[1];
A = prhs[0];
} else {
mexErrMsgTxt("One of the inputs must be a vector.");
}

// Get pointers to the dimensions and data areas

ndim = mxGetNumberOfDimensions(A);
dims = mxGetDimensions(A);
Apr = mxGetPr(A);
Api = mxGetPi(A);
spr = mxGetPr(s);
spi = mxGetPi(s);

// Calculate size of each nD slice to be multiplied by a scalar

kmax = dims[ndim-1];
imax = 1;
for( k=0; k<ndim-1; k++ ) {
imax *= dims[k];
}

// Check that vector is the correct size

if( mxGetNumberOfElements(s) != kmax ) {
mexErrMsgTxt("Vector size does not match last dimension of nD array.");
}

// Create the output array and get pointers to the data areas

if( Api != NULL || spi != NULL ) {
C = plhs[0] = mxCreateNumericArray(ndim, dims, mxDOUBLE_CLASS, mxCOMPLEX);
} else {
C = plhs[0] = mxCreateNumericArray(ndim, dims, mxDOUBLE_CLASS, mxREAL);
}
Cpr = mxGetPr(C);
Cpi = mxGetPi(C);

// Do the (scalar) * (slice) multiplies

if( Api != NULL ) {
if( spi != NULL ) { // (complex scalar) * (complex slices)
for( k=0; k<kmax; k++ ) {
sr = *spr++;
si = *spi++;
for( i=0; i<imax; i++ ) {
*Cpr++ = sr * (*Apr ) - si * (*Api );
*Cpi++ = sr * (*Api++) + si * (*Apr++);
}
}
} else { // (real scalar) * (complex slices)
for( k=0; k<kmax; k++ ) {
sr = *spr++;
for( i=0; i<imax; i++ ) {
*Cpr++ = sr * (*Apr++);
*Cpi++ = sr * (*Api++);
}
}
}
} else {
if( spi != NULL ) { // (complex scalar) * (real slices)
for( k=0; k<kmax; k++ ) {
sr = *spr++;
si = *spi++;
for( i=0; i<imax; i++ ) {
*Cpr++ = sr * (*Apr);
*Cpi++ = si * (*Apr++);
}
}
} else { // (real scalar) * (real slices)
for( k=0; k<kmax; k++ ) {
sr = *spr++;
for( i=0; i<imax; i++ ) {
*Cpr++ = sr * (*Apr++);
}
}
}
}

}
From: Paul on
"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message <hp844a$gt9$1(a)fred.mathworks.com>...
> "ErasmusMC " <z.rijnen(a)erasmusmc.nl> wrote in message <hp72qp$8us$1(a)fred.mathworks.com>...
> > Hi James, i'm more than interested in your mex routine, since i'm also using these large sets of data. So if you can post it,than please do.
> >
> > Thanks in advance,
> > ErasmusMC
>
> Here it is. I cleaned it up a bit from the quick & dirty version I wrote earlier. It will handle either real or complex double arguments in any order, one an nD array and the other a vector with size matching the last dimension of the nD array. Spot checks on large arrays show about a 3x speed improvement over the MATLAB loop. In addition to this standalone C mex function, I have decided to incorporate this capability into my mtimesx submission on the FEX, which has more general multi-dimensional matrix multiply capabilities but did not have this one in particular. The new version of mtimesx will probably be uploaded to the FEX sometime next week. Enjoy.
>
> James Tursa
>

James,

For this simple case, it's straightforward to generate a mex file using the emlmex command, if the dimensions of the inputs C and E are known a priori. Not sure if this is the case for the OP. Supposing it is, I'm curious how your mex code (or whatever you would rewrite to take advantage of the known input dimensions) compares with that generated by emlmex from an efficiency/speed standpoint. In general, do you have an opinion about how well emlmex works for those of us who aren't as proficient at rolling our own mex files?

Paul
From: James Tursa on
"Paul" <pauldotjackson(a)jhuapl.edu> wrote in message <hpahke$p31$1(a)fred.mathworks.com>...
>
> James,
>
> For this simple case, it's straightforward to generate a mex file using the emlmex command, if the dimensions of the inputs C and E are known a priori. Not sure if this is the case for the OP. Supposing it is, I'm curious how your mex code (or whatever you would rewrite to take advantage of the known input dimensions) compares with that generated by emlmex from an efficiency/speed standpoint. In general, do you have an opinion about how well emlmex works for those of us who aren't as proficient at rolling our own mex files?
>
> Paul

That is an interesting question in general. I am aware of emlmex and have used it in the past, but the fixed size restriction generally is too limiting for my typical uses so I end up writing my own. I don't know how an emlmex version of the m-code loop will compare to my mex routine but I will make some tests. As to your more specific question, there will be *no* speed improvement in my mex routine if the dimensions are fixed and hard coded. The only time savings would be an extremely minor one in that the loop to calculate the size of each slice (imax) could be eliminated. That time savings is negligible and not even worth considering. Other than multi-threading, I don't know any way the speed could be improved.

James Tursa