From: Armin Mueller on
Dear NG,

I'm measuring color intensity with scanned sheets of paper. Matrix size,
memory consumption and computation time is quite considerable.
Everything should work within a GUI, so response time should be well
below one second.

I was trying to speed up computation and save some memory using integer
data types. However, my success is limited. Please try to profile the
attached matlab code. I'm getting for example:

time calls mem line
1.33 1 128m/0b/128m 7 trust_region_dbl =
1.03 1 64.1m/0b/64.1m 12 trust_region_i32 =
0.94 1 16m/0b/16m 17 trust_region_i8 =

Any chance to get this better?

Cheers
Armin
From: Jan Simon on
Dear Armin,

> This is a multi-part message in MIME format.
> --------------080601090808060308050506
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit

What a strange header.

> I was trying to speed up computation and save some memory using integer
> data types. However, my success is limited. Please try to profile the
> attached matlab code. I'm getting for example:
>
> time calls mem line
> 1.33 1 128m/0b/128m 7 trust_region_dbl =
> 1.03 1 64.1m/0b/64.1m 12 trust_region_i32 =
> 0.94 1 16m/0b/16m 17 trust_region_i8 =
>
> Any chance to get this better?

Yes. What is "mem"? The memory used by the output array? But the memory used during the computations should be considered also.

> value = rand(4096, 4096);
> trust_region_dbl = -1*(value<0.1) ...
> -1*(value<0.2) ...
> +1*(value>0.8) ...
> +1*(value>0.9);
>
> trust_region_i32 = int32(-1)*int32(value<0.1) ...
> + int32(-1)*int32(value<0.2) ...
> + int32(+1)*int32(value>0.8) ...
> + int32(+1)*int32(value>0.9);

"(value < 0.1)" replies a LOGICAL array, which uses 1 byte per element. Converting this to a UINT32 wastes time and memory.
Calculations with LOGICALs convert them to DOUBLEs automatically, so you can omit the "1 *":
trust_region_dbl = -(value<0.1) -(value<0.2) +(value>0.8) +(value>0.9);
This saves 20%. (measured with RAND(1024, 1024) instead, otherwise I my tiny RAM was exhausted)
But it is much faster to reduce the amout of intermediately used memory:
trust_region = zeros(4096, 4096, 'int8');
trust_region(value < 0.2) = -1;
trust_region(value < 0.1) = -2;
trust_region(value > 0.8) = 1;
trust_region(value < 0.9) = 2;
This is 70% faster than the original DOUBLE method.
Using a C-mex would be even faster, because it could avoid testing all 4 conditions but stop after a match was found.

Good luck, Jan
From: Jan Simon on
Dear Armin,

> value = rand(4096);
> trust_region = zeros(4096, 4096, 'int8');
> trust_region(value < 0.2) = -1;
> trust_region(value < 0.1) = -2;
> trust_region(value > 0.8) = 1;
> trust_region(value < 0.9) = 2;
> This is 70% faster than the original DOUBLE method.
> Using a C-mex would be even faster, because it could avoid testing all 4
> conditions but stop after a match was found.

I've tried it: The C-mex is further 80% faster than the above method:
---------------- start: TrustReg.c --------------------------------------------
// Jan Simon, 04-Aug-2010
#include "mex.h"
#include "tmwtypes.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint8_T *Y;
double *XP, X;
mwSize i, n;

plhs[0] = mxCreateNumericArray(
mxGetNumberOfDimensions(prhs[0]),
mxGetDimensions(prhs[0]), mxINT8_CLASS, mxREAL);

XP = mxGetPr(prhs[0]);
Y = (uint8_T) mxGetPr(plhs[0]);
n = mxGetNumberOfElements(prhs[0]);
for (i = 0; i < n; i++) {
X = *XP++;
if (X < 0.1) {
Y[i] = -2;
} else if (X < 0.2) {
Y[i] = -1;
} else if (X > 0.9) {
Y[i] = 2;
} else if (X > 0.8) {
Y[i] = 1;
}
}
return;
}
---------------- end: TrustReg.c ----------------------------------------------

This takes 4% of the time of your "-1*(value<0.1)..." approach (Matlab 2009a, 32bit, WinXP, MSVC, SSE2 flags, 1.5GHz Pentium-M). And no intemediate memory is allocated.
This problem could be parallelized easily for multi-core processors. I guess, you can be 50% faster on a quad-core with starting 4 threads.

Although the C-mex is fancy, the other ideas can be applied for other situations also:
- No unneeded casting of LOGICALs to DOUBLE/INT32/INT8 etc.
- Pre-allocate the output and process the limitis one after the other using logical indexing. This is usually faster than creating a single line statement containing a complicated condition.

Good luck, Jan
From: Jan Simon on
Dear Armin,

> > value = rand(4096);
> > trust_region = zeros(4096, 4096, 'int8');
> > trust_region(value < 0.2) = -1;
> > trust_region(value < 0.1) = -2;
> > trust_region(value > 0.8) = 1;
> > trust_region(value < 0.9) = 2;

*Typo*:
trust_region(value > 0.9) = 2;

And? Does it work? Is 96% speedup with the C-mex enough?

Kind regards, Jan
From: Armin Mueller on
Jan Simon wrote:

> What a strange header.

What is wrong with that?!

> Yes. What is "mem"? The memory used by the output array? But the memory
> used during the computations should be considered also.

This is what you get when you are using "profiler -memory".

> "(value < 0.1)" replies a LOGICAL array, which uses 1 byte per element.
> Converting this to a UINT32 wastes time and memory.

Yes, this was a bad example to show how *not* to do.

> Calculations with LOGICALs convert them to DOUBLEs automatically, so you
> can omit the "1 *":

OK. I was hoping that the Matlab JIT compiler would be clever enough to
optimize that. Obviously not...

Cheers,
Armin
 |  Next  |  Last
Pages: 1 2
Prev: PCA
Next: Datetick - Matlab 7.4.0.287 R2007a