How to achieve ultimate summing speed? [Matlab]

Prev: spm_select
Next: MATLAB code speed

From: Rune Allnor on 5 Mar 2010 14:54

On 5 Mar, 20:13, "James Tursa"
<aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:

> s = mxCalloc(m, sizeof(*s));
> for( i=0; i<m; i++ ) {
> pr[i] = s[i];
> }
> mxFree(s);

Those calloc/free calls are expensive. What about
letting s be a local scalar and move the result to
pr[i] after treating each row?

I wouldn't be surprised if you end up saving >> 50%
by doing that.

Rune

From: James Tursa on 5 Mar 2010 16:43

Rune Allnor <allnor(a)tele.ntnu.no> wrote in message <44ae8909-03ae-4219-a6b7-14d9329ebabf(a)z35g2000yqd.googlegroups.com>...
> On 5 Mar, 20:13, "James Tursa"
> <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:
>
>
> > s = mxCalloc(m, sizeof(*s));
> > for( i=0; i<m; i++ ) {
> > pr[i] = s[i];
> > }
> > mxFree(s);
>
> Those calloc/free calls are expensive. What about
> letting s be a local scalar and move the result to
> pr[i] after treating each row?
>
> I wouldn't be surprised if you end up saving >> 50%
> by doing that.
>
> Rune

I can try that. My thought was to traverse the R array only once, hence the s allocation. If I use only one scalar then I have to traverse the R array m times, which I expect to be slower than allocating s, but I haven't actually tried it yet. I will give it a shot ...

James Tursa

From: James Tursa on 5 Mar 2010 16:56

"James Tursa" <aclassyguy_with_a_k_not_a_c(a)hotmail.com> wrote in message <hmrtu8$go7$1(a)fred.mathworks.com>...
> Rune Allnor <allnor(a)tele.ntnu.no> wrote in message <44ae8909-03ae-4219-a6b7-14d9329ebabf(a)z35g2000yqd.googlegroups.com>...
> > On 5 Mar, 20:13, "James Tursa"
> > <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:
> >
> >
> > > s = mxCalloc(m, sizeof(*s));
> > > for( i=0; i<m; i++ ) {
> > > pr[i] = s[i];
> > > }
> > > mxFree(s);
> >
> > Those calloc/free calls are expensive. What about
> > letting s be a local scalar and move the result to
> > pr[i] after treating each row?
> >
> > I wouldn't be surprised if you end up saving >> 50%
> > by doing that.
> >
> > Rune
>
> I can try that. My thought was to traverse the R array only once, hence the s allocation. If I use only one scalar then I have to traverse the R array m times, which I expect to be slower than allocating s, but I haven't actually tried it yet. I will give it a shot ...
>
> James Tursa

Here is the result, about 70% slower than my previous post using an allocated s. This is about what I would have expected given the multiple traverses of R involved. All that redundant memory access just kills the running times.

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i, j, m, n;
mwSize s;
double *pr, *R, *R0;
double k;

m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
R0 = mxGetPr(prhs[0]);
k = mxGetScalar(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(m, 1, mxREAL);
pr = mxGetPr(plhs[0]);
for( i=0; i<m; i++ ) {
R = R0++;
s = 0;
for( j=0; j<n; j++ ) {
if( *R < k ) {
++s;
}
if( j < n-1 ) R += m;
}
pr[i] = s;
}
}

James Tursa

From: Rune Allnor on 5 Mar 2010 17:07

On 5 Mar, 22:56, "James Tursa"
<aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:
> "James Tursa" <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote in message <hmrtu8$go...(a)fred.mathworks.com>...
> > Rune Allnor <all...(a)tele.ntnu.no> wrote in message <44ae8909-03ae-4219-a6b7-14d9329eb...(a)z35g2000yqd.googlegroups.com>...
> > > On 5 Mar, 20:13, "James Tursa"
> > > <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:
>
> > > > s = mxCalloc(m, sizeof(*s));
> > > > for( i=0; i<m; i++ ) {
> > > > pr[i] = s[i];
> > > > }
> > > > mxFree(s);
>
> > > Those calloc/free calls are expensive. What about
> > > letting s be a local scalar and move the result to
> > > pr[i] after treating each row?
>
> > > I wouldn't be surprised if you end up saving >> 50%
> > > by doing that.
>
> > > Rune
>
> > I can try that. My thought was to traverse the R array only once, hence the s allocation. If I use only one scalar then I have to traverse the R array m times, which I expect to be slower than allocating s, but I haven't actually tried it yet. I will give it a shot ...
>
> > James Tursa
>
> Here is the result, about 70% slower than my previous post using an allocated s. This is about what I would have expected given the multiple traverses of R involved. All that redundant memory access just kills the running times.

You traverse the rows? I didn't see that. I assumed
you traversed the columns. Traversing the columns one
could use the scalar local variable at the same time
there would be no need to traverse the array more than
once.

Rune

From: James Tursa on 5 Mar 2010 18:17

Rune Allnor <allnor(a)tele.ntnu.no> wrote in message <8f55e079-2b76-4694-842f-5f8d21da55af(a)q21g2000yqm.googlegroups.com>...
> On 5 Mar, 22:56, "James Tursa"
> <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:
> > "James Tursa" <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote in message <hmrtu8$go...(a)fred.mathworks.com>...
> > > Rune Allnor <all...(a)tele.ntnu.no> wrote in message <44ae8909-03ae-4219-a6b7-14d9329eb...(a)z35g2000yqd.googlegroups.com>...
> > > > On 5 Mar, 20:13, "James Tursa"
> > > > <aclassyguy_with_a_k_not_...(a)hotmail.com> wrote:
> >
> > > > > s = mxCalloc(m, sizeof(*s));
> > > > > for( i=0; i<m; i++ ) {
> > > > > pr[i] = s[i];
> > > > > }
> > > > > mxFree(s);
> >
> > > > Those calloc/free calls are expensive. What about
> > > > letting s be a local scalar and move the result to
> > > > pr[i] after treating each row?
> >
> > > > I wouldn't be surprised if you end up saving >> 50%
> > > > by doing that.
> >
> > > > Rune
> >
> > > I can try that. My thought was to traverse the R array only once, hence the s allocation. If I use only one scalar then I have to traverse the R array m times, which I expect to be slower than allocating s, but I haven't actually tried it yet. I will give it a shot ...
> >
> > > James Tursa
> >
> > Here is the result, about 70% slower than my previous post using an allocated s. This is about what I would have expected given the multiple traverses of R involved. All that redundant memory access just kills the running times.
>
> You traverse the rows? I didn't see that. I assumed
> you traversed the columns. Traversing the columns one
> could use the scalar local variable at the same time
> there would be no need to traverse the array more than
> once.
>
> Rune

Yes. That issue was brought up earlier by another poster. If OP could rearrange his data as the transpose, then one could do what you suggest.

James Tursa

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: spm_select
Next: MATLAB code speed