split one line computation without duplicating data [Matlab]

Prev: remove a specific frequency from emg signal
Next: remove a specific frequency from emg signal

From: Andy on 9 Aug 2010 16:43

Some food for thought:

a=rand(10000,1);

tic
b=reshape(a,100,100);
toc

clear b;
tic
b=reshape(a,100,[]);
toc

clear b;
tic
b=reshape(a,[],100);
toc

%{
displays:
Elapsed time is 0.000015 seconds.
Elapsed time is 0.000328 seconds.
Elapsed time is 0.000313 seconds.
%}

Implicit arguments to reshape take more than 20 times longer. Also:

a=rand(10000,1);

clear b;
tic
b=reshape(a,1,[]); % this is what you did
toc

tic
b=reshape(a,1,10000); % without implicit arguments
toc

clear b;
tic
b=a.'; % fastest and clearest
toc

%{
displays:
Elapsed time is 0.001034 seconds.
Elapsed time is 0.000019 seconds.
Elapsed time is 0.000012 seconds.
%}

Using the transpose operator is about 86 times faster.

From: Matt Fig on 9 Aug 2010 16:50

Walter Roberson <roberson(a)hushmail.com> wrote in message
> Each intermediate step in a computation creates an (unnamed) variable that is
> used for the computation purposes. If that variable is the result of the last
> step in the computation and it is being assigned to a plain variable, then any
> previous plain variable by that name is discarded and the name is associated
> with the header for that unnamed variable _without copying the data_. If the
> unnamed variable is not the last step in the computation, then at some point
> (probably before the next line) the unnamed variable is released.
>
> Thus, there is virtually no cost to creating a temporary variable of
> intermediate results and clear'ing that variable once it is no longer needed.
> Trivial extra memory is used: all that happens is that a name slot gets
> associated with the data block that is already there anyhow.
>
> The one disadvantage of this approach is that sometimes Matlab has fast
> routines for some more complex operations -- for example A*B+C can be done
> more quickly than T1 = A * B; T2 = T1 + C; clear T1

What about using the same variable name?

E = sum(reshape(A,size(A,1)/N^2,N^2),2);
E = bsxfun(@times,E.',I);
E = E(:);

From: Andy on 9 Aug 2010 17:01

"Matt Fig" <spamanon(a)yahoo.com> wrote in message <i3ppmd$art$1(a)fred.mathworks.com>...
> Walter Roberson <roberson(a)hushmail.com> wrote in message
> > Each intermediate step in a computation creates an (unnamed) variable that is
> > used for the computation purposes. If that variable is the result of the last
> > step in the computation and it is being assigned to a plain variable, then any
> > previous plain variable by that name is discarded and the name is associated
> > with the header for that unnamed variable _without copying the data_. If the
> > unnamed variable is not the last step in the computation, then at some point
> > (probably before the next line) the unnamed variable is released.
> >
> > Thus, there is virtually no cost to creating a temporary variable of
> > intermediate results and clear'ing that variable once it is no longer needed.
> > Trivial extra memory is used: all that happens is that a name slot gets
> > associated with the data block that is already there anyhow.
> >
> > The one disadvantage of this approach is that sometimes Matlab has fast
> > routines for some more complex operations -- for example A*B+C can be done
> > more quickly than T1 = A * B; T2 = T1 + C; clear T1
>
> What about using the same variable name?
>
> E = sum(reshape(A,size(A,1)/N^2,N^2),2);
> E = bsxfun(@times,E.',I);
> E = E(:);

Note: it was mentioned that this line is being called many times in a loop. Assuming A is not changing size on each iteration, call size(A,1) once outside the loop to save a little more time.

From: Matt Fig on 9 Aug 2010 17:08

"Andy " <myfakeemailaddress(a)gmail.com> wrote in message
> Note: it was mentioned that this line is being called many times in a loop. Assuming A is not changing size on each iteration, call size(A,1) once outside the loop to save a little more time.

Ah yes, I meant that as a place holder only. If A is always the same size, these values should be put in directly instead of either calculating them explicitly as I did, or using [] as the OP did.

From: Juliette Salexa on 9 Aug 2010 20:25

Walter Roberson <roberson(a)hushmail.com> wrote in message
> Could you confirm that that size() you gave is N^12 and not N^2 ?

Yes. That's not a typo.

> That you
> are, in other words, expecting that innermost reshape to result in a matrix
> which is N^10 by N^2 and that summing along the second dimension would thus
> result in a column vector of N^10 elements? Which you then flip over to be a
> row vector 1 x N^10 and use bsxfun to produce the product of all pairs of that
> row vector together with the column vector I which is N^12 x 1, thus producing
> a matrix which is N^12 by N^10 ?

I'm VERY sorry, I made a crucial mistake in my description.
size(I) = [N^2, N^10]

so the size of A never changes. The summation makes it a factor of N^2 smaller, then the bsxfun returns it to its original size. This happens repeatedly.

> In your "for" loop, what is varying? I'm wondering about the possibility of
> pre-calculating some of the results. Also, is that exact output order
> important? If not, then it might be possible to eliminate one of two of the
> transposes.

The forloop looks like:

for i=1:100
A=reshape(bsxfun(@times,reshape(sum(reshape(A,[],N^2),2),1,[]),I),[],1);
B(i)=summation of certain elements of A
end

I'm not exactly sure what can be precalculated.
Perhaps powers of I ?
I'm not sure if that helps, since each power of I has to be independently multiplied by the summed values of A anyway.

As for the transposes, Doug Schwarz once told me on this newsgroup:

"Reshape uses negligible time. All it does is change the array header that stores how many rows and columns there are. It doesn't move any data. It is always very fast no
matter how large the argument. Always. (Did I emphasize that enough?)"

so I'm not sure how much time could be saved by eliminating transposes.

==================

"Sean " <sean.dewolski(a)nospamplease.umit.maine.edu> wrote in message

> What size does this line result in?
>>reshape(sum(reshape(A,[],N^2),2),1,[])
> I've been trying it and ending up with scalars which removes the need for bsxfun > >entirely. I may be (am probably ;-) ) creating bad data though. Can you produce a >small amount of sample data?

It should just be the original size of A, divided by N^2 (and transposed)

If you try: reshape(sum(reshape(ones(256,1),[],4),2),1,[])
you'll get a 1,64 vector (not a scalar)

==================
Walter Roberson <roberson(a)hushmail.com> wrote in message

> for example A*B+C can be done
> more quickly than T1 = A * B; T2 = T1 + C; clear T1

if I have 32GB of memory, and
If A,B, and C are 10GB matrices (30GB is being used and I don't have space for a 4th 10GB array)

I would hope that doing
A=A*B+C

would not create any new arrays taking up memory. The 10GB worth of data in A should just get modified, twice.

Doing T1 = A * B; T2 = T1 + C; clear T1 would be impossible because although T1 is meant to be cleared very soon, the computer ran out of memory before it had a chance to be cleared.

Are you saying that at each step of:

A=reshape(bsxfun(@times,reshape(sum(reshape(A,[],N^2),2),1,[]),I),[],1);

a temporary variable is created ?

I think you're right.
Although I'm convinced that I remember seeing a blog post by Loren Shure which described how to avoid this, but for some reason I can't find it anymore =(

I'm hoping there's a way to avoid that temporary variable because theoretically it's not necessary (if the data in A is just modified rather than keeping both A and the result), and due to the size of A and I, my computer will run out of memory if an unnecessary temporary variable is created.
=====================
"Andy " <myfakeemailaddress(a)gmail.com> wrote in message

>Using the transpose operator is about 86 times faster.

But the bsxfun and sum take in total about 5000 seconds, so I don't think using [] verses precalculating will make a difference.

And I was recommended reshape vs transpose because reshape doesn't move data, while I've been told that the transpose is a nightmare for large matrices since matlab likes to store things column-wise (but I don't know this stuff as well as I wish I did!)

====
thanks again everyone that replied,
I'll keep looking for that blog post which I'm convinced I've seen before.

First | Prev | Next | Last
Pages: 1 2 3
Prev: remove a specific frequency from emg signal
Next: remove a specific frequency from emg signal