Getting indexes of rows of matrix with more than n repetitions [Matlab]

Prev: About Function Handle
Next: Rudimentary network emulator(RUNE)

From: Jan Simon on 6 Jan 2010 14:50

Dear Matt!

> > [foo,foo,foo,goo,goo]=f();
> I know you use this approach, as I do. I was asking why other folks prefer to create a dummy variable in the workspace.

The dummy is created with this also, but immediately overwritten!

If I have a giantic vector, which nearly fills my memory, and I want to sort it with:
[dummy, index] = sort(Array);
or
[index, index] = sort(Array);
in both cases the array is created (any different opinions?)!
In the case of cell strings, the sorted array contains shared data copied - fortuantely. But if I never need the sorted array, it would be nice to have a SORTIND, which replies just the index vector. Is anybody willing to publish a MEX wrapper for a quicksort???

That [index] and [index] have the same address is not really surprising. It is the same variable. You do not need a FORMAT DEBUG for that.

I failed: Matlab 6.5 has not a sort.c, but sortcellchar.c. This exists in Matlab 7.8 also, but is not documented. Lukily this solves my question for cell strings.
The MEX function sortrowsc.c is not an alternative, because it is 6 times slower than SORT if applied to a single column (why?!).

If I show "[index, index] = sort(Array)" to a Matlab beginner, I have to explain, that I assume, that the output arguments of a function are assigned from the left to the right.
If I show "[dummy, index] = sort(Array)", I do not have to explain anything.

Kind regards, Jan

BTW. Has the OP tested my try to solve the problem faster than UNIQUE/HISTC/ISMEMBC ?

From: us on 6 Jan 2010 16:01

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hi2kg4$b73$1(a)fred.mathworks.com>...
> "Matt Fig" <spamanon(a)yahoo.com> wrote in message <hi2im0$fbd$1(a)fred.mathworks.com>...
> > "us " <us(a)neurol.unizh.ch> wrote in message
> > > this has been shown many a times in CSSM...
> > > - i'll always use
> > > [foo,foo,foo,goo,goo]=f();
> > > approach
> >
> >
> > I know you use this approach, as I do. I was asking why other folks prefer to create a dummy variable in the workspace.
>
> Matt and us, I for once prefer the "dummy" approach (followed by a "clear" statement). It's just more readable to me (and I force myself to use different names for different variables). Any eloquence argument to convince me otherwise?
>
> Bruno

bruno - NO: here we go...
- i know: EVAL(!)... and nested TRY/CATCH...
- but YOU will easily understand...
- wintel sys ic2/2*2.6ghz/2gb/winxp.sp3.32/r2009b...

% create FOO.M
function foo(varargin)
% 1) subroutine GOO returns 2 vars in diff memory locations
% 2) fill FOO WS with 2 vars
% 3) if memory overflows...
% 4) clear all WS vars
% 5) fill FOO WS with 1 var at same memory address

nt=10^8; % <- #var to create...
if ~nargin
n=1000000; %#ok (r2009b)
else
n=varargin{1}; %#ok (r2009b)
end
% fill FOO WS with individual var A_xxx/B_xxx
try
for i=1:nt
com=sprintf('[a_%d,b_%d]=goo(n);',i,i);
eval(com);
end
catch %#ok
w=whos('b*'); % <- only count B_xxx
disp(sprintf('ERROR at index %5d %5d',i,numel(w)));
clear a* b*; % <- CLEAR VARS
% fill FOO WS with individual var B_xxx
try
for i=1:nt
com=sprintf('[b_%d,b_%d]=goo(n);',i,i);
eval(com);
end
catch %#ok
w=whos('b*'); % <- only count B_xxx
disp(sprintf('ERROR at index %5d %5d',i,numel(w)));
end
end
end
function [a,b]=goo(varargin) %#ok
a=zeros(1,varargin{1},'double');
b=a;
b(1)=b(1)+1; % <- allocate new memory...
end

% at the command prompt
clear all; % <- !!!!!
foo(100000)
%{
ERROR at index 214 213
ERROR at index 427 426
%}
foo(1000000)
%{
ERROR at index 17 16
ERROR at index 33 32
%}

hence, [DUMMY,VAR] is taxing the WS, whilst [VAR,VAR] is NOT (as much)...
just a thought...
urs

From: Jan Simon on 6 Jan 2010 16:07

Dear Bruno!

> Jan, how much do you estimate an inplace sorting (e.g., on large double array) would save time?

It is clear, that the creation of the sort index is the demanding part of SORT, while the copy of the input in sorted order is secondary - usually. Unfortuantely one of the 2 DIMM ports of my computer is damaged and I have to live with 512MB RAM. But saving temporarily used memory is always useful, even on a 16 GB machine. Nevertheless, timing matters for 8MB array already:

For the estimation of the speed gain:
x = rand(1e6, 1);
tic; y = sort(x); toc ==> 0.26 sec
tic; [y, s] = sort(x); toc ==> 0.40 sec
tic; y = x(s); toc ==> 0.14 sec

So I assume I could save 35% computing time.
I assume, that SORT sorts the values inplace and the sorting index is created simultaneously, if 2 outputs are used.
Sorting a INT16 array is much faster:
x = uint16(x * 32000);
tic; y = sort(x); toc ==> 0.16 sec
tic; [y, s] = sort(x); toc ==> 0.31 sec
Creating the sorting index needs additional 0.15 sec as in the DOUBLE case above.

If the replied index vector could be an UINT32 array, this would be even nicer:
ints = uint32(s);
tic; y = x(ints); toc ==> 0.09 sec (instead of 0.14 sec)

> I'm quite happy with the performance of matlab SORT, except I wish to have a sorting routine where the comparison operator can be customized - but I guess such feature is very inefficient in Matlab due to overhead.

SORT is really fine, you are right! But this is not a reason to avoid improving it.
If you would create a MEX, which replies the sorting index (perhaps in a chosable type), which uses the standard comparison or optionally a user-defined operator, which can compete in speed with single-output SORT -- *I* would download it! Promised.
I assume, calling a user-defined operator through mexCallMATLAB would be a great brake.

Kind regards, Jan

From: us on 6 Jan 2010 16:18

"Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <hi2phr$5nj$1(a)fred.mathworks.com>...
> Dear Matt!
>
> > > [foo,foo,foo,goo,goo]=f();
> > I know you use this approach, as I do. I was asking why other folks prefer to create a dummy variable in the workspace.

> If I have a giantic vector, which nearly fills my memory, and I want to sort it with:
> [dummy, index] = sort(Array);
> or
> [index, index] = sort(Array);
> in both cases the array is created (any different opinions?)!

yes, see my reply (including FOO test program) to bl...
the point is
- if INDEX|1,2 is very large within the function, it will fail...
- if INDEX|1,2 is large and adds to the callers WS, it will fail a bit later...

> That [index] and [index] have the same address is not really surprising. It is the same variable. You do not need a FORMAT DEBUG for that.

no, of course, but it is nice to convince some ML agnostics by sheer command window output...

> If I show "[index, index] = sort(Array)" to a Matlab beginner, I have to explain, that I assume, that the output arguments of a function are assigned from the left to the right.
> If I show "[dummy, index] = sort(Array)", I do not have to explain anything.

well... NO mercy on this one: i know you've (probably) tried to explain to a C-novice an inscrutable pointer-construct...
they just have to learn...

SO - old CSSMers will stick with the
[foo,foo,foo,goo,goo]=f();
syntax...

:-)
urs

From: Bruno Luong on 6 Jan 2010 16:25

But us, isn't the test unfair when the variable a_i is not properly cleared?

I add the "clear" command and both syntax fails at the same places (see the new foo below).

%%%%%%%

function foo(varargin)
% 1) subroutine GOO returns 2 vars in diff memory locations
% 2) fill FOO WS with 2 vars
% 3) if memory overflows...
% 4) clear all WS vars
% 5) fill FOO WS with 1 var at same memory address

nt=10^8; % <- #var to create...
if ~nargin
n=1000000; %#ok (r2009b)
else
n=varargin{1}; %#ok (r2009b)
end
% fill FOO WS with individual var A_xxx/B_xxx
try
for i=1:nt
com=sprintf('[a_%d,b_%d]=goo(n);',i,i);
eval(com);
% Add by Bruno
com=sprintf('clear a_%d',i);
eval(com);
end
catch %#ok
w=whos('b*'); % <- only count B_xxx
disp(sprintf('ERROR at index %5d %5d',i,numel(w)));
clear a* b*; % <- CLEAR VARS
% fill FOO WS with individual var B_xxx
try
for i=1:nt
com=sprintf('[b_%d,b_%d]=goo(n);',i,i);
eval(com);
end
catch %#ok
w=whos('b*'); % <- only count B_xxx
disp(sprintf('ERROR at index %5d %5d',i,numel(w)));
end
end
end
function [a,b]=goo(varargin) %#ok
a=zeros(1,varargin{1},'double');
b=a;
b(1)=b(1)+1; % <- allocate new memory...
end

% Command line
>> foo(1000000)
ERROR at index 176 175
ERROR at index 176 175
>> foo(10000000)
ERROR at index 16 15
ERROR at index 16 15
>>

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: About Function Handle
Next: Rudimentary network emulator(RUNE)