From: Juliette Salexa on
Hello,

I've recently been using Matt Fig's NPERMUTEK file quite a lot.
It's very useful and coded very well.

I noticed that for large arrays it becomes slow:
---------------------------------------------------
for i=10:13
tic;npermutek(1:4,i);toc;
end

% Elapsed time is 1.249711 seconds.
% Elapsed time is 2.566134 seconds.
% Elapsed time is 11.564763 seconds.
% Elapsed time is 72.920755 seconds. 7GB of RAM
---------------------------------------------------
I used the i=13 result for a larger computation which only took 119s including the call to npermutek, so npermutek.m is taking 60% of the time!

Since I'll be using my program >1000 times over the next few years it would be nice to reduce that percentage =)

One thing I noticed was:
---------------------------------------------------
for i=10:14
tic;npermutek(cast(1:4,'single'),i);toc;
end

%i=10:13
% Elapsed time is 0.818169 seconds.
% Elapsed time is 3.002963 seconds.
% Elapsed time is 11.828072 seconds.
% Elapsed time is 48.118251 seconds.
% Elapsed time is 288.286617 seconds.. >26GB of RAM
---------------------------------------------------

So using single precision instead of double precision speeds things up a lot.

Since 1:4 are all integers, I think using int8(1:4) instead of cast(1:4,'single')
would speed things up even more, but the problem is that

CUMSUM() does not work for integers.

(1) Is there an alternative to CUMSUM() that works for integers ?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Also, my code will use npermutek(1:4,i) for progressively larger i,

If I already have n13=npermutek(1:4,13) stored, The calculations of npermutek(1:4,13) should not be so costly if it already uses the information contained in n13 !

For example, I could do:
n13=npermutek(1:4,13);
a=kron(n13,ones(4,1)); b=repmat((1:4)',size(n13,1),1);
n14=horzcat(a,b);

in which I have used n13 to help me calculate n14.

This is of course slower than npermutek(1:4,14) because KRON is a very slow function ... it's designed to do a lot of complicated things in order to work for very general cases, while kron(A,ones(4,1)) is in theory a very simple task.

(2) Does anyone know if there's an alternative to KRON for this basic case where the second argument is ones(4,1) ?

(3) Or is there another way to calculate npermutek(1:4,i) iteratively in such a way that the array for the previous value of i is used in order to calculate the array for the next value of i ?

(1) Is there an alternative to CUMSUM() that works for integers ?

Any suggestions or discussions would be very greatly appreciated!
From: Matt Fig on
I submitted COMBINATOR (on the FEX) which includes a MEX file for working with non-(double/single) integers, and also handles three other combinatorial problems.

Jan Simon also wrote some pure MEX codes which may or may not handle other data types too, and may be faster. Look for them on the FEX.

Part of the problem is that you are working with large arrays.
size(npermutek(1:4,14)) is going to be 4^14 -by- 4. And you want to go bigger? If you are going to be using large results many times over a period of years, would it be beneficial to save them to disk and load as needed?
From: Juliette Salexa on
Thanks for the quick response!

Yes for now what I've been doing is loading saved arrays that were precalculated, but in some cases the array that is needed is something like npermutek([1 2 4 9],i) in which case I have to calculate the permutations again (and I'll have to keep calculating new permutations every time the first argument changes)

I just thought there might be a way to speed up npermutek for larger cases,
since I will often have N=npermutek(a,n) already calculated, and (I think) npermutek doesn't use that N to calculate npermutek(a,n+1), but rather, starts from scratch again.

If npermutek(a,17) was calculated by first calculating npermutek(a,14) and then iteratively building on npermutek(a,14), it might be faster than calculating npermutek(a,17) from scratch.

My idea using KRON uses N to calculate npermutek(a,n+1), but is slow because KRON isn't optimized for the case where the second argument is a vector of ones.

==

I will try the alternatives that you suggested though!

==

I was also wondering if there might be able to reduce the memory requirements here. This might be very farfetched, but npermutek(a,14) will have parts that look like:

1 1 1 1 1 1 1 1 1 1 1 | 1 1 1
1 1 1 1 1 1 1 1 1 1 1 | 1 1 2
1 1 1 1 1 1 1 1 1 1 1 | 1 1 3
1 1 1 1 1 1 1 1 1 1 1 | 1 1 4
1 1 1 1 1 1 1 1 1 1 1 | 1 2 1
.... |
1 1 1 1 1 1 1 1 1 1 1 | 4 4 4

where everything left of the vertical bar is the same.
Perhaps insetead of storing all 64 of those 1's, perhaps matlab could just represent that whole area as a block of 1's ? [I'm really not sure if MATLAB can do this]

Also, the double precision number 11111111111111 takes up less space than 14 different single precision representations of '1' , so perhaps each line could be stored as one double precision number instead of 14 single precision numbers ? In that way only 4^14 numbers need to be stored rather than 4^14 x 14 ?

Again, I'm not sure if matlab can do that though..

If anyone has any ideas I would be eager to learn more. Thanks!
From: Matt Fig on
You shouldn't have to recalculate every time the first arg changes!

T = npermutek([1 2],5); % The 'base' case.
B = [6 7]; % A different 'first' argument.
T2 = npermutek(B,5); % First argument changed.
% Now see if there is another way.
isequal(B(T),T2) % We can get T2 without calling npermutek again!
From: Juliette Salexa on
That's a good point!

Well, another reason why I wanted to be able to use npermutek(1:4,n) to calculate npermutek(1:4,n+1) is that, for example:

INDICES=npermutek(1:4,2)=
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4
3 1
...
4 4

*I will associate a scalar to each of those lines. (the scalar's specific value depends on the numbers on that line)*

Some of these numbers will end up being small enough that I ignore them, so I can remove those lines from INDICES and end up with something like:

indices2 =
1 1
1 3
2 1
2 2
2 4
3 1
...
4 4

And then in the next iteration, intead of assigning scalars to each line of npermutek(1:4,3), I can define them only on:

indices3 =
1 1 1
1 1 2
1 1 3
1 1 4

1 3 1
1 3 2
1 3 3
1 3 4

2 1 1
2 1 2
2 1 3
2 1 4

....
....
4 4 4

I'm guessing:
a=kron(indices2,ones(4,1)); b=repmat((1:4)',size(indices2,1),1);
indices3=horzcat(a,b);

is the only obvious way to generate indices3 from indices2, but the KRON function is slow because of the fact that its designed for more general arguments

But if there's a faster way than using KRON, for this slightly modified version of NPERMUTEK, a similar technique could probably be used to speed up NPERMUTEK itself, since npermutek(1:4,14) could now take advantage of knowing npermutek(1:4,13), and 12 and 11 and iteratively so on.

*not sure if what I said made any sense..