index of npermutek [Matlab]

Prev: Curve fitting MIMO grey-box model using pem and freq. data
Next: (web + urlread) empty page from google scholar

From: Juliette Salexa on 2 May 2010 19:33

Hello =)

I'm trying to calculate the (i,j)th element of npermutek(1:4,N) without calculating npermutek(1:4,N).

This is because npermutek(1:4,50) is too big to store, yet calculating the (i,j)th element is doable.

a=mod(ceil(i/4^j),4);
if a==0; a=4; end
disp(a)

If there is any clever way to speed up this algorithm, I would be very eager to learn it.

I'm particularly not happy with:
(1) the IF statement [i don't want matlab to have to go through the logic in the if statement a billions of times as I run this algorithm repetetively]

(2) ceil(i/4^j) since i/4 will in general be double precision even though ideally for a calculation like this I'd only be working with int(8) numbers . And although CEIL is built-in, calling CEIL might have more overhead than necessary

Any advice or suggestions would be *greatly* appreciated since this I'll be running this algorithm billions of times.

Thank you!

From: Juliette Salexa on 2 May 2010 19:45

I've considered the possibility to storing 4^j for all j, and just reading them, instead of recalculating them in each iteration, but the advantage of this varies depending on whether I'm using a for loop or using indexing:

====================
FOR LOOP
====================
for k=0:11; a(k+1)=4^k; end

tic; % USING STORED 4^j **FASTER**
for i=1:4^11; for j=0:10;
ceil(i/a(j+1)); end end toc;

tic; % CALCULATING 4^j EACH TIME **SLOWER**
for i=1:4^11; for j=0:10;
ceil(i/4^j); end end toc;

% Elapsed time is 166.944637 seconds.
% Elapsed time is 176.749439 seconds.
====================

In the above example, using stored 4^j values is faster,
In the below example, it's slower !!

====================
INDEXING
====================
h=1:4^13;
tic;m(:,1)=ceil(h(:)/4^1)+ceil(h(:)/4^2)+ceil(h(:)/4^11);toc;
tic;y(:,1)=ceil(h(:)/a(2))+ceil(h(:)/a(3))+ceil(h(:)/a(12));toc;
isequal(y,m) % YES

% Elapsed time is 12.303705 seconds.
% Elapsed time is 13.384584 seconds.

==================
I find it strange that recalculating 4^j each time is detrimental in the for loop, but beneficial in the indexed case ?? That's very strange, does anyone have any idea why that might be the case ?

From: Walter Roberson on 2 May 2010 20:35

Juliette Salexa wrote:

> tic;m(:,1)=ceil(h(:)/4^1)+ceil(h(:)/4^2)+ceil(h(:)/4^11);toc;

In order to time properly, each of the commands should be on a line by
itself, and the whole thing should be placed into a function, which you
should run several times in order to get minimum and average timings.

The Just In Time (JIT) compiler does not operate at all from the command
line or from scripts. This is by design.

The JIT also compiles things differently (usually less efficiently) if
you have multiple commands on the same line. I haver never seen an
explanation for why that happens.

From: Juliette Salexa on 3 May 2010 11:16

Thank you Walter,

I don't think I am understanding you though.
I didn't use the JIT compiler at all (at least I don't think I did, I don't even know what it is)

Also, are you saying that it would be faster to split the line m(:,1)=ceil(h(:)/4^1)+ceil(h(:)/4^2)+ceil(h(:)/4^11);
into three separate lines:

m(:,1)=ceil(h(:)/4^1)
m=m+ceil(h(:)/4^2)
m=m+ceil(h(:)/4^11)

??

Or that I should be doing something like:

c=h(:)/4^1; d=c/4 ; e=d/4^9;
c1=ceil(c), d2=ceil(d); e2=ceil(e);
m(:,1)=c1+c2+c3

I'm worried that splitting it this way would cause me to run out of memory (h has 4^11 elements. If I split the computation, I'm using triple the amount of space)

Is this what you meant by coding it line-by-line ?

From: Walter Roberson on 3 May 2010 13:00

Juliette Salexa wrote:

> I don't think I am understanding you though.
> I didn't use the JIT compiler at all (at least I don't think I did, I
> don't even know what it is)

The Just In Time compiler (JIT) is always active for functions (but not
scripts or the command line), unless you specifically turn it off. The
JIT parses the human-readable code and creates an in-memory version of
the code that is in an internal binary language. The JIT cross-analyzes
the code across multiple lines and tries to figure out the most
efficient way of handling the function as a whole. When it can deduce
enough about what is going on, it will make calls to the most specific
routine that it knows about to handle the work. If, however, the code is
too complex for it, or if it has not seen enough information to know
what the optimal routine would be, then instead it uses more general
(slower) internal routines.

As a small example: there are routines that are optimized for adding
together integer data-types (e.g., uint32), and there are routines that
are optimized for adding together double precision numbers. If the JIT
is able to deduce that two arrays being added are double precision, it
can directly call the double-precision routine, and if it can deduce
that they are particular kinds of integers, then it can directly call
the appropriate integer routine, but if the JIT does not have enough
information, it will have to use code that first has to figure out what
kinds of numbers are being added -- which will be slower than optimal.

As I mentioned, the JIT does not come into effect for things done at the
command line or in scripts: Matlab always uses the most general routine
in that case, and thus (barring bugs in the JIT), scripts and the
command line will fairly like be slower than the same code in functions.

> Also, are you saying that it would be faster to split the line
> m(:,1)=ceil(h(:)/4^1)+ceil(h(:)/4^2)+ceil(h(:)/4^11);
> into three separate lines:

You had a line that started with tic, then a semi-colon, then a command,
then a semi-colon, then a toc. Splitting that up into three lines,

tic
the command
toc

is better, because the JIT is known to have difficulty optimizing the
commands when there are several different commands on the same line.

| Next | Last
Pages: 1 2
Prev: Curve fitting MIMO grey-box model using pem and freq. data
Next: (web + urlread) empty page from google scholar