MATLAB indexing issue (was Re: Use of MATLAB fftshift) [DSP]

Prev: why fftshift() is useful.
Next: Indexing base sanity check - (was ... which [was ... which impliedref to ... ;/

From: Rune Allnor on 27 Jun 2010 14:16

On 27 Jun, 20:06, robert bristow-johnson <r...(a)audioimagination.com>
wrote:
> On Jun 27, 8:09 am, Rune Allnor <all...(a)tele.ntnu.no> wrote:
>
> > On 27 Jun, 07:56, eric.jacob...(a)ieee.org (Eric Jacobsen) wrote:
>
> > > IMHO the most plausible explanation for why this has never been
> > > addressed is that a conscious decision has been made within the
> > > MathWorks that Matlab should not have flexible or zero-based indexing
> > > capability.
>
> > The reason why it has never been addressed is that MATLAB is
> > an acronym for MATrix LABoratory. In linear algebra the numerical
> > arrays are - like it or not - indexed base 1.
>
> whether it's 0 or 1 is not the issue. whether the user can define it
> for a particular array *is* the issue.

In that case, why can't you just use the OO capabilities of
matlab and declare your own array class, say, RBJarray, and
overload the () operator?

Ought to be a piece of cake, provided matlab's OO is half decent.

Rune

From: Walter Roberson on 4 Jul 2010 12:42

robert bristow-johnson wrote:
> On Jun 27, 8:09 am, Rune Allnor <all...(a)tele.ntnu.no> wrote:

>> Changing this would undermine nearly 30 years worth of code base.

> no it wouldn't. it could be perfectly backward compatible, because
> newly-created arrays would have default origin of 1 for every
> dimension of the array. the user would have to call a not-yet-
> existing function to change the origin.

Robert, your enthusiasm for this idea has led you to neglect thinking
the problems through.

Suppose I have a mex file written to the current API. One or more of the
arguments to the function are array indices. Now introduce your proposed
new indexing scheme and have the user create such an array and pass it
in to the existing mex file.

Your claim that the proposed change "could be perfectly backwards
compatible" implies that the indices the user passes in must not rely on
the new indexing scheme, because "perfectly backwards compatible" means
that old code must work UNCHANGED. Now *without changing the API for
existing routines*, how is Matlab going to know if a (say) 5 being
passed in as a numeric value is already adjusted to be 1-relative or
needs to be silently re-biased by Matlab to the appropriate basis in
order to preserve backwards compatibility? What if the dimension to be
indexed is itself is a parameter so that the unbiasing that needs to
take place is not constant? What if the dimension number is not an
_obvious_ parameter, such as if you had some encoding such as the mesh
encoding that mixes control values and data values in the same array?
What if the indices that have to be rebiased have been packed, such as
two 8-bit indices numerically jammed into a 16 bit number -- how is
Matlab going to know the jamming algorithm to know how to rebias and
construct the appropriate new index?

As long as indices are computable data then in order to support
different index biases you MUST break backwards compatibility, in that
the existing code would have to be enhanced to know about and take into
account the new indexing scheme for any parameter that is not provably
an old-style matrix.

From: robert bristow-johnson on 4 Jul 2010 20:43

On Jul 4, 12:42 pm, Walter Roberson <rober...(a)hushmail.com> wrote:
> robert bristow-johnson wrote:
> > On Jun 27, 8:09 am, Rune Allnor <all...(a)tele.ntnu.no> wrote:
> >> Changing this would undermine nearly 30 years worth of code base.
> >
> > no it wouldn't. it could be perfectly backward compatible, because
> > newly-created arrays would have default origin of 1 for every
> > dimension of the array. the user would have to call a not-yet-
> > existing function to change the origin.
>
> Robert, your enthusiasm for this idea has led you to neglect thinking
> the problems through.

that might be a premature judgment. you don't know how much i have
thought this through for more than a decade. if you can get Google
Groups to search adequately, you might find the places where the
objections (similar to yours) were brought up and i swatted them
down. even Cleve eventually admitted that it was, from the strict
definition of the term, backward compatible.

> Suppose I have a mex file written to the current API.

as long as no one applies your MEX function (written under the old
assumptions) to any array with any origin not 1, there would be no
problem. still backward compatible. no one's code breaks.

if this extension or enhancement to MATLAB were adopted and you wanted
your .mex function to work with arrays of origin different than 1, you
would have to modify the .mex function to look for the origins (using
the new API). otherwise your function would work as if all of the
origins were 1 when they may not be.

it's no different than with any other extension or enhancement to a
language. the issue you brought up is not a violation of backward
compatibility. old code (with old .mex files) would still work the
same way they did before.

r b-j

From: Steven Lord on 6 Jul 2010 15:38

"robert bristow-johnson" <rbj(a)audioimagination.com> wrote in message
news:2f984955-7c2c-4414-8b5b-5843e314a8d7(a)b35g2000yqi.googlegroups.com...
> On Jul 4, 12:42 pm, Walter Roberson <rober...(a)hushmail.com> wrote:
> > robert bristow-johnson wrote:
> > > On Jun 27, 8:09 am, Rune Allnor <all...(a)tele.ntnu.no> wrote:
> > >> Changing this would undermine nearly 30 years worth of code base.
> > >
> > > no it wouldn't. it could be perfectly backward compatible, because
> > > newly-created arrays would have default origin of 1 for every
> > > dimension of the array. the user would have to call a not-yet-
> > > existing function to change the origin.
> >
> > Robert, your enthusiasm for this idea has led you to neglect thinking
> > the problems through.
>
> that might be a premature judgment. you don't know how much i have
> thought this through for more than a decade. if you can get Google
> Groups to search adequately, you might find the places where the
> objections (similar to yours) were brought up and i swatted them
> down. even Cleve eventually admitted that it was, from the strict
> definition of the term, backward compatible.
>
> > Suppose I have a mex file written to the current API.
>
> as long as no one applies your MEX function (written under the old
> assumptions) to any array with any origin not 1, there would be no
> problem. still backward compatible. no one's code breaks.

Except for the person who DOES apply an older MEX function to a non-1 based
array. People often need/want to reuse their old code, and I've found
people tend to get a wee bit upset if you tell them "Sorry, don't do that."
unless there's a major benefit they can immediately see (and sometimes even
then.) While some (including yourself) may see 0-based indexing as such a
major benefit, there are also many who would not.

> if this extension or enhancement to MATLAB were adopted and you wanted
> your .mex function to work with arrays of origin different than 1, you
> would have to modify the .mex function to look for the origins (using
> the new API). otherwise your function would work as if all of the
> origins were 1 when they may not be.

And that is an incompatibility. Let's take a look at a few others.

1)
Let's say that a user had a function that needs to loop over the columns of
their matrix. This is not a contrived example; I've seen plenty of code
that does something similar.

for whichcolumn = 1:size(A, 2)
% process column A(:, whichcolumn)
end

If A is a 1-based array, then everything works as it did before. If A was a
0-based array, then this would error with an error message like "Index
exceeds matrix dimension" or something similar. Code that used to work no
longer works. That's an incompatibility.

Even worse, let's say the user's code, for whatever reason, only needed to
process all but the final column of A.

for whichcolumn = 1:size(A, 2)-1
% process column A(:, whichcolumn)
end

Now this code can SILENTLY GIVE THE WRONG ANSWER if A is a 0-based array, as
it will process all but the _first_ column of A.

2)
What happens if I SAVE a 0-based array in a MAT-file and LOAD it in a
version prior to the introduction of this change? Would you expect that to
work?

3)
Suppose A is a 0-based array and B is a 1-based array. What happens if I
ask for A+B? A.*B? [Assume all the sizes match; just consider the
based-ness for this scenario.]

I'm guessing you're going to say that A+B and A.*B both error; operating
under that assumption, let's take a look at one more example. What should
the command pi+A do? Does the pi built-in function return a 0-based array
or a 1-based array? Seems to me that users would expect pi+A to add pi to
each element of A -- but for backwards compatibility, and to agree with your
own proposal from earlier in this thread, PI would _have_ to return a
1-based array and so pi+A would ERROR. If you say that PI should be "smart
enough" to know with what it's going to be combined and return the
appropriate-indexed scalar:

x = pi;
y = A+x;
z = B+x;

_One_ of the latter two operations must error unless scalars are treated as
a special case. If they are treated specially, x = 1:10; will encounter the
same problem assuming A and B are both 10-element vectors, since x must have
a specified base when it is created.

> it's no different than with any other extension or enhancement to a
> language. the issue you brought up is not a violation of backward
> compatibility. old code (with old .mex files) would still work the
> same way they did before.

Robert, when you've brought this type of system up in the past I've
seriously thought about how it could work, but based on the potential
problems I called out above, speaking personally I do not think it would be
a good idea to modify MATLAB indexing to use your proposed system.

I think your best option would be to create your own 0-based (or
variable-based) object and overload those functions with which you want to
work as well as subscripted indexing and assignment -- that way you can
control the behavior of indexing for your object. You can even make use of
the built-in functions (rather than having to reimplement them) by using the
BUILTIN function to call them on the 1-based data that's stored inside the
array as a private data member, and adjust the indices afterward (if
necessary.) Indeed, Nabeel posted an object to do just this to the File
Exchange several years ago:

http://www.mathworks.com/matlabcentral/fileexchange/1168-varbase

If you wanted to write a classdef-based version of this object then the
Object-Oriented Programming section of the documentation will contain the
information you'll need to become familiar with this style of object. In
particular, since indexing will be a major component of this object, the
following page will be of special interest:

http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_oop/br09eqz.html

--
Steve Lord
slord(a)mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

From: robert bristow-johnson on 6 Jul 2010 18:19

On Jul 6, 3:38 pm, "Steven Lord" <sl...(a)mathworks.com> wrote:
> "robert bristow-johnson" <r...(a)audioimagination.com> wrote in message
> news:2f984955-7c2c-4414-8b5b-5843e314a8d7(a)b35g2000yqi.googlegroups.com...
>
>
>
> > On Jul 4, 12:42 pm, Walter Roberson <rober...(a)hushmail.com> wrote:
> > > robert bristow-johnson wrote:
> > > > On Jun 27, 8:09 am, Rune Allnor <all...(a)tele.ntnu.no> wrote:
> > > >> Changing this would undermine nearly 30 years worth of code base.
>
> > > > no it wouldn't. it could be perfectly backward compatible, because
> > > > newly-created arrays would have default origin of 1 for every
> > > > dimension of the array. the user would have to call a not-yet-
> > > > existing function to change the origin.
>
> > > Robert, your enthusiasm for this idea has led you to neglect thinking
> > > the problems through.
>
> > that might be a premature judgment. you don't know how much i have
> > thought this through for more than a decade. if you can get Google
> > Groups to search adequately, you might find the places where the
> > objections (similar to yours) were brought up and i swatted them
> > down. even Cleve eventually admitted that it was, from the strict
> > definition of the term, backward compatible.
>
> > > Suppose I have a mex file written to the current API.
>
> > as long as no one applies your MEX function (written under the old
> > assumptions) to any array with any origin not 1, there would be no
> > problem. still backward compatible. no one's code breaks.
>
> Except for the person who DOES apply an older MEX function to a non-1 based
> array.

the definition of "backward compatible" that i am using is (from
Wikipedia): "a product or a technology is said to be backward
compatible when it is able to fully take the place of an older
product... Backward compatibility is a relationship between two
components, rather than being an attribute of just one of them. More
generally, a new component is said to be backward compatible if it
provides all of the functionality of the old component."

what i am and had been proposing for a decade is backward compatible
in that meaning. if people try to use the "new feature" (non-1 based
arrays) and misuse them and get error messages, that does not mean
that it fails backward compatibility.

if a MEX function was unaware of this (and old ones *would* be
unaware), it would treat any array argument as if it had origin 1 for
every dimension. it would be wrong, but that would be a misuse (to
use a function never written to deal with other origins on a non-1
based array). and nothing would blow up.

> Robert, when you've brought this type of system up in the past I've
> seriously thought about how it could work, but based on the potential
> problems I called out above, speaking personally I do not think it would be
> a good idea to modify MATLAB indexing to use your proposed system.
>
> I think your best option would be to create your own 0-based (or
> variable-based) object and overload those functions with which you want to
> work as well as subscripted indexing and assignment -- that way you can
> control the behavior of indexing for your object.

here's the deal: i am a DSP person, not an OOP person. MATLAB has
marketed itself (falsely) making claims (in the v4 and v5 user
manuals) as:

"MATLAB integrates numerical analysis, matrix computation, signal
processing, and graphics in an easy-to-use environment where problems
and
solutions are expressed just as they are written mathematically - ...
"

"MATLAB is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-
use
environment where problems and solutions are expressed in familiar
mathematical notation."

note the phrases in claims: "familiar mathematical notation" and "just
as
they are written mathematically". i submit that those claims are
false in
a sense that is salient particularly for those of use who use MATLAB
for
(digital) signal processing. and i suspect the claims are false for
some
other users that also deal with data that are naturally sequenced with
subscripts or indices that are non-positive.

now, what i want is for MATLAB to live up to that. MATLAB is not C++
where, if i complain that C doesn't give me a complex variable type,
the first response from someone is that i should use C++ and a library
with a complex class. MATLAB *does* have an OOP capability and with
those two subscript calls (i think they're called subsref() and
subsasgn()) to make it possible to insert a shim that intercepts the
indices. so, what you say Steven is true, a class can be done, but
what i need to use MATLAB (or Octave) for is to do DSP. i should not
have to become an OOPs programmer just to do basic DSP with equations
that are recognizable in the DSP lit.

i remember that Thomas Krauss once told me (a decade ago) that he
brought this up early before the Sig Proc Toolbox (the very first
MATLAB toolbox) was completed and released. this is really when you
guys should have fixed the problem. you have forced people to adopt
non-standard non-natural indexing for the problems they work on.
besides the FFT (where it should be obvious) i think TMW blew it with
the definitions of polyval() and related functions like polyfit().
they should have used 0-based arrays of coefficients with a(n) being
the coefficient for x^n. similarly TMW also screwed up the
definitions of conv() and related functions. when polynomials are
multiplied to each other, we know it's like their coefficient
sequences are convolved and the FFT can be used to convolve large
coefficient sequences. but the order of coefficients (and the index
values associated) are just plain wrong in MATLAB. besides missing
the proper 0-based counting, TMW put the order wrong.

> You can even make use of
> the built-in functions (rather than having to reimplement them) by using the
> BUILTIN function to call them on the 1-based data that's stored inside the
> array as a private data member, and adjust the indices afterward (if
> necessary.)

if we were to do it that way, what the object converter (i would call
it "rebase" or "reorigin") should do, is check to see if the origins
for all dimensions in the output array are all 1, then it should
return a regular MATLAB matrix variable and not one of this "rebase"
type) so that all of the existing MATLAB functions can work on this 1-
based array.

> Indeed, Nabeel posted an object to do just this to the File
> Exchange several years ago:

i remember. and i remember i was unable to use it (because much more
needed to be done).

> http://www.mathworks.com/matlabcentral/fileexchange/1168-varbase
>
> If you wanted to write a classdef-based version of this object then the
> Object-Oriented Programming section of the documentation will contain the
> information you'll need to become familiar with this style of object. In
> particular, since indexing will be a major component of this object, the
> following page will be of special interest:
>
> http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_oop/br09...

again, you're requiring me to be an OOPs programmer when what i need
MATLAB for is to do signal processing. this is something *you*guys*
should fix. i would be happy to help in specification (and i had and
i'll dig up the old text) but this is a deficit in MATLAB and it's
sorta non-responsible for TMW to require users to fix such deficits in
their product.

that varbase class should be renamed "reorigin" and, to begin with,
the following operators should be overloaded according to the
following spec, and if a resulting "reorigin" object happens to have
only 1 for the origins of every dimension, then a regular 1-based
MATLAB variable should be returned. i know it's a little bit to read,
but i would ask that you read it. to make that class work, this is
the minimum that is needed to make it work usefully and correctly (and
backward compatible). i am, of course, guessing at what the C-like
date structure is for a MATLAB variable, but whatever it is, the
pointer to the origin[] vector should be appended to the end or use an
existing *unused* field. then it would be backward compatible with
MEX files.

r b-j

________________________________________________

enum MATLAB_class {text, real, complex};
// I don't wanna cloud the issue considering other classes.

typedef struct
{
void* data; // pointer to actual array data
char* name; // pointer to the variable's name
enum MATLAB_class type; // class of MATLAB variable (real,
complex,...)
int num_dimensions; // number of array dimensions >= 2
long* size; // points to a vector with the number of rows,
columns, etc.
} MATLAB_variable;

char name[32]; // suppose MATLAB names are unique to 31 chars
long size[num_dimensions];

if (type == text)
{
char data[size[0]*size[1]*...*size[num_dimensions-1]];
}
else if (type == real)
{
double data[size[0]*size[1]*...*size[num_dimensions-1];
}
else if (type == complex)
{
double data[size[0]*size[1]*...*size[num_dimensions-1][2];
}

The above is sorta C-like pseudocode. I'm writing it as if the
declarations
can allocated like malloc() does.

Currently, when an element, A(n,k), of a 2 dimensional MATLAB array A
is
accessed, first n and k are confirmed to be integer value (not a
problem
in C), then confirmed to be at least 1 and less than or equal to
size[0]
and size[1], respectively. It those constraints are satisfied, the
value
of that element is accessed as:

data[(n-1)*size[0] + (k-1)];

For a 3 dimensional array, A(m,n,k), it would be the same but now:

data[((m-1)*size[1] + (n-1))*size[0] + (k-1)];

What is proposed is to first add a new member to the MATLAB variable
structure called "origin" which is a vector of the very same length
(num_dimensions) as the "size" vector. The default value for all
elements
of the origin[] vector would be 1 with only the exceptions outlined
below.
This is what makes this backwards compatible, in the strictest sense
of the
term.

typedef struct
{
void* data;
char* name;
enum MATLAB_class type;
int num_dimensions;
long* size;
long* origin; // points to a vector with index origin for each
dimension
} MATLAB_variable;

char name[32];
long size[num_dimensions];
long origin[num_dimensions];

Now before each index is used, it is checked against the bounds for
that
dimension ( origin[dim] <= index < size[dim]+origin[dim] where
0 <= dim < num_dimensions), Since the default for origin[dim] is 1,
this
will have no effect, save for the teeny amount of processing time need
to
look up the origin, on existing MATLAB legacy code.

So to access a single element of an array A, this C array index would
look like:

data[(n-origin[1])*size[0] + (k-origin[0])];

For a 3 dimensional array, A(m,n,k), it would look like:

data[((m-origin[2])*size[1] + (n-origin[1])*size[0] + (k-origin[0])];

Okay, how someone like myself would use this to do something different
is
that there would be at least two new MATLAB facilities similar to
size() and
reshape() that I might call "origin()" and "reorigin()",
respectively. Just
like MATLAB size() function returns the contents of the size[] vector,
origin() would return, in MATLAB format, the contents of the origin[]
vector.
And just like reshape() changes (under proper conditions) the contents
of
the size[] vector, reorigin() would change the contents of the
origin[] vector.
Since reorigin() does not exist in legacy MATLAB code (oh, I suppose
someone
could have created a function named that, but that's a naming problem
that
need not be considered here), then there is no way for existing MATLAB
programs to change the origins from their default values of 1 making
this
fix perfectly backward compatible.

Now, just as there are dimension compatibility rules that exist now
for
MATLAB operations, there would be a few natural rules that would be
added so
that "reorigined" MATLAB arrays could have operations applied to them
in a
sensible way.

ARRAY ADDITION ("+") and SUBTRACTION ("-") and element-by-element
ARRAY
MULTIPLICATION (".*"), DIVISION ("./"), POWER (".^"), and ELEMENTARY
FUNCTIONS:

Currently MATLAB insists that the number of dimensions are equal and
the
size of each dimension are equal (that is the same "shape") before
adding or
subtracting matrices or arrays. The one exception to that is adding a
scaler to an array, in which a hypothetical array of equal size and
shape
with all elements equal to the scaler, is added to the array. The
resulting
array has the same size and shape as the argument arrays.

The proposed system would, of course, continue this constraint and add
a new
constraint in that index origins for each dimension (the origin[]
vector)
would have to be equal for two arrays to be added or subtracted or
element-
by-element multiplied, etc. The resulting array would have the same
shape
and origin[] vector as the input arrays.

"MATRIX" MULTIPLICATION ("*"):

A = B*C;

Currently MATLAB appropriately insists that the number of columns of B
are
equal to the number of rows of C (we shall call that number K). The
resulting array has the number of rows of B and the number of columns
of C.
The value of a particular element of A would be:

K
A(m,n) = SUM{ B(m,k) * C(k,n) }
k=1

The proposed system would, of course, continue this constraint and add
a new
constraint in that index origins must be equal for each dimension
where the
lengths must be equal. That is the number of columns of B are equal
to the
number of rows of C and the origin index of the columns of B are equal
to the
origin index of the rows of C. The resulting array has the number of
rows of
B and the number of columns of C and the origin index of the rows of
B and
the origin index of the columns of C. The value of a particular
element of A
would be:

org-1+K
A(m,n) = SUM{ B(m,k) * C(k,n) }
k=org

where org = origin[0] for the B array and origin[1] for the C array
which must be the same number. In the same manner as the present
MATLAB
K would be size[0] for the B array and size[1] for the C array and
would
have to be the same number, otherwize a diagnostic error would be
returned.

Both of these definitions are degenerations of the more general case
where:

+inf
A(m,n) = SUM{ B(m,k) * C(k,n) }
k=-inf

where here you consider B and C to be zero-extended to infinity in all
four
directions (up, down, left, and right). It's just that the zero
element
pairs do not have to be multiplied and summed.

Matrix powers and exponentials (on square matrices) can be defined
to be consistent with this extension of the matrix multiply.

MATRIX DIVISION ("/" and "\"):

Like the current matrix division, it would invert the operation of

A = B*C

that is

C = B\A

and

B = A/C

The same size (or shape) requirements and index origin requirements of
the "*" operator would apply to "\" and "/". Given A and B, whatever
shape and origin requirement for B and C and whatever shape and origin
returned in A in the statement:

A = B*C;

would be the same requirements for B and A and define the resulting
shape and origin for C in the statement:

C = B\A

CONCATINATION:

This would also be a simple and straight-forward extension of how
MATLAB
presently concatinates arrays. When we say:

A = [B C];

The number of rows of B and C must be equal, but the number of columns
of B
and C can be anything. The first columns of A are identical with the
columns of B and then also must the indices of those columns. And
independent of what the column indices of C are, they just pick up
where the
column index of B left off. This rule extension defaults to what
MATLAB
presently does if B and C are both arrays with origin 1. A similar
rule
extension can be made for A = [B ; C]; In all cases the upper left
corner
of A is identical to the upper left corner of B, both in value but
also in
subscripts (so A(1,1) becomes B(1,1) just like it does now in MATLAB).

FUNCTIONS THAT RETURN INDICES (min(), max(), find(), sort(),
ind2sub(), and
any others that I don't know about):

It must be internally consistent (and certainly can be made to be).
The
indices returned would be exactly like the 1-based indices returned
presently in MATLAB except that the origin for the corresponding
dimension
(that defaults to 1) would be added to each C-like index. That is,
just
like now in MATLAB:

[max_value, max_index] = max(A);

This must mean that A(max_index) is equal to max_value.

I think that this is easy enough to define. The only hard part is to
identify all MATLAB functions that search through an array and modify
them
to start and end at indices that might be different than 1 and
size[dim] as
are the search bounds today. It would instead search from origin[dim]
to
size[dim]+origin[dim]-1 which would default to the current operation
if the
origin equals 1.

FOR ALL OTHER MATLAB OPERATIONS, until a reasonable extended
definition for
arrays with origins not 1 is thought up, MATLAB could either bomb out
with
an illegal operation error if the base is not 1 or could, perhaps,
ignore
the origin. Either way, it's still backwards compatible.

| Next | Last
Pages: 1 2
Prev: why fftshift() is useful.
Next: Indexing base sanity check - (was ... which [was ... which impliedref to ... ;/