MATLAB indexing issue [Matlab]

Prev: Possible to store results as a matrix?
Next: Saving/Loading The Workspace From a Structure Within .mat

From: robert bristow-johnson on 7 Jul 2010 13:20

because i copied some of this stuff from a decade old post, and
evidently, Google Groups has changed the number of characters per line
and we cannot control word wrapping in our posts in Google Groups, i
have edited the line endings (and we'll let the lines wrap
naturally). this should be much more readable than my previous post.

Steven Lord and others at TMW, can you *please* read this carefully,
with enough of an open mind, and *then* respond with questions, legit
criticisms, suggestions and fixes. and please be straight with us
about the definition of "backward compatible". if *old* code works
with this, (without making use of any calls to "reorigin()" or
"rebase()" or whatever it would be named), it *is* "backward
compatible", no?

________________________________________________________________

(here are the quotes from old MATLAB manuals that *should* be the
uncompromized vision of the use of the product. just because TMW no
longer prints these exact statements in their current manuals does not
remove that vision for what MATLAB should be.)

"MATLAB integrates numerical analysis, matrix computation, signal
processing, and graphics in an easy-to-use environment where problems
and solutions are expressed just as they are written
mathematically..."

"MATLAB is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-
use environment where problems and solutions are expressed in familiar
mathematical notation."

note the phrases in claims: "familiar mathematical notation" and "just
as they are written mathematically". i submit that those claims are
false in a sense that is salient particularly for those of use who use
MATLAB for (digital) signal processing. and i suspect the claims are
false for some other users that also deal with data that are naturally
sequenced with subscripts or indices that are non-positive.

________________________________________________________________

(this is the foundation of the solution to this indexing problem that
i am asking TMW people to read carefully and not reject due to red
herrings.)

enum MATLAB_class {text, real, complex};
// I don't wanna cloud the issue considering other classes.

typedef struct
{
void* data; // pointer to actual array data
char* name; // pointer to the variable's name
enum MATLAB_class type; // class of MATLAB variable (real, etc.
int num_dimensions; // number of array dimensions >= 2
long* size; // vector with the number of rows, columns, etc.
} MATLAB_variable;

char name[32]; // suppose MATLAB names are unique to 31 chars
long size[num_dimensions];

if (type == text)
{
char data[size[0]*size[1]*...*size[num_dimensions-1]];
}
else if (type == real)
{
double data[size[0]*size[1]*...*size[num_dimensions-1];
}
else if (type == complex)
{
double data[size[0]*size[1]*...*size[num_dimensions-1][2];
}

The above is sorta C-like pseudocode. I'm writing it as if the
declarations can allocated like malloc() does.

Currently, when an element, A(n,k), of a 2-dimensional MATLAB array A
is accessed, first n and k are confirmed to be integer value (not a
problem in C), then confirmed to be at least 1 and less than or equal
to size[0] and size[1], respectively. If those constraints are
satisfied, the value of that element is accessed as:

data[(n-1)*size[0] + (k-1)];

For a 3 dimensional array, A(m,n,k), it would be the same but now:

data[((m-1)*size[1] + (n-1))*size[0] + (k-1)];

What is proposed is to first add a new member to the MATLAB variable
structure called "origin" which is a vector of the very same length
(num_dimensions) as the "size" vector. The default value for all
elements of the origin[] vector would be 1 with only the exceptions
outlined below. This is what makes this backwards compatible, in the
strictest sense of the term.

typedef struct
{
void* data;
char* name;
enum MATLAB_class type;
int num_dimensions;
long* size;
long* origin; // points to a vector with origin for each dimension
} MATLAB_variable;

char name[32];
long size[num_dimensions];
long origin[num_dimensions];

Now before each index is used, it is checked against the bounds for
that dimension ( origin[dim] <= index < size[dim]+origin[dim] where 0
<= dim < num_dimensions). Since the default for origin[dim] is 1,
this will have no effect, save for the teeny amount of processing time
need to look up the origin, on existing MATLAB legacy code.

So to access a single element of an array, A(n,k), this C array index
would look like:

data[(n-origin[1])*size[0] + (k-origin[0])];

For a 3 dimensional array, A(m,n,k), it would look like:

data[((m-origin[2])*size[1] + (n-origin[1])*size[0] + (k-
origin[0])];

Okay, how someone like myself would use this to do something different
is that there would be at least two new MATLAB facilities similar to
size() and reshape() that I might call "origin()" and "reorigin()",
respectively. Just like the MATLAB size() function returns the
contents of the size[] vector, origin() would return, in MATLAB
format, the contents of the origin[] vector.

And just like reshape() changes (under proper conditions) the contents
of the size[] vector, reorigin() would change the contents of the
origin[] vector. Since reorigin() does not exist in legacy MATLAB
code (oh, I suppose someone could have created a function named that,
but that's a naming problem that need not be considered here), then
there is no way for existing MATLAB programs to change the origins
from their default values of 1 making this fix perfectly backward
compatible.

Now, just as there are dimension compatibility rules that exist now
for MATLAB operations, there would be a few natural rules that would
be added so that "reorigined" MATLAB arrays could have operations
applied to them in a sensible way.

ARRAY ADDITION ("+") and SUBTRACTION ("-") and element-by-element
ARRAY MULTIPLICATION (".*"), DIVISION ("./"), POWER (".^"), and
ELEMENTARY FUNCTIONS:

Currently MATLAB insists that the number of dimensions are equal and
the size of each dimension are equal (that is the same "shape") before
adding or subtracting matrices or arrays. The one exception to that
is adding a scaler to an array, in which a hypothetical array of equal
size and shape with all elements equal to the scaler, is added to the
array. The resulting array has the same size and shape as the
argument arrays.

The proposed system would, of course, continue this constraint and add
a new constraint in that index origins for each dimension (the
origin[] vector) would have to be equal for two arrays to be added or
subtracted or element-by-element multiplied, etc. The resulting array
would have the same size[] vector and origin[] vector as the input
arrays.

"MATRIX" MULTIPLICATION ("*"):

A = B*C;

Currently MATLAB appropriately insists that the number of columns of B
are equal to the number of rows of C (we shall call that number K).
The resulting array has the number of rows of B and the number of
columns of C. The value of a particular element of A would be:

K
A(m,n) = SUM{ B(m,k) * C(k,n) }
k=1

The proposed system would, of course, continue this constraint and add
a new constraint in that index origins must be equal for each
dimension where the lengths must be equal. That is the number of
columns of B are equal to the number of rows of C and the origin index
of the columns of B is equal to the origin index of the rows of C.
The resulting array, A, has the number of rows of B and the number of
columns of C and the origin index of the rows of B and the origin
index of the columns of C would also be the same as for A. The value
of a particular element of A would be:

org-1+K
A(m,n) = SUM{ B(m,k) * C(k,n) }
k=org

where org = origin[0] for the B array and origin[1] for the C array
which must be the same number. In the same manner as the present
MATLAB K would be size[0] for the B array and size[1] for the C array
and would have to be the same number, otherwise a diagnostic error
would be returned.

Both of these definitions are degenerations of the more general case
where:

+inf
A(m,n) = SUM{ B(m,k) * C(k,n) }
k=-inf

where here you consider B and C to be zero-extended to infinity in all
four directions (up, down, left, and right). It's just that the zero
element pairs do not have to be multiplied and summed.

Matrix powers and exponentials (on square matrices) can be defined to
be consistent with this extension of the matrix multiply.

MATRIX DIVISION ("/" and "\"):

Like the current matrix division, it would invert the operation of

A = B*C

that is

C = B\A

and

B = A/C

The same size (or shape) requirements and index origin requirements of
the "*" operator would apply to "\" and "/". Given A and B, whatever
shape and origin requirement for B and C and whatever shape and origin
returned in A in the statement:

A = B*C;

would be the same requirements for B and A and define the resulting
shape and origin for C in the statement:

C = B\A

CONCATINATION:

This would also be a simple and straight-forward extension of how
MATLAB presently concatinates arrays. When we say:

A = [B C];

The number of rows of B and C must be equal, but the number of columns
of B and C can be anything. The first columns of A are identical with
the columns of B and then also must the indices of those columns. And
independent of what the column indices of C are, they just pick up
where the column index of B left off. This rule extension defaults to
what MATLAB presently does if B and C are both arrays with origin 1.
A similar rule extension can be made for A = [B ; C]; In all cases
the upper left corner of A is identical to the upper left corner of B,
both in value but also in subscripts (so A(1,1) becomes B(1,1) just
like it does now in MATLAB).

FUNCTIONS THAT RETURN INDICES (min(), max(), find(), sort(),
ind2sub(), and any others that I don't know about):

It must be internally consistent (and certainly can be made to be).
The indices returned would be exactly like the 1-based indices
returned presently in MATLAB except that the origin for the
corresponding dimension (that defaults to 1) would be added to each C-
like (0-based) index. That is, just like now in MATLAB:

[max_value, max_index] = max(A);

This must mean that A(max_index) is equal to max_value.

I think that this is easy enough to define. The only hard part is to
identify all MATLAB functions that search through an array and modify
them to start and end at indices that might be different than 1 and
size[dim] as are the search bounds today. It would instead search
from origin[dim] to size[dim]+origin[dim]-1 which would default to the
current operation if the origin equals 1.

FOR ALL OTHER MATLAB OPERATIONS, until a reasonable extended
definition for arrays with origins not 1 is thought up, MATLAB could
either bomb out with an illegal operation error if the base is not 1
or could, perhaps, ignore the origin. Either way, it's still
backwards compatible.

From: James Tursa on 8 Jul 2010 14:45

robert bristow-johnson <rbj(a)audioimagination.com> wrote in message <f3e6e4ef-8e06-4d8b-a4e6-392ed00594a4(a)d37g2000yqm.googlegroups.com>...
>
(snip)
>
> Steven Lord and others at TMW, can you *please* read this carefully ...
>
(snip)

To TMW: If you guys ever *do* take the time to read through this and seriously consider doing something like this, I hope you put your ideas out ahead of time for some type of pier review to ferret out potential usage problems before implementing. e.g., Fortran had to do something similar when they went from strictly 1-based indexing to any-based indexing.

James Tursa

|
Pages: 1
Prev: Possible to store results as a matrix?
Next: Saving/Loading The Workspace From a Structure Within .mat