From: Seebs on
On 2010-05-10, Nathan <nathancbaker(a)gmail.com> wrote:
> Then let's just start with Juha's original code and see where some
> simple improvements can be made, shall we?

> Given, the original...

> ,---
> Value_t* MyClass::findValue(const Value_t& value)
> {
> for(size_t xInd = 0; xInd < data.size(); ++xInd)
> for(size_t yInd = 0; yInd < data[xInd].size(); ++yInd)
> for(size_t zInd = 0; zInd < data[xInd][yInd].size(); +
> +zInd)
> {
> if(data[xInd][yInd][zInd] == value)
> return &data[xInd][yInd][zInd];
> }
>
> return 0;
> }
> `---

> First, we can remove the nested loops like so...

> ,---
> Value_t* MyClass::findValue(const Value_t& value)
> {
> size_t xyzMax = data.size() + data[xInd].size() + data[xInd]
> [yInd].size();
> for(size_t xInd = 0; xInd < xyzMax; ++xInd)
> {
> if(data[xInd] == value)
> return &data[xInd];
> }
>
> return 0;
> }
> `---

Note the crossposts. Over here in C-land, we don't have the option of
overriding data[] in that way. Actually, I'm not even sure you do in
C++ land. Imagine that the array is a 10x10x10 array. What is
data[5]? What is data[15]? How does data[5] know that data[5].size()
should be the size of the sixth row, but that (data[5] == value) should
be testing against the sixth element of the first row? Also, where are
xInd and yInd being declared, that we can use data[xInd][yInd].size()?

In short, this is just plain wrong. You're using undefined values. You
haven't established that the rows are the same size -- the original code
works even if each row is a different length!

So far as I can tell, for a 10x10x10 array, you seem to be calculating
the value 30, and then indexing the first 30 rows of the array, only 10
of which exist, and comparing them to values. That seems... very confused.

> Next, we remove the nested exit point by changing it into a signal to
> exit the loop...

> ,---
> Value_t* MyClass::findValue(const Value_t& value)
> {
> size_t xyzMax = data.size() + data[xInd].size() + data[xInd]
> [yInd].size();
> result = 0;
> for(size_t xInd = 0; xInd < xyzMax; ++xInd)
> {
> if(data[xInd] == value)
> result = &data[xInd];
> xInd = xyzMax;
> }
>
> return result;
> }
> `---

I would never, ever, approve this code. Even apart from all the issues
above, modifying loop variables like this is a worse control structure
than anything this side of longjmp().

Modifying loop variables by, say, incrementing them when you grab the next
data item can make sense. Setting the loop variable to a magic value is
crazy.

Also, you omitted {}, so this will always exit on the first call, nevermind
the various other problems.

> Juha asked for an alternative that didn't make use of multiple exit
> points. This is one way to achieve that -- fold the inner loops
> (those indices are meaningless) and signal the end to the 'for' loop.

Except you've not established that those indices are "meaningless".

> Simple, isn't it?

Simple, but completely wrong. Even if it were adjusted to compile, it'd
still be wrong; you haven't addressed the fact that the original supported
a grid in which different elements can be of different sizes.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Seebs on
On 2010-05-10, Richard Heathfield <rjh(a)see.sig.invalid> wrote:
> Seebs wrote:
>> On 2010-05-10, Richard Heathfield <rjh(a)see.sig.invalid> wrote:
>>> Yes. But, for me, SESE is idiomatic.

>> That's not an idiom, that's a policy.

> Yes, but it's an idiomatic policy. A high-level idiom, if you like.

Yes. But high-level idioms don't provide the same chunking benefit that
literal idioms do. The reason to use idioms is that they convert a large
number of data into a single datum, allowing the brain to use much less
short-term memory on the idiomatic part while studying the rest.

>> An idiom is a very specific way of writing a loop,

> That definition of "idiom" is far too restrictive. For example, it
> excludes if((fp = fopen(filename, filemode)) == NULL), which is
> certainly not a very specific way of writing a loop, and yet it is most
> definitely idiomatic.

I did not say every specific way of writing a loop is an idiom.

An A is a B does not imply that all B are A.

> I understand the point behind your reply, of course. It's just that I
> don't agree with it. (As with many style issues, there is no one right
> answer that works for everybody.)

But there is, in the case of idioms, a right answer that works enough
better for a sufficiently large majority of the population that it should
probably be adopted. The cognitive load of a condition which isn't a
trivial building block is high enough to massively outweigh, in most cases,
the cost of considering the possibility that something inside the loop
terminates the loop.

In short, I think this would be a much better argument for a language other
than C. C evolved for multiple exits, and this remains the way of using it
that the largest number of people will read without errors or mistakes
introduced by the extra complexity.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Seebs on
On 2010-05-10, Ben Bacarisse <ben.usenet(a)bsb.me.uk> wrote:
> [I've trimmed the groups. Sorry if this is the wrong thing to do]

No worries, I don't have two of those groups either.

> You can't assume that the data is contiguous or, to be a little more
> general, that a single [] operation means the same as three of them.
> This is, after all, C++ not C.

In C++ it's at least conceivable that someone could come up with some
exceptionally sneaky overloading of [] to make it viable. In C, it's
just plain wrong. In general, you cannot assume that
data[x][y][z]
and
((type *) data)[(x*YSIZE*ZSIZE) + (y + ZSIZE) + z]

are the same.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Nathan Baker on

"Thomas J. Gritzan" <phygon_antispam(a)gmx.de> wrote in message
news:hs981d$ot3$1(a)newsreader5.netcologne.de...
>
> This code uses only 1 index operation to data. data[xInd] still has 2
> dimensions and can't be compared to the scalar 'value'; the types don't
> match.

A computer's RAM has only one dimension. Let us consider the following 2-D
array:

x | 0 | 1 | 2
0 | a | b | c
1 | d | e | f
2 | g | h | i

In RAM, it looks like this:

(0,0) a (0,1) b (0,2) c (1,0) d (1,1) e (1,2) f (2,0) g (2,1) h (2,2) i

....which we can also index as...

(0) a (1) b (2) c (3) d (4) e (5) f (6) g (7) h (8) i

....so, you can easily see that 'data[2][1]' and 'data[7]' reference the same
value.

Using one index, rather than three, decreases the complexity of our algo.

Nathan.


From: Seebs on
On 2010-05-10, Nathan Baker <nathancbaker(a)gmail.com> wrote:
> "Thomas J. Gritzan" <phygon_antispam(a)gmx.de> wrote in message
> news:hs981d$ot3$1(a)newsreader5.netcologne.de...
>> This code uses only 1 index operation to data. data[xInd] still has 2
>> dimensions and can't be compared to the scalar 'value'; the types don't
>> match.

> A computer's RAM has only one dimension.

You are making a flawed assumption.

> Let us consider the following 2-D
> array:

> x | 0 | 1 | 2
> 0 | a | b | c
> 1 | d | e | f
> 2 | g | h | i
>
> In RAM, it looks like this:
>
> (0,0) a (0,1) b (0,2) c (1,0) d (1,1) e (1,2) f (2,0) g (2,1) h (2,2) i

Maybe.

It could be that it's not really a 2D array, but an array of pointers to
non-contiguous blocks.

> ...which we can also index as...
>
> (0) a (1) b (2) c (3) d (4) e (5) f (6) g (7) h (8) i

> ...so, you can easily see that 'data[2][1]' and 'data[7]' reference the same
> value.

Wrong.

Let's look more closely at data[1][0] and data[3]. data[3] refers to the
block of memory just after 'i'. data[1][0] refers to 'b'.

See, indexing doesn't magically change from one type to another without any
intervention; it follows the type of the thing you're indexing. If
data[2].size() tells you the size of the 'cfi' row, then data[2] is not 'b'.

> Using one index, rather than three, decreases the complexity of our algo.

It would if it worked, but it is not always the case that it would work, and
this is one of the cases where it's very unlikely to.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!