Extended Hamming encode/decode [Embedded]

Prev: Announcement: New version of UML state machine code generator available
Next: CFP with Extended Deadline of Mar. 21, 2010: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010

From: David Brown on 7 Mar 2010 07:16

On 07/03/2010 01:56, Jon Kirwan wrote:
> I needed to write some very simple c routines (tempted to use
> assembly, but not for this application at this time) to
> handle the following extended hamming code cases:
>
> [ 8, 4, 4]
> [16, 11, 4]
> [32, 26, 4]
>
> (These are the total number of bits, number of data bits, and
> the guaranteed Hamming distance between valid symbols.)
>
> As indicated by the last number (the hamming distance between
> valid symbols) these routines are able to correct single bit
> errors and detect double-bit errors (SECDED.)
>
> There are many different ways to perform such Hamming codes
> and I've chosen a particular one among many. There are
> others that also work. Although I've chosen a fairly well
> documented form, it's certainly not the only possible form
> and so it is that comparisons of the encoded results with
> other algorithms may, or may not, match up bit for bit. But
> the function remains and it does conform to some I've seen.
>
> The code I've written uses no tables at all and is very
> short. As such, it may be appropriate for small micro
> embedded use. For example, the 16-bit Hamming version, on
> the MSP430 using IAR and the standard "release" code
> generation, yields 144 words of code. With the "debug" code
> generation, it's 175 words, instead. For 32-bit Hamming
> codes, "release" again, that nearly doubles to 248 words. But
> that gives an idea. Cycle counts vary on inputs.
>
> I've isolated these into three short modules -- one for each
> of the three data types required. There's an encode and
> decode routine in each. If anyone is interested in the
> results, here's a link to the code.
>
> http://www.infinitefactors.org/misc/hamm.zip
>
> A very short demo program that tests the routines and
> illustrates their use is also included. There is a small
> file included where the specific data sizes can be specified
> for a particular compiler.
>
> They've been tested using a couple of compilers, Microsoft's
> VC++ 1.52C DOS compiler and IAR's MSP430 (Workbench 5.0)
> compiler, but not so thoroughly tested that I'm confident
> they will compile well with all of them. The design reuses
> code, where it seemed possible, and does NOT require special
> data typedefs.
>
> Much better could be had with assembly, but these satisfied
> the need for now and, of course, assembly would require many
> different instances to be coded. These routines can act as a
> template, though, for deductions to specific assembly coding.
>
> I will gladly modify them based upon useful input, as well.
>
> Jon

Hi Jon,

May I give you some constructive criticism on the code? I haven't
actually gone through the details of the implementation itself as yet,
to see if I can see any way to improve on the size or speed of the code.
I'll do that if I can find the time, mainly for the fun of it. But I
have a few more general comments on the code.

First off, before I criticise anything, let me thank you for sharing the
code with others - this is a kind of coding that many people find hard
to get right. And of course, my comments here are based on my own
personal opinions, preferences, and rants - when I write "you should
never do this", I really mean "/I/ would never do this" :-) I also
don't expect you to agree with everything I write, though I hope you'll
agree with some of it.

Why did you pick the LGPL? It basically means that people here can use
your code in test code, but can't (legally) use it in any "production" code.

You should drop the "auto" keyword. It has been unnecessary in C for
decades, conflicts with newer C++ standards, and if there are in fact
any compilers that pay attention to it, it will generate worse code
(since "auto" is then a hint to the compiler to put the variable on the
stack, hindering the optimiser).

You have a "hamm_def.h" include file to define your own non-standard
sized integer types that are not actually portable or accurate (at the
very least, use "short int" for 16-bit - it's not universal, but it is
more portable than plain "int"). Since you are clearly aiming for
portable code, you would be much better using "#include <stdint.h>" and
using modern sized integer types. It is perfectly reasonable to expect
people to have <stdint.h> these days - anyone who needs to use your code
with ancient tools will simply have to put in their own typedefs.

Having first defined these size-specific types, you then ignore them in
the code and use that most hideous of illogical expression, the
"unsigned char". Until someone can draw me two letter "a"s and explain
to me why one of these is "signed" and the other "unsigned", I will only
use "char" for holding a character. When I want an 8-bit number, I'll
use the logically-named standard types for it - uint8_t or int8_t. The
fact that these boil down to "signed char" and "unsigned char" is one of
the uglinesses of C that we can't entirely escape - but we should do our
best.

Another failing of C is its lack of a proper Boolean type. But for a
very long time, <stdbool.h> has given us something close - and people
writing clear code have used their own typedefs, #define's, and enums
for bools before that. If you are writing a function that returns a
success or failure, it should return a "bool" with values "true" and
"false". That way you are saying exactly what you mean - giving both
the user and the compiler the best use of the code (a "bool" is faster
than an "int" on an AVR, for example).

In general, functions that take pointers and return a status indication
are inefficient, and can often be a source of error (people pass them
pointers to inappropriate types). On small microcontrollers,
particularly 8-bit devices, using pointers like this means a lot more
code and time compared to passing data directly in registers. Even on
the msp430, which has very good pointer support, it is slower. There
are many alternative ways to handle this sort of interface. For
example, you could have a global "hammingDecodeError" flag that is set
on failure (yes, I really mean using a global - in almost all realistic
uses of these functions, it would be perfectly safe and faster than
anything else). You could pass the value directly, with a pointer to an
error flag as a separate parameter. Or you could also define a specific
return value as an error indicator. You've picked the "traditional" C
approach, though it is perhaps the most inefficient.

You use a lot of identifiers starting with double underscores - these
are supposed to be restricted to internal identifiers associated with
the compiler and standard libraries. They are certainly not to be used
on the main API call interface for your functions.

#including a ".c" file is, IMHO, horrible. One .c file compiles to one
..o file. Just because C and the preprocessor does not provide a proper
module concept, does not mean you should use any old file structure. At
the very least, hamm_xx should be .h (or .inc, to distinguish it from
proper header files). I would rate copy-and-paste above including a
".c" file.

mvh.,

David

From: steve_schefter on 7 Mar 2010 14:01

Hi David.

> Why did you pick the LGPL? It basically means that people here can use
> your code in test code, but can't (legally) use it in any "production" code.

This isn't my impression of LGPL. Can I ask what leads you to that
conclusion?

Steve

From: Jon Kirwan on 7 Mar 2010 14:01

On Sun, 07 Mar 2010 13:16:44 +0100, David Brown
<david(a)westcontrol.removethisbit.com> wrote:

>On 07/03/2010 01:56, Jon Kirwan wrote:
>> I needed to write some very simple c routines (tempted to use
>> assembly, but not for this application at this time) to
>> handle the following extended hamming code cases:
>>
>> [ 8, 4, 4]
>> [16, 11, 4]
>> [32, 26, 4]
>>
>> (These are the total number of bits, number of data bits, and
>> the guaranteed Hamming distance between valid symbols.)
>>
>> As indicated by the last number (the hamming distance between
>> valid symbols) these routines are able to correct single bit
>> errors and detect double-bit errors (SECDED.)
>>
>> There are many different ways to perform such Hamming codes
>> and I've chosen a particular one among many. There are
>> others that also work. Although I've chosen a fairly well
>> documented form, it's certainly not the only possible form
>> and so it is that comparisons of the encoded results with
>> other algorithms may, or may not, match up bit for bit. But
>> the function remains and it does conform to some I've seen.
>>
>> The code I've written uses no tables at all and is very
>> short. As such, it may be appropriate for small micro
>> embedded use. For example, the 16-bit Hamming version, on
>> the MSP430 using IAR and the standard "release" code
>> generation, yields 144 words of code. With the "debug" code
>> generation, it's 175 words, instead. For 32-bit Hamming
>> codes, "release" again, that nearly doubles to 248 words. But
>> that gives an idea. Cycle counts vary on inputs.
>>
>> I've isolated these into three short modules -- one for each
>> of the three data types required. There's an encode and
>> decode routine in each. If anyone is interested in the
>> results, here's a link to the code.
>>
>> http://www.infinitefactors.org/misc/hamm.zip
>>
>> A very short demo program that tests the routines and
>> illustrates their use is also included. There is a small
>> file included where the specific data sizes can be specified
>> for a particular compiler.
>>
>> They've been tested using a couple of compilers, Microsoft's
>> VC++ 1.52C DOS compiler and IAR's MSP430 (Workbench 5.0)
>> compiler, but not so thoroughly tested that I'm confident
>> they will compile well with all of them. The design reuses
>> code, where it seemed possible, and does NOT require special
>> data typedefs.
>>
>> Much better could be had with assembly, but these satisfied
>> the need for now and, of course, assembly would require many
>> different instances to be coded. These routines can act as a
>> template, though, for deductions to specific assembly coding.
>>
>> I will gladly modify them based upon useful input, as well.
>>
>> Jon
>
>Hi Jon,
>
>May I give you some constructive criticism on the code?

Please do. I always enjoy learning how others view things
and how I might do better, as well. Your thoughts help me a
great deal and I appreciate it a lot.

>I haven't
>actually gone through the details of the implementation itself as yet,
>to see if I can see any way to improve on the size or speed of the code.
> I'll do that if I can find the time, mainly for the fun of it. But I
>have a few more general comments on the code.

Thanks.

>First off, before I criticise anything, let me thank you for sharing the
>code with others - this is a kind of coding that many people find hard
>to get right. And of course, my comments here are based on my own
>personal opinions, preferences, and rants - when I write "you should
>never do this", I really mean "/I/ would never do this" :-) I also
>don't expect you to agree with everything I write, though I hope you'll
>agree with some of it.

Okay.

>Why did you pick the LGPL? It basically means that people here can use
>your code in test code, but can't (legally) use it in any "production" code.

Partly, it was exactly because of your reaction. We've
recently had a modest discussion about licensing and I have
frankly found my reading of specific licenses to have muddled
by understanding more than clarified it. So much to take in
and it still hasn't gelled in my mind, yet.

Partly, I think, because I'm not yet even sure what I want.

I'm very much open to ANY thoughts here.

>You should drop the "auto" keyword. It has been unnecessary in C for
>decades, conflicts with newer C++ standards, and if there are in fact
>any compilers that pay attention to it, it will generate worse code
>(since "auto" is then a hint to the compiler to put the variable on the
>stack, hindering the optimiser).

I also use LET when writing BASIC code. It's an idiom of
mine, I suppose. But it helps _me_ by placing a written word
that explicitly calls my attention to its unique nature in
functions and it harms nothing at all.

I use a lot of languages, not just c, and I have used so many
more. Everything from RPG II and COBOL and Algol and various
assembly languages, starting around 1970 or so through BASIC
a few years later and c in '78 (I remember, because it was
with Unix v6 kernel code) and c++ in about '87, or so.

Because of the breadth of languages I still continue to use,
it also saves me from just that very tiny extra moment of
thought about c, in particular. Without the 'auto' keyword,
the same line, outside a function, works but has a storage
type, life span, and visibility that is marked different and
I don't like having to pause even a moment. The 'auto'
keyword takes a fraction of a moment to type and it pays,
time and time again later on for me.

I've been known to use LET in BASIC code, too.

I don't know if this is a habit I want to shed. However, I'd
like to know more about why you wrote "conflicts with newer
c++ standards." Can you provide a standards document
reference I can go to on this subject? Or is this simply a
"new standard" that is developing within the gcc community
that I need to learn about?

Also, a final point. It used to have no cost and have no
change in the compiler behavior -- certainly that remains
true with c, which is the language I chose here. Had I been
using c++ for this code, which is a different language as I
am continually reminded of, I would have written it as a
template class allowing instancing, as needed. Do you argue
that when I write c, not c++, that I need to be aware of your
comments regarding this "new standard?"

>You have a "hamm_def.h" include file to define your own non-standard
>sized integer types that are not actually portable or accurate (at the
>very least, use "short int" for 16-bit - it's not universal, but it is
>more portable than plain "int"). Since you are clearly aiming for
>portable code, you would be much better using "#include <stdint.h>" and
>using modern sized integer types. It is perfectly reasonable to expect
>people to have <stdint.h> these days - anyone who needs to use your code
>with ancient tools will simply have to put in their own typedefs.

Thanks for that. I just checked a couple of c compiler
implementations I have here and am still supporting,
actively, for customers. There is no stdint.h file in their
include directories. Doesn't exist. For example, it's not
present in Lattice c or in Microsoft's VC 1.52c (which is the
actual target for this code.) And I still support systems
written in both of these compiler tools, that are used in
modern FABs today for IC manufacturing. If you can believe
it.

It is a C99 standard, not C90. I honestly wish that I could
rely upon C99 (despite some sour comments I've read from at
least one compiler author, who I trust to know what he speaks
of, that C99 is a terrible standard... an opinion I'm not
well enough informed about to argue either way), but that
simply isn't even the modern world of compiler tools for
embedded use I'm exposed to (and I'm not suggesting that VC++
1.52C is modern, so don't mistake me here.) Only a very few
of the compiler tools I have access to attempt much of C99,
though I've no doubt that stdint.h is an easy one for them.

This leaves me arguing that I would prefer to rely upon a
base I _can_ fairly require, which is C90.

I'm still open to an argument here or other ideas.

>Having first defined these size-specific types, you then ignore them in
>the code and use that most hideous of illogical expression, the
>"unsigned char".

No, the specified size is on purpose, Dave (setting aside for
a moment what you will address in a moment regarding my use
of unsigned.)

In this case, the check values will NEVER exceed that size.
Because of that fact, I can insist on it. There is no need
for the user of this code to worry about that detail, because
I have. Check values, plus parity, of 8 bits will cover 128
bit words. For those interested in exceeding that with my
code, I suppose they can modify the code to deal with it. (I
might have created a separate typedef, I suppose.)

Do you have a good way I might be able to use sizeof() in a
#if? I seem to recall that doesn't work.

>Until someone can draw me two letter "a"s and explain
>to me why one of these is "signed" and the other "unsigned", I will only
>use "char" for holding a character.

Well, then we have a serious problem, Dave. There is quite a
difference between c's handling of signed and unsigned in
terms of conversions and in this particular case, it becomes
important. A shift operation is only one such reason.

If you are interested in exploring what happens, go ahead and
change the unsigned chars to just char and modify the 1U to a
1, instead, in hamm_xx.c and recompile and check the results.

c allows quite a range of unspecified behavior when simply
using char. One has no idea whether or not it is signed or
unsigned on a particular implementation. Operations, such as
shift, are defined well for unsigned but not so with signed.
The conversion of signed to unsigned is well defined over the
entire range, but unsigned to signed is not well defined --
there is a huge hole in the specification there.

Had I decided to use char here, I would have been worrying
about conversions here and there, across thoughts of many
different compilers and targets -- conversions I don't have
to worry about as written. When I wrote this code, I avoided
the pitfalls entirely, this way. Had I used char, I would
have had to carefully considered each and every operation in
the context of all the compilers I've had experience with.
That's too much to add when writing something quick.

>When I want an 8-bit number, I'll
>use the logically-named standard types for it - uint8_t or int8_t.

Ah. So I could have used those in hamm_def.h, then, and
you'd have felt better about that choice, at least.

>The
>fact that these boil down to "signed char" and "unsigned char" is one of
>the uglinesses of C that we can't entirely escape - but we should do our
>best.

Well, I will await your discussion of the above before
continuing on.

>Another failing of C is its lack of a proper Boolean type.

But I'm writing in c. Oh, well.

>But for a
>very long time, <stdbool.h> has given us something close - and people
>writing clear code have used their own typedefs, #define's, and enums
>for bools before that.

Sorry, no stdbool.h, either. Not in the target or in some of
the other compilers I've used in the past. It is, yet again,
a C99 introduction. Not C90. I cannot rely upon it.

>If you are writing a function that returns a
>success or failure, it should return a "bool" with values "true" and
>"false". That way you are saying exactly what you mean - giving both
>the user and the compiler the best use of the code (a "bool" is faster
>than an "int" on an AVR, for example).

I completely agree, in general. However, it has been LONG
practice in c to use 'int' for this purpose and to return 0
or 1. Doing that continues a well-worn usage and shouldn't
cause anyone any difficulties at all. Using a C99 standard
would cause some difficulties in cases I can readily avoid by
this usage.

>In general, functions that take pointers and return a status indication
>are inefficient, and can often be a source of error (people pass them
>pointers to inappropriate types).
><snip>

But we aren't speaking "in general" here. In this case, it
is an explicit nature of SECDED that you can _detect_ an
error. That's the whole point of the idea. A Hamming code
with a Hamming distance of only 3 has no error detection,
only correction. So there, you might have a point. But this
is for a Hamming distance of 4, with the outright desire of
doing so being that errors _can_ be detected and reported.

So we aren't talking a "general case."

You've put in some more time, as well. And I appreciate it.
But let's cover these before completing the rest. I'd enjoy
the discussion very much. But this is already enough to
grapple with and it may be better to find agreement in styles
here before moving much further.

Thanks,
Jon

From: David Brown on 7 Mar 2010 14:23

steve_schefter(a)hotmail.com wrote:
> Hi David.
>
>> Why did you pick the LGPL? It basically means that people here can use
>> your code in test code, but can't (legally) use it in any "production" code.
>
> This isn't my impression of LGPL. Can I ask what leads you to that
> conclusion?
>
> Steve

To use LGPL'ed code with your own code, anyone receiving your code must
have access to the LGPL'ed code, be able to change and recompile that
code, then link it along with the rest of the binary and run that
software instead. For non-embedded systems (or "big" embedded systems),
that typically means the LGPL'ed code is part of a dynamic library. To
use LGPL code in a (typically statically linked) embedded system means
that the end user will need access to your source code, or at least
compiled and linkable object files. Since that is impractical or
impossible for most embedded systems, the rough conclusion is that pure
LGPL code is very seldom suitable for embedded systems.

Of course, in some cases that is exactly what the code supplier is
intending - the end user is able to use the code for evaluation and
testing, and in non-commercial systems (and also for internal use), but
for a commercial system they would get a separate license from the code
supplier. That's a valid way of licensing code to developers. But I
don't think that was Jon's intention here.

This has been covered endlessly in previous threads in this newsgroup -
please let's not have another one unless someone has new points that
were not covered recently.

From: Frank Buss on 7 Mar 2010 14:26

Jon Kirwan wrote:

> Partly, I think, because I'm not yet even sure what I want.

That's the first thing you have to solve :-)

Maybe a short explanation can help: LGPL allows other developers to build a
library from your source code and this library can be used in closed source
programs, but the library source and all changes to the library has to be
made public. If it is GPL, anything which uses it has to be public and GPL,
too.

For most public projects, I prefer the BSD license: Do whatever you want
with it, but add some credits of the original author to your documentation.

--
Frank Buss, fb(a)frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: Announcement: New version of UML state machine code generator available
Next: CFP with Extended Deadline of Mar. 21, 2010: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010