Extended Hamming encode/decode [Embedded]

Prev: Announcement: New version of UML state machine code generator available
Next: CFP with Extended Deadline of Mar. 21, 2010: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010

From: David Brown on 7 Mar 2010 16:08

Jon Kirwan wrote:
> On Sun, 07 Mar 2010 13:16:44 +0100, David Brown
> <david(a)westcontrol.removethisbit.com> wrote:
>
>> On 07/03/2010 01:56, Jon Kirwan wrote:
>>> I needed to write some very simple c routines (tempted to use
>>> assembly, but not for this application at this time) to
>>> handle the following extended hamming code cases:
>>>
>>> [ 8, 4, 4]
>>> [16, 11, 4]
>>> [32, 26, 4]
>>>
>>> (These are the total number of bits, number of data bits, and
>>> the guaranteed Hamming distance between valid symbols.)
>>>
>>> As indicated by the last number (the hamming distance between
>>> valid symbols) these routines are able to correct single bit
>>> errors and detect double-bit errors (SECDED.)
>>>
>>> There are many different ways to perform such Hamming codes
>>> and I've chosen a particular one among many. There are
>>> others that also work. Although I've chosen a fairly well
>>> documented form, it's certainly not the only possible form
>>> and so it is that comparisons of the encoded results with
>>> other algorithms may, or may not, match up bit for bit. But
>>> the function remains and it does conform to some I've seen.
>>>
>>> The code I've written uses no tables at all and is very
>>> short. As such, it may be appropriate for small micro
>>> embedded use. For example, the 16-bit Hamming version, on
>>> the MSP430 using IAR and the standard "release" code
>>> generation, yields 144 words of code. With the "debug" code
>>> generation, it's 175 words, instead. For 32-bit Hamming
>>> codes, "release" again, that nearly doubles to 248 words. But
>>> that gives an idea. Cycle counts vary on inputs.
>>>
>>> I've isolated these into three short modules -- one for each
>>> of the three data types required. There's an encode and
>>> decode routine in each. If anyone is interested in the
>>> results, here's a link to the code.
>>>
>>> http://www.infinitefactors.org/misc/hamm.zip
>>>
>>> A very short demo program that tests the routines and
>>> illustrates their use is also included. There is a small
>>> file included where the specific data sizes can be specified
>>> for a particular compiler.
>>>
>>> They've been tested using a couple of compilers, Microsoft's
>>> VC++ 1.52C DOS compiler and IAR's MSP430 (Workbench 5.0)
>>> compiler, but not so thoroughly tested that I'm confident
>>> they will compile well with all of them. The design reuses
>>> code, where it seemed possible, and does NOT require special
>>> data typedefs.
>>>
>>> Much better could be had with assembly, but these satisfied
>>> the need for now and, of course, assembly would require many
>>> different instances to be coded. These routines can act as a
>>> template, though, for deductions to specific assembly coding.
>>>
>>> I will gladly modify them based upon useful input, as well.
>>>
>>> Jon
>> Hi Jon,
>>
>> May I give you some constructive criticism on the code?
>
> Please do. I always enjoy learning how others view things
> and how I might do better, as well. Your thoughts help me a
> great deal and I appreciate it a lot.
>
>> I haven't
>> actually gone through the details of the implementation itself as yet,
>> to see if I can see any way to improve on the size or speed of the code.
>> I'll do that if I can find the time, mainly for the fun of it. But I
>> have a few more general comments on the code.
>
> Thanks.
>
>> First off, before I criticise anything, let me thank you for sharing the
>> code with others - this is a kind of coding that many people find hard
>> to get right. And of course, my comments here are based on my own
>> personal opinions, preferences, and rants - when I write "you should
>> never do this", I really mean "/I/ would never do this" :-) I also
>> don't expect you to agree with everything I write, though I hope you'll
>> agree with some of it.
>
> Okay.
>
>> Why did you pick the LGPL? It basically means that people here can use
>> your code in test code, but can't (legally) use it in any "production" code.
>
> Partly, it was exactly because of your reaction. We've
> recently had a modest discussion about licensing and I have
> frankly found my reading of specific licenses to have muddled
> by understanding more than clarified it. So much to take in
> and it still hasn't gelled in my mind, yet.
>
> Partly, I think, because I'm not yet even sure what I want.
>

Fair enough. Releasing the code now under the LGPL does not limit you
from releasing it under other licenses later.

> I'm very much open to ANY thoughts here.
>

I'd go for a FreeRTOS-style modified GPL saying that the original
"library", and any changes made to it, must be released under the same
license, while other code linked to the "library" that only uses the
library's public interface is not bound by the license. eCOS has a
similar license:

<http://ecos.sourceware.org/license-overview.html>

Alternatively, given that this is a fairly small amount of code and
there is no realistic chance that you will ever sue someone for stealing
it or abusing whatever license you pick, you could just put a notice
saying "Please retain the above copyright notice, provide a copy of this
file to anyone who asks, and publish any modifications and enhancements
to this code in a public forum such as comp.arch.embedded". Honest
users will respect your wishes, dishonest ones will ignore whatever
license you use.

>> You should drop the "auto" keyword. It has been unnecessary in C for
>> decades, conflicts with newer C++ standards, and if there are in fact
>> any compilers that pay attention to it, it will generate worse code
>> (since "auto" is then a hint to the compiler to put the variable on the
>> stack, hindering the optimiser).
>
> I also use LET when writing BASIC code. It's an idiom of
> mine, I suppose. But it helps _me_ by placing a written word
> that explicitly calls my attention to its unique nature in
> functions and it harms nothing at all.
>

Generally, I'm all for "explicit" rather than "implicit". I consider
the "implicit int" in C to be one of the language's many failings. But
the keyword "auto" is so archaic that it's only purpose is to confuse
programmers with less than twenty years of experience. And it conflicts
directly with modern C++ - that's an indication of how rarely it is used
in any C code you'd find these days. Unnecessary and gratuitous
conflicts with C++ is, I would say, harmful. (I would say the opposite
too - for example, in the little C++ I write, a function taking no
parameters is void foo(void), which is valid in both languages.)

> I use a lot of languages, not just c, and I have used so many
> more. Everything from RPG II and COBOL and Algol and various
> assembly languages, starting around 1970 or so through BASIC
> a few years later and c in '78 (I remember, because it was
> with Unix v6 kernel code) and c++ in about '87, or so.
>

Yet you have clearly written this code for the benefit of others, not
yourself - thus you have to balance your idiosyncrasies with standard
practice. I too program in many languages, and have particular ways of
writing my code - I would change my style for code published for general
use.

> Because of the breadth of languages I still continue to use,
> it also saves me from just that very tiny extra moment of
> thought about c, in particular. Without the 'auto' keyword,
> the same line, outside a function, works but has a storage
> type, life span, and visibility that is marked different and
> I don't like having to pause even a moment. The 'auto'
> keyword takes a fraction of a moment to type and it pays,
> time and time again later on for me.
>
> I've been known to use LET in BASIC code, too.
>
> I don't know if this is a habit I want to shed. However, I'd
> like to know more about why you wrote "conflicts with newer
> c++ standards." Can you provide a standards document
> reference I can go to on this subject? Or is this simply a
> "new standard" that is developing within the gcc community
> that I need to learn about?
>

<http://en.wikipedia.org/wiki/C%2B%2B0x#Type_inference>
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1984.pdf>
<http://gcc.gnu.org/projects/cxx0x.html>

Basically, in C++0x (a standard started over ten years ago, and still
not fully implemented - who said the computer world moved quickly?), you
can use "auto" as the a type in a variable declaration when the compiler
can infer the true type from the initialiser. Thus "auto x = 1;" makes
x an int (the type of "1"). It's main use is when using or defining
templates, where variable type declarations can get seriously messy -
the programmer knows what they mean, but getting the ugly details right
is hard work.

> Also, a final point. It used to have no cost and have no
> change in the compiler behavior -- certainly that remains
> true with c, which is the language I chose here. Had I been
> using c++ for this code, which is a different language as I
> am continually reminded of, I would have written it as a
> template class allowing instancing, as needed. Do you argue
> that when I write c, not c++, that I need to be aware of your
> comments regarding this "new standard?"
>

Yes, I argue that. C and C++ are different languages - but they are
close enough, and have enough in common, that it is silly to write code
that does not work as expected in the other language unless you have
good reason. There are many incompatibilities between C and C++ in
theory - but there are none where the C code would be worsened by making
it work equally well with C++.

Treat C++ as though it were yet another C standard. You wouldn't write
code that worked with C90 but failed with C99 (though you may write code
that worked with C99 and not with C90, taking advantage of the newer
functionality and ignoring the obsolete standard). Don't write code
that works with C and not C++ without good reason - and as far as I
know, there are no good reasons.

>> You have a "hamm_def.h" include file to define your own non-standard
>> sized integer types that are not actually portable or accurate (at the
>> very least, use "short int" for 16-bit - it's not universal, but it is
>> more portable than plain "int"). Since you are clearly aiming for
>> portable code, you would be much better using "#include <stdint.h>" and
>> using modern sized integer types. It is perfectly reasonable to expect
>> people to have <stdint.h> these days - anyone who needs to use your code
>> with ancient tools will simply have to put in their own typedefs.
>
> Thanks for that. I just checked a couple of c compiler
> implementations I have here and am still supporting,
> actively, for customers. There is no stdint.h file in their
> include directories. Doesn't exist. For example, it's not
> present in Lattice c or in Microsoft's VC 1.52c (which is the
> actual target for this code.) And I still support systems
> written in both of these compiler tools, that are used in
> modern FABs today for IC manufacturing. If you can believe
> it.
>

Sometimes in this industry, you have to deal with ancient systems or
ancient tools - I know that well. But you have to draw a line and say
that the code you write today is for the systems available today, and
for use with today's software and today's tools. You take advantage of
the modern tools and their benefits. If it turns out that you need to
retrofit your new code to work with an old system, /that/ is the time to
go out of your way to make it work there. For example, I use //
comments because they are neater and clearer than /* */ comments (for
short comments), even though I have a few compilers that don't support
them. But I won't limit myself just for the sake of a couple of
projects I haven't touched for years.

And of course in this case, writing your own "stdint.h" for your old C90
compiler is not exactly difficult.

Note that this is different from using new features or being
incompatible just for the sake of it - there has to be some benefit in it.

> It is a C99 standard, not C90. I honestly wish that I could
> rely upon C99 (despite some sour comments I've read from at
> least one compiler author, who I trust to know what he speaks
> of, that C99 is a terrible standard... an opinion I'm not
> well enough informed about to argue either way), but that

C is a terrible language, and C99 is a terrible standard - but they are
the best we've got in this industry.

> simply isn't even the modern world of compiler tools for
> embedded use I'm exposed to (and I'm not suggesting that VC++
> 1.52C is modern, so don't mistake me here.) Only a very few
> of the compiler tools I have access to attempt much of C99,
> though I've no doubt that stdint.h is an easy one for them.
>

Most modern compilers (less than 5 years old, say) implement a fair
amount of C99. Few (if any) implement it all. But the missing features
are seldom relevant - that's why the compiler writers haven't bothered
to implement them. See for example:

<http://gcc.gnu.org/c99status.html>

There is nothing "missing" that is worth bothering about for most
programs. The compiler industry is slow, but after ten years anything
that is worth implementing, and that programmers want to use, has been
implemented.

> This leaves me arguing that I would prefer to rely upon a
> base I _can_ fairly require, which is C90.
>

You can rely on modern compilers implementing the important parts of C99
in the same way as you can rely on any compiler to generate correct code
- i.e., you are normally safe but sometimes you'll get caught out.

> I'm still open to an argument here or other ideas.
>
>> Having first defined these size-specific types, you then ignore them in
>> the code and use that most hideous of illogical expression, the
>> "unsigned char".
>
> No, the specified size is on purpose, Dave (setting aside for
> a moment what you will address in a moment regarding my use
> of unsigned.)
>
> In this case, the check values will NEVER exceed that size.
> Because of that fact, I can insist on it. There is no need
> for the user of this code to worry about that detail, because
> I have. Check values, plus parity, of 8 bits will cover 128
> bit words. For those interested in exceeding that with my
> code, I suppose they can modify the code to deal with it. (I
> might have created a separate typedef, I suppose.)
>

I'm not arguing against your use of __HAMM_VALUE_T for size-specific
types - far from it (though I argue against your use of "__", and I
forgot to complain about the use of all-caps being ugly to read).

I'm arguing against using "unsigned char" when what you really mean is
"unsigned 8-bit integer", generally written "uint8_t".

> Do you have a good way I might be able to use sizeof() in a
> #if? I seem to recall that doesn't work.
>

No, you can't use sizeof in an #if. But I can't see why you might want
to do that here - perhaps we are misunderstanding each other?

>> Until someone can draw me two letter "a"s and explain
>> to me why one of these is "signed" and the other "unsigned", I will only
>> use "char" for holding a character.
>
> Well, then we have a serious problem, Dave. There is quite a
> difference between c's handling of signed and unsigned in
> terms of conversions and in this particular case, it becomes
> important. A shift operation is only one such reason.
>
> If you are interested in exploring what happens, go ahead and
> change the unsigned chars to just char and modify the 1U to a
> 1, instead, in hamm_xx.c and recompile and check the results.
>

I know all about C's handling of "signed char" and "unsigned char" -
that's not the issue.

But a "char" is a "character", i.e., a letter or character in a string.
Regardless of what K&R thought, it makes no logical sense to talk
about a "signed" letter - just as it makes no sense to talk about an
upper-case number. "signed char" is a meaningless concept in itself -
and using any sort of "char" for numbers is dangerous because people
think they can use plain "char", and then get trouble with
signed/unsigned issues.

It is true that the original C had no better standard for 8-bit data,
but C99 does have better types - they are called "uint8_t" and "int8_t".
And if you have to use a compiler without <stdint.h>, then you define
these types yourself.

> c allows quite a range of unspecified behavior when simply
> using char. One has no idea whether or not it is signed or
> unsigned on a particular implementation. Operations, such as
> shift, are defined well for unsigned but not so with signed.
> The conversion of signed to unsigned is well defined over the
> entire range, but unsigned to signed is not well defined --
> there is a huge hole in the specification there.
>
> Had I decided to use char here, I would have been worrying
> about conversions here and there, across thoughts of many
> different compilers and targets -- conversions I don't have
> to worry about as written. When I wrote this code, I avoided
> the pitfalls entirely, this way. Had I used char, I would
> have had to carefully considered each and every operation in
> the context of all the compilers I've had experience with.
> That's too much to add when writing something quick.
>

You've missed my point entirely - using "char" is even worse than using
"unsigned char". The best is to use "uint8_t" - second best is to use
your own __UINT08 typedef.

>> When I want an 8-bit number, I'll
>> use the logically-named standard types for it - uint8_t or int8_t.
>
> Ah. So I could have used those in hamm_def.h, then, and
> you'd have felt better about that choice, at least.
>

Yes, you should have used the standard typedefs rather than inventing
your own almost-standard versions.

>> The
>> fact that these boil down to "signed char" and "unsigned char" is one of
>> the uglinesses of C that we can't entirely escape - but we should do our
>> best.
>
> Well, I will await your discussion of the above before
> continuing on.
>
>> Another failing of C is its lack of a proper Boolean type.
>
> But I'm writing in c. Oh, well.
>

Yes, C is at heart a broken language. But many of its failings are
fixable, and many have been fixed by later standards. Just because the
original C language does not have a Boolean type or sized integers does
not mean that modern C does not have them, or that you can't use them in
your own code. To write good C requires discipline to turn it into a
good language, not to limit yourself to the lowest common denominator.

>> But for a
>> very long time, <stdbool.h> has given us something close - and people
>> writing clear code have used their own typedefs, #define's, and enums
>> for bools before that.
>
> Sorry, no stdbool.h, either. Not in the target or in some of
> the other compilers I've used in the past. It is, yet again,
> a C99 introduction. Not C90. I cannot rely upon it.
>

Then let me give you a C90-compatible <stdbool.h> for free. You can
relicense it as you want (such as for inclusion in hamm.zip), and use it
to give you ancient compilers a little boost:

/* stdbool.h */
#ifndef __stdbool_h__
#define __stdbool_h__
typedef bool unsigned char;
#define true 1
#define false 0
#endif

There, that wasn't too hard. You could add a similar stdint.h.

>> If you are writing a function that returns a
>> success or failure, it should return a "bool" with values "true" and
>> "false". That way you are saying exactly what you mean - giving both
>> the user and the compiler the best use of the code (a "bool" is faster
>> than an "int" on an AVR, for example).
>
> I completely agree, in general. However, it has been LONG
> practice in c to use 'int' for this purpose and to return 0
> or 1. Doing that continues a well-worn usage and shouldn't
> cause anyone any difficulties at all. Using a C99 standard
> would cause some difficulties in cases I can readily avoid by
> this usage.
>

It is certainly true that using "int" and "0" and "1" instead of
booleans has long been standard practice in C. That does not mean it is
a /good/ practice.

>> In general, functions that take pointers and return a status indication
>> are inefficient, and can often be a source of error (people pass them
>> pointers to inappropriate types).
>> <snip>
>
> But we aren't speaking "in general" here. In this case, it
> is an explicit nature of SECDED that you can _detect_ an
> error. That's the whole point of the idea. A Hamming code
> with a Hamming distance of only 3 has no error detection,
> only correction. So there, you might have a point. But this
> is for a Hamming distance of 4, with the outright desire of
> doing so being that errors _can_ be detected and reported.
>
> So we aren't talking a "general case."
>

More specifically, you are looking for a function that will decode
hammed messages quickly and efficiently, correcting correctable mistakes
and indicating uncorrectable mistakes. Passing pointers to your values
rather than the values themselves makes the code bigger and slower,
contrary to your requirements. There are better ways to structure your
code to avoid those inefficiencies.

> You've put in some more time, as well. And I appreciate it.

These little discussions are always fun - even if we don't manage to
change each other's opinions, it's still good to challenge our
assumptions and be forced to think about /why/ we like to code the way
we do.

> But let's cover these before completing the rest. I'd enjoy
> the discussion very much. But this is already enough to
> grapple with and it may be better to find agreement in styles
> here before moving much further.
>

Fair enough.

> Thanks,
> Jon

mvh.,

David
(it's "David", by the way, not "Dave")

From: Tim Wescott on 7 Mar 2010 16:13

Tim Wescott wrote:
> David Brown wrote:
>> On 07/03/2010 01:56, Jon Kirwan wrote:
> -- snip --
>
>>> I've isolated these into three short modules -- one for each
>>> of the three data types required. There's an encode and
>>> decode routine in each. If anyone is interested in the
>>> results, here's a link to the code.
>>>
>>> http://www.infinitefactors.org/misc/hamm.zip
>>>
>>> A very short demo program that tests the routines and
>>> illustrates their use is also included. There is a small
> -- snip some more --
>>> Jon
>>
>> Hi Jon,
>>
> -- snip even more --
>
>> Why did you pick the LGPL? It basically means that people here can
>> use your code in test code, but can't (legally) use it in any
>> "production" code.
>
> -- and yet more --
>
> Uh, no. The "GPL" part means that if someone incorporates it into a bit
> of production hardware they need to publish the parts of Jon's code they
> used. The "L" part means that they _don't_ have to publish any code
> that they wrote to link against Jon's code (unlike the "full" GPL, which
> includes the source of anything you link against).
>
> There are products out there that ship under GPL and LGPL -- witness the
> older Linksys routers, which ran Linux and for which there is a web page
> with full source --> for the 'regular' Linux parts <-- available.
>
> Granted, many companies aren't willing to include _anything_ that has a
> GPL or LGPL license, because they don't want to go to the trouble of
> sorting out what's what and publishing the "gotta publish" bits without
> publishing the "we want this a secret" bits.
>
> Change "legally" to "for most commercial purposes given today's
> corporate culture" and you'll be 100% correct.
>
> Were Jon willing to go to the trouble and pain _himself_, this would be
> a prime candidate for a dual licensing scheme.
>
Whoops -- reading other comments I'm reminded that the LGPL may 'poison'
the licensing of this code a bit more than I had thought.

What of licenses such as that for eCos or other open-source RTOS's?
Those also must be linked against, but have licenses that one hopes are
well thought out.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

From: Hans-Bernhard Bröker on 7 Mar 2010 17:11

Jon Kirwan wrote:
> In this case, though, the files stand alone. So all the code
> is there and anyone is able to change and recompile it.
>
> I'm still confused, I suppose.

Yes. You're confusing the effects of the licence on _your_ code with
those that it has on _other_people's_ code.

If other people use your LGPL'ed code, they have to distribute their
code in a shape that allows updates to your code to be applied to it.
For a smallish embeddeded application, which is statically linked, that
would mean to offer to distribute their code as one or more linkable
object files, instead of the usual raw hex file (or entire device).
People with a closed-source background somehow tend to consider that a
deal breaker.

From: Jon Kirwan on 7 Mar 2010 17:47

On Sun, 07 Mar 2010 13:56:26 -0600, Vladimir Vassilevsky
<nospam(a)nowhere.com> wrote:

>Jon Kirwan wrote:
>> On Sun, 07 Mar 2010 08:58:17 +0100, whygee <yg(a)yg.yg> wrote:
>>
>>
>>>Jon Kirwan wrote:
>>>
>>>>Vladimir Vassilevsky wrote:
>>>>
>>>>>If your goal is the absolutely the smallest code, Meggit decoder is a
>>>>>way to go. It is going to be rather slow though. It only takes ~10 lines
>>>>>of code to decode either of the codes; the decoder is going to be
>>>>>identical except for a couple of constants.
>>
>>
>>>I've never heard of Meggit, however SECDED is often good enough
>>>when speed is needed, for example in real-time communications
>>>(radio or long wires anyone ?)
>>
>>
>> I've not heard of Meggit, either. Perhaps Vladimir can
>> discuss it, since he has put it on the table for us.
>
>Actially, it is Meggitt decoder. The idea is to calculate the syndromes
>on the fly rather then store them in a LUT. This kind of decoder could
>be easily applied for *any* cyclic code; it's advantage is simplicity.
>
>For those who are interested in the classic error correction math, the
>good book is R. Blahut "Theory and practice of the error control codes"
><snip of code>

Thanks, Vladimir. That helps a lot. This isn't Hamming, but
is related to polynomial methods and moves into the early
1960's time frame and beyond.

Jon

From: Jon Kirwan on 7 Mar 2010 17:50

On Sun, 7 Mar 2010 20:45:21 +0100, Frank Buss
<fb(a)frank-buss.de> wrote:

>Jon Kirwan wrote:
>
>> So what does this mean? If I put out some routines that get
>> directly linked into the code as a separate project file
>> within a project, it's not really part of a "library" per se
>> but really part of the project as a whole, just like any
>> other module file may be. Does this mean the entire project
>> must be disclosed? If not, why not?
>
>I'm not a lawyer, but I think the idea is that the library has to be
>provided as some dynamic link library,

By "dynamic link library" do you mean this in the way that
Windows used it -- something loaded by the operating system
as a separate file of routines without its own stack, but
accessible to a calling program? If so, that pretty much
excludes all embedded systems I work on.

>so in theory the user can replace it
>easily.

Well, that would certainly be true for DLLs under Windows if
the user had all the necessary source code required to
regenerate the DLL (or VxD, I suppose.)

>Linking staticly to a program is not allowed with LGPL

Somehow, although I've seen that discussed here, I simply
have missed this in reading the license. But it is probably
because I'm confused.

>(I think
>then the resulting program is GPL'ed). E.g. that's one reason why Qt is
>provided as LGPL and as a commercial license: With the LPGL license you
>have to deploy DLLs, the commercial license allows you to link all in one
>executable-file. For details read the LGPL text, but maybe you'll need a
>lawyer to understand it.

Yeah. I tried to read it more carefully today and I'm still
not "getting it."

Jon

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Announcement: New version of UML state machine code generator available
Next: CFP with Extended Deadline of Mar. 21, 2010: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010