Prev: Announcement: New version of UML state machine code generator available
Next: CFP with Extended Deadline of Mar. 21, 2010: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010
From: David Brown on 7 Mar 2010 07:16 On 07/03/2010 01:56, Jon Kirwan wrote: > I needed to write some very simple c routines (tempted to use > assembly, but not for this application at this time) to > handle the following extended hamming code cases: > > [ 8, 4, 4] > [16, 11, 4] > [32, 26, 4] > > (These are the total number of bits, number of data bits, and > the guaranteed Hamming distance between valid symbols.) > > As indicated by the last number (the hamming distance between > valid symbols) these routines are able to correct single bit > errors and detect double-bit errors (SECDED.) > > There are many different ways to perform such Hamming codes > and I've chosen a particular one among many. There are > others that also work. Although I've chosen a fairly well > documented form, it's certainly not the only possible form > and so it is that comparisons of the encoded results with > other algorithms may, or may not, match up bit for bit. But > the function remains and it does conform to some I've seen. > > The code I've written uses no tables at all and is very > short. As such, it may be appropriate for small micro > embedded use. For example, the 16-bit Hamming version, on > the MSP430 using IAR and the standard "release" code > generation, yields 144 words of code. With the "debug" code > generation, it's 175 words, instead. For 32-bit Hamming > codes, "release" again, that nearly doubles to 248 words. But > that gives an idea. Cycle counts vary on inputs. > > I've isolated these into three short modules -- one for each > of the three data types required. There's an encode and > decode routine in each. If anyone is interested in the > results, here's a link to the code. > > http://www.infinitefactors.org/misc/hamm.zip > > A very short demo program that tests the routines and > illustrates their use is also included. There is a small > file included where the specific data sizes can be specified > for a particular compiler. > > They've been tested using a couple of compilers, Microsoft's > VC++ 1.52C DOS compiler and IAR's MSP430 (Workbench 5.0) > compiler, but not so thoroughly tested that I'm confident > they will compile well with all of them. The design reuses > code, where it seemed possible, and does NOT require special > data typedefs. > > Much better could be had with assembly, but these satisfied > the need for now and, of course, assembly would require many > different instances to be coded. These routines can act as a > template, though, for deductions to specific assembly coding. > > I will gladly modify them based upon useful input, as well. > > Jon Hi Jon, May I give you some constructive criticism on the code? I haven't actually gone through the details of the implementation itself as yet, to see if I can see any way to improve on the size or speed of the code. I'll do that if I can find the time, mainly for the fun of it. But I have a few more general comments on the code. First off, before I criticise anything, let me thank you for sharing the code with others - this is a kind of coding that many people find hard to get right. And of course, my comments here are based on my own personal opinions, preferences, and rants - when I write "you should never do this", I really mean "/I/ would never do this" :-) I also don't expect you to agree with everything I write, though I hope you'll agree with some of it. Why did you pick the LGPL? It basically means that people here can use your code in test code, but can't (legally) use it in any "production" code. You should drop the "auto" keyword. It has been unnecessary in C for decades, conflicts with newer C++ standards, and if there are in fact any compilers that pay attention to it, it will generate worse code (since "auto" is then a hint to the compiler to put the variable on the stack, hindering the optimiser). You have a "hamm_def.h" include file to define your own non-standard sized integer types that are not actually portable or accurate (at the very least, use "short int" for 16-bit - it's not universal, but it is more portable than plain "int"). Since you are clearly aiming for portable code, you would be much better using "#include <stdint.h>" and using modern sized integer types. It is perfectly reasonable to expect people to have <stdint.h> these days - anyone who needs to use your code with ancient tools will simply have to put in their own typedefs. Having first defined these size-specific types, you then ignore them in the code and use that most hideous of illogical expression, the "unsigned char". Until someone can draw me two letter "a"s and explain to me why one of these is "signed" and the other "unsigned", I will only use "char" for holding a character. When I want an 8-bit number, I'll use the logically-named standard types for it - uint8_t or int8_t. The fact that these boil down to "signed char" and "unsigned char" is one of the uglinesses of C that we can't entirely escape - but we should do our best. Another failing of C is its lack of a proper Boolean type. But for a very long time, <stdbool.h> has given us something close - and people writing clear code have used their own typedefs, #define's, and enums for bools before that. If you are writing a function that returns a success or failure, it should return a "bool" with values "true" and "false". That way you are saying exactly what you mean - giving both the user and the compiler the best use of the code (a "bool" is faster than an "int" on an AVR, for example). In general, functions that take pointers and return a status indication are inefficient, and can often be a source of error (people pass them pointers to inappropriate types). On small microcontrollers, particularly 8-bit devices, using pointers like this means a lot more code and time compared to passing data directly in registers. Even on the msp430, which has very good pointer support, it is slower. There are many alternative ways to handle this sort of interface. For example, you could have a global "hammingDecodeError" flag that is set on failure (yes, I really mean using a global - in almost all realistic uses of these functions, it would be perfectly safe and faster than anything else). You could pass the value directly, with a pointer to an error flag as a separate parameter. Or you could also define a specific return value as an error indicator. You've picked the "traditional" C approach, though it is perhaps the most inefficient. You use a lot of identifiers starting with double underscores - these are supposed to be restricted to internal identifiers associated with the compiler and standard libraries. They are certainly not to be used on the main API call interface for your functions. #including a ".c" file is, IMHO, horrible. One .c file compiles to one ..o file. Just because C and the preprocessor does not provide a proper module concept, does not mean you should use any old file structure. At the very least, hamm_xx should be .h (or .inc, to distinguish it from proper header files). I would rate copy-and-paste above including a ".c" file. mvh., David
From: steve_schefter on 7 Mar 2010 14:01 Hi David. > Why did you pick the LGPL? It basically means that people here can use > your code in test code, but can't (legally) use it in any "production" code. This isn't my impression of LGPL. Can I ask what leads you to that conclusion? Steve
From: Jon Kirwan on 7 Mar 2010 14:01 On Sun, 07 Mar 2010 13:16:44 +0100, David Brown <david(a)westcontrol.removethisbit.com> wrote: >On 07/03/2010 01:56, Jon Kirwan wrote: >> I needed to write some very simple c routines (tempted to use >> assembly, but not for this application at this time) to >> handle the following extended hamming code cases: >> >> [ 8, 4, 4] >> [16, 11, 4] >> [32, 26, 4] >> >> (These are the total number of bits, number of data bits, and >> the guaranteed Hamming distance between valid symbols.) >> >> As indicated by the last number (the hamming distance between >> valid symbols) these routines are able to correct single bit >> errors and detect double-bit errors (SECDED.) >> >> There are many different ways to perform such Hamming codes >> and I've chosen a particular one among many. There are >> others that also work. Although I've chosen a fairly well >> documented form, it's certainly not the only possible form >> and so it is that comparisons of the encoded results with >> other algorithms may, or may not, match up bit for bit. But >> the function remains and it does conform to some I've seen. >> >> The code I've written uses no tables at all and is very >> short. As such, it may be appropriate for small micro >> embedded use. For example, the 16-bit Hamming version, on >> the MSP430 using IAR and the standard "release" code >> generation, yields 144 words of code. With the "debug" code >> generation, it's 175 words, instead. For 32-bit Hamming >> codes, "release" again, that nearly doubles to 248 words. But >> that gives an idea. Cycle counts vary on inputs. >> >> I've isolated these into three short modules -- one for each >> of the three data types required. There's an encode and >> decode routine in each. If anyone is interested in the >> results, here's a link to the code. >> >> http://www.infinitefactors.org/misc/hamm.zip >> >> A very short demo program that tests the routines and >> illustrates their use is also included. There is a small >> file included where the specific data sizes can be specified >> for a particular compiler. >> >> They've been tested using a couple of compilers, Microsoft's >> VC++ 1.52C DOS compiler and IAR's MSP430 (Workbench 5.0) >> compiler, but not so thoroughly tested that I'm confident >> they will compile well with all of them. The design reuses >> code, where it seemed possible, and does NOT require special >> data typedefs. >> >> Much better could be had with assembly, but these satisfied >> the need for now and, of course, assembly would require many >> different instances to be coded. These routines can act as a >> template, though, for deductions to specific assembly coding. >> >> I will gladly modify them based upon useful input, as well. >> >> Jon > >Hi Jon, > >May I give you some constructive criticism on the code? Please do. I always enjoy learning how others view things and how I might do better, as well. Your thoughts help me a great deal and I appreciate it a lot. >I haven't >actually gone through the details of the implementation itself as yet, >to see if I can see any way to improve on the size or speed of the code. > I'll do that if I can find the time, mainly for the fun of it. But I >have a few more general comments on the code. Thanks. >First off, before I criticise anything, let me thank you for sharing the >code with others - this is a kind of coding that many people find hard >to get right. And of course, my comments here are based on my own >personal opinions, preferences, and rants - when I write "you should >never do this", I really mean "/I/ would never do this" :-) I also >don't expect you to agree with everything I write, though I hope you'll >agree with some of it. Okay. >Why did you pick the LGPL? It basically means that people here can use >your code in test code, but can't (legally) use it in any "production" code. Partly, it was exactly because of your reaction. We've recently had a modest discussion about licensing and I have frankly found my reading of specific licenses to have muddled by understanding more than clarified it. So much to take in and it still hasn't gelled in my mind, yet. Partly, I think, because I'm not yet even sure what I want. I'm very much open to ANY thoughts here. >You should drop the "auto" keyword. It has been unnecessary in C for >decades, conflicts with newer C++ standards, and if there are in fact >any compilers that pay attention to it, it will generate worse code >(since "auto" is then a hint to the compiler to put the variable on the >stack, hindering the optimiser). I also use LET when writing BASIC code. It's an idiom of mine, I suppose. But it helps _me_ by placing a written word that explicitly calls my attention to its unique nature in functions and it harms nothing at all. I use a lot of languages, not just c, and I have used so many more. Everything from RPG II and COBOL and Algol and various assembly languages, starting around 1970 or so through BASIC a few years later and c in '78 (I remember, because it was with Unix v6 kernel code) and c++ in about '87, or so. Because of the breadth of languages I still continue to use, it also saves me from just that very tiny extra moment of thought about c, in particular. Without the 'auto' keyword, the same line, outside a function, works but has a storage type, life span, and visibility that is marked different and I don't like having to pause even a moment. The 'auto' keyword takes a fraction of a moment to type and it pays, time and time again later on for me. I've been known to use LET in BASIC code, too. I don't know if this is a habit I want to shed. However, I'd like to know more about why you wrote "conflicts with newer c++ standards." Can you provide a standards document reference I can go to on this subject? Or is this simply a "new standard" that is developing within the gcc community that I need to learn about? Also, a final point. It used to have no cost and have no change in the compiler behavior -- certainly that remains true with c, which is the language I chose here. Had I been using c++ for this code, which is a different language as I am continually reminded of, I would have written it as a template class allowing instancing, as needed. Do you argue that when I write c, not c++, that I need to be aware of your comments regarding this "new standard?" >You have a "hamm_def.h" include file to define your own non-standard >sized integer types that are not actually portable or accurate (at the >very least, use "short int" for 16-bit - it's not universal, but it is >more portable than plain "int"). Since you are clearly aiming for >portable code, you would be much better using "#include <stdint.h>" and >using modern sized integer types. It is perfectly reasonable to expect >people to have <stdint.h> these days - anyone who needs to use your code >with ancient tools will simply have to put in their own typedefs. Thanks for that. I just checked a couple of c compiler implementations I have here and am still supporting, actively, for customers. There is no stdint.h file in their include directories. Doesn't exist. For example, it's not present in Lattice c or in Microsoft's VC 1.52c (which is the actual target for this code.) And I still support systems written in both of these compiler tools, that are used in modern FABs today for IC manufacturing. If you can believe it. It is a C99 standard, not C90. I honestly wish that I could rely upon C99 (despite some sour comments I've read from at least one compiler author, who I trust to know what he speaks of, that C99 is a terrible standard... an opinion I'm not well enough informed about to argue either way), but that simply isn't even the modern world of compiler tools for embedded use I'm exposed to (and I'm not suggesting that VC++ 1.52C is modern, so don't mistake me here.) Only a very few of the compiler tools I have access to attempt much of C99, though I've no doubt that stdint.h is an easy one for them. This leaves me arguing that I would prefer to rely upon a base I _can_ fairly require, which is C90. I'm still open to an argument here or other ideas. >Having first defined these size-specific types, you then ignore them in >the code and use that most hideous of illogical expression, the >"unsigned char". No, the specified size is on purpose, Dave (setting aside for a moment what you will address in a moment regarding my use of unsigned.) In this case, the check values will NEVER exceed that size. Because of that fact, I can insist on it. There is no need for the user of this code to worry about that detail, because I have. Check values, plus parity, of 8 bits will cover 128 bit words. For those interested in exceeding that with my code, I suppose they can modify the code to deal with it. (I might have created a separate typedef, I suppose.) Do you have a good way I might be able to use sizeof() in a #if? I seem to recall that doesn't work. >Until someone can draw me two letter "a"s and explain >to me why one of these is "signed" and the other "unsigned", I will only >use "char" for holding a character. Well, then we have a serious problem, Dave. There is quite a difference between c's handling of signed and unsigned in terms of conversions and in this particular case, it becomes important. A shift operation is only one such reason. If you are interested in exploring what happens, go ahead and change the unsigned chars to just char and modify the 1U to a 1, instead, in hamm_xx.c and recompile and check the results. c allows quite a range of unspecified behavior when simply using char. One has no idea whether or not it is signed or unsigned on a particular implementation. Operations, such as shift, are defined well for unsigned but not so with signed. The conversion of signed to unsigned is well defined over the entire range, but unsigned to signed is not well defined -- there is a huge hole in the specification there. Had I decided to use char here, I would have been worrying about conversions here and there, across thoughts of many different compilers and targets -- conversions I don't have to worry about as written. When I wrote this code, I avoided the pitfalls entirely, this way. Had I used char, I would have had to carefully considered each and every operation in the context of all the compilers I've had experience with. That's too much to add when writing something quick. >When I want an 8-bit number, I'll >use the logically-named standard types for it - uint8_t or int8_t. Ah. So I could have used those in hamm_def.h, then, and you'd have felt better about that choice, at least. >The >fact that these boil down to "signed char" and "unsigned char" is one of >the uglinesses of C that we can't entirely escape - but we should do our >best. Well, I will await your discussion of the above before continuing on. >Another failing of C is its lack of a proper Boolean type. But I'm writing in c. Oh, well. >But for a >very long time, <stdbool.h> has given us something close - and people >writing clear code have used their own typedefs, #define's, and enums >for bools before that. Sorry, no stdbool.h, either. Not in the target or in some of the other compilers I've used in the past. It is, yet again, a C99 introduction. Not C90. I cannot rely upon it. >If you are writing a function that returns a >success or failure, it should return a "bool" with values "true" and >"false". That way you are saying exactly what you mean - giving both >the user and the compiler the best use of the code (a "bool" is faster >than an "int" on an AVR, for example). I completely agree, in general. However, it has been LONG practice in c to use 'int' for this purpose and to return 0 or 1. Doing that continues a well-worn usage and shouldn't cause anyone any difficulties at all. Using a C99 standard would cause some difficulties in cases I can readily avoid by this usage. >In general, functions that take pointers and return a status indication >are inefficient, and can often be a source of error (people pass them >pointers to inappropriate types). ><snip> But we aren't speaking "in general" here. In this case, it is an explicit nature of SECDED that you can _detect_ an error. That's the whole point of the idea. A Hamming code with a Hamming distance of only 3 has no error detection, only correction. So there, you might have a point. But this is for a Hamming distance of 4, with the outright desire of doing so being that errors _can_ be detected and reported. So we aren't talking a "general case." You've put in some more time, as well. And I appreciate it. But let's cover these before completing the rest. I'd enjoy the discussion very much. But this is already enough to grapple with and it may be better to find agreement in styles here before moving much further. Thanks, Jon
From: David Brown on 7 Mar 2010 14:23 steve_schefter(a)hotmail.com wrote: > Hi David. > >> Why did you pick the LGPL? It basically means that people here can use >> your code in test code, but can't (legally) use it in any "production" code. > > This isn't my impression of LGPL. Can I ask what leads you to that > conclusion? > > Steve To use LGPL'ed code with your own code, anyone receiving your code must have access to the LGPL'ed code, be able to change and recompile that code, then link it along with the rest of the binary and run that software instead. For non-embedded systems (or "big" embedded systems), that typically means the LGPL'ed code is part of a dynamic library. To use LGPL code in a (typically statically linked) embedded system means that the end user will need access to your source code, or at least compiled and linkable object files. Since that is impractical or impossible for most embedded systems, the rough conclusion is that pure LGPL code is very seldom suitable for embedded systems. Of course, in some cases that is exactly what the code supplier is intending - the end user is able to use the code for evaluation and testing, and in non-commercial systems (and also for internal use), but for a commercial system they would get a separate license from the code supplier. That's a valid way of licensing code to developers. But I don't think that was Jon's intention here. This has been covered endlessly in previous threads in this newsgroup - please let's not have another one unless someone has new points that were not covered recently.
From: Frank Buss on 7 Mar 2010 14:26
Jon Kirwan wrote: > Partly, I think, because I'm not yet even sure what I want. That's the first thing you have to solve :-) Maybe a short explanation can help: LGPL allows other developers to build a library from your source code and this library can be used in closed source programs, but the library source and all changes to the library has to be made public. If it is GPL, anything which uses it has to be public and GPL, too. For most public projects, I prefer the BSD license: Do whatever you want with it, but add some credits of the original author to your documentation. -- Frank Buss, fb(a)frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de |