Saving C structures to file -- best practices [Embedded]

Prev: 50 ohm PCB antenna track impedance
Next: BT earpieces

From: D Yuniskis on 24 Mar 2010 15:30

Hi Dave,

Dave Boland wrote:
> I'm doing some work on a 16-bit micro and one of the things the software
> does is store data in a structure for easy access. Currently, I save
> the entire structure to file and get it back from file when the program
> is restarted. No problem so far.
>
> However, if I need to change the structure (add a member, remove one,
> change one, etc.), the data, when read in from file, gets corrupted as
> one may expect. In addition, this code may be ported to a 32-bit
> controller, which means the alignment of the data in the structure will
> be different. So even if the structure doesn't change, when reading
> data created by a 16-bit processor I would expect the 32-bit processor
> to corrupt the data.
>
> So here is my question -- what are the best practices for saving data
> from a C structure so it is universally readable -- even if the
> structure definition changes? I'm leaning toward a character array
> (buffer) that uses name-value pairs, but that is a lot more work, and
> may be over-kill. Any thoughts?

If space is no object (in the code to write *and* read -- as well
as the bulk medium itself), then push everything in ASCII and
pull it back the same way. Add a version number to the file
format so your code can look at that and adapt to the specifics
of what it should expect to encounter.

As with any input, validate each parameter before using it
(since it can be corrupted -- even intentionally). If you
need to ensure the contents are not corrupted, sign them
with a secure hash.

If you are limited on space, do something like the way TIFF
files are encoded -- tag each parameter with a unique *numeric*
identifier. (TIFF would be good to study in this regard
as it makes this pretty clear; you might also look at the
way NCD X Terminals store their NVRAM parameters). If you
decide you need a new parameter -- or need to entirely
change its syntax -- you can just "invent" another identifier.

In either case, either push a flag into the file to indicate
byte ordering (if you want to store "raw binary" and need
to be able to read it back into a different architecture)
or force a particular byte ordering (e.g., analagous to
"network byte order").

Avoid just pushing things like floats, doubles, etc. as
binary unless you are willing to adopt the same encoding
universally in their representation (e.g., IEEE 754).

Also remember that the way a struct is represented in
memory varies from compiler to compiler. E.g., instead
of pushing the bytes that the struct occupies, instead,
push the *values* contained within the struct as if they
were individual "parameters" (possibly *within* a single
parameter -- delimited with something contextually
appropriate)

From: David Brown on 25 Mar 2010 04:27

On 24/03/2010 16:54, Rich Webb wrote:
> On Wed, 24 Mar 2010 11:16:13 -0400, Dave Boland<dboland9(a)fastmail.fm>
> wrote:
>
>> I'm doing some work on a 16-bit micro and one of the things the software
>> does is store data in a structure for easy access. Currently, I save
>> the entire structure to file and get it back from file when the program
>> is restarted. No problem so far.
>>
>> However, if I need to change the structure (add a member, remove one,
>> change one, etc.), the data, when read in from file, gets corrupted as
>> one may expect. In addition, this code may be ported to a 32-bit
>> controller, which means the alignment of the data in the structure will
>> be different. So even if the structure doesn't change, when reading
>> data created by a 16-bit processor I would expect the 32-bit processor
>> to corrupt the data.
>>
>> So here is my question -- what are the best practices for saving data
>>from a C structure so it is universally readable -- even if the
>> structure definition changes? I'm leaning toward a character array
>> (buffer) that uses name-value pairs, but that is a lot more work, and
>> may be over-kill. Any thoughts?
>
> Add to the structure a version and a size member in a fixed location
> (most easily at the top of the structure). That gives your application
> code at least some chance of doing the right thing if it knows, e.g.,
> that member X is only present in version Y and later.
>
> Once the layout has mostly settled down, only add new members at the
> end. Never remove (or change) old members, just mark them as "reserved"
> or "optional" and fill them with benign values.
>

That is the way I usually handle such data (mostly in eeprom rather than
a file, but the principle is the same). It's compact and efficient, and
you can handle smooth upgrades (and even downgrades).

> For alignment, make the first element of the structure a 32-bit
> quantity. The standard requires that a pointer to the structure also
> points to the first element (and vice versa, of course). That should
> "square up" the overall alignment between 16- and 32-bit environments.
>

Aligning the first element may not be enough. On 8-bit or 16-bit
machine, struct { uint32_t a; uint16_t b; uint32_t c; } will take 10
bytes and c will be unaligned. Some 32-bit cpus can work with unaligned
values, but others cannot.

So make sure that elements are aligned to their "natural" alignments.
And of course use only sized types.

As a check, simply compile the struct typedef on a 32-bit compiler and
check its size is the same as on the 16-bit compiler.

From: Rich Webb on 25 Mar 2010 10:19

On Thu, 25 Mar 2010 09:27:46 +0100, David Brown
<david(a)westcontrol.removethisbit.com> wrote:

>Aligning the first element may not be enough. On 8-bit or 16-bit
>machine, struct { uint32_t a; uint16_t b; uint32_t c; } will take 10
>bytes and c will be unaligned. Some 32-bit cpus can work with unaligned
>values, but others cannot.
>
>So make sure that elements are aligned to their "natural" alignments.
>And of course use only sized types.
>
>As a check, simply compile the struct typedef on a 32-bit compiler and
>check its size is the same as on the 16-bit compiler.

Yes, I've been meaning to pull the thread mentioned in this, er, thread:
<http://groups.google.com/group/osdeve_mirror_tcpip_uip/browse_thread/thread/1b087fa6027122ae/8720f5401f9f2c2d?q=byte+ordering+problem+with+arm7>

Ran across it after I had ported uip to an MSP430-series board and was
thinking about a port to one of NXP's LPC2xxx chips.

From the link above:
"As Patrick pointed out, it's more like an alignment problem, on arm
that's a heavy issue. On arm7, an unaligned access doesn't generate a
trap, instead, it's accepted on a very specific arm way that internally
rotates the bits not in the way one would expect. The compiler assumes
you know what you're doing, and will not try to cope with unaligned
accesses unless the data type (or the attributes, such as packed)
implies that."

I really haven't taken the time to suss it out in detail, though. Maybe
this weekend ...

--
Rich Webb Norfolk, VA

From: Boudewijn Dijkstra on 25 Mar 2010 10:49

Op Thu, 25 Mar 2010 15:19:51 +0100 schreef Rich Webb
<bbew.ar(a)mapson.nozirev.ten>:
> On Thu, 25 Mar 2010 09:27:46 +0100, David Brown
> <david(a)westcontrol.removethisbit.com> wrote:
> <http://groups.google.com/group/osdeve_mirror_tcpip_uip/browse_thread/thread/1b087fa6027122ae/8720f5401f9f2c2d?q=byte+ordering+problem+with+arm7>
>
> From the link above:
> "As Patrick pointed out, it's more like an alignment problem, on arm
> that's a heavy issue. On arm7, an unaligned access doesn't generate a
> trap, [but rotates the data]

Only with word-size transfers in ARM mode. In Thumb mode or with halfword
transfers, it is worse; the result is UNPREDICTABLE.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

From: Nobody on 25 Mar 2010 19:04

On Wed, 24 Mar 2010 11:16:13 -0400, Dave Boland wrote:

> So here is my question -- what are the best practices for saving data
> from a C structure so it is universally readable -- even if the
> structure definition changes? I'm leaning toward a character array
> (buffer) that uses name-value pairs, but that is a lot more work, and
> may be over-kill. Any thoughts?

It depends upon how much inefficiency you can tolerate, in terms of data
size, code size and execution time.

A context-free grammar such as XML, JSON, YAML, etc will provide maximum
flexibility at the cost of efficiency.

The most efficient solution is likely to be a binary structure with a
version field. However, code size can suffer if the structure changes
frequently and the code needs to be able to handle all known versions.

I wouldn't seriously consider XML unless you actually need to be able to
manipulate the data with external tools. JSON, YAML, etc will provide the
same level of flexibility with a fraction of the code size (there's no
point in using XML unless you're going to write a 100%-conformant parser,
which is harder than it looks).

Or you can use a binary equivalent of something like JSON, where keys are
unique integers instead of strings, and values consist of a type byte
followed by binary data appropriate to the type.

From a time-efficiency perspective, writing numbers in hex rather than
decimal can provide a significant gain.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: 50 ohm PCB antenna track impedance
Next: BT earpieces