A bit resistant to disruption [General Programming]

Prev: CFP: SAS'2010 - 17th International Static Analysis Symposium, Perpignan, France
Next: Get 101 Question on C, C++, C++ under Windows, MFC, DLL, COM/DOM

From: Thad Smith on 12 Feb 2010 23:49

Francois Grieu wrote:
> /*
> Disclaimer: This is a hard problem, only minimally disguised into
> something on-topic for comp.lang.c

comp.programming and comp.arch.embedded added to the list.

> You are programming a C99 conforming freestanding implementation.
> Disruption (reset or power off/on cycle) can occur at *ANY* time without
> advance warning, and stops whatever operation is underway. The program
> is re-executed from main() after a disruption.
>
> The objective is to implement a single bit resistant to disruption,
> accessed using two procedures to be written:
> int bRead(void); // Return 0 or 1 according to bit state.
> void bToggle(void); // Change the state of the bit.
>
> These two properties shall be met:
> a) The bit is stable in the absence of Toggle. That is, the result given
> by any two undisrupted bRead() is the same unless bToggle() was invoked
> in-between.
> b) The bit is changed by Toggle. That is, the result given by any two
> undisrupted bRead() is different if bToggle() was invoked exactly once
> in-between and no disruption occurred during that execution of bToggle().
....
> An EEPROM cell can be left neither erased nor programmed by a disrupted
> eErase unless that cell was already erased, or by a disrupted eProgram.
> For a cell in such halfway state, eRead returns a value specified only
> to be 0 or 1. That otherwise unspecified value may in particular change
> after disruption (due e.g. to uncontrollable random noise and/or
> temperature and/or voltage drift in the hardware used by eRead), even
> though the sate of cells is unchanged by disruption or/and eRead.
>
> Note: if a cell is not erased (left in a halfway state or programmed),
> an eProgram of that cell is prohibited. A workaround is to use the

First, this is an uninteresting problem because if a power failure occurs during
a toggle attempt, the restored state gives no information at all.

Consider the following alternate problem that guarantees to restore some
information.

Let a variable have 3 stable states s1, s2, and s3. A stable state is preserved
across power losses. Assume that the state always changes s1 to s2, s2 to s3,
and s3 to s1 (mod 3 counter). Since transitions are not instantaneous, there
are 3 transient states: t12, t23, and t31. How many bits, using your rules, are
required to, on power restoration, recover either the preceding stable state, or
if failure during transition, eith the preceding of succeeding stable state?
This explicitly prohibits restoring the third state.

This can be done with 3 independent bits:
B3B2B1
s1 0 0 1
t12 0 1 1
s2 0 1 0
t23 1 1 0
s3 1 0 0
t31 1 0 1

Transition from s1 to s2: write B2, then erase B1, etc.

Recovery from reset:
If in a stable state, erase all zero bits. This makes the stable state robust.
If in a transient state, erase the bit to revert to the previous stable state.

The only remaining problem is to getting a robust first state, since if the
power is lost when first attempting to write state s1 and the B1 is a weak 1, it
will not be restored. You thus need a robust initialization to the first state.

Proof: There is either one or two one bits set, since once two bits are set an
erasure precedes writing the other bit. If failure during a transition before
the written bit becomes readable, it is erased on recovery, making it stable.
If the transition state is read on restart, the new bit may be weak. It is
erased when the previous state is restored. If a failure occurs during a the
second step of the transition when the erasure was not complete, but intially
read as zero, the rule for zero erasure makes the new state robust.

This can be extended, of course, to a variable with additional states. All real
EEPROM that I am familiar with erases and writes more than one bit at a time, so
that would affect the design of an optimal algorithm for storing more information.

--
Thad

From: Daniel Giaimo on 13 Feb 2010 00:15

On 2/12/2010 11:49 PM, Thad Smith wrote:
> Francois Grieu wrote:
>> /*
>> Disclaimer: This is a hard problem, only minimally disguised into
>> something on-topic for comp.lang.c
>
> comp.programming and comp.arch.embedded added to the list.
>
>> You are programming a C99 conforming freestanding implementation.
>> Disruption (reset or power off/on cycle) can occur at *ANY* time
>> without advance warning, and stops whatever operation is underway. The
>> program is re-executed from main() after a disruption.
>>
>> The objective is to implement a single bit resistant to disruption,
>> accessed using two procedures to be written:
>> int bRead(void); // Return 0 or 1 according to bit state.
>> void bToggle(void); // Change the state of the bit.
>>
>> These two properties shall be met:
>> a) The bit is stable in the absence of Toggle. That is, the result
>> given by any two undisrupted bRead() is the same unless bToggle() was
>> invoked in-between.
>> b) The bit is changed by Toggle. That is, the result given by any two
>> undisrupted bRead() is different if bToggle() was invoked exactly once
>> in-between and no disruption occurred during that execution of bToggle().
> ...
> > An EEPROM cell can be left neither erased nor programmed by a disrupted
> > eErase unless that cell was already erased, or by a disrupted eProgram.
> > For a cell in such halfway state, eRead returns a value specified only
> > to be 0 or 1. That otherwise unspecified value may in particular change
> > after disruption (due e.g. to uncontrollable random noise and/or
> > temperature and/or voltage drift in the hardware used by eRead), even
> > though the sate of cells is unchanged by disruption or/and eRead.
> >
> > Note: if a cell is not erased (left in a halfway state or programmed),
> > an eProgram of that cell is prohibited. A workaround is to use the
>
> First, this is an uninteresting problem because if a power failure
> occurs during a toggle attempt, the restored state gives no information
> at all.
>
> Consider the following alternate problem that guarantees to restore some
> information.
>
> Let a variable have 3 stable states s1, s2, and s3. A stable state is
> preserved across power losses. Assume that the state always changes s1
> to s2, s2 to s3, and s3 to s1 (mod 3 counter). Since transitions are not
> instantaneous, there are 3 transient states: t12, t23, and t31. How many
> bits, using your rules, are required to, on power restoration, recover
> either the preceding stable state, or if failure during transition, eith
> the preceding of succeeding stable state? This explicitly prohibits
> restoring the third state.
>
> This can be done with 3 independent bits:
> B3B2B1
> s1 0 0 1
> t12 0 1 1
> s2 0 1 0
> t23 1 1 0
> s3 1 0 0
> t31 1 0 1
>
> Transition from s1 to s2: write B2, then erase B1, etc.
>
> Recovery from reset:
> If in a stable state, erase all zero bits. This makes the stable state
> robust.
> If in a transient state, erase the bit to revert to the previous stable
> state.
>
> The only remaining problem is to getting a robust first state, since if
> the power is lost when first attempting to write state s1 and the B1 is
> a weak 1, it will not be restored. You thus need a robust initialization
> to the first state.
>
> Proof: There is either one or two one bits set, since once two bits are
> set an erasure precedes writing the other bit. If failure during a
> transition before the written bit becomes readable, it is erased on
> recovery, making it stable. If the transition state is read on restart,
> the new bit may be weak. It is erased when the previous state is
> restored. If a failure occurs during a the second step of the transition
> when the erasure was not complete, but intially read as zero, the rule
> for zero erasure makes the new state robust.

I think there is a flaw here. Suppose you are moving from transition
state t12 to s2 when you are interrupted. In this case bit b1 will be
in a halfway state when you come back up. Suppose when you come back up
you read the state as t12. You then follow your procedure to revert to
state s1 by erasing b2. You are then interrupted again. When you come
back, you read all bits 0 because bit b1 was actually in a halfway
state.

--
Dan Giaimo

From: Moi on 13 Feb 2010 07:49

On Sat, 13 Feb 2010 00:15:47 -0500, Daniel Giaimo wrote:

> On 2/12/2010 11:49 PM, Thad Smith wrote:
>> Francois Grieu wrote:
>>> /*
>>> Disclaimer: This is a hard problem, only minimally disguised into
>>> something on-topic for comp.lang.c
>>
>> comp.programming and comp.arch.embedded added to the list.
>>
>>> You are programming a C99 conforming freestanding implementation.
>>> Disruption (reset or power off/on cycle) can occur at *ANY* time
>>> without advance warning, and stops whatever operation is underway. The
>>> program is re-executed from main() after a disruption.
>>>
>>> The objective is to implement a single bit resistant to disruption,
>>> accessed using two procedures to be written: int bRead(void); //
>>> Return 0 or 1 according to bit state. void bToggle(void); // Change
>>> the state of the bit.
>>>
>>> These two properties shall be met:
>>> a) The bit is stable in the absence of Toggle. That is, the result
>>> given by any two undisrupted bRead() is the same unless bToggle() was
>>> invoked in-between.
>>> b) The bit is changed by Toggle. That is, the result given by any two
>>> undisrupted bRead() is different if bToggle() was invoked exactly once
>>> in-between and no disruption occurred during that execution of
>>> bToggle().
>> ...
>> > An EEPROM cell can be left neither erased nor programmed by a
>> > disrupted eErase unless that cell was already erased, or by a
>> > disrupted eProgram. For a cell in such halfway state, eRead returns
>> > a value specified only to be 0 or 1. That otherwise unspecified
>> > value may in particular change after disruption (due e.g. to
>> > uncontrollable random noise and/or temperature and/or voltage drift
>> > in the hardware used by eRead), even though the sate of cells is
>> > unchanged by disruption or/and eRead.
>> >
>> > Note: if a cell is not erased (left in a halfway state or
>> > programmed), an eProgram of that cell is prohibited. A workaround is
>> > to use the
>>
>> First, this is an uninteresting problem because if a power failure
>> occurs during a toggle attempt, the restored state gives no information
>> at all.
>>
>> Consider the following alternate problem that guarantees to restore
>> some information.
>>
>> Let a variable have 3 stable states s1, s2, and s3. A stable state is
>> preserved across power losses. Assume that the state always changes s1
>> to s2, s2 to s3, and s3 to s1 (mod 3 counter). Since transitions are
>> not instantaneous, there are 3 transient states: t12, t23, and t31. How
>> many bits, using your rules, are required to, on power restoration,
>> recover either the preceding stable state, or if failure during
>> transition, eith the preceding of succeeding stable state? This
>> explicitly prohibits restoring the third state.
>>
>> This can be done with 3 independent bits: B3B2B1
>> s1 0 0 1
>> t12 0 1 1
>> s2 0 1 0
>> t23 1 1 0
>> s3 1 0 0
>> t31 1 0 1
>>
>> Transition from s1 to s2: write B2, then erase B1, etc.
>>
>> Recovery from reset:
>> If in a stable state, erase all zero bits. This makes the stable state
>> robust.
>> If in a transient state, erase the bit to revert to the previous stable
>> state.
>>
>> The only remaining problem is to getting a robust first state, since if
>> the power is lost when first attempting to write state s1 and the B1 is
>> a weak 1, it will not be restored. You thus need a robust
>> initialization to the first state.
>>
>> Proof: There is either one or two one bits set, since once two bits are
>> set an erasure precedes writing the other bit. If failure during a
>> transition before the written bit becomes readable, it is erased on
>> recovery, making it stable. If the transition state is read on restart,
>> the new bit may be weak. It is erased when the previous state is
>> restored. If a failure occurs during a the second step of the
>> transition when the erasure was not complete, but intially read as
>> zero, the rule for zero erasure makes the new state robust.
>
> I think there is a flaw here. Suppose you are moving from transition
> state t12 to s2 when you are interrupted. In this case bit b1 will be
> in a halfway state when you come back up. Suppose when you come back up
> you read the state as t12. You then follow your procedure to revert to
> state s1 by erasing b2. You are then interrupted again. When you come
> back, you read all bits 0 because bit b1 was actually in a halfway
> state.

I had exactly the same coding, and I am suffering exactly the same problems at recovery.

The thing that counts is, that (given that at any time *only one bit can be halfway*)
- for the "stable" patters with one 1 bit the 1 bit can be trusted, and one of the 0s may
actually be halfway
- for the "transient" patterns with two 1s, only the 0 can be trusted, and one
of the 1s may actually be halfway.

At recovery: in the t31 (101) case above, the intended state is s3 (100),
and the way to get there is obviously to erase the rightmost bit (followed
by an additional erase of the middle bit)
This will work fine if the rightmost bit was actually halfway, but
if the leftmost bit was halfway, the next state *may* become 000,
(which in my implementation is a "forbidden" uninitialized state,
which steps forward to s1 (001), which is not what we want.
I am a bit hesitating to set the middle bit, since that is the only trusted bit
in the t31 (101) state.

So the real problem becomes: there are two bits can be halfway, but not both,
and we don't know which one. To be sure we need to touch them both, in the
right order (avoiding unwanted states, which, after a crash in our recovery process,
might be picked up by a second recovery)

AvK

From: Thad Smith on 13 Feb 2010 10:13

Daniel Giaimo wrote:

>> > An EEPROM cell can be left neither erased nor programmed by a
>> disrupted
>> > eErase unless that cell was already erased, or by a disrupted
>> eProgram.
>> > For a cell in such halfway state, eRead returns a value specified only
>> > to be 0 or 1. That otherwise unspecified value may in particular
>> change
>> > after disruption (due e.g. to uncontrollable random noise and/or
>> > temperature and/or voltage drift in the hardware used by eRead), even
>> > though the sate of cells is unchanged by disruption or/and eRead.
>> >
>> > Note: if a cell is not erased (left in a halfway state or programmed),
>> > an eProgram of that cell is prohibited. A workaround is to use the
>>
>> First, this is an uninteresting problem because if a power failure
>> occurs during a toggle attempt, the restored state gives no information
>> at all.
>>
>> Consider the following alternate problem that guarantees to restore some
>> information.
>>
>> Let a variable have 3 stable states s1, s2, and s3. A stable state is
>> preserved across power losses. Assume that the state always changes s1
>> to s2, s2 to s3, and s3 to s1 (mod 3 counter). Since transitions are not
>> instantaneous, there are 3 transient states: t12, t23, and t31. How many
>> bits, using your rules, are required to, on power restoration, recover
>> either the preceding stable state, or if failure during transition, eith
>> the preceding of succeeding stable state? This explicitly prohibits
>> restoring the third state.
>>
>> This can be done with 3 independent bits:
>> B3B2B1
>> s1 0 0 1
>> t12 0 1 1
>> s2 0 1 0
>> t23 1 1 0
>> s3 1 0 0
>> t31 1 0 1
>>
>> Transition from s1 to s2: write B2, then erase B1, etc.
>>
>> Recovery from reset:
>> If in a stable state, erase all zero bits. This makes the stable state
>> robust.
>> If in a transient state, erase the bit to revert to the previous stable
>> state.
>>
>> The only remaining problem is to getting a robust first state, since if
>> the power is lost when first attempting to write state s1 and the B1 is
>> a weak 1, it will not be restored. You thus need a robust initialization
>> to the first state.
>>
>> Proof: There is either one or two one bits set, since once two bits are
>> set an erasure precedes writing the other bit. If failure during a
>> transition before the written bit becomes readable, it is erased on
>> recovery, making it stable. If the transition state is read on restart,
>> the new bit may be weak. It is erased when the previous state is
>> restored. If a failure occurs during a the second step of the transition
>> when the erasure was not complete, but intially read as zero, the rule
>> for zero erasure makes the new state robust.
>
> I think there is a flaw here. Suppose you are moving from transition
> state t12 to s2 when you are interrupted. In this case bit b1 will be
> in a halfway state when you come back up. Suppose when you come back up
> you read the state as t12. You then follow your procedure to revert to
> state s1 by erasing b2. You are then interrupted again. When you come
> back, you read all bits 0 because bit b1 was actually in a halfway
> state.

Good catch, Dan. Actually what I would really do is to violate the rule against
reprogramming. If it is read as t12 on powerup I would reprogram B2, then erase
B1. I use the rationale that it is a rare occurrance (power must be lost in
brief transition period) and I expect reprogramming to reduce the lifetime
cycles of the cell, rather than immediately kill it. Also, I would add more
transition states in eliminate erasures for stable states on powerup, since
erasures reduce cell life, as well.

--
Thad

From: Thad Smith on 13 Feb 2010 19:16

Thad Smith wrote:
> Francois Grieu wrote:
>> /*
>> Disclaimer: This is a hard problem, only minimally disguised into
>> something on-topic for comp.lang.c
>
> comp.programming and comp.arch.embedded added to the list.
>
>> You are programming a C99 conforming freestanding implementation.
>> Disruption (reset or power off/on cycle) can occur at *ANY* time
>> without advance warning, and stops whatever operation is underway. The
>> program is re-executed from main() after a disruption.
>>
>> The objective is to implement a single bit resistant to disruption,
>> accessed using two procedures to be written:
>> int bRead(void); // Return 0 or 1 according to bit state.
>> void bToggle(void); // Change the state of the bit.
>>
>> These two properties shall be met:
>> a) The bit is stable in the absence of Toggle. That is, the result
>> given by any two undisrupted bRead() is the same unless bToggle() was
>> invoked in-between.
>> b) The bit is changed by Toggle. That is, the result given by any two
>> undisrupted bRead() is different if bToggle() was invoked exactly once
>> in-between and no disruption occurred during that execution of bToggle().
> ....
>> An EEPROM cell can be left neither erased nor programmed by a disrupted
>> eErase unless that cell was already erased, or by a disrupted eProgram.
>> For a cell in such halfway state, eRead returns a value specified only
>> to be 0 or 1. That otherwise unspecified value may in particular change
>> after disruption (due e.g. to uncontrollable random noise and/or
>> temperature and/or voltage drift in the hardware used by eRead), even
>> though the sate of cells is unchanged by disruption or/and eRead.
>>
>> Note: if a cell is not erased (left in a halfway state or programmed),
>> an eProgram of that cell is prohibited. A workaround is to use the
>
> First, this is an uninteresting problem because if a power failure
> occurs during a toggle attempt, the restored state gives no information
> at all.

I'm taking another stab at the problem, based on more analysis and an
understanding of possible use. Thanks to Daniel Giaimo for finding a problem in
the earlier proposal.

Let's assume the variable has three state categories: s1, s2, and t. s1 and s2
are stable states (will not change spontaneously). Other values are transition
values. If the variable is read in a state t, the variable is then changed to
s1 or s2.

This can be used to perform a consistent database state update:
Assume initially database copy A is consistent and dbstate=s1 (database A is
valid). To update, make an updated copy B. At this point both A and B are
consistent. Set dbstate=s2 (database B is valid). Move B to A. Set dbstate=s1.

If power is lost while dbstate is being changed, both database copies are valid,
so restoring the state to either stable state results in a consistent database.
We just need to make sure that s1 and s2 are always stable when an
modification is in progress on copy A or B.

In the following list of sequential states, each of the four bits are
independently erasable and writable (normally implemented in separate bytes).

Bit
dbstate dcba Stable bits
s1 0001 ca
t1a 0011 da
t1b 0111 db
t1c 0110 db
s2 0010 ba
t2a 1010 cb
t2b 1011 dc
t2c 1001 ca
(s1) 0001 ca

To go from state s1 to s2 or s2 to s1, we first re-erase the last bit erased,
then sequentially write the variable to the following transition and stable
states, changing a single bit at a time.

When reading dbstate, if value is s1 or s2, return the value.
If value is txa or txb, erase and rewrite the last bit set, then precede to the
next stable state. If value is txc, re-erase the last bit erased, then precede
to the next stable state.

If power fails when in a transition state, a reading of the state with the above
rules will restore dbstate to a stable value. Multiple interrupts during a
restoration can cause the state values to go from t1b to t1a to s1, but it
doesn't revert any further (similar for t2b to t2a to s2).

Assume power fails during the t1c-s2 or t2c-s1 transition such that the powerup
state reads s2 or s1 with a partially erased bit (d for s1, c for s2). If the
partially erased bit is read as zero, dbstate will be considered stable, even
though there is an unstable bit. The state may revert to the previous
transition state, which, when read, will be advanced to the same stable state.
Assume d is a weak zero in s1 when dbstate starts to advance. The first
operation on advance is to re-erase d, eliminating the weak state, then start
transition to s2.

--
Thad

| Next | Last
Pages: 1 2 3
Prev: CFP: SAS'2010 - 17th International Static Analysis Symposium, Perpignan, France
Next: Get 101 Question on C, C++, C++ under Windows, MFC, DLL, COM/DOM