Exceptions [C++]

From: David Abrahams on 2 Sep 2005 10:39

"Nicola Musatti" <nicola.musatti(a)gmail.com> writes:

> David Abrahams wrote:
> [...]
>> _Programmer errors_ lead to precondition failures which invoke
>> undefined behavior
>
> Ok. What you are saying is you don't know where you are so your best
> option is give up immediately, lest you cause additional damage,
> correct?

No, at least not in that sentence I'm not. All I'm saying there is
what I wrote and not anything about how to respond.

Stopping quickly and as gracefully as possible is usually the best
option. Sometimes you can't afford that, though. For example, if
your program is running the stage lights at a rock concert, you don't
want them to stop flashing. That would be weird. However, you ought
to be thinking about alternatives, like, "can I reboot the system
quickly enough?" And if you're writing critical systems like life
support you ought to be thinking about having backup hardware in place
that can take over while you shut this hardware down.

> I see two issues that arise from this point of view: how to implement
> "giving up" and what kind of recovery can be performed.
>
> Ideally one would want to collect as much information as possible on
> what went wrong for diagnostic purposes. On Unix systems generating a
> core dump is a convenient option, on other systems it might not be so
> easy. On the system I work on

Which one, please?

> I don't have core dumps, but my debugger
> breaks on throws,

Unconditionally? That could severely impair debuggability of some
kinds of legitimate code.

> so at least in development builds I implement assertions by throwing
> exceptions.

If that's you're only option, it's your only option. What can I say?
It might be better to invoke the debugger directly, if it's possible,
though.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Bob Bell on 2 Sep 2005 12:39

Gerhard Menzl wrote:
> Bob Bell wrote:
> No, I am not trying to invalidate your point in any way. When I point
> out what I perceive as inconsistencies I do so in order to to increase
> my understanding and, hopefully, achieve mutual agreement on a more
> refined level.

That's entirely reasonable; sorry if I seemed a little touchy.

> When you say that "the function shouldn't be allowed to execute a single
> instruction more", logging and asserting would be impossible.

Well, maybe I overstated things a bit there.

> "Executing
> as little code as possible", on the other hand, sounds reasonable to me
> and eliminates the contradiction. I still cannot reconcile this
> guideline with Dave's point that unwinding the stack is (almost always)
> wrong, but starting a separate undo/recovery mechanism isn't. This may
> be due to misunderstanding.

To me, "execute as little code as possible" and "don't throw an
exception" are completely consistent with each other, due to the fact
that "throw an exception" is equivalent to "allow any code to run." A
separate recovery mechanism is OK because it can be constrained to
"execute as little code as possible" before the system shuts down. (As
for basing this recovery mechanism on an undo mechanism, I don't have a
strong opinion.)

> > If you mean that you want to avoid crashes/ungraceful shutdowns when
> > end-users use the system, I agree.
>
> That is what I am concerned about, and it's not just because of trying
> to be nice to users. The software I am currently working on is part of a
> larger system that has an abysmal record. The team was recently
> confronted with an ultimatum set by the customer along the lines of:
> you've got six more weeks until the final test; if there's one single
> crash (and from a customer's view, this includes assertions and the
> like), you're out - the end of a contract of several million dollars as
> well as dozens of jobs. From this perspective, one cannot help eyeing
> statements like "terminating the program is good because it helps
> debugging" with a certain reserve. You don't have to tell me that there
> has to be something seriously wrong with the development process to get
> into a situation like this in the first place, but unfortunately,
> development in the real world does not always take the perfect (or even
> reasonably sound) process path.

Ouch. I sympathize; sometimes what we should do is not what we're
allowed to do. I've been trying to keep the discussion on the level of
what we should do.

If I was working for an employer that said "one more crash and you're
fired," I'd probably be bending over backward to make sure that, no
matter what, the system didn't crash, despite the fact that I would
likely do nothing to make the system more stable in any real sense.

> > Perhaps you should try it before deciding that it doesn't work.
>
> I am sorry if I should have created the impression that I have decided
> the approach doesn't work.

I shouldn't have put that last sentence in, I was out of line.

> I *do* make liberal use of assertions. But
> you have to take into account that the more liberally you use
> assertions, the more likely it is that you err on the other side, i.e.
> that you designate a condition as a result of a bug when in reality it
> is a possible, if exotic program state.

There are always risks of mistakes no matter what you do, so I don't
disagree with this point. But I weigh it against the alternative, and
find the alternative worse; the risk of not detecting bugs by leaving
out assertions is worse to me than the risk of mistaking a tolerable
condition as a bug.

Keep in mind also that the more assertions there are, the more likely
you are to find a bug close to its cause.

In any case, if an assertion fires on a condition that should be
tolerated, you've still detected a bug -- the incorrect assertion. ;-)

> I am well aware that certain technical terms mean different things in
> different parts of the software engineering community, but I also think
> that redefining terms gratuitously should be avoided.

Sure, but gratuitous similarities should be avoided as well. The notion
of "class" is significantly different in C++ than it is in, say, CLOS,
but that doesn't prevent the C++ community from using the term
productively. I'm not too concerned with (what I think are) minor
differences between the term "precondition" in C++ and Eiffel. The
important thing (like with "class") is to come up with a definition
that works for C++.

> How about an example? Suppose you have a telephony application with a
> phone book. The phone book module uses std::binary_search on a
> std::vector. A precondition for this algorithm is that the range be
> sorted. A bug causes an unsorted range to be passed. Leaving aside the
> fact that detecting the violation of this precondition may be a bit
> costly, how would you expect the application to react? Abort and thus
> disconnect the call in progress although it's only the phone book that
> is broken?

The phrase "although it's only the phone book that is broken" adds an
unjustified bias to the question: how do you know that only the phone
book is broken? Until you debug it, you're just hoping. I would "abort
and thus disconnect the call in progress" and then attempt to fix the
bug.

I typically turn assertions off when a program is delivered, so this is
unlikely to happen to an end-user. What will happen, though, I cannot
predict. Maybe his call will work. Or maybe he'll be billed at ten
times his normal rate.

> Notify the user of the problem but let him
> finish the call? Would the latter cause the precondition cease to be a
> precondition?

It would cease to be useful to call "vector must be sorted" a
precondition.

If "precondition" means "a condition which must be true when a function
is called in order for a function to work", then any condition which
fails but still allows the function to work is not a precondition. (I'm
including any normal return or a thrown exception in "work".)

Roughly speaking, I can partition "conditions before a function is
called" into three groups:

1) conditions which allow a function to "succeed"
2) conditions which lead to a function "not succeeding"
3) conditions which are the result of programmer errors

The important thing is that a system must gracefully tolerate 1) and
2); there is no requirement (in general, there can't be) for a system
to gracefully tolerate 3). We can reason about the system as long as we
only have 1) and 2); we can draw conclusions about its correctness, and
make predictions about what it will do and what new states it will
enter. With 3), we cannot reason reliably about the system, and cannot
predict what it will do.

Thus, I think it's very important to distinguish 1) and 2) from 3).

Do you agree that this is a useful distinction to make (regardless of
the term used to label that distinction)?

When it turns out that the phone book vector is not sorted, we have 3).
Trying to continue running is treating it like 1) or 2).

> As for fuzzy thinking, "something's amiss somewhere" sounds more fuzzy
> to me than "something's amiss in this module/function". Sure, a bug can
> surface at a point far from its source: writing to arbitrary locations
> in memory is an example. But is it feasible always to react as if this
> were the case, although in the majority of cases the cause is probably
> to be found locally?

In my experience, yes. Most of the time, you're right; the bug is
local, and usually quite simple. Sometimes, the bug is quite nasty and
takes a bit longer; sometimes, the problem is not local at all. In any
case, I stop the program and debug it.

I don't really have a problem with deciding a priori that a bug is
local or far-reaching in scope. What I have a problem with is starting
with "the bug is probably local" as a premise and concluding "we can
probably just let the program keep running for a while."

> > One other pragmatic reason to stop the program and fix the bug the
> > moment the bug is detected is that you never know when the bug is
> > going to recur and you'll get another opportunity.
>
> What kind of scenario do you have in mind?

Any bug that is difficult to reproduce. For example, a memory
corruption bug; these often appear intermittently, and can be quite
difficult to track down. If such a bug is detected, it's better to stop
now and fix it, because who knows when it will recur again?

> If your program aborts at a
> remote user site, no immediate fixing is going to take place.

True, but what's the alternative?

> I fully
> agree that masking bugs and just plodding on is bad practice, I just
> doubt that aborting is the only means of preventing it.

There are several means of preventing bugs, and they all complement
each other. Assertions are just one part of the process. Assertions
combined with rigorous testing will uncover a _lot_ of bugs. (Even
assertions plus minimal testing is better than nothing.)

Not aborting when a bug is detected makes it much harder to fix.

Suppose you modify the program such that it can tolerate the bug (e.g.,
you throw an exception when the bug is detected, and some caller
responds to the exception by successfully isolating the affected parts
of the system, perhaps reinitializing them). Is the condition really a
bug anymore? I don't think so; the condition, and the system's
response, becomes part of the well-defined behavior of the system. If
this is the case, it doesn't make sense to call the condition a
precondition anymore.

> > Precondition failures indicate bugs, and the right thing to do is fix
> > the bug; just about the worst thing you could do is throw an
> > exception, since throwing an exception is tantamount to ignoring the
> > bug.
>
> Why you do you equate throwing exceptions with ignoring bugs?

Because it allows the system to continue running.

> In my
> application, the top level exception handler tries to write as much
> information as possible to a log file. It then informs the user that a
> serious error has happened, that the application may be in a shaky state
> and had better be terminated, where the log file is, and that an
> administrator should be called to collect the information and forward it
> to the vendor. Admittedly, the user could choose to ignore the notice
> and carry on. but then he could also restart an aborted application adn
> carry on. Or are you concerned about sloppy exception handling practices
> in larger teams?

Sloppy exception handling definitely makes things worse, but even
excellent exception handling can allow the system to continue.

In your scenario, the bug certainly isn't ignored by the user. From the
point of view of the code, however, it essentially was ignored, because
after doing all the logging and so forth, the program goes back to
doing what it did before.

If you throw an exception, it's possible for any code path in the
system to be executed, even though you know at least one path is
broken, and you don't know how extensive the damage is.

Long-windedly yours,

Bob

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Gerhard Menzl on 8 Sep 2005 10:05

Bob Bell wrote:

> There are always risks of mistakes no matter what you do, so I don't
> disagree with this point. But I weigh it against the alternative, and
> find the alternative worse; the risk of not detecting bugs by leaving
> out assertions is worse to me than the risk of mistaking a tolerable
> condition as a bug.
>
> Keep in mind also that the more assertions there are, the more likely
> you are to find a bug close to its cause.
>
> In any case, if an assertion fires on a condition that should be
> tolerated, you've still detected a bug -- the incorrect assertion. ;-)

No qualms about this in non-shipping code (see also my response to David
in this and all other regards). Spurious assertions at the customer's
site can do a lot of damage, though.

> The phrase "although it's only the phone book that is broken" adds an
> unjustified bias to the question: how do you know that only the phone
> book is broken? Until you debug it, you're just hoping. I would "abort
> and thus disconnect the call in progress" and then attempt to fix the
> bug.
>
> I typically turn assertions off when a program is delivered, so this
> is unlikely to happen to an end-user. What will happen, though, I
> cannot predict. Maybe his call will work. Or maybe he'll be billed at
> ten times his normal rate.

If you had stated this from the beginning, we could have saved quite
some effort and bandwidth. :-) My concerns have always been related to
what happens at the user's site. Interestingly though, the practice of
turning off assertions in shipping code has been condemned here many times.

> If "precondition" means "a condition which must be true when a
> function is called in order for a function to work", then any
> condition which fails but still allows the function to work is not a
> precondition. (I'm including any normal return or a thrown exception
> in "work".)
>
> Roughly speaking, I can partition "conditions before a function is
> called" into three groups:
>
> 1) conditions which allow a function to "succeed"
> 2) conditions which lead to a function "not succeeding"
> 3) conditions which are the result of programmer errors
>
> The important thing is that a system must gracefully tolerate 1) and
> 2); there is no requirement (in general, there can't be) for a system
> to gracefully tolerate 3). We can reason about the system as long as
> we only have 1) and 2); we can draw conclusions about its correctness,
> and make predictions about what it will do and what new states it will
> enter. With 3), we cannot reason reliably about the system, and cannot
> predict what it will do.
>
> Thus, I think it's very important to distinguish 1) and 2) from 3).
>
> Do you agree that this is a useful distinction to make (regardless of
> the term used to label that distinction)?

Absolutely. The dispute has always been about how to handle situations
of type 3.

> In my experience, yes. Most of the time, you're right; the bug is
> local, and usually quite simple. Sometimes, the bug is quite nasty and
> takes a bit longer; sometimes, the problem is not local at all. In any
> case, I stop the program and debug it.

Under lab conditions, you can. In the field, you can't, and the
trade-offs are often different.

> I don't really have a problem with deciding a priori that a bug is
> local or far-reaching in scope. What I have a problem with is starting
> with "the bug is probably local" as a premise and concluding "we can
> probably just let the program keep running for a while."

This is not and has never been my premise. Exceptions may be *abused* to
make a program behave like this, but I contest the argument that this is
in their nature. It depends on your exception handling concept. If you
don't have a good concept, your program will be hard to debug and maintain.

>>If your program aborts at a
>>remote user site, no immediate fixing is going to take place.
>
> True, but what's the alternative?

Trying to log as much as possible, notifying the user, and giving him
the chance to shut down himself and remain in charge of the situation,
as opposed to humiliating the user by pretending that a myopic piece of
code is a better judge. Yes, I am being polemic here. I am aware that
sometimes a piece of code *is* a better judge; it's just that I see a
tendency among technical people to assume this is the case in general.

> There are several means of preventing bugs, and they all complement
> each other. Assertions are just one part of the process. Assertions
> combined with rigorous testing will uncover a _lot_ of bugs. (Even
> assertions plus minimal testing is better than nothing.)
>
> Not aborting when a bug is detected makes it much harder to fix.

Under lab conditions, sure.

> Sloppy exception handling definitely makes things worse, but even
> excellent exception handling can allow the system to continue.
>
> In your scenario, the bug certainly isn't ignored by the user. From
> the point of view of the code, however, it essentially was ignored,
> because after doing all the logging and so forth, the program goes
> back to doing what it did before.
>
> If you throw an exception, it's possible for any code path in the
> system to be executed, even though you know at least one path is
> broken, and you don't know how extensive the damage is.

That depends a lot on the type of application and how you handle
exceptions. Also note that if you turn assertions off in shipping code,
your program doesn't just go back to doing what it did before, it goes
on following the broken path willingly! Surely this is worse than
aborting the broken path and taking the bet that at least backing away
from the bug zone works.

--
Gerhard Menzl

#dogma int main ()

Humans may reply by replacing the thermal post part of my e-mail address
with "kapsch" and the top level domain part with "net".

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Gerhard Menzl on 8 Sep 2005 10:08

David Abrahams wrote:

> If you find my point to be in contradiction with the goal of
> "executing as little code as possible," then there probably has been a
> misunderstanding. At the point you detect a violated precondition,
> there can usually only be partial recovery. You should do just enough
> work to avoid total catastrophe, if you can. At the point you detect
> a violated precondition, you don't have any way to ensure that the
> actions taken by stack unwinding will be minimal. On the other hand,
> if you have a separate mechanism for registering critical recovery
> actions -- and an agreement among components to use it -- you can
> invoke that, and avoid any noncritical actions.

I think one reason why I am having difficulties coming to terms with
your position (and why this discussion keeps on going) is that, to me,
the distinction between what *has already happened* when a violated
precondition is detected and what *is going to happen* when the function
continues nevertheless is still somewhat blurred.

There is no dispute that continuing would cause hell to break loose.
After all, that's why the author of the function specified the
precondition in the first place. However, I cannot agree with the
general assumption that hell *has already broken loose* at the point of
detection, and that nothing and nobody can be trusted anymore. Sure, it
means that there is a bug that prevents the function from achieving its
goal. But does it also mean that the same bug will interfere with the
operations performed during unwinding? I think I know what your answer
to this is going to be:

> you don't know that ;-)

Do I, or better: does the original function have to? If another
operation performed during unwinding relies on the same condition,
surely it will have specified the same precondition, and if it doesn't,
it should not be affected. Admittedly, this raises the thorny issue of
exceptions thrown during unwinding, but if an exception should really
leave a destructor as an effect of a situation like this, terminate
would be called, which conforms to what you advocate anyway.

Precondition specifications aren't normally about complex global states,
they demand that certain local conditions of limited scope be met. They
don't say "the stack is uncorrupted", they say: "this particular vector
must be sorted". If it isn't, it's usually because the author of the
client forgot to sort the vector, or called another function after the
sort that push_backs an element. In well-designed programs that exhibit
a high degree of encapsulation and low coupling, this should never
affect code that doesn't rely on the sorting of the vector - unless the
violation is a mere effect of a greater mess, like a buffer overrun. But
in that case, starting a separate recovery mechanism is acting "on a
wing and a prayer" as well. Ultimately, your bet is: the precondition
does not hold, great evil is afoot, I will just manage to perform a
number of critical operations, but performing the non-critical ones
would awaken more evil things, hence I skip them and bail out.

Now I will readily agree that there is a wide range of applications for
which this is just the right bet. With many types of programs, it will
be even better not to try any recovery at all. I am not convinced,
however, that this should be a general guideline, regardless of the
concrete problem domain and application type. You have brought up the
stage lighting example yourself - perhaps we just differ on the question
how rare or how frequent these applications are. And if I remember
correctly, it was a statement of mine along the lines of having
different strategies for different applications that started the whole
discussion.

> The other reason that unwinding is almost always wrong is that it is
> very prone to losing the information that a bug was detected, and
> allowing execution to proceed as though full recovery has occurred.
> All it takes is passing through a layer like this one:
>
> try
> {
> ...something that detects a precondition violation...
> }
> catch(e1& x)
> {
> translate_or_report(x);
> }
> catch(e2& x)
> {
> translate_or_report(x);
> }
> ...
> catch(...)
> {
> translate_or_report_unknown_error();
> }
>
> which often occurs at subsystem boundaries.

I fully agree that ignoring errors and masking bugs is a bad thing and a
reason for concern. But what you are saying here is that because
exceptions might be suppressed or handled in a light-hearted way the
should not be thrown in the first place. In other words, a function that
detects a precondition violation cannot trust a top-level component. How
then can it trust the assertion/termination mechanism? After all, it
might have been defined like this:

void do_assert (char const* expr,
char const* file,
unsigned int line)
{
write_log (expr, file, line); // don't feel like terminating
}

I don't think that the mere theoretical possibility of a component
screwing up justifies not giving it a chance. The way I understand
Design by Contract, it's a methodology that pervades the entire program.
It's not a local measure. Abstracting from the other reasons you have
brought forward against throwing exceptions on detecting precondition
violations, the handler would have to look like:

try
{
// something that detects a precondition violation
}
catch (precondition_violation& pv)
{
// do whatever is appropriate for the type of application:
// log, display a message, abort, whatever
}

If someone gets this wrong, they are likely to get the separate
violation handler wrong as well.

This also raises the question at what level it is appropriate to decide
how to react to a violated precondition. If the proper reaction depends
on the type of application (and by bringing up your stage lighting
example you admit that it does), the decision can only be taken at a
higher level, at any rate not in a general purpose function that isn't
aware of the type of application it resides in. Otherwise, it would not
even be possible for the stage lighting controller to carry on and
project random colours.

Talking of application-independent, low-level code, how are (especially
third-party) libraries supposed to handle a violated precondition? Note
that I am referring to the actual implementation here, not the interface
documentation. You can't throw an exception, because you would have to
document it, and then there wouldn't be a precondition anymore. assert()
or terminate()? Carry on and let the client taste the full consequences
of its negligence? How do you handle this at Boost?

> Your best option, if you have time for it -- and if a clean emergency
> shutdown will not be interpreted as a crash -- is to institute a
> recovery subsystem for critical things that must happen during
> emergency shutdown.

This is perfectly ok for technical people like you and me. Most
customers (those I referred to, at any rate), however, don't care about
this sort of thing. There is no such thing as a graceful shutdown for
them. If the program stops working, it's a crash. They have little
esteem even for the most elaborate and elegant shutdown mechanism. Of
course the anger will be less if their documents get saved, compared to
a real crash, where they arent't, but they are still angry. And you know
what? They are right!

I really find myself wearing two hats here: as a developer, I always
take the worst case into consideration and want the world to come to a
standstill whenever a bug is detected, but as a user's advocate and,
more still, as a user myself I don't want to be bugged by programs that
decide to drop dead.

A good example is the Mozilla family of Web browsers. Every now and
again, the otherwise much loved monster will declare that an error has
happened, and that the application will be terminated, and would I be so
kind to fill out the quality feedback form. It then dies and takes
everything that is not saved (such as things you have just typed into a
Web form) with it. I have never looked at the source of the Mozilla
project, but this behaviour looks suspiciously like
abort-on-contract-breach to me. Every time this happens, my developer's
admiration for the refined bug reporting mechanism is quickly
extinguished by my user's rage. It's technology-centric behaviour. It
humiliates users. Yes, there is a theoretical possibility that
unwinding, notifying me and offering me the chance to close the browser
myself might wake a sleeping demon that goes and formats my hard disk.
But that danger is probably much higher with applications that don't
bother with DbC in the first place. In all likelyhood, the worst thing
that would happen is garbage on the display.

> In non-shipping code, asserts should immediately
> invoke the debugger, and then invoke emergency recovery and shutdown.
> In shipping code, obviously, there's no debugger.

I am grateful that you make the distinction between non-shipping and
shipping code here. Let me emphasize that from the beginning of this
exchange my reservations have been related exclusively to the latter.
With non-shipping code, i.e. under lab conditions, my practice has
always been to stop immediately, so no objections there. After all,
that's the standard behaviour of the C standard library assert() on most
platforms. Automatically invoking the debugger is nice if your platform
supports it (mine causes the debugger itself to freeze in nine out of
ten cases), but that's a detail.

That leaves the question what to do in shipping code. Standard C
practice (in the sense of what most platforms seem to do - I don't know
what the C Standard says) is to let the preprocessor suppress the test
and boldly stomp into what may be disastrous. Incidentally, the Eiffel
practice (thanks for the link, by the way) seems to be similar:
assertion monitoring is usually turned off in shipping code. This is in
stark contrast to what has been frequently advocated in this newsgroup.
The standard argument is: disabling assertions in shipping code is like
leaving the life jackets ashore when you set sail. I find this metaphor
rather misleading - assertions are more like self-destruction devices
than life jackets - yet the argument cannot be dismissed so easily. What
is your position on this? Should assertions in shipping code do nothing,
do the same as in non-shipping code, or do something else? Ironically,
one of the suggestions I remember having read here is that they should
throw exceptions. :-)

> Good Luck.

Thanks, but the ordeal has been passed already. I may now build in
crashs again. *g*

> That can only happen if you assert some condition that isn't in the
> called function's set of documented preconditions. If the assertion
> matches the function's documentation, then it *is* catching a bug.

I was referring to preconditions/assertions that aren't, i.e. the kind
of error where you think something always holds only to discover there
are situations where it legitimately doesn't. In other words, the bug is
in your analysis.

> As far as I can tell, the Eiffel camp has a similar understanding.
> Because the throw-in-response-to-precondition-violation behavior can
> be turned on and off globally, you basically can't count on it. Think
> of it as one possible expression of undefined behavior.

According to the description in the Eiffel tutorial you pointed me to,
the behaviour can be specified at the class level, with higher level
defaults. It is not up to the individual function to decide. Thus, at
least within the bounds of the code you maintain and compile yourself,
you tell the runtime what to do when a contract is violated. That is,
you *can* count on it. A typical, simple strategy is to turn all checks
on during development and turn them off before you ship.

> In some languages, throwing an exception is basically the only way to
> get a debuggable stack trace. If that's the case in Eiffel, it would
> explain why they have the option to throw: it's as close as possible
> to invoking the debugger (perhaps it even does so).
>
> I should also point out that there's some variation among languages
> (and even among C++ compilers) in _when_ stack unwinding actually
> occurs. For example, in C++, if the exception is never caught, there
> may not ever be any unwinding (it's up to the implementation). In
> Python, no unwinding happens until the exception backtrace is
> _explicitly_ discarded or the next exception is thrown. I don't know
> about the details of Eiffel's exception mechanism, but all of these
> variations can have a major impact on the danger of throwing in
> response to a precondition violation. In other words, you may have to
> look a lot deeper to understand the proper relationship of Eiffel to
> C++.

Certainly. The default effect of an exception in Eiffel seems to be
termination. Whether this involves unwinding or not I cannot tell. On my
platform (.NET), an exception is a convenient way to get a stack trace.

> Absolutely. But I don't think there are as many different definitions
> as you seem to think there are. Have you found *any* definitions of
> "precondition" other than the Wikipedia one? I'm not talking about
> meanings of the word you infer from seeing it used in context. I'm
> talking about _definitions_.

I think we have reached agreement on the definition; it's the resulting
conclusions and practices, and their applicability to different types of
software where doubts remain.

--
Gerhard Menzl

#dogma int main ()

Humans may reply by replacing the thermal post part of my e-mail address
with "kapsch" and the top level domain part with "net".

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: David Abrahams on 12 Sep 2005 13:27

Gerhard Menzl <gerhard.menzl(a)hotmail.com> writes:

> David Abrahams wrote:
>
>> If you find my point to be in contradiction with the goal of
>> "executing as little code as possible," then there probably has been a
>> misunderstanding. At the point you detect a violated precondition,
>> there can usually only be partial recovery. You should do just enough
>> work to avoid total catastrophe, if you can. At the point you detect
>> a violated precondition, you don't have any way to ensure that the
>> actions taken by stack unwinding will be minimal. On the other hand,
>> if you have a separate mechanism for registering critical recovery
>> actions -- and an agreement among components to use it -- you can
>> invoke that, and avoid any noncritical actions.
>
> I think one reason why I am having difficulties coming to terms with
> your position (and why this discussion keeps on going) is that, to
> me, the distinction between what *has already happened* when a
> violated precondition is detected and what *is going to happen* when
> the function continues nevertheless is still somewhat blurred.

Of course it is blurry, because you don't know anything about the
severity of the problem once a violated precondition is detected.
Your options are to be conservative and take emergency shutdown
measures, continue -- whether by going forward or unwinding to some
other place in the code -- and hope the breakage is not too bad, or
sit somewhere in between, placing bets on a case-by-case basis.

> There is no dispute that continuing would cause hell to break loose.
> After all, that's why the author of the function specified the
> precondition in the first place. However, I cannot agree with the
> general assumption that hell *has already broken loose* at the point
> of detection, and that nothing and nobody can be trusted anymore.

It is certainly true that hell has already broken loose. By the time
the violation is detected, somebody somewhere has already done
something they were told not to do. Whether or not anything can be
trusted is a matter of opinion; you can make your own judgements.

> Sure, it means that there is a bug that prevents the function from
> achieving its goal. But does it also mean that the same bug will
> interfere with the operations performed during unwinding? I think I
> know what your answer to this is going to be:
>
>> you don't know that ;-)

Exactly.

> Do I, or better: does the original function have to?

If it is going to make guarantees about robustness in the face of
these conditions, then yes. That's where we started this discussion:
you wanted to provide documented guarantees of behavior in the face of
violated preconditions. If the original function is going to guess
about the brokenness of the context in which it was called, then no,
it doesn't need to know. However, as I've repeatedly said, the called
function usually has little or no knowledge about the context in which
it is called, so it's very difficult to make an educated guess.

> Precondition specifications aren't normally about complex global states,
> they demand that certain local conditions of limited scope be met.

Exactly. That's what makes the detecting code particularly unsuited
to making educated guesses about severity.

> They don't say "the stack is uncorrupted", they say: "this
> particular vector must be sorted". If it isn't, it's usually because
> the author of the client forgot to sort the vector, or called
> another function after the sort that push_backs an element.

On what do you base that assessment? Do you have data, or is it just
intuition?

> In well-designed programs that exhibit a high degree of
> encapsulation and low coupling, this should never affect code that
> doesn't rely on the sorting of the vector - unless the violation is
> a mere effect of a greater mess, like a buffer overrun.

True, but what makes you think that sortedness is not part of some
much larger global invariant? The sortedness of the vector might be
fundamental to the operation of most of the program.

> But in that case, starting a separate recovery mechanism is acting
> "on a wing and a prayer" as well.

Exactly. At that point, everything is a shot in the dark. I bet on
the recovery mechanism avoiding total catastrophe because it's the
best I can do.

> Ultimately, your bet is: the precondition does not hold, great evil
> is afoot, I will just manage to perform a number of critical
> operations, but performing the non-critical ones would awaken more
> evil things, hence I skip them and bail out.

Right.

> Now I will readily agree that there is a wide range of applications
> for which this is just the right bet. With many types of programs,
> it will be even better not to try any recovery at all. I am not
> convinced, however, that this should be a general guideline,
> regardless of the concrete problem domain and application type. You
> have brought up the stage lighting example yourself - perhaps we
> just differ on the question how rare or how frequent these
> applications are.

Maybe, maybe not. Of course it's a matter of degree.

Programmers in general seldom make the distinction carefully between
violated preconditions and conditions that are known to be
recoverable. You yourself seem to have had that problem. The pull to
throw from a violated precondition, and hope that code somewhere else
can deal with the problem, is quite strong. We're loathe to admit
that the program is broken, so we bet that something can be done about
it elsewhere. Once you start trying to unwind-and-continue from a
violated precondition, you -- or someone on your team -- will
typically begin to add code for defensive programming (which has a
high development cost and often, doesn't actually work), because you
now have to make the program "work" even in a broken state.

When I say, "it's almost always a mistake to throw from a violated
precondition," I am addressing that problem: I want people to think
much more carefully about the consequences and be much more
conservative about the idea of doing so. If you determine, for
whatever reason, that your application is better off betting that
things "aren't broken too badly," you should still design the program
as though preconditions are never actually violated. In other words,
the program should not count on these exceptions and expect to respond
to them in useful ways. Anything else leads to a mess.

> And if I remember correctly, it was a statement of
> mine along the lines of having different strategies for different
> applications that started the whole discussion.

If you did make such a remark, that wasn't what prompted me to get
involved. It was the blurring of the notion of precondition that
incited my interest.

>> The other reason that unwinding is almost always wrong is that it is
>> very prone to losing the information that a bug was detected, and
>> allowing execution to proceed as though full recovery has occurred.
>> All it takes is passing through a layer like this one:
>>
>> try
>> {
>> ...something that detects a precondition violation...
>> }
>> catch(e1& x)
>> {
>> translate_or_report(x);
>> }
>> catch(e2& x)
>> {
>> translate_or_report(x);
>> }
>> ...
>> catch(...)
>> {
>> translate_or_report_unknown_error();
>> }
>>
>> which often occurs at subsystem boundaries.
>
> I fully agree that ignoring errors and masking bugs is a bad thing and a
> reason for concern. But what you are saying here is that because
> exceptions might be suppressed or handled in a light-hearted way the
> should not be thrown in the first place. In other words, a function that
> detects a precondition violation cannot trust a top-level
> component.

No, that's not the issue. In general, catch blocks like the one above
do the right thing. They're not swallowing errors. In this case the
precondition violation gets treated like a recoverable error simply
because its exception passes through a translation layer. At such a
language or subsystem boundary, even if the programmer can anticipate
the "violated precondition" exception type thrown by low level code,
what would you have him do? What sort of response would you deem
"trustworthy?"

> I don't think that the mere theoretical possibility of a component
> screwing up justifies not giving it a chance. The way I understand
> Design by Contract, it's a methodology that pervades the entire program.
> It's not a local measure. Abstracting from the other reasons you have
> brought forward against throwing exceptions on detecting precondition
> violations, the handler would have to look like:
>
> try
> {
> // something that detects a precondition violation
> }
> catch (precondition_violation& pv)
> {
> // do whatever is appropriate for the type of application:
> // log, display a message, abort, whatever
> }

These layers occur in libraries, where we might not know what's
appropriate for the application.

> This also raises the question at what level it is appropriate to decide
> how to react to a violated precondition. If the proper reaction depends
> on the type of application (and by bringing up your stage lighting
> example you admit that it does), the decision can only be taken at a
> higher level, at any rate not in a general purpose function that isn't
> aware of the type of application it resides in. Otherwise, it would not
> even be possible for the stage lighting controller to carry on and
> project random colours.

Yes. See BOOST_ASSERT, which I think uses a sensible approach.

> Talking of application-independent, low-level code, how are (especially
> third-party) libraries supposed to handle a violated precondition? Note
> that I am referring to the actual implementation here, not the interface
> documentation. You can't throw an exception, because you would have to
> document it, and then there wouldn't be a precondition anymore. assert()
> or terminate()? Carry on and let the client taste the full consequences
> of its negligence? How do you handle this at Boost?

Ditto. :)

>> Your best option, if you have time for it -- and if a clean emergency
>> shutdown will not be interpreted as a crash -- is to institute a
>> recovery subsystem for critical things that must happen during
>> emergency shutdown.
>
> This is perfectly ok for technical people like you and me. Most
> customers (those I referred to, at any rate), however, don't care about
> this sort of thing. There is no such thing as a graceful shutdown for
> them. If the program stops working, it's a crash.

Okay, so in your case a clean emergency shutdown will be interpreted
as a crash.

I once added that strategy to a program that was historically unstable
and my customers were immensely grateful that their work wasn't being
lost. Of course I fixed a lot of bugs too, but didn't manage to nail
all of them. However, the clean emergency shutdown, along with users
sending me their files and a description of the action that caused the
crash, allowed me to go on fixing more problems until the program was
in pretty good shape.

> They have little esteem even for the most elaborate and elegant
> shutdown mechanism. Of course the anger will be less if their
> documents get saved, compared to a real crash, where they arent't,
> but they are still angry. And you know what? They are right!

Sure. The way to avoid that is to eliminate bugs from the program,
not to try to hobble along anyway when a bug is detected. The
customer will usually be just as angry when the program doesn't behave
as expected because some internal assumption is violated. And you
know what? They are right!

Hobbling along can even be dangerous for the customer's data, since
for them, too, the pull not to admit something is really broken is
strong. Who knows what effect the next attempted editing operation or
transaction will actually have?

> I really find myself wearing two hats here: as a developer, I always
> take the worst case into consideration and want the world to come to
> a standstill whenever a bug is detected, but as a user's advocate
> and, more still, as a user myself I don't want to be bugged by
> programs that decide to drop dead.

Dropping dead is not the same as a clean emergency shutdown.

> A good example is the Mozilla family of Web browsers. Every now and
> again, the otherwise much loved monster will declare that an error
> has happened, and that the application will be terminated, and would
> I be so kind to fill out the quality feedback form. It then dies and
> takes everything that is not saved (such as things you have just
> typed into a Web form) with it. I have never looked at the source of
> the Mozilla project, but this behaviour looks suspiciously like
> abort-on-contract-breach to me. Every time this happens, my
> developer's admiration for the refined bug reporting mechanism is
> quickly extinguished by my user's rage. It's technology-centric
> behaviour. It humiliates users. Yes, there is a theoretical
> possibility that unwinding, notifying me and offering me the chance
> to close the browser myself might wake a sleeping demon that goes
> and formats my hard disk. But that danger is probably much higher
> with applications that don't bother with DbC in the first place. In
> all likelyhood, the worst thing that would happen is garbage on the
> display.

Maybe. You don't really know that, do you?

Anyway browsers are an unusual case, since they're primarily viewers.
If they don't contain some carefully-crafted bit of the user's work
that will be lost on a crash, it's probably okay... hmm, but wait:
there's webmail. So I could lose this whole message unless it gets
written to disk and I get the chance to start over. Oh, and there are
all kinds of scam sites that masquerade as secure and trustworthy,
which might be easily mistaken for legit if the user begins to
overlook garbage on the screen. As a matter of fact, security is a
big deal for web browsers. They're used in all kinds of critical
applications, including banking. Oh, and there are plugins, which
could be malicious and might be the cause of the violation detected.
No, I don't think we want the browser pressing ahead when it detects a
bug, not at all. I think this is a perfect case-in-point.

> That leaves the question what to do in shipping code. Standard C
> practice (in the sense of what most platforms seem to do - I don't
> know what the C Standard says) is to let the preprocessor suppress
> the test and boldly stomp into what may be disastrous. Incidentally,
> the Eiffel practice (thanks for the link, by the way) seems to be
> similar: assertion monitoring is usually turned off in shipping
> code.

That can be a good policy, because programmers concerned about
efficiency will never be deterred from writing assertions on the basis
of slowing down the shipping program.

> This is in stark contrast to what has been frequently advocated in
> this newsgroup. The standard argument is: disabling assertions in
> shipping code is like leaving the life jackets ashore when you set
> sail.

One or two vocal advocates of that approach do not a consensus make.
I've never agreed with it.

> I find this metaphor rather misleading - assertions are more like
> self-destruction devices than life jackets - yet the argument cannot
> be dismissed so easily. What is your position on this? Should
> assertions in shipping code do nothing, do the same as in
> non-shipping code, or do something else?

The correct policy depends on the application and your degree of
confidence in the code.

> Ironically, one of the suggestions I remember having read here is
> that they should throw exceptions. :-)

There are lots of different opinions out there. I'm sure you could
find an advocate for anything if you look hard enough.

>> That can only happen if you assert some condition that isn't in the
>> called function's set of documented preconditions. If the assertion
>> matches the function's documentation, then it *is* catching a bug.
>
> I was referring to preconditions/assertions that aren't, i.e. the kind
> of error where you think something always holds only to discover there
> are situations where it legitimately doesn't. In other words, the bug is
> in your analysis.

Yeah, I was accounting for that possibility. It's still a bug.

>> Have you found *any* definitions of "precondition" other than the
>> Wikipedia one? I'm not talking about meanings of the word you
>> infer from seeing it used in context. I'm talking about
>> _definitions_.
>
> I think we have reached agreement on the definition;

Well, that's progress, anyway.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev | Next | Last
Pages: 9 10 11 12 13 14 15 16 17 18 19 20
Next: C++/CLI limitations?