Next: C++/CLI limitations?
From: David Abrahams on 2 Sep 2005 10:39 "Nicola Musatti" <nicola.musatti(a)gmail.com> writes: > David Abrahams wrote: > [...] >> _Programmer errors_ lead to precondition failures which invoke >> undefined behavior > > Ok. What you are saying is you don't know where you are so your best > option is give up immediately, lest you cause additional damage, > correct? No, at least not in that sentence I'm not. All I'm saying there is what I wrote and not anything about how to respond. Stopping quickly and as gracefully as possible is usually the best option. Sometimes you can't afford that, though. For example, if your program is running the stage lights at a rock concert, you don't want them to stop flashing. That would be weird. However, you ought to be thinking about alternatives, like, "can I reboot the system quickly enough?" And if you're writing critical systems like life support you ought to be thinking about having backup hardware in place that can take over while you shut this hardware down. > I see two issues that arise from this point of view: how to implement > "giving up" and what kind of recovery can be performed. > > Ideally one would want to collect as much information as possible on > what went wrong for diagnostic purposes. On Unix systems generating a > core dump is a convenient option, on other systems it might not be so > easy. On the system I work on Which one, please? > I don't have core dumps, but my debugger > breaks on throws, Unconditionally? That could severely impair debuggability of some kinds of legitimate code. > so at least in development builds I implement assertions by throwing > exceptions. If that's you're only option, it's your only option. What can I say? It might be better to invoke the debugger directly, if it's possible, though. -- Dave Abrahams Boost Consulting www.boost-consulting.com [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Bob Bell on 2 Sep 2005 12:39 Gerhard Menzl wrote: > Bob Bell wrote: > No, I am not trying to invalidate your point in any way. When I point > out what I perceive as inconsistencies I do so in order to to increase > my understanding and, hopefully, achieve mutual agreement on a more > refined level. That's entirely reasonable; sorry if I seemed a little touchy. > When you say that "the function shouldn't be allowed to execute a single > instruction more", logging and asserting would be impossible. Well, maybe I overstated things a bit there. > "Executing > as little code as possible", on the other hand, sounds reasonable to me > and eliminates the contradiction. I still cannot reconcile this > guideline with Dave's point that unwinding the stack is (almost always) > wrong, but starting a separate undo/recovery mechanism isn't. This may > be due to misunderstanding. To me, "execute as little code as possible" and "don't throw an exception" are completely consistent with each other, due to the fact that "throw an exception" is equivalent to "allow any code to run." A separate recovery mechanism is OK because it can be constrained to "execute as little code as possible" before the system shuts down. (As for basing this recovery mechanism on an undo mechanism, I don't have a strong opinion.) > > If you mean that you want to avoid crashes/ungraceful shutdowns when > > end-users use the system, I agree. > > That is what I am concerned about, and it's not just because of trying > to be nice to users. The software I am currently working on is part of a > larger system that has an abysmal record. The team was recently > confronted with an ultimatum set by the customer along the lines of: > you've got six more weeks until the final test; if there's one single > crash (and from a customer's view, this includes assertions and the > like), you're out - the end of a contract of several million dollars as > well as dozens of jobs. From this perspective, one cannot help eyeing > statements like "terminating the program is good because it helps > debugging" with a certain reserve. You don't have to tell me that there > has to be something seriously wrong with the development process to get > into a situation like this in the first place, but unfortunately, > development in the real world does not always take the perfect (or even > reasonably sound) process path. Ouch. I sympathize; sometimes what we should do is not what we're allowed to do. I've been trying to keep the discussion on the level of what we should do. If I was working for an employer that said "one more crash and you're fired," I'd probably be bending over backward to make sure that, no matter what, the system didn't crash, despite the fact that I would likely do nothing to make the system more stable in any real sense. > > Perhaps you should try it before deciding that it doesn't work. > > I am sorry if I should have created the impression that I have decided > the approach doesn't work. I shouldn't have put that last sentence in, I was out of line. > I *do* make liberal use of assertions. But > you have to take into account that the more liberally you use > assertions, the more likely it is that you err on the other side, i.e. > that you designate a condition as a result of a bug when in reality it > is a possible, if exotic program state. There are always risks of mistakes no matter what you do, so I don't disagree with this point. But I weigh it against the alternative, and find the alternative worse; the risk of not detecting bugs by leaving out assertions is worse to me than the risk of mistaking a tolerable condition as a bug. Keep in mind also that the more assertions there are, the more likely you are to find a bug close to its cause. In any case, if an assertion fires on a condition that should be tolerated, you've still detected a bug -- the incorrect assertion. ;-) > I am well aware that certain technical terms mean different things in > different parts of the software engineering community, but I also think > that redefining terms gratuitously should be avoided. Sure, but gratuitous similarities should be avoided as well. The notion of "class" is significantly different in C++ than it is in, say, CLOS, but that doesn't prevent the C++ community from using the term productively. I'm not too concerned with (what I think are) minor differences between the term "precondition" in C++ and Eiffel. The important thing (like with "class") is to come up with a definition that works for C++. > How about an example? Suppose you have a telephony application with a > phone book. The phone book module uses std::binary_search on a > std::vector. A precondition for this algorithm is that the range be > sorted. A bug causes an unsorted range to be passed. Leaving aside the > fact that detecting the violation of this precondition may be a bit > costly, how would you expect the application to react? Abort and thus > disconnect the call in progress although it's only the phone book that > is broken? The phrase "although it's only the phone book that is broken" adds an unjustified bias to the question: how do you know that only the phone book is broken? Until you debug it, you're just hoping. I would "abort and thus disconnect the call in progress" and then attempt to fix the bug. I typically turn assertions off when a program is delivered, so this is unlikely to happen to an end-user. What will happen, though, I cannot predict. Maybe his call will work. Or maybe he'll be billed at ten times his normal rate. > Notify the user of the problem but let him > finish the call? Would the latter cause the precondition cease to be a > precondition? It would cease to be useful to call "vector must be sorted" a precondition. If "precondition" means "a condition which must be true when a function is called in order for a function to work", then any condition which fails but still allows the function to work is not a precondition. (I'm including any normal return or a thrown exception in "work".) Roughly speaking, I can partition "conditions before a function is called" into three groups: 1) conditions which allow a function to "succeed" 2) conditions which lead to a function "not succeeding" 3) conditions which are the result of programmer errors The important thing is that a system must gracefully tolerate 1) and 2); there is no requirement (in general, there can't be) for a system to gracefully tolerate 3). We can reason about the system as long as we only have 1) and 2); we can draw conclusions about its correctness, and make predictions about what it will do and what new states it will enter. With 3), we cannot reason reliably about the system, and cannot predict what it will do. Thus, I think it's very important to distinguish 1) and 2) from 3). Do you agree that this is a useful distinction to make (regardless of the term used to label that distinction)? When it turns out that the phone book vector is not sorted, we have 3). Trying to continue running is treating it like 1) or 2). > As for fuzzy thinking, "something's amiss somewhere" sounds more fuzzy > to me than "something's amiss in this module/function". Sure, a bug can > surface at a point far from its source: writing to arbitrary locations > in memory is an example. But is it feasible always to react as if this > were the case, although in the majority of cases the cause is probably > to be found locally? In my experience, yes. Most of the time, you're right; the bug is local, and usually quite simple. Sometimes, the bug is quite nasty and takes a bit longer; sometimes, the problem is not local at all. In any case, I stop the program and debug it. I don't really have a problem with deciding a priori that a bug is local or far-reaching in scope. What I have a problem with is starting with "the bug is probably local" as a premise and concluding "we can probably just let the program keep running for a while." > > One other pragmatic reason to stop the program and fix the bug the > > moment the bug is detected is that you never know when the bug is > > going to recur and you'll get another opportunity. > > What kind of scenario do you have in mind? Any bug that is difficult to reproduce. For example, a memory corruption bug; these often appear intermittently, and can be quite difficult to track down. If such a bug is detected, it's better to stop now and fix it, because who knows when it will recur again? > If your program aborts at a > remote user site, no immediate fixing is going to take place. True, but what's the alternative? > I fully > agree that masking bugs and just plodding on is bad practice, I just > doubt that aborting is the only means of preventing it. There are several means of preventing bugs, and they all complement each other. Assertions are just one part of the process. Assertions combined with rigorous testing will uncover a _lot_ of bugs. (Even assertions plus minimal testing is better than nothing.) Not aborting when a bug is detected makes it much harder to fix. Suppose you modify the program such that it can tolerate the bug (e.g., you throw an exception when the bug is detected, and some caller responds to the exception by successfully isolating the affected parts of the system, perhaps reinitializing them). Is the condition really a bug anymore? I don't think so; the condition, and the system's response, becomes part of the well-defined behavior of the system. If this is the case, it doesn't make sense to call the condition a precondition anymore. > > Precondition failures indicate bugs, and the right thing to do is fix > > the bug; just about the worst thing you could do is throw an > > exception, since throwing an exception is tantamount to ignoring the > > bug. > > Why you do you equate throwing exceptions with ignoring bugs? Because it allows the system to continue running. > In my > application, the top level exception handler tries to write as much > information as possible to a log file. It then informs the user that a > serious error has happened, that the application may be in a shaky state > and had better be terminated, where the log file is, and that an > administrator should be called to collect the information and forward it > to the vendor. Admittedly, the user could choose to ignore the notice > and carry on. but then he could also restart an aborted application adn > carry on. Or are you concerned about sloppy exception handling practices > in larger teams? Sloppy exception handling definitely makes things worse, but even excellent exception handling can allow the system to continue. In your scenario, the bug certainly isn't ignored by the user. From the point of view of the code, however, it essentially was ignored, because after doing all the logging and so forth, the program goes back to doing what it did before. If you throw an exception, it's possible for any code path in the system to be executed, even though you know at least one path is broken, and you don't know how extensive the damage is. Long-windedly yours, Bob [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Gerhard Menzl on 8 Sep 2005 10:05 Bob Bell wrote: > There are always risks of mistakes no matter what you do, so I don't > disagree with this point. But I weigh it against the alternative, and > find the alternative worse; the risk of not detecting bugs by leaving > out assertions is worse to me than the risk of mistaking a tolerable > condition as a bug. > > Keep in mind also that the more assertions there are, the more likely > you are to find a bug close to its cause. > > In any case, if an assertion fires on a condition that should be > tolerated, you've still detected a bug -- the incorrect assertion. ;-) No qualms about this in non-shipping code (see also my response to David in this and all other regards). Spurious assertions at the customer's site can do a lot of damage, though. > The phrase "although it's only the phone book that is broken" adds an > unjustified bias to the question: how do you know that only the phone > book is broken? Until you debug it, you're just hoping. I would "abort > and thus disconnect the call in progress" and then attempt to fix the > bug. > > I typically turn assertions off when a program is delivered, so this > is unlikely to happen to an end-user. What will happen, though, I > cannot predict. Maybe his call will work. Or maybe he'll be billed at > ten times his normal rate. If you had stated this from the beginning, we could have saved quite some effort and bandwidth. :-) My concerns have always been related to what happens at the user's site. Interestingly though, the practice of turning off assertions in shipping code has been condemned here many times. > If "precondition" means "a condition which must be true when a > function is called in order for a function to work", then any > condition which fails but still allows the function to work is not a > precondition. (I'm including any normal return or a thrown exception > in "work".) > > Roughly speaking, I can partition "conditions before a function is > called" into three groups: > > 1) conditions which allow a function to "succeed" > 2) conditions which lead to a function "not succeeding" > 3) conditions which are the result of programmer errors > > The important thing is that a system must gracefully tolerate 1) and > 2); there is no requirement (in general, there can't be) for a system > to gracefully tolerate 3). We can reason about the system as long as > we only have 1) and 2); we can draw conclusions about its correctness, > and make predictions about what it will do and what new states it will > enter. With 3), we cannot reason reliably about the system, and cannot > predict what it will do. > > Thus, I think it's very important to distinguish 1) and 2) from 3). > > Do you agree that this is a useful distinction to make (regardless of > the term used to label that distinction)? Absolutely. The dispute has always been about how to handle situations of type 3. > In my experience, yes. Most of the time, you're right; the bug is > local, and usually quite simple. Sometimes, the bug is quite nasty and > takes a bit longer; sometimes, the problem is not local at all. In any > case, I stop the program and debug it. Under lab conditions, you can. In the field, you can't, and the trade-offs are often different. > I don't really have a problem with deciding a priori that a bug is > local or far-reaching in scope. What I have a problem with is starting > with "the bug is probably local" as a premise and concluding "we can > probably just let the program keep running for a while." This is not and has never been my premise. Exceptions may be *abused* to make a program behave like this, but I contest the argument that this is in their nature. It depends on your exception handling concept. If you don't have a good concept, your program will be hard to debug and maintain. >>If your program aborts at a >>remote user site, no immediate fixing is going to take place. > > True, but what's the alternative? Trying to log as much as possible, notifying the user, and giving him the chance to shut down himself and remain in charge of the situation, as opposed to humiliating the user by pretending that a myopic piece of code is a better judge. Yes, I am being polemic here. I am aware that sometimes a piece of code *is* a better judge; it's just that I see a tendency among technical people to assume this is the case in general. > There are several means of preventing bugs, and they all complement > each other. Assertions are just one part of the process. Assertions > combined with rigorous testing will uncover a _lot_ of bugs. (Even > assertions plus minimal testing is better than nothing.) > > Not aborting when a bug is detected makes it much harder to fix. Under lab conditions, sure. > Sloppy exception handling definitely makes things worse, but even > excellent exception handling can allow the system to continue. > > In your scenario, the bug certainly isn't ignored by the user. From > the point of view of the code, however, it essentially was ignored, > because after doing all the logging and so forth, the program goes > back to doing what it did before. > > If you throw an exception, it's possible for any code path in the > system to be executed, even though you know at least one path is > broken, and you don't know how extensive the damage is. That depends a lot on the type of application and how you handle exceptions. Also note that if you turn assertions off in shipping code, your program doesn't just go back to doing what it did before, it goes on following the broken path willingly! Surely this is worse than aborting the broken path and taking the bet that at least backing away from the bug zone works. -- Gerhard Menzl #dogma int main () Humans may reply by replacing the thermal post part of my e-mail address with "kapsch" and the top level domain part with "net". [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Gerhard Menzl on 8 Sep 2005 10:08 David Abrahams wrote: > If you find my point to be in contradiction with the goal of > "executing as little code as possible," then there probably has been a > misunderstanding. At the point you detect a violated precondition, > there can usually only be partial recovery. You should do just enough > work to avoid total catastrophe, if you can. At the point you detect > a violated precondition, you don't have any way to ensure that the > actions taken by stack unwinding will be minimal. On the other hand, > if you have a separate mechanism for registering critical recovery > actions -- and an agreement among components to use it -- you can > invoke that, and avoid any noncritical actions. I think one reason why I am having difficulties coming to terms with your position (and why this discussion keeps on going) is that, to me, the distinction between what *has already happened* when a violated precondition is detected and what *is going to happen* when the function continues nevertheless is still somewhat blurred. There is no dispute that continuing would cause hell to break loose. After all, that's why the author of the function specified the precondition in the first place. However, I cannot agree with the general assumption that hell *has already broken loose* at the point of detection, and that nothing and nobody can be trusted anymore. Sure, it means that there is a bug that prevents the function from achieving its goal. But does it also mean that the same bug will interfere with the operations performed during unwinding? I think I know what your answer to this is going to be: > you don't know that ;-) Do I, or better: does the original function have to? If another operation performed during unwinding relies on the same condition, surely it will have specified the same precondition, and if it doesn't, it should not be affected. Admittedly, this raises the thorny issue of exceptions thrown during unwinding, but if an exception should really leave a destructor as an effect of a situation like this, terminate would be called, which conforms to what you advocate anyway. Precondition specifications aren't normally about complex global states, they demand that certain local conditions of limited scope be met. They don't say "the stack is uncorrupted", they say: "this particular vector must be sorted". If it isn't, it's usually because the author of the client forgot to sort the vector, or called another function after the sort that push_backs an element. In well-designed programs that exhibit a high degree of encapsulation and low coupling, this should never affect code that doesn't rely on the sorting of the vector - unless the violation is a mere effect of a greater mess, like a buffer overrun. But in that case, starting a separate recovery mechanism is acting "on a wing and a prayer" as well. Ultimately, your bet is: the precondition does not hold, great evil is afoot, I will just manage to perform a number of critical operations, but performing the non-critical ones would awaken more evil things, hence I skip them and bail out. Now I will readily agree that there is a wide range of applications for which this is just the right bet. With many types of programs, it will be even better not to try any recovery at all. I am not convinced, however, that this should be a general guideline, regardless of the concrete problem domain and application type. You have brought up the stage lighting example yourself - perhaps we just differ on the question how rare or how frequent these applications are. And if I remember correctly, it was a statement of mine along the lines of having different strategies for different applications that started the whole discussion. > The other reason that unwinding is almost always wrong is that it is > very prone to losing the information that a bug was detected, and > allowing execution to proceed as though full recovery has occurred. > All it takes is passing through a layer like this one: > > try > { > ...something that detects a precondition violation... > } > catch(e1& x) > { > translate_or_report(x); > } > catch(e2& x) > { > translate_or_report(x); > } > ... > catch(...) > { > translate_or_report_unknown_error(); > } > > which often occurs at subsystem boundaries. I fully agree that ignoring errors and masking bugs is a bad thing and a reason for concern. But what you are saying here is that because exceptions might be suppressed or handled in a light-hearted way the should not be thrown in the first place. In other words, a function that detects a precondition violation cannot trust a top-level component. How then can it trust the assertion/termination mechanism? After all, it might have been defined like this: void do_assert (char const* expr, char const* file, unsigned int line) { write_log (expr, file, line); // don't feel like terminating } I don't think that the mere theoretical possibility of a component screwing up justifies not giving it a chance. The way I understand Design by Contract, it's a methodology that pervades the entire program. It's not a local measure. Abstracting from the other reasons you have brought forward against throwing exceptions on detecting precondition violations, the handler would have to look like: try { // something that detects a precondition violation } catch (precondition_violation& pv) { // do whatever is appropriate for the type of application: // log, display a message, abort, whatever } If someone gets this wrong, they are likely to get the separate violation handler wrong as well. This also raises the question at what level it is appropriate to decide how to react to a violated precondition. If the proper reaction depends on the type of application (and by bringing up your stage lighting example you admit that it does), the decision can only be taken at a higher level, at any rate not in a general purpose function that isn't aware of the type of application it resides in. Otherwise, it would not even be possible for the stage lighting controller to carry on and project random colours. Talking of application-independent, low-level code, how are (especially third-party) libraries supposed to handle a violated precondition? Note that I am referring to the actual implementation here, not the interface documentation. You can't throw an exception, because you would have to document it, and then there wouldn't be a precondition anymore. assert() or terminate()? Carry on and let the client taste the full consequences of its negligence? How do you handle this at Boost? > Your best option, if you have time for it -- and if a clean emergency > shutdown will not be interpreted as a crash -- is to institute a > recovery subsystem for critical things that must happen during > emergency shutdown. This is perfectly ok for technical people like you and me. Most customers (those I referred to, at any rate), however, don't care about this sort of thing. There is no such thing as a graceful shutdown for them. If the program stops working, it's a crash. They have little esteem even for the most elaborate and elegant shutdown mechanism. Of course the anger will be less if their documents get saved, compared to a real crash, where they arent't, but they are still angry. And you know what? They are right! I really find myself wearing two hats here: as a developer, I always take the worst case into consideration and want the world to come to a standstill whenever a bug is detected, but as a user's advocate and, more still, as a user myself I don't want to be bugged by programs that decide to drop dead. A good example is the Mozilla family of Web browsers. Every now and again, the otherwise much loved monster will declare that an error has happened, and that the application will be terminated, and would I be so kind to fill out the quality feedback form. It then dies and takes everything that is not saved (such as things you have just typed into a Web form) with it. I have never looked at the source of the Mozilla project, but this behaviour looks suspiciously like abort-on-contract-breach to me. Every time this happens, my developer's admiration for the refined bug reporting mechanism is quickly extinguished by my user's rage. It's technology-centric behaviour. It humiliates users. Yes, there is a theoretical possibility that unwinding, notifying me and offering me the chance to close the browser myself might wake a sleeping demon that goes and formats my hard disk. But that danger is probably much higher with applications that don't bother with DbC in the first place. In all likelyhood, the worst thing that would happen is garbage on the display. > In non-shipping code, asserts should immediately > invoke the debugger, and then invoke emergency recovery and shutdown. > In shipping code, obviously, there's no debugger. I am grateful that you make the distinction between non-shipping and shipping code here. Let me emphasize that from the beginning of this exchange my reservations have been related exclusively to the latter. With non-shipping code, i.e. under lab conditions, my practice has always been to stop immediately, so no objections there. After all, that's the standard behaviour of the C standard library assert() on most platforms. Automatically invoking the debugger is nice if your platform supports it (mine causes the debugger itself to freeze in nine out of ten cases), but that's a detail. That leaves the question what to do in shipping code. Standard C practice (in the sense of what most platforms seem to do - I don't know what the C Standard says) is to let the preprocessor suppress the test and boldly stomp into what may be disastrous. Incidentally, the Eiffel practice (thanks for the link, by the way) seems to be similar: assertion monitoring is usually turned off in shipping code. This is in stark contrast to what has been frequently advocated in this newsgroup. The standard argument is: disabling assertions in shipping code is like leaving the life jackets ashore when you set sail. I find this metaphor rather misleading - assertions are more like self-destruction devices than life jackets - yet the argument cannot be dismissed so easily. What is your position on this? Should assertions in shipping code do nothing, do the same as in non-shipping code, or do something else? Ironically, one of the suggestions I remember having read here is that they should throw exceptions. :-) > Good Luck. Thanks, but the ordeal has been passed already. I may now build in crashs again. *g* > That can only happen if you assert some condition that isn't in the > called function's set of documented preconditions. If the assertion > matches the function's documentation, then it *is* catching a bug. I was referring to preconditions/assertions that aren't, i.e. the kind of error where you think something always holds only to discover there are situations where it legitimately doesn't. In other words, the bug is in your analysis. > As far as I can tell, the Eiffel camp has a similar understanding. > Because the throw-in-response-to-precondition-violation behavior can > be turned on and off globally, you basically can't count on it. Think > of it as one possible expression of undefined behavior. According to the description in the Eiffel tutorial you pointed me to, the behaviour can be specified at the class level, with higher level defaults. It is not up to the individual function to decide. Thus, at least within the bounds of the code you maintain and compile yourself, you tell the runtime what to do when a contract is violated. That is, you *can* count on it. A typical, simple strategy is to turn all checks on during development and turn them off before you ship. > In some languages, throwing an exception is basically the only way to > get a debuggable stack trace. If that's the case in Eiffel, it would > explain why they have the option to throw: it's as close as possible > to invoking the debugger (perhaps it even does so). > > I should also point out that there's some variation among languages > (and even among C++ compilers) in _when_ stack unwinding actually > occurs. For example, in C++, if the exception is never caught, there > may not ever be any unwinding (it's up to the implementation). In > Python, no unwinding happens until the exception backtrace is > _explicitly_ discarded or the next exception is thrown. I don't know > about the details of Eiffel's exception mechanism, but all of these > variations can have a major impact on the danger of throwing in > response to a precondition violation. In other words, you may have to > look a lot deeper to understand the proper relationship of Eiffel to > C++. Certainly. The default effect of an exception in Eiffel seems to be termination. Whether this involves unwinding or not I cannot tell. On my platform (.NET), an exception is a convenient way to get a stack trace. > Absolutely. But I don't think there are as many different definitions > as you seem to think there are. Have you found *any* definitions of > "precondition" other than the Wikipedia one? I'm not talking about > meanings of the word you infer from seeing it used in context. I'm > talking about _definitions_. I think we have reached agreement on the definition; it's the resulting conclusions and practices, and their applicability to different types of software where doubts remain. -- Gerhard Menzl #dogma int main () Humans may reply by replacing the thermal post part of my e-mail address with "kapsch" and the top level domain part with "net". [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: David Abrahams on 12 Sep 2005 13:27
Gerhard Menzl <gerhard.menzl(a)hotmail.com> writes: > David Abrahams wrote: > >> If you find my point to be in contradiction with the goal of >> "executing as little code as possible," then there probably has been a >> misunderstanding. At the point you detect a violated precondition, >> there can usually only be partial recovery. You should do just enough >> work to avoid total catastrophe, if you can. At the point you detect >> a violated precondition, you don't have any way to ensure that the >> actions taken by stack unwinding will be minimal. On the other hand, >> if you have a separate mechanism for registering critical recovery >> actions -- and an agreement among components to use it -- you can >> invoke that, and avoid any noncritical actions. > > I think one reason why I am having difficulties coming to terms with > your position (and why this discussion keeps on going) is that, to > me, the distinction between what *has already happened* when a > violated precondition is detected and what *is going to happen* when > the function continues nevertheless is still somewhat blurred. Of course it is blurry, because you don't know anything about the severity of the problem once a violated precondition is detected. Your options are to be conservative and take emergency shutdown measures, continue -- whether by going forward or unwinding to some other place in the code -- and hope the breakage is not too bad, or sit somewhere in between, placing bets on a case-by-case basis. > There is no dispute that continuing would cause hell to break loose. > After all, that's why the author of the function specified the > precondition in the first place. However, I cannot agree with the > general assumption that hell *has already broken loose* at the point > of detection, and that nothing and nobody can be trusted anymore. It is certainly true that hell has already broken loose. By the time the violation is detected, somebody somewhere has already done something they were told not to do. Whether or not anything can be trusted is a matter of opinion; you can make your own judgements. > Sure, it means that there is a bug that prevents the function from > achieving its goal. But does it also mean that the same bug will > interfere with the operations performed during unwinding? I think I > know what your answer to this is going to be: > >> you don't know that ;-) Exactly. > Do I, or better: does the original function have to? If it is going to make guarantees about robustness in the face of these conditions, then yes. That's where we started this discussion: you wanted to provide documented guarantees of behavior in the face of violated preconditions. If the original function is going to guess about the brokenness of the context in which it was called, then no, it doesn't need to know. However, as I've repeatedly said, the called function usually has little or no knowledge about the context in which it is called, so it's very difficult to make an educated guess. > Precondition specifications aren't normally about complex global states, > they demand that certain local conditions of limited scope be met. Exactly. That's what makes the detecting code particularly unsuited to making educated guesses about severity. > They don't say "the stack is uncorrupted", they say: "this > particular vector must be sorted". If it isn't, it's usually because > the author of the client forgot to sort the vector, or called > another function after the sort that push_backs an element. On what do you base that assessment? Do you have data, or is it just intuition? > In well-designed programs that exhibit a high degree of > encapsulation and low coupling, this should never affect code that > doesn't rely on the sorting of the vector - unless the violation is > a mere effect of a greater mess, like a buffer overrun. True, but what makes you think that sortedness is not part of some much larger global invariant? The sortedness of the vector might be fundamental to the operation of most of the program. > But in that case, starting a separate recovery mechanism is acting > "on a wing and a prayer" as well. Exactly. At that point, everything is a shot in the dark. I bet on the recovery mechanism avoiding total catastrophe because it's the best I can do. > Ultimately, your bet is: the precondition does not hold, great evil > is afoot, I will just manage to perform a number of critical > operations, but performing the non-critical ones would awaken more > evil things, hence I skip them and bail out. Right. > Now I will readily agree that there is a wide range of applications > for which this is just the right bet. With many types of programs, > it will be even better not to try any recovery at all. I am not > convinced, however, that this should be a general guideline, > regardless of the concrete problem domain and application type. You > have brought up the stage lighting example yourself - perhaps we > just differ on the question how rare or how frequent these > applications are. Maybe, maybe not. Of course it's a matter of degree. Programmers in general seldom make the distinction carefully between violated preconditions and conditions that are known to be recoverable. You yourself seem to have had that problem. The pull to throw from a violated precondition, and hope that code somewhere else can deal with the problem, is quite strong. We're loathe to admit that the program is broken, so we bet that something can be done about it elsewhere. Once you start trying to unwind-and-continue from a violated precondition, you -- or someone on your team -- will typically begin to add code for defensive programming (which has a high development cost and often, doesn't actually work), because you now have to make the program "work" even in a broken state. When I say, "it's almost always a mistake to throw from a violated precondition," I am addressing that problem: I want people to think much more carefully about the consequences and be much more conservative about the idea of doing so. If you determine, for whatever reason, that your application is better off betting that things "aren't broken too badly," you should still design the program as though preconditions are never actually violated. In other words, the program should not count on these exceptions and expect to respond to them in useful ways. Anything else leads to a mess. > And if I remember correctly, it was a statement of > mine along the lines of having different strategies for different > applications that started the whole discussion. If you did make such a remark, that wasn't what prompted me to get involved. It was the blurring of the notion of precondition that incited my interest. >> The other reason that unwinding is almost always wrong is that it is >> very prone to losing the information that a bug was detected, and >> allowing execution to proceed as though full recovery has occurred. >> All it takes is passing through a layer like this one: >> >> try >> { >> ...something that detects a precondition violation... >> } >> catch(e1& x) >> { >> translate_or_report(x); >> } >> catch(e2& x) >> { >> translate_or_report(x); >> } >> ... >> catch(...) >> { >> translate_or_report_unknown_error(); >> } >> >> which often occurs at subsystem boundaries. > > I fully agree that ignoring errors and masking bugs is a bad thing and a > reason for concern. But what you are saying here is that because > exceptions might be suppressed or handled in a light-hearted way the > should not be thrown in the first place. In other words, a function that > detects a precondition violation cannot trust a top-level > component. No, that's not the issue. In general, catch blocks like the one above do the right thing. They're not swallowing errors. In this case the precondition violation gets treated like a recoverable error simply because its exception passes through a translation layer. At such a language or subsystem boundary, even if the programmer can anticipate the "violated precondition" exception type thrown by low level code, what would you have him do? What sort of response would you deem "trustworthy?" > I don't think that the mere theoretical possibility of a component > screwing up justifies not giving it a chance. The way I understand > Design by Contract, it's a methodology that pervades the entire program. > It's not a local measure. Abstracting from the other reasons you have > brought forward against throwing exceptions on detecting precondition > violations, the handler would have to look like: > > try > { > // something that detects a precondition violation > } > catch (precondition_violation& pv) > { > // do whatever is appropriate for the type of application: > // log, display a message, abort, whatever > } These layers occur in libraries, where we might not know what's appropriate for the application. > This also raises the question at what level it is appropriate to decide > how to react to a violated precondition. If the proper reaction depends > on the type of application (and by bringing up your stage lighting > example you admit that it does), the decision can only be taken at a > higher level, at any rate not in a general purpose function that isn't > aware of the type of application it resides in. Otherwise, it would not > even be possible for the stage lighting controller to carry on and > project random colours. Yes. See BOOST_ASSERT, which I think uses a sensible approach. > Talking of application-independent, low-level code, how are (especially > third-party) libraries supposed to handle a violated precondition? Note > that I am referring to the actual implementation here, not the interface > documentation. You can't throw an exception, because you would have to > document it, and then there wouldn't be a precondition anymore. assert() > or terminate()? Carry on and let the client taste the full consequences > of its negligence? How do you handle this at Boost? Ditto. :) >> Your best option, if you have time for it -- and if a clean emergency >> shutdown will not be interpreted as a crash -- is to institute a >> recovery subsystem for critical things that must happen during >> emergency shutdown. > > This is perfectly ok for technical people like you and me. Most > customers (those I referred to, at any rate), however, don't care about > this sort of thing. There is no such thing as a graceful shutdown for > them. If the program stops working, it's a crash. Okay, so in your case a clean emergency shutdown will be interpreted as a crash. I once added that strategy to a program that was historically unstable and my customers were immensely grateful that their work wasn't being lost. Of course I fixed a lot of bugs too, but didn't manage to nail all of them. However, the clean emergency shutdown, along with users sending me their files and a description of the action that caused the crash, allowed me to go on fixing more problems until the program was in pretty good shape. > They have little esteem even for the most elaborate and elegant > shutdown mechanism. Of course the anger will be less if their > documents get saved, compared to a real crash, where they arent't, > but they are still angry. And you know what? They are right! Sure. The way to avoid that is to eliminate bugs from the program, not to try to hobble along anyway when a bug is detected. The customer will usually be just as angry when the program doesn't behave as expected because some internal assumption is violated. And you know what? They are right! Hobbling along can even be dangerous for the customer's data, since for them, too, the pull not to admit something is really broken is strong. Who knows what effect the next attempted editing operation or transaction will actually have? > I really find myself wearing two hats here: as a developer, I always > take the worst case into consideration and want the world to come to > a standstill whenever a bug is detected, but as a user's advocate > and, more still, as a user myself I don't want to be bugged by > programs that decide to drop dead. Dropping dead is not the same as a clean emergency shutdown. > A good example is the Mozilla family of Web browsers. Every now and > again, the otherwise much loved monster will declare that an error > has happened, and that the application will be terminated, and would > I be so kind to fill out the quality feedback form. It then dies and > takes everything that is not saved (such as things you have just > typed into a Web form) with it. I have never looked at the source of > the Mozilla project, but this behaviour looks suspiciously like > abort-on-contract-breach to me. Every time this happens, my > developer's admiration for the refined bug reporting mechanism is > quickly extinguished by my user's rage. It's technology-centric > behaviour. It humiliates users. Yes, there is a theoretical > possibility that unwinding, notifying me and offering me the chance > to close the browser myself might wake a sleeping demon that goes > and formats my hard disk. But that danger is probably much higher > with applications that don't bother with DbC in the first place. In > all likelyhood, the worst thing that would happen is garbage on the > display. Maybe. You don't really know that, do you? Anyway browsers are an unusual case, since they're primarily viewers. If they don't contain some carefully-crafted bit of the user's work that will be lost on a crash, it's probably okay... hmm, but wait: there's webmail. So I could lose this whole message unless it gets written to disk and I get the chance to start over. Oh, and there are all kinds of scam sites that masquerade as secure and trustworthy, which might be easily mistaken for legit if the user begins to overlook garbage on the screen. As a matter of fact, security is a big deal for web browsers. They're used in all kinds of critical applications, including banking. Oh, and there are plugins, which could be malicious and might be the cause of the violation detected. No, I don't think we want the browser pressing ahead when it detects a bug, not at all. I think this is a perfect case-in-point. > That leaves the question what to do in shipping code. Standard C > practice (in the sense of what most platforms seem to do - I don't > know what the C Standard says) is to let the preprocessor suppress > the test and boldly stomp into what may be disastrous. Incidentally, > the Eiffel practice (thanks for the link, by the way) seems to be > similar: assertion monitoring is usually turned off in shipping > code. That can be a good policy, because programmers concerned about efficiency will never be deterred from writing assertions on the basis of slowing down the shipping program. > This is in stark contrast to what has been frequently advocated in > this newsgroup. The standard argument is: disabling assertions in > shipping code is like leaving the life jackets ashore when you set > sail. One or two vocal advocates of that approach do not a consensus make. I've never agreed with it. > I find this metaphor rather misleading - assertions are more like > self-destruction devices than life jackets - yet the argument cannot > be dismissed so easily. What is your position on this? Should > assertions in shipping code do nothing, do the same as in > non-shipping code, or do something else? The correct policy depends on the application and your degree of confidence in the code. > Ironically, one of the suggestions I remember having read here is > that they should throw exceptions. :-) There are lots of different opinions out there. I'm sure you could find an advocate for anything if you look hard enough. >> That can only happen if you assert some condition that isn't in the >> called function's set of documented preconditions. If the assertion >> matches the function's documentation, then it *is* catching a bug. > > I was referring to preconditions/assertions that aren't, i.e. the kind > of error where you think something always holds only to discover there > are situations where it legitimately doesn't. In other words, the bug is > in your analysis. Yeah, I was accounting for that possibility. It's still a bug. >> Have you found *any* definitions of "precondition" other than the >> Wikipedia one? I'm not talking about meanings of the word you >> infer from seeing it used in context. I'm talking about >> _definitions_. > > I think we have reached agreement on the definition; Well, that's progress, anyway. -- Dave Abrahams Boost Consulting www.boost-consulting.com [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |