Prev: Which ode solver is the best for this problem? (Duffing oscillator)
Next: IOSTAT for input from sequential formatted files with ?"incomplete" final record
From: Gordon Sande on 6 Feb 2010 10:21 On 2010-02-06 11:02:25 -0400, George White <aa056(a)chebucto.ns.ca> said: > On Thu, 4 Feb 2010, Ron Shepard wrote: > >> In article >> <de9d0fd6-33f8-44e4-8e03-20428eaf7451(a)21g2000yqj.googlegroups.com>, >> deltaquattro <deltaquattro(a)gmail.com> wrote: >> >>> when writing code which performs a numerical task such as for example >>> interpolation, would you add the checks on the input data inside the >>> sub itself, or would you demand it to other code to be called before >>> the interpolation subroutine? For example, in 1D linear interpolation >>> you have to lookup an ordered table for the position of a value x, >>> before performing interpolation. Checking for x falling inside or >>> outside the interval spanned by the table should be done inside or >>> outside the interpolation sub? I'd say outside, but then when I reuse >>> the code I risk forgetting the call to the checking sub... >> >> There isn't a universal answer to this. If the checks are relatively >> inexpensive (time, memory, cache, etc.), then you should add them in the >> low level routine so they are always active. If they are expensive, or >> in the gray area between, then you have options. You might write two >> versions of the routine, one with and one without the checks. You would >> call the fast one when speed is critical (e.g. inside innermost >> do-loops) and where it makes sense to test the arguments outside the >> low-level routine (e.g. outside of the innermost do-loops), and you >> would call the safe one when speed is not critical or where nothing is >> saved by testing outside of the call. Or, you might have a single >> version of the routine, but do the internal tests conditionally based on >> the value of an argument (or the existence of an optional argument). >> This is one of the common uses for optional arguments. > > If the chances of bad values are small, sometimes it is better to just > go ahead and compute garbage, then catch the bad value outside the > routine. Many functions are designed return a special value in case of > errors, e.g., a function whose value should be positive can return > negative values to signal various types of errors. When using lookup > tables it is often > simple to add two extra bins for the out-of-range values and adjust the > code to map the out-of-range values to those bins. This is generally > much easier to handle in parallel processing than putting error handling > in low-level code -- you end up with 99 values that were computed quickly > but the programs stalls waiting on the one value that triggered an > error handler. This has been a problem with some implementations of > basic math libraries where exp(-bignum) triggers a slow handler to > decide whether to underflow and return 0 or trigger an exception. It is often convenient to classify both users and applications as sophisicated or not. The advice you are offering is for sophisicated users in sophisicated applications and is perfectly reasonable. For the other three combinations it is not so suitable. A user who has to ask has pretty clearly indicated that they are not a sophisicated user with a sophisicated application. Recall the standard comment on the pricing of yachts. The first instruction for obtaining a drivers license is rarely, if ever, done with a Formula 1 car on a closed track. There are specialized schools for those who want to try their hand with high performance vehicles but they assume considerable prior experience. >> The same issue applies also within the routine regarding what to do if >> it detects an error. Should it return an error code, or should it abort >> internally? You can write a single routine with an optional return >> argument that handles both situations. If the argument is present, then >> it can be set with an appropriate error code and return to the caller, >> otherwise the routine can abort on the spot. > > In some situations it is useful to keep a count of the errors so you > can provide a table. In my work (remote sensing) you have a data set > with >>> 10^6 records, many with missing/invalid data. You want to compute what > makes sense for each record, and keep statistics for the various data > problems so you can generate a summary table. The Slatec xerror > package provides this capability.
From: Dan Nagle on 6 Feb 2010 11:47 Hello, On 2010-02-06 10:02:25 -0500, George White <aa056(a)chebucto.ns.ca> said: > Many functions are designed return a special value in case of errors, > e.g., a function whose value should be positive can return negative > values to signal various types of errors. The use of magic numbers should be shunned. It is all too easy to bring an out-of-range into range when use of software is generalized, as it surely will be eventually. This advice applies to derived types as well as intrinsic types. Add a valid/invalid value flag to your derived type instead of guessing future use. Just because the current programmer is sophisticated doesn't imply that all future programmers will be. -- Cheers! Dan Nagle
From: Richard Maine on 6 Feb 2010 12:56 Gordon Sande <g.sande(a)worldnet.att.net> wrote: > On 2010-02-06 11:02:25 -0400, George White <aa056(a)chebucto.ns.ca> said: > > > If the chances of bad values are small, sometimes it is better to just > > go ahead and compute garbage, then catch the bad value outside the > > routine. Many functions are designed return a special value in case of > > errors, > It is often convenient to classify both users and applications as sophisicated > or not. The advice you are offering is for sophisicated users in sophisicated > applications and is perfectly reasonable. For the other three combinations > it is not so suitable. A user who has to ask has pretty clearly indicated that > they are not a sophisicated user with a sophisicated application. Recall the > standard comment on the pricing of yachts. Yes. I have trouble imagining ever recommending the return of such flag values. Maybe there might be cases where it is a good idea, but if someone is asking, I'd say that pretty much rules them out as a good candidate. Of course, there are also cases where people do it without asking and shouldn't have either, but it is harder for me to do much about those. One of the "war stories" that I'm sure I have repeated here before relates to exactly such a poor decision, made by people who allege to have been professionals. In processing data from the first Space Shuttle entry, we recieved a tape (yes, that was the usual data transfer means of the day) with what was alleged to be the "best estimated trajectory" fr teh entry. It took only a casual glance at the data to see that it was ludicrous. For a large portion of the entry, the values were many orders of magnitude from plausible. It turns out that there were three errors. 1. An error in algorithm. Atmospheric data was gathered at a handful of points roughly along the expected flight path. One needed to interpolate from those points to estimate the data at any particular vehicle location. Interpolating between sparse and irregularly positioned points on the surface of the earth can be non-trivial. They chose an algorithm that might have sounded reasonable, but had a serious flaw. In particular, they intended to interpolate between the two points that most closely bracketed the current estimated vehicle location. They selected the two points by taking the two that were closest to the current estimated location. Unfortunately, that algorithm is only reasonable when the spacing is at least somewhat uniform. There was one data station in Hawaii and another an Vandenberg (on the Pacific coast of California). After that, the stations were a *LOT* more closely spaced. )THis was for a west-coast landing). I think the next one was in Bakersfield, which is maybe a hundred miles from Vandenberg. Slightly over halfway between Hawaii and Vandenberg, the two closest stations became Vandenberg and Bakersfield. Oops. 2. The use of flag values in tables. The Bakersfield data was not expected to be used until after the Shuttle had passed Vandenberg. No data was gathered for Bakersfield the higher altitudes that the Shuttle was at over the Pacific. Those table locations were filled with 999. flag values. (I don't recall what machines were used, but they certainly didn't have IEEE floatting point or anything else that had hardware flag vaues like NaNs). 3. The failure to do any plausibility checking at all, inside the interpolation routine, outside of it, or in the final product. It turns out that, for example, 999 is not a very reasonable value for atmospheric density in slugs per cubic feet. The standard value at sea level is something like 0.002378, if my memory serves me. (It has been a long time since I needed that number; surprized that it popped into my mind at all; I wouldn't advise using it without checking). Not only did they not check the value from the table, they evidently never even glanced at the final results on their output tape. Using 999 for density results in pretty "impressive" dynamic pressure values, for an example that I recall pointing out. When I complained about the impausible values, they explained that no, they didn't check for plausibility because they were "computer people" instead of aero engineers. They just computed the numbers and that it was up to us to tell them whether the numbers looked reasonable. The concept that they might check their own work was foreign to them. I resolved never again to use any data from the group in question. I'm sure they thought of themselves as professionals; people like that give a bad name to the profession. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: deltaquattro on 8 Feb 2010 09:48 On 6 Feb, 16:02, George White <aa...(a)chebucto.ns.ca> wrote: > On Thu, 4 Feb 2010, Ron Shepard wrote: [..] > > If the chances of bad values are small, sometimes it is better to just go > ahead and compute garbage, then catch the bad value outside the routine. > Many functions are designed return a special value in case of errors, > e.g., a function whose value should be positive can return negative values > to signal various types of errors. When using lookup tables it is often > simple to add two extra bins for the out-of-range values and adjust the > code to map the out-of-range values to those bins. This is generally > much easier to handle in parallel processing than putting error handling > in low-level code -- you end up with 99 values that were computed quickly > but the programs stalls waiting on the one value that triggered an > error handler. This has been a problem with some implementations of > basic math libraries where exp(-bignum) triggers a slow handler to > decide whether to underflow and return 0 or trigger an exception. Hi, George, great! The idea of using two extra bins for the out-of-range values fit perfectly in my actual code structure. My Locate function searchs the lookup table for the index of the array element xArr(i) (if any) whose value is less than x (the interpolation point). So I can assign i=0 if x < min(xArr), and i=n if i=n if x > max(xArr). This way I have been able to keep Locate as a function, without having to redefine it asubroutine just to return the error code. > > > The same issue applies also within the routine regarding what to do if > > it detects an error. Should it return an error code, or should it abort > > internally? You can write a single routine with an optional return > > argument that handles both situations. If the argument is present, then > > it can be set with an appropriate error code and return to the caller, > > otherwise the routine can abort on the spot. > > In some situations it is useful to keep a count of the errors so you can > provide a table. In my work (remote sensing) you have a data set with>>10^6 records, many with missing/invalid data. You want to compute what > > makes sense for each record, and keep statistics for the various data > problems so you can generate a summary table. The Slatec xerror package > provides this capability. > Interesting, I will look into that. Thanks, Best Regards, deltaquattro > -- > George White <aa...(a)chebucto.ns.ca> <g...(a)acm.org> > 189 Parklea Dr., Head of St. Margarets Bay, Nova Scotia B3Z 2G6
From: deltaquattro on 8 Feb 2010 11:28
On 6 Feb, 17:47, Dan Nagle <danna...(a)verizon.net> wrote: > Hello, > > On 2010-02-06 10:02:25 -0500, George White <aa...(a)chebucto.ns.ca> said: > > > Many functions are designed return a special value in case of errors, > > e.g., a function whose value should be positive can return negative > > values to signal various types of errors. > > The use of magic numbers should be shunned. > > It is all too easy to bring an out-of-range > into range when use of software is generalized, > as it surely will be eventually. > > This advice applies to derived types as well > as intrinsic types. Add a valid/invalid value flag > to your derived type instead of guessing future use. > > Just because the current programmer is sophisticated doesn't imply > that all future programmers will be. > > -- > Cheers! > > Dan Nagle Hi, Dan, I understand your concern, as also expressed by Gordon and Richard. As a general rule, for sure you're right. However, in the specific case of my Locate function called by LinearInterp, as modified using George's idea, I think it's safe to rely on out-of-range values: IMPLICIT NONE CONTAINS INTEGER FUNCTION locate(xArr,x) IMPLICIT NONE REAL, DIMENSION(:), INTENT(IN) :: xArr REAL, INTENT(IN) :: x ! dato un array xArr(1:n) ordinato ed un valore x, locate è l'indice j t.c. ! x è in [xArr(j),xArr(j+1)], se xArr è crescente, altrimenti in [xArr(j+1),xArr(j)]. ! Se x < xArr(1) (x>xArr(1)) per xArr crescente (xArr decrescente) allora j = 0 ! Se x > xArr(n) (x<xArr(n)) per xArr crescente (xArr decrescente) allora j = n INTEGER jl,jm,ju,n LOGICAL ascend ! Inizializzazione n = size(xArr) jl = 0 ju = n+1 ascend = (xArr(n) >= xArr(1)) if (x == xArr(1)) then locate = 1 else if (x == xArr(n)) then locate = n-1 else ! Ricerca do while (ju-jl > 1) jm = (ju+jl)/2 if (ascend .eqv. (x >= xArr(jm))) then jl = jm else ju = jm endif end do locate = jl endif END FUNCTION locate END MODULE LOCATE_M I think it's safe to add the two out-of-range bins as 0 and n (with n being the length of xArr). The reason is that, because of the way xArr is declared, I know that the indices of xArr go from 1 to n, and that jl is forced to be between 1 and n-1, if x is not outside [min(xArr),max(xArr)]. So I can use 0 and n as out-of-range values. Clearly, in general it's highly risky, or even downright impossible, to rely on out-of-range values to return an error flag from a function. How could I do that, without having to rewrite all my functions as subroutines, in order to allow for two output parameters? Which is the solution you choose in your codes? Do you define the function as a derived type? Thanks, Best Regards deltaquattro |