How to deal with "missing points" in arrays [Fortran]

Prev: VAX VMS Fortran Source
Next: New Intel Visual Fortran user

From: deltaquattro on 9 Jun 2010 12:31

Hi,

this is really more of a "numerical computing" question, so I cross-
post to sci.math.num.analysis too. I decided to post on
comp.lang.fortran, anyway, because here is full of computational
scientists and anyway there are some sides of the issue specifically
related to Fortran language.

The problem is this: I am modifying a legacy code, and I need to
compute some REAL values which I then store in large arrays. Sometimes
it's impossible to compute these values: for example, think of
interpolating a table to a given abscissa, it may happen that the
abscissa falls outside the curve boundaries. I have code which checks
for this possibility, and if this happens the interpolation is not
performed. However, now I must "store" somewhere the information that
interpolation was not possible for that array element, and inform the
user of it. Since the values can be either positive or negative, I
cannot use tricks like initializing the array element to a negative
values.

I'm sure this has happened to you before: which solution did you use?
Basically, I can think of three ways:

1. For each REAL array, I declare a LOGICAL array of the same shape,
which contains 0 for correct values and 1 for missing values. I guess
that's the cleanest way, but I have a lot of arrays and I'd rather not
declare an extra array for each of them. I know it's not a memory
issues (obviously LOGICAL arrays don't occupy a lot of space, even if
they do are big in my case!), but to me it seems like I'm adding
redundant code. It would be better to declare arrays of a derived
type, each element containing a REAL and a LOGICAL, but this would
force me to modify the code in all the places where the arrays are
used, and it's quite a big code.

2. I initialize a missing value to an extremely large positive or
negative value, like 9e99. I think that's how the problem is usually
solved in practice, isn't it? I'm a bit worried that this is not
entirely "clean", since such values could in theory also result from
the interpolation. However, since reasonable values of all the
interpolated quantities are usually in the range -100/100, when this
happens usually it is related to errors in the interpolation table
data. So most likely it indicates an error which must be signaled to
the user.

3. One could initialize the "missing" values to NaN. However, I then
have to test for the array element being a NaN, when I produce my
output for the user. From what I remember about Fortran and NaN,
there's (or there was) no portable way to do this...am I wrong?

I would really appreciate your help on this issue, since I really
don't know which way to choose and currently I'm stuck! Thanks in
advance,

Best Regards

Sergio Rossi

From: Ian Bush on 9 Jun 2010 13:58

On 9 June, 17:31, deltaquattro <deltaquat...(a)gmail.com> wrote:
>
> 3. One could initialize the "missing" values to NaN. However, I then
> have to test for the array element being a NaN, when I produce my
> output for the user. From what I remember about Fortran and NaN,
> there's (or there was) no portable way to do this...am I wrong?
>

Well in f2003 there is the ieee_is_nan function. You'll need the
ieee_arithmetic module (I think) to use it.

I must admit, however, I haven't used this and have no feel for how
widely
implemented this is yet,

Ian

From: glen herrmannsfeldt on 9 Jun 2010 14:23

In comp.lang.fortran deltaquattro <deltaquattro(a)gmail.com> wrote:

(snip on applicability)

> The problem is this: I am modifying a legacy code, and I need to
> compute some REAL values which I then store in large arrays. Sometimes
> it's impossible to compute these values: for example, think of
> interpolating a table to a given abscissa, it may happen that the
> abscissa falls outside the curve boundaries. I have code which checks
> for this possibility, and if this happens the interpolation is not
> performed. However, now I must "store" somewhere the information that
> interpolation was not possible for that array element, and inform the
> user of it. Since the values can be either positive or negative, I
> cannot use tricks like initializing the array element to a negative
> values.

> I'm sure this has happened to you before: which solution did you use?
> Basically, I can think of three ways:
>
> 1. For each REAL array, I declare a LOGICAL array of the same shape,
> which contains 0 for correct values and 1 for missing values. I guess
> that's the cleanest way, but I have a lot of arrays and I'd rather not
> declare an extra array for each of them. I know it's not a memory
> issues (obviously LOGICAL arrays don't occupy a lot of space, even if
> they do are big in my case!),

In Fortran, the default LOGICAL is the same size as default REAL.
You may have a smaller size available, though.

> but to me it seems like I'm adding
> redundant code. It would be better to declare arrays of a derived
> type, each element containing a REAL and a LOGICAL, but this would
> force me to modify the code in all the places where the arrays are
> used, and it's quite a big code.

It is hard to say. There are some cache issues, as well as
readability.

> 2. I initialize a missing value to an extremely large positive or
> negative value, like 9e99. I think that's how the problem is usually
> solved in practice, isn't it? I'm a bit worried that this is not
> entirely "clean", since such values could in theory also result from
> the interpolation.

Well, you could check for the (unlikely) accidental occurance and
substitute a different (nearby) value. Note that 9e99 is too big
for the single precision REAL on most systems. I believe that
before IEEE, this was the usual solution. Likely still even with
IEEE, as long as non-IEEE machines are around.

> However, since reasonable values of all the
> interpolated quantities are usually in the range -100/100, when this
> happens usually it is related to errors in the interpolation table
> data. So most likely it indicates an error which must be signaled to
> the user.

In general testing for a specific floating point value isn't
a good idea, but it is likely done in this case.

> 3. One could initialize the "missing" values to NaN. However, I then
> have to test for the array element being a NaN, when I produce my
> output for the user. From what I remember about Fortran and NaN,
> there's (or there was) no portable way to do this...am I wrong?

You have to test in all cases, so I don't see the difference.
Many will print out a nice value, such as NaN, in the case of NaN,
such that you don't have to test. Portable NaN testing is fairly
new to Fortran. This is probably the best solution going forward.

> I would really appreciate your help on this issue, since I really
> don't know which way to choose and currently I'm stuck! Thanks in
> advance,

I don't recommend the LOGICAL variable method, unless it is
necessary to have all REAL values legal. If you need portability
to older compilers, you could do conditional compilation on a
test for NaN or 9.9999e30 (fits in single precision on most
machines), or 9.9999e99 (in double precision on many machines).

-- glen

From: Thomas Koenig on 9 Jun 2010 15:22

On 2010-06-09, deltaquattro <deltaquattro(a)gmail.com> wrote:
> 1. For each REAL array, I declare a LOGICAL array of the same shape,

Default logical has the same size as default real. You could check
if your compiler has a smaller kind. gfortran, for example, supports
logical(kind=1), which occupies one byte.

> which contains 0 for correct values and 1 for missing values.

A nit: logical arrays only contain .TRUE. and .FALSE. How they are
implemented internally is processor dependent.

> 2. I initialize a missing value to an extremely large positive or
> negative value, like 9e99.

I don't in-line signalling much. It is likely to bite you in the
one place in your program where you didn't think to check for it.
Murphy dictates that there will be at least one such place ;-)

> 3. One could initialize the "missing" values to NaN.

Probably the best way. You could isolate the check in a module, in
a single, system-dependent function. Chances are most compilers will
have either isnan() or the F2003 IEEE feature.

There is a fourth method, which depends on the way that you process your
data. If you walk through it linearly and you have few invalid points, you
could keep a list of invalid points, and check for the presence of that
point on the list.

From: Harald Anlauf on 9 Jun 2010 16:02

On Jun 9, 6:31 pm, deltaquattro <deltaquat...(a)gmail.com> wrote:
> 2. I initialize a missing value to an extremely large positive or
> negative value, like 9e99. I think that's how the problem is usually
> solved in practice, isn't it? I'm a bit worried that this is not
> entirely "clean", since such values could in theory also result from
> the interpolation. However, since reasonable values of all the
> interpolated quantities are usually in the range -100/100, when this
> happens usually it is related to errors in the interpolation table
> data. So most likely it indicates an error which must be signaled to
> the user.

Some external data formats like NetCDF have the concept of a "missing
value"
that you define for a variable or an arbitrary-rank array, which can
be
inquired when reading in the data. It need not be the same fixed
value
for all variables in a file. I would therefore recommend to use a
variable which you compare to instead of a certain "magic" number.

Cheers,
Harald

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: VAX VMS Fortran Source
Next: New Intel Visual Fortran user