Intel fortran (em64t) very low-accuracy results on x86

Prev: Some matlab -> fortran translation
Next: minor typo in The Fortran 2003 Handbook?

From: Steve Lionel on 26 Dec 2009 20:01

On 12/26/2009 5:33 PM, News user wrote:

> I have access to 3 differtent types of intel-CPU linux computers
> running either intel ifort
> (versions 9, 10 and 11) or g77:
> (a) Pentimum 4 using intel 32-bit compilers
> (b) Dual core Itanium 2 (linux ia64) using intel 64 compilers
> (c) Intel Xeon 5300 quad core Processor (linux x86_64) using intel
> em64t compilers

> Machines (a) and (b) produce results identical to 9 significant digits
> (s.d.) but machine (c)
> produce results with only 4 s.d. Note that my code uses double-
> precision and the systems
> are well-conditioned, and thus I expect very limited round-off error,
> and in particular,
> agreement close to 14 s.d.

Please post and provide more details in our user forum (link below). In
particular, please tell us which compiler options you are using. If you
can provide a test case, that would be best as otherwise all we can do
is speculate.

It would be interesting for you to take the executable from system a and
run it on system c to see if it gets different answers. For this test,
be sure you are not using the "-ax" option that does automatic CPU type
dispatching.

--
Steve Lionel
Developer Products Division
Intel Corporation
Nashua, NH

For email address, replace "invalid" with "com"

User communities for Intel Software Development Products
http://software.intel.com/en-us/forums/
Intel Software Development Products Support
http://software.intel.com/sites/support/
My Fortran blog
http://www.intel.com/software/drfortran

From: Tim Prince on 26 Dec 2009 20:05

News user wrote:

>
> Note that on machine (c) g77 works ok - it is the
> intel ifort em64t compiler which produces the low 4 s.d. agreement.
Besides, when using ifort, you should be using options such as
-assume protect_parens,minus0,byterecl -prec-div -prec-sqrt
for compatibility with standards and other compilers. Your failure to
discuss this indicates you are likely using inconsistent options.
g77 32-bit defaults to x87 extended 80-bit format for evaluation of
expressions; I never had much luck with g77 64-bit.

From: News user on 27 Dec 2009 00:01

> Please post and provide more details in our user forum (link below). In
> particular, please tell us which compiler options you are using. If you
> can provide a test case, that would be best as otherwise all we can do
> is speculate.
>
> It would be interesting for you to take the executable from system a and
> run it on system c to see if it gets different answers. For this test,
> be sure you are not using the "-ax" option that does automatic CPU type
> dispatching.
> Steve Lionel

On Dec 26, 8:05 pm, Tim Prince <TimothyPri...(a)sbcglobal.net> wrote:
> News user wrote:
>
> > Note that on machine (c) g77 works ok - it is the
> > intel ifort em64t compiler which produces the low 4 s.d. agreement.
>
> Besides, when using ifort, you should be using options such as
> -assume protect_parens,minus0,byterecl -prec-div -prec-sqrt
> for compatibility with standards and other compilers. Your failure to
> discuss this indicates you are likely using inconsistent options.
> g77 32-bit defaults to x87 extended 80-bit format for evaluation of
> expressions; I never had much luck with g77 64-bit.

Dear Steve and Tim:

Many thanks for your answers.

Using ifort I compiled my code using:
ifort -O2 program.f -o program
or via MKL with:
ifort [-openmp] -O2 program.f -L$MKLPATH \
-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/
libmkl_intel_thread.a \
$MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -
lpthread \
-o program

i.e. w/o any Floating Point options thinking that the same version of
compiler will have the same default FP options on the different
machines.

Compiling again by adding -mp or the faster -fp-model precise before -
O2
in the commands above, I manage to get at least 9 s.d. agreement on
all types of machines.

As Steve suggested, transfering the executable from machine (a) to
machine (c)
I got identical results to the 14 s.d. I print out.

Using -assume protect_parens,minus0,byterecl -prec-div -prec-sqrt
(without -mp or -fp-model precise) it did not improve the results
on machine (c), i.e I still got 4 s.d. agreement.
Tim, may I suppose that the options you suggested need to be combined
with
some -fp-model option?

Based on the above I have the following two questions:

(a) What are the Floating Point options (or any options affecting
precision)
used in compiling MKL by intel?

(b) If somebody wants to receive the maximum combatability or FL
precision
what are all the options he has to use in compiling with ifort?

Many thanks again!

From: Tim Prince on 27 Dec 2009 01:28

News user wrote:
>> Please post and provide more details in our user forum (link below). In
>> particular, please tell us which compiler options you are using. If you
>> can provide a test case, that would be best as otherwise all we can do
>> is speculate.
>>
>> It would be interesting for you to take the executable from system a and
>> run it on system c to see if it gets different answers. For this test,
>> be sure you are not using the "-ax" option that does automatic CPU type
>> dispatching.
>> Steve Lionel
>
>
> On Dec 26, 8:05 pm, Tim Prince <TimothyPri...(a)sbcglobal.net> wrote:
>> News user wrote:
>>
>>> Note that on machine (c) g77 works ok - it is the
>>> intel ifort em64t compiler which produces the low 4 s.d. agreement.
>> Besides, when using ifort, you should be using options such as
>> -assume protect_parens,minus0,byterecl -prec-div -prec-sqrt
>> for compatibility with standards and other compilers. Your failure to
>> discuss this indicates you are likely using inconsistent options.
>> g77 32-bit defaults to x87 extended 80-bit format for evaluation of
>> expressions; I never had much luck with g77 64-bit.
>
>
> Dear Steve and Tim:
>
> Many thanks for your answers.
>
> Using ifort I compiled my code using:
> ifort -O2 program.f -o program
> or via MKL with:
> ifort [-openmp] -O2 program.f -L$MKLPATH \
> -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/
> libmkl_intel_thread.a \
> $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -
> lpthread \
> -o program
>
> i.e. w/o any Floating Point options thinking that the same version of
> compiler will have the same default FP options on the different
> machines.
>
> Compiling again by adding -mp or the faster -fp-model precise before -
> O2
> in the commands above, I manage to get at least 9 s.d. agreement on
> all types of machines.
>
> As Steve suggested, transfering the executable from machine (a) to
> machine (c)
> I got identical results to the 14 s.d. I print out.
>
> Using -assume protect_parens,minus0,byterecl -prec-div -prec-sqrt
> (without -mp or -fp-model precise) it did not improve the results
> on machine (c), i.e I still got 4 s.d. agreement.
> Tim, may I suppose that the options you suggested need to be combined
> with
> some -fp-model option?
>
> Based on the above I have the following two questions:
>
> (a) What are the Floating Point options (or any options affecting
> precision)
> used in compiling MKL by intel?
>
> (b) If somebody wants to receive the maximum combatability or FL
> precision
> what are all the options he has to use in compiling with ifort?
>
> Many thanks again!
>
>
>
>
-fp-model source (or precise) include the options
-assume protect_parens -prec-div -prec-sqrt -no-ftz
and disable optimization of sum reduction.
-mp produces a mixture of SSE/SSE2 (source precision evaluation) and x87
(double precision plus extended range evaluation) code. Its success in
your case tends to indicate you need the extra accuracy or range of
double precision.
I assumed you could accept abrupt underflow (-ftz), as it is normally in
use on ia64. If it makes as much difference as you are reporting now,
you must have many operations on values of magnitude < 1e-32 if you have
inadvertently kept some single precision, or < 1e-296 if double.
MKL compile options aren't documented but definitely include full
vectorization and little or no x87. For the most part, the abrupt or
gradual underflow setting would be inherited from the setting in your
main program, unless you use ieee_set_underflow_mode().

From: Terence on 27 Dec 2009 02:35

On Dec 27, 10:18 am, News user <ara...(a)gmail.com> wrote:

> I realized that I forgot to mentioned some details on my code:
>
> I use f77 fortran and I have double-checked my code with ftnchek;
> and yes I always write very carefully using double precision constants
> (e.g. 3.5d0) and all my real variables are defined to be real*8.
> (The code is quite complicated and long to be included in a message.)
>
> Note that on machine (c) g77 works ok - it is the
> intel ifort em64t compiler which produces the low 4 s.d. agreement.

I wonder if you are specifying a single-precision square root function
in an environement where there are reserved names for the various
levels of desired precision?

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Some matlab -> fortran translation
Next: minor typo in The Fortran 2003 Handbook?

Intel fortran (em64t) very low-accuracy results on x86_64