lost of efficiency of operator overloading [Fortran]

Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer

From: Hifi-Comp on 26 May 2010 22:37

I am writing a code using operator overloading feature of F90/95. The
basic math is to replace all real(8) with two real(8) contained in a
type named DUAL_NUM, and overloading the corresponding calculations
according to newly defined data. Based on the coding, the computing
should not more than three times more than computing for real(8).
However, my simple test shows that computing for DUAL_NUM is almost
nine times more expensive. I hope some of your knowledgable Fortran
experts can help me figure out the loss of efficiency and how can I
make the code more efficient. Thanks alot!

TYPE,PUBLIC:: DUAL_NUM
REAL(8)::x_ad_
REAL(8)::xp_ad_
END TYPE DUAL_NUM

PUBLIC OPERATOR (+)
INTERFACE OPERATOR (+)
MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL
END INTERFACE

PUBLIC OPERATOR (*)
INTERFACE OPERATOR (*)
MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL
END INTERFACE

ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res)
TYPE (DUAL_NUM), INTENT(IN)::u,v
TYPE (DUAL_NUM)::res
res%x_ad_ = u%x_ad_+v%x_ad_
res%xp_ad_ = u%xp_ad_+v%xp_ad_
END FUNCTION ADD_DD

ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res)
TYPE (DUAL_NUM), INTENT(IN)::u,v
TYPE (DUAL_NUM)::res
res%x_ad_ = u%x_ad_*v%x_ad_
res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_
END FUNCTION MULT_DD

The segment of the original code:
REAL(8):: x, y, z,f
x=1.0d0;y=2.0d0;z=0.3d0

!**********************************
DO i=1,50000000
f=x-y*z
ENDDO
!**********************************

The do loop runs for 0.516 seconds.

The corresponding overloaded code:
TYPE(DUAL_NUM):: x,y,z,f

x=DUAL_NUM(1.0d0,1.0D0);
y=DUAL_NUM(2.0d0,1.0D0);
z=DUAL_NUM(0.3d0,0.0D0)

!**********************************
DO i=1,50000000
f=X-y*z
ENDDO
!*********************************
The do loop runs for 4.513 seconds.

Supposedly, for DUAL_NUM, the operations needed for minus are twice as
those needed for REAL, and the operations needed for times are thrice
as those needed for REAL. That is the time needed for computation
should not be more than three times of computation for real. However,
the overall time is almost nine times more. What else takes more time?

From: glen herrmannsfeldt on 27 May 2010 00:35

Hifi-Comp <wenbinyu.heaven(a)gmail.com> wrote:

> I am writing a code using operator overloading feature of F90/95. The
> basic math is to replace all real(8) with two real(8) contained in a
> type named DUAL_NUM, and overloading the corresponding calculations
> according to newly defined data. Based on the coding, the computing
> should not more than three times more than computing for real(8).
> However, my simple test shows that computing for DUAL_NUM is almost
> nine times more expensive. I hope some of your knowledgable Fortran
> experts can help me figure out the loss of efficiency and how can I
> make the code more efficient. Thanks alot!

(snip of code)

I will guess that it is function call overhead. Also, the function
call might flush any pipelining that the processor might otherwise
be doing to speed up the calculation.

If you are really in a hurry, you might try to do it
with C preprocessor macros instead of operator overloading.
That is a little less convenient (you write it like function calls),
but maybe not so bad.

-- glen

From: Richard Maine on 27 May 2010 01:13

glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:

> Hifi-Comp <wenbinyu.heaven(a)gmail.com> wrote:
>
> > I am writing a code using operator overloading feature of F90/95. The
> > basic math is to replace all real(8) with two real(8) contained in a
> > type named DUAL_NUM, and overloading the corresponding calculations
> > according to newly defined data. Based on the coding, the computing
> > should not more than three times more than computing for real(8).
> > However, my simple test shows that computing for DUAL_NUM is almost
> > nine times more expensive. I hope some of your knowledgable Fortran
> > experts can help me figure out the loss of efficiency and how can I
> > make the code more efficient. Thanks alot!

> I will guess that it is function call overhead.

I'd guess the same (not having really studied the code much, but it seem
slikely on the surface). Estimating computation time by counting
floatting point operations was once at least reasonable as a first
guess. But "once" means several decades ago.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

From: steve on 27 May 2010 01:38

On May 26, 10:13 pm, nos...(a)see.signature (Richard Maine) wrote:
> glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote:
> > Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote:
>
> > > I am writing a code using operator overloading feature of F90/95. The
> > > basic math is to replace all real(8) with two real(8) contained in a
> > > type named DUAL_NUM, and overloading the corresponding calculations
> > > according to newly defined data. Based on the coding, the computing
> > > should not more than three times more than computing for real(8).
> > > However, my simple test shows that computing for DUAL_NUM is almost
> > > nine times more expensive. I hope some of your knowledgable Fortran
> > > experts can help me figure out the loss of efficiency and how can I
> > > make the code more efficient. Thanks alot!
> > I will guess that it is function call overhead.
>
> I'd guess the same (not having really studied the code much, but it seem
> slikely on the surface). Estimating computation time by counting
> floatting point operations was once at least reasonable as a first
> guess. But "once" means several decades ago.
>

Once I fix the code and add the missing pieces, I get a
timing of 2 microseconds for the do-loop involving the REAL(8)
entities and 1 microsecond for the do-loop with the user
defined type.
laptop:kargl[205] gfc4x -o z -O3 -fwhole-program gh.f90
laptop:kargl[206] ./z
2.00001523E-06
1.00000761E-06

The first number is the REAL(8) loop timing in seconds and
the 2nd is the user defined type loop.

gfc4x is gfortran from gcc version 4.6.0 20100520.

Basically, common sub-expression eliminations lifts the
expression outside the loops. The loops are then eliminated
because they are empty loops.

--
steve

From: m_b_metcalf on 27 May 2010 05:29

On May 27, 4:37 am, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote:
> I am writing a code using operator overloading feature of F90/95. The
> basic math is to replace all real(8) with two real(8) contained in a
> type named DUAL_NUM, and overloading the corresponding calculations
> according to newly defined data. Based on the coding, the computing
> should not more than three times more than computing for real(8).
> However, my simple test shows that computing for DUAL_NUM is almost
> nine times more expensive. I hope some of your knowledgable Fortran
> experts can help me figure out the loss of efficiency and how can I
> make the code more efficient. Thanks alot!
>
> TYPE,PUBLIC:: DUAL_NUM
> REAL(8)::x_ad_
> REAL(8)::xp_ad_
> END TYPE DUAL_NUM
>
> PUBLIC OPERATOR (+)
> INTERFACE OPERATOR (+)
> MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL
> END INTERFACE
>
> PUBLIC OPERATOR (*)
> INTERFACE OPERATOR (*)
> MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL
> END INTERFACE
>
> ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res)
> TYPE (DUAL_NUM), INTENT(IN)::u,v
> TYPE (DUAL_NUM)::res
> res%x_ad_ = u%x_ad_+v%x_ad_
> res%xp_ad_ = u%xp_ad_+v%xp_ad_
> END FUNCTION ADD_DD
>
> ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res)
> TYPE (DUAL_NUM), INTENT(IN)::u,v
> TYPE (DUAL_NUM)::res
> res%x_ad_ = u%x_ad_*v%x_ad_
> res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_
> END FUNCTION MULT_DD
>
> The segment of the original code:
> REAL(8):: x, y, z,f
> x=1.0d0;y=2.0d0;z=0.3d0
>
> !**********************************
> DO i=1,50000000
> f=x-y*z
> ENDDO
> !**********************************
>
> The do loop runs for 0.516 seconds.
>
> The corresponding overloaded code:
> TYPE(DUAL_NUM):: x,y,z,f
>
> x=DUAL_NUM(1.0d0,1.0D0);
> y=DUAL_NUM(2.0d0,1.0D0);
> z=DUAL_NUM(0.3d0,0.0D0)
>
> !**********************************
> DO i=1,50000000
> f=X-y*z
> ENDDO
> !*********************************
> The do loop runs for 4.513 seconds.
>
> Supposedly, for DUAL_NUM, the operations needed for minus are twice as
> those needed for REAL, and the operations needed for times are thrice
> as those needed for REAL. That is the time needed for computation
> should not be more than three times of computation for real. However,
> the overall time is almost nine times more. What else takes more time?

As Steve has observed, an optimizing compiler recognizes that the
result that you're calculating is unused, and so removes the
calculation completely. If I take your code, fix it up, add a line
'ftot = ftot - f' into each loop, add a 'print*, ftot', and increase
the number of iterations by a factor of ten, then both versions run in
1.0s on my machine @ 2GHz using the Intel compiler with full
optimization. For what it's worth.

Regards,

Mike Metcalf

| Next | Last
Pages: 1 2 3 4 5 6
Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer