Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer
From: Hifi-Comp on 26 May 2010 22:37 I am writing a code using operator overloading feature of F90/95. The basic math is to replace all real(8) with two real(8) contained in a type named DUAL_NUM, and overloading the corresponding calculations according to newly defined data. Based on the coding, the computing should not more than three times more than computing for real(8). However, my simple test shows that computing for DUAL_NUM is almost nine times more expensive. I hope some of your knowledgable Fortran experts can help me figure out the loss of efficiency and how can I make the code more efficient. Thanks alot! TYPE,PUBLIC:: DUAL_NUM REAL(8)::x_ad_ REAL(8)::xp_ad_ END TYPE DUAL_NUM PUBLIC OPERATOR (+) INTERFACE OPERATOR (+) MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL END INTERFACE PUBLIC OPERATOR (*) INTERFACE OPERATOR (*) MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL END INTERFACE ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res) TYPE (DUAL_NUM), INTENT(IN)::u,v TYPE (DUAL_NUM)::res res%x_ad_ = u%x_ad_+v%x_ad_ res%xp_ad_ = u%xp_ad_+v%xp_ad_ END FUNCTION ADD_DD ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res) TYPE (DUAL_NUM), INTENT(IN)::u,v TYPE (DUAL_NUM)::res res%x_ad_ = u%x_ad_*v%x_ad_ res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_ END FUNCTION MULT_DD The segment of the original code: REAL(8):: x, y, z,f x=1.0d0;y=2.0d0;z=0.3d0 !********************************** DO i=1,50000000 f=x-y*z ENDDO !********************************** The do loop runs for 0.516 seconds. The corresponding overloaded code: TYPE(DUAL_NUM):: x,y,z,f x=DUAL_NUM(1.0d0,1.0D0); y=DUAL_NUM(2.0d0,1.0D0); z=DUAL_NUM(0.3d0,0.0D0) !********************************** DO i=1,50000000 f=X-y*z ENDDO !********************************* The do loop runs for 4.513 seconds. Supposedly, for DUAL_NUM, the operations needed for minus are twice as those needed for REAL, and the operations needed for times are thrice as those needed for REAL. That is the time needed for computation should not be more than three times of computation for real. However, the overall time is almost nine times more. What else takes more time?
From: glen herrmannsfeldt on 27 May 2010 00:35 Hifi-Comp <wenbinyu.heaven(a)gmail.com> wrote: > I am writing a code using operator overloading feature of F90/95. The > basic math is to replace all real(8) with two real(8) contained in a > type named DUAL_NUM, and overloading the corresponding calculations > according to newly defined data. Based on the coding, the computing > should not more than three times more than computing for real(8). > However, my simple test shows that computing for DUAL_NUM is almost > nine times more expensive. I hope some of your knowledgable Fortran > experts can help me figure out the loss of efficiency and how can I > make the code more efficient. Thanks alot! (snip of code) I will guess that it is function call overhead. Also, the function call might flush any pipelining that the processor might otherwise be doing to speed up the calculation. If you are really in a hurry, you might try to do it with C preprocessor macros instead of operator overloading. That is a little less convenient (you write it like function calls), but maybe not so bad. -- glen
From: Richard Maine on 27 May 2010 01:13 glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote: > Hifi-Comp <wenbinyu.heaven(a)gmail.com> wrote: > > > I am writing a code using operator overloading feature of F90/95. The > > basic math is to replace all real(8) with two real(8) contained in a > > type named DUAL_NUM, and overloading the corresponding calculations > > according to newly defined data. Based on the coding, the computing > > should not more than three times more than computing for real(8). > > However, my simple test shows that computing for DUAL_NUM is almost > > nine times more expensive. I hope some of your knowledgable Fortran > > experts can help me figure out the loss of efficiency and how can I > > make the code more efficient. Thanks alot! > I will guess that it is function call overhead. I'd guess the same (not having really studied the code much, but it seem slikely on the surface). Estimating computation time by counting floatting point operations was once at least reasonable as a first guess. But "once" means several decades ago. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: steve on 27 May 2010 01:38 On May 26, 10:13 pm, nos...(a)see.signature (Richard Maine) wrote: > glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote: > > Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote: > > > > I am writing a code using operator overloading feature of F90/95. The > > > basic math is to replace all real(8) with two real(8) contained in a > > > type named DUAL_NUM, and overloading the corresponding calculations > > > according to newly defined data. Based on the coding, the computing > > > should not more than three times more than computing for real(8). > > > However, my simple test shows that computing for DUAL_NUM is almost > > > nine times more expensive. I hope some of your knowledgable Fortran > > > experts can help me figure out the loss of efficiency and how can I > > > make the code more efficient. Thanks alot! > > I will guess that it is function call overhead. > > I'd guess the same (not having really studied the code much, but it seem > slikely on the surface). Estimating computation time by counting > floatting point operations was once at least reasonable as a first > guess. But "once" means several decades ago. > Once I fix the code and add the missing pieces, I get a timing of 2 microseconds for the do-loop involving the REAL(8) entities and 1 microsecond for the do-loop with the user defined type. laptop:kargl[205] gfc4x -o z -O3 -fwhole-program gh.f90 laptop:kargl[206] ./z 2.00001523E-06 1.00000761E-06 The first number is the REAL(8) loop timing in seconds and the 2nd is the user defined type loop. gfc4x is gfortran from gcc version 4.6.0 20100520. Basically, common sub-expression eliminations lifts the expression outside the loops. The loops are then eliminated because they are empty loops. -- steve
From: m_b_metcalf on 27 May 2010 05:29 On May 27, 4:37 am, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote: > I am writing a code using operator overloading feature of F90/95. The > basic math is to replace all real(8) with two real(8) contained in a > type named DUAL_NUM, and overloading the corresponding calculations > according to newly defined data. Based on the coding, the computing > should not more than three times more than computing for real(8). > However, my simple test shows that computing for DUAL_NUM is almost > nine times more expensive. I hope some of your knowledgable Fortran > experts can help me figure out the loss of efficiency and how can I > make the code more efficient. Thanks alot! > > TYPE,PUBLIC:: DUAL_NUM > REAL(8)::x_ad_ > REAL(8)::xp_ad_ > END TYPE DUAL_NUM > > PUBLIC OPERATOR (+) > INTERFACE OPERATOR (+) > MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL > END INTERFACE > > PUBLIC OPERATOR (*) > INTERFACE OPERATOR (*) > MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL > END INTERFACE > > ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res) > TYPE (DUAL_NUM), INTENT(IN)::u,v > TYPE (DUAL_NUM)::res > res%x_ad_ = u%x_ad_+v%x_ad_ > res%xp_ad_ = u%xp_ad_+v%xp_ad_ > END FUNCTION ADD_DD > > ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res) > TYPE (DUAL_NUM), INTENT(IN)::u,v > TYPE (DUAL_NUM)::res > res%x_ad_ = u%x_ad_*v%x_ad_ > res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_ > END FUNCTION MULT_DD > > The segment of the original code: > REAL(8):: x, y, z,f > x=1.0d0;y=2.0d0;z=0.3d0 > > !********************************** > DO i=1,50000000 > f=x-y*z > ENDDO > !********************************** > > The do loop runs for 0.516 seconds. > > The corresponding overloaded code: > TYPE(DUAL_NUM):: x,y,z,f > > x=DUAL_NUM(1.0d0,1.0D0); > y=DUAL_NUM(2.0d0,1.0D0); > z=DUAL_NUM(0.3d0,0.0D0) > > !********************************** > DO i=1,50000000 > f=X-y*z > ENDDO > !********************************* > The do loop runs for 4.513 seconds. > > Supposedly, for DUAL_NUM, the operations needed for minus are twice as > those needed for REAL, and the operations needed for times are thrice > as those needed for REAL. That is the time needed for computation > should not be more than three times of computation for real. However, > the overall time is almost nine times more. What else takes more time? As Steve has observed, an optimizing compiler recognizes that the result that you're calculating is unused, and so removes the calculation completely. If I take your code, fix it up, add a line 'ftot = ftot - f' into each loop, add a 'print*, ftot', and increase the number of iterations by a factor of ten, then both versions run in 1.0s on my machine @ 2GHz using the Intel compiler with full optimization. For what it's worth. Regards, Mike Metcalf
|
Next
|
Last
Pages: 1 2 3 4 5 6 Prev: passing a type of different KIND to a C function from Fortran ?binding Next: ALLOCATABLE array as MPI buffer |