From: Hifi-Comp on 19 Jun 2010 00:52 On Jun 19, 12:24 am, steve <kar...(a)comcast.net> wrote: > On Jun 18, 8:22 pm, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote: > > > It is a revisit a problem I posted some time ago. However, the problem > > is not completely resolved. > > I have a code as follows to test the efficiency of OO. > > Oddly, you haven't shown a -O0 result. > > (Code elided) > > > > > > > When I put all the code in one single file named test.f90 and compile > > it using gfortran -O3 -ffast-math -march=native -fwhole-file > > test.f90, I obtain excellent efficience for operator overload: > > Analysis runs for 0.141 sec and DNAD runs for 0.062 sec. > > However However, when I split it into three separate files (put > > program test in main.f90, CPUtime model in CPUtime.f90, and DNAD > > module in DNAD.f90), and use the following series of command: > > gfortran -c -O3 -ffast-math -march=native -fwhole-file CPUtime.f90 > > gfortran -c -O3 -ffast-math -march=native -fwhole-file DNAD.f90 > > gfortran -c -O3 -ffast-math -march=native -fwhole-file main.f90 > > > gfortran -o -O3 -ffast-math -march=native -fwhole-file CPUtime.o > > DNAD.o main.o > > > I lost much of the efficiency, now the Analysis runs for 0.156 sec and > > DNAD runs for 1.25 sec. > > > Any hints on how to optimize the multiple file code is greatly > > appreciated. > > Read the documentation? > > Hopefully, google-group does mess up the formatting. > > % cat run > #! /bin/csh > > echo "Case 1" > gfc4x -O3 -march=native -fwhole-program -ffast-math \ > -funroll-loops -ftree-vectorize -o z a.f90 > ./z > echo "Case 2" > gfc4x -O3 -march=native -fwhole-file -ffast-math \ > -funroll-loops -ftree-vectorize -c c.f90 > gfc4x -O3 -march=native -fwhole-file -ffast-math \ > -funroll-loops -ftree-vectorize -c d.f90 > gfc4x -O3 -march=native -fwhole-program -ffast-math \ > -funroll-loops -ftree-vectorize -o z b.f90 d.o c.o > ./z > > echo "Case 3" > gfc4x -flto -O3 -march=native -fwhole-file -ffast-math \ > -funroll-loops -ftree-vectorize -c c.f90 > gfc4x -flto -O3 -march=native -fwhole-file -ffast-math \ > -funroll-loops -ftree-vectorize -c d.f90 > gfc4x -flto -O3 -march=native -fwhole-program -ffast-math \ > -funroll-loops -ftree-vectorize -o z b.f90 d.o c.o > ./z > > % ./run > Case 1 > Analysis Runs for 7.50000030E-02 Seconds. > -19999999.999023605 > DNAD Runs for 7.59999976E-02 Seconds. > -19999999.999023605 24999999.999999996 > Case 2 > Analysis Runs for 7.50000030E-02 Seconds. > -19999999.999023605 > DNAD Runs for 1.5360000 Seconds. > -19999999.999023605 24999999.999999996 > Case 3 > Analysis Runs for 7.59999976E-02 Seconds. > -19999999.999023605 > DNAD Runs for 7.50000030E-02 Seconds. > -19999999.999023605 24999999.999999996 > > -- > steve- Hide quoted text - > > - Show quoted text - I am using gfortran on windows. -flto is not available for win32 gcc version 4.6.0 20100524.
From: Hifi-Comp on 18 Jun 2010 23:22 It is a revisit a problem I posted some time ago. However, the problem is not completely resolved. I have a code as follows to test the efficiency of OO. MODULE CPUTime IMPLICIT NONE PRIVATE PUBLIC TIC, TOC INTEGER::start, rate, finish CONTAINS SUBROUTINE TIC CALL SYSTEM_CLOCK(start,rate) END SUBROUTINE TIC FUNCTION TOC() RESULT(sec) REAL::sec CALL SYSTEM_CLOCK(finish) IF(finish>start) THEN sec=REAL(finish-start)/REAL(rate) ELSE sec=0.0 ENDIF END FUNCTION TOC END MODULE CPUTime MODULE DNAD IMPLICIT NONE PRIVATE TYPE,PUBLIC:: DUAL_NUM REAL(8)::x_ad_ REAL(8)::xp_ad_ END TYPE DUAL_NUM PUBLIC OPERATOR (-) INTERFACE OPERATOR (-) MODULE PROCEDURE MINUS_DD END INTERFACE PUBLIC OPERATOR (*) INTERFACE OPERATOR (*) MODULE PROCEDURE MULT_DD END INTERFACE PUBLIC OPERATOR (/) INTERFACE OPERATOR (/) MODULE PROCEDURE DIV_DD END INTERFACE CONTAINS ELEMENTAL FUNCTION MINUS_DD(u,v) RESULT(res) TYPE (DUAL_NUM), INTENT(IN)::u,v TYPE (DUAL_NUM)::res res%x_ad_ = u%x_ad_-v%x_ad_ res%xp_ad_= u%xp_ad_-v%xp_ad_ END FUNCTION MINUS_DD ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res) TYPE (DUAL_NUM), INTENT(IN)::u,v TYPE (DUAL_NUM)::res res%x_ad_ = u%x_ad_*v%x_ad_ res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_ END FUNCTION MULT_DD ELEMENTAL FUNCTION DIV_DD(u,v) RESULT(res) TYPE (DUAL_NUM), INTENT(IN)::u,v REAL(8)::tmp TYPE (DUAL_NUM)::res tmp=1.D0/v%x_ad_ res%x_ad_ = u%x_ad_*tmp res%xp_ad_ =(u%xp_ad_- res%x_ad_*v%xp_ad_)*tmp END FUNCTION DIV_DD END MODULE DNAD PROGRAM Test USE DNAD USE CPUTime IMPLICIT NONE REAL(8):: x_,y_,z_,f_,ftot_ TYPE(DUAL_NUM):: x,y,z,f,ftot INTEGER:: I x_=1.0d0;y_=2.0d0;z_=0.3d0 ftot_=0.0d0 CALL TIC DO i=1,50000000 f_=x_-y_*z_/x_ ftot_ = ftot_ - f_ ENDDO WRITE(*,*)'Analysis Runs for ', TOC(),' Seconds.' write(*,*)ftot_ x=DUAL_NUM(1.0d0,0.1D0);y=DUAL_NUM(2.0d0,0.2D0);z=DUAL_NUM(0.3d0,0.3D0) ftot=DUAL_NUM(0.0d0,0.0D0) CALL TIC DO i=1,50000000 f=x-y*z/x ftot = ftot - f ENDDO WRITE(*,*)'DNAD Runs for ', TOC(),' Seconds.' write(*,*)ftot END PROGRAM Test When I put all the code in one single file named test.f90 and compile it using gfortran -O3 -ffast-math -march=native -fwhole-file test.f90, I obtain excellent efficience for operator overload: Analysis runs for 0.141 sec and DNAD runs for 0.062 sec. However However, when I split it into three separate files (put program test in main.f90, CPUtime model in CPUtime.f90, and DNAD module in DNAD.f90), and use the following series of command: gfortran -c -O3 -ffast-math -march=native -fwhole-file CPUtime.f90 gfortran -c -O3 -ffast-math -march=native -fwhole-file DNAD.f90 gfortran -c -O3 -ffast-math -march=native -fwhole-file main.f90 gfortran -o -O3 -ffast-math -march=native -fwhole-file CPUtime.o DNAD.o main.o I lost much of the efficiency, now the Analysis runs for 0.156 sec and DNAD runs for 1.25 sec. Any hints on how to optimize the multiple file code is greatly appreciated.
From: yaqi on 18 Jun 2010 23:57 On Jun 18, 9:22 pm, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote: > It is a revisit a problem I posted some time ago. However, the problem > is not completely resolved. > I have a code as follows to test the efficiency of OO. > > MODULE CPUTime > IMPLICIT NONE > PRIVATE > PUBLIC TIC, TOC > INTEGER::start, rate, finish > CONTAINS > SUBROUTINE TIC > CALL SYSTEM_CLOCK(start,rate) > END SUBROUTINE TIC > > FUNCTION TOC() RESULT(sec) > REAL::sec > CALL SYSTEM_CLOCK(finish) > IF(finish>start) THEN > sec=REAL(finish-start)/REAL(rate) > ELSE > sec=0.0 > ENDIF > END FUNCTION TOC > END MODULE CPUTime > > MODULE DNAD > IMPLICIT NONE > PRIVATE > > TYPE,PUBLIC:: DUAL_NUM > REAL(8)::x_ad_ > REAL(8)::xp_ad_ > END TYPE DUAL_NUM > > PUBLIC OPERATOR (-) > INTERFACE OPERATOR (-) > MODULE PROCEDURE MINUS_DD > END INTERFACE > > PUBLIC OPERATOR (*) > INTERFACE OPERATOR (*) > MODULE PROCEDURE MULT_DD > END INTERFACE > > PUBLIC OPERATOR (/) > INTERFACE OPERATOR (/) > MODULE PROCEDURE DIV_DD > END INTERFACE > > CONTAINS > ELEMENTAL FUNCTION MINUS_DD(u,v) RESULT(res) > TYPE (DUAL_NUM), INTENT(IN)::u,v > TYPE (DUAL_NUM)::res > res%x_ad_ = u%x_ad_-v%x_ad_ > res%xp_ad_= u%xp_ad_-v%xp_ad_ > END FUNCTION MINUS_DD > > ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res) > TYPE (DUAL_NUM), INTENT(IN)::u,v > TYPE (DUAL_NUM)::res > res%x_ad_ = u%x_ad_*v%x_ad_ > res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_ > END FUNCTION MULT_DD > > ELEMENTAL FUNCTION DIV_DD(u,v) RESULT(res) > TYPE (DUAL_NUM), INTENT(IN)::u,v > REAL(8)::tmp > TYPE (DUAL_NUM)::res > tmp=1.D0/v%x_ad_ > res%x_ad_ = u%x_ad_*tmp > res%xp_ad_ =(u%xp_ad_- res%x_ad_*v%xp_ad_)*tmp > END FUNCTION DIV_DD > END MODULE DNAD > > PROGRAM Test > USE DNAD > USE CPUTime > IMPLICIT NONE > REAL(8):: x_,y_,z_,f_,ftot_ > TYPE(DUAL_NUM):: x,y,z,f,ftot > INTEGER:: I > > x_=1.0d0;y_=2.0d0;z_=0.3d0 > ftot_=0.0d0 > CALL TIC > > DO i=1,50000000 > f_=x_-y_*z_/x_ > ftot_ = ftot_ - f_ > > ENDDO > WRITE(*,*)'Analysis Runs for ', TOC(),' Seconds.' > write(*,*)ftot_ > > x=DUAL_NUM(1.0d0,0.1D0);y=DUAL_NUM(2.0d0,0.2D0);z=DUAL_NUM(0.3d0,0.3D0) > ftot=DUAL_NUM(0.0d0,0.0D0) > CALL TIC > > DO i=1,50000000 > f=x-y*z/x > ftot = ftot - f > > ENDDO > WRITE(*,*)'DNAD Runs for ', TOC(),' Seconds.' > write(*,*)ftot > END PROGRAM Test > > When I put all the code in one single file named test.f90 and compile > it using gfortran -O3 -ffast-math -march=native -fwhole-file > test.f90, I obtain excellent efficience for operator overload: > Analysis runs for 0.141 sec and DNAD runs for 0.062 sec. > > However However, when I split it into three separate files (put > program test in main.f90, CPUtime model in CPUtime.f90, and DNAD > module in DNAD.f90), and use the following series of command: > gfortran -c -O3 -ffast-math -march=native -fwhole-file CPUtime.f90 > gfortran -c -O3 -ffast-math -march=native -fwhole-file DNAD.f90 > gfortran -c -O3 -ffast-math -march=native -fwhole-file main.f90 > > gfortran -o -O3 -ffast-math -march=native -fwhole-file CPUtime.o > DNAD.o main.o > > I lost much of the efficiency, now the Analysis runs for 0.156 sec and > DNAD runs for 1.25 sec. > > Any hints on how to optimize the multiple file code is greatly > appreciated. Hi Hifi-Comp, I tested your code with Intel Visual Fortran. It matters when I turn the Interprocedural optimization to Multi-file (/Qipo). Single file optimization does not give the good performance. With /Qipo, time is 0.125s, without, 2.45s. Not quite sure if gfortran can do the similar thing. If not, you may consider to switch to another compiler. Anyway we are assured this optimization can be done by compilers. yaqi
From: glen herrmannsfeldt on 19 Jun 2010 00:00 Hifi-Comp <wenbinyu.heaven(a)gmail.com> wrote: (big snip) > When I put all the code in one single file named test.f90 and compile > it using gfortran -O3 -ffast-math -march=native -fwhole-file > test.f90, I obtain excellent efficience for operator overload: > Analysis runs for 0.141 sec and DNAD runs for 0.062 sec. > However However, when I split it into three separate files (put > program test in main.f90, CPUtime model in CPUtime.f90, and DNAD (snip) > I lost much of the efficiency, now the Analysis runs for 0.156 sec and > DNAD runs for 1.25 sec. > Any hints on how to optimize the multiple file code is greatly > appreciated. There is a story about a guy who goes to the doctor, complaining that it hurts if I go like this, what should I do? The doctor says, don't go like that. Optimizing over the whole program allows it to inline the call. Without it, there is at least the subroutine call overhead. There is no way to inline the called routine if the compiler can't see it at compile time, no matter how many times you ask. Some calling conventions are more efficient than others, but none are more efficient than not doing a call. -- glen
From: steve on 19 Jun 2010 00:24 On Jun 18, 8:22 pm, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote: > It is a revisit a problem I posted some time ago. However, the problem > is not completely resolved. > I have a code as follows to test the efficiency of OO. Oddly, you haven't shown a -O0 result. (Code elided) > When I put all the code in one single file named test.f90 and compile > it using gfortran -O3 -ffast-math -march=native -fwhole-file > test.f90, I obtain excellent efficience for operator overload: > Analysis runs for 0.141 sec and DNAD runs for 0.062 sec. > However However, when I split it into three separate files (put > program test in main.f90, CPUtime model in CPUtime.f90, and DNAD > module in DNAD.f90), and use the following series of command: > gfortran -c -O3 -ffast-math -march=native -fwhole-file CPUtime.f90 > gfortran -c -O3 -ffast-math -march=native -fwhole-file DNAD.f90 > gfortran -c -O3 -ffast-math -march=native -fwhole-file main.f90 > > gfortran -o -O3 -ffast-math -march=native -fwhole-file CPUtime.o > DNAD.o main.o > > I lost much of the efficiency, now the Analysis runs for 0.156 sec and > DNAD runs for 1.25 sec. > > Any hints on how to optimize the multiple file code is greatly > appreciated. Read the documentation? Hopefully, google-group does mess up the formatting. % cat run #! /bin/csh echo "Case 1" gfc4x -O3 -march=native -fwhole-program -ffast-math \ -funroll-loops -ftree-vectorize -o z a.f90 ../z echo "Case 2" gfc4x -O3 -march=native -fwhole-file -ffast-math \ -funroll-loops -ftree-vectorize -c c.f90 gfc4x -O3 -march=native -fwhole-file -ffast-math \ -funroll-loops -ftree-vectorize -c d.f90 gfc4x -O3 -march=native -fwhole-program -ffast-math \ -funroll-loops -ftree-vectorize -o z b.f90 d.o c.o ../z echo "Case 3" gfc4x -flto -O3 -march=native -fwhole-file -ffast-math \ -funroll-loops -ftree-vectorize -c c.f90 gfc4x -flto -O3 -march=native -fwhole-file -ffast-math \ -funroll-loops -ftree-vectorize -c d.f90 gfc4x -flto -O3 -march=native -fwhole-program -ffast-math \ -funroll-loops -ftree-vectorize -o z b.f90 d.o c.o ../z % ./run Case 1 Analysis Runs for 7.50000030E-02 Seconds. -19999999.999023605 DNAD Runs for 7.59999976E-02 Seconds. -19999999.999023605 24999999.999999996 Case 2 Analysis Runs for 7.50000030E-02 Seconds. -19999999.999023605 DNAD Runs for 1.5360000 Seconds. -19999999.999023605 24999999.999999996 Case 3 Analysis Runs for 7.59999976E-02 Seconds. -19999999.999023605 DNAD Runs for 7.50000030E-02 Seconds. -19999999.999023605 24999999.999999996 -- steve
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Commercial Fortran Compilers Next: preprocessing issue with Visual Studio |