Prev: Type of the target of an assignment shall not be abstract
Next: Is there an Ada compiler whoseAda.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) isefficient?
From: Colin Paul Gloster on 16 Feb 2010 11:50 On Mon, 15 Feb 2010, S. J. W. posted: |---------------------------------------------------------------------------| |"[..] | |> | |> You see now what's happening. With the gnatn switch the | |> compiler is smart enough to call the Log just once, rather | |> than 10**6 times. | |> | |> If you remove the -gnatn or -gnatN switches, then it runs in | |> 0m0.024s again. | | | |The trouble is that that benchmark does something other than Colin's!" | |---------------------------------------------------------------------------| That is not the problem. The code which I posted at the beginning of this thread was not a means in itself, but was intended for timing performances of implementations of logarithm functions in the base of ten in a manner representative of real code which I use. The real code is not dedicated to calculating something approximately equal to 6.3E+08. I could have written 500 * 1_000_000 calls or 3.14 * 1000 calls or a single call. A single call might have been overwhelmed by overhead unrelated to the logarithm function. In the case of the C++ version when using a particular compilation switch, I failed in the task because the hardcoded arguments I provided resulted in a trivial and dramatic optimization which would not happen in the real code. While it is unfortunate for Ada code in general that Ada compilers fail to mimic this optimization of G++'s, that particular optimization would not benefit the usage of logarithms in the real code I mentioned. Dr. Jonathan Parker is free to pursue this problem in a subthread or with vendors. |---------------------------------------------------------------------------| |"This might be a more accurate translation: | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with Text_IO; use Text_IO; | |procedure Log_Bench_0 is | | type Real is digits 15; | | package Math is new Ada.Numerics.Generic_Elementary_Functions | |(Real); | | use Math; | | Answer : Real := 0.0; | | Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; | |begin | | for I in 1 .. 1_000_000 loop | | declare | | X : Real := 0.1; | | begin | | for J in 1 .. 500 loop | | Answer := Answer + Log_Base_10_Of_E * Log (X); | | X := X + 0.1; | | end loop; | | end; | | end loop; | | Put (Real'Image(Answer)); | |end Log_Bench_0; | | | |I've tried inlining GNAT's implementation (GCC 4.5.0, x86_64-aqpple- | |darwin10.2.0) and even just calling up the C log10 routine using an | |inline. None was very impressive compared to the g++ result. | | | |Colin's time: 37s | |Jonathan's time (-O3 -ffast-math -gnatp): 16s | |Jonathan;s time (-O3 -ffast-math -gnatp -gnatN -funroll-loops): 14s | |Jonathan's time (same opts, but using C log10()): 11s" | |---------------------------------------------------------------------------| That ordering does not necessarily hold... GCC4.2.4... gnatmake -O3 -ffast-math -gnatp Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp time ./Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m14.328s user 0m14.329s sys 0m0.000s gnatmake -O3 -ffast-math -gnatp -gnatN -funroll-loops Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops time ./Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m14.346s user 0m14.341s sys 0m0.004s GCC4.4.3 (slower than GCC4.2.4 for this program)... gnatmake -O3 -ffast-math Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m14.713s user 0m14.689s sys 0m0.000s gnatmake -O3 -ffast-math -gnatp Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m14.691s user 0m14.693s sys 0m0.000s gnatmake -O3 -ffast-math -gnatp -gnatN -funroll-loops Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m14.690s user 0m14.689s sys 0m0.000s |---------------------------------------------------------------------------| |"so we still have 3 orders of magnitude to go to get to the g++ result: | |0.02s | | | |This is my final version, with the inlined GNAT implementation too: | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with System.Machine_Code; use System.Machine_Code; | |with Text_IO; use Text_IO; | |procedure Log_Bench is | | type Real is digits 15; | | package Math is new Ada.Numerics.Generic_Elementary_Functions | |(Real); | | use Math; | | Answer : Real := 0.0; | | Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; | | function LogM (X : Real) return Real; | | pragma Inline_Always (LogM); | | function LogM (X : Real) return Real is | | Result : Real; | | NL : constant String := ASCII.LF & ASCII.HT; | | begin | | Asm (Template => | | "fldln2 " & NL | | & "fxch " & NL | | & "fyl2x " & NL, | | Outputs => Real'Asm_Output ("=t", Result), | | Inputs => Real'Asm_Input ("0", X)); | | return Result; | | end LogM; | | function LogL (X : Real) return Real; | | pragma Import (C, LogL, "log10"); | |begin | | for I in 1 .. 1_000_000 loop | | declare | | X : Real := 0.1; | | begin | | for J in 1 .. 500 loop | |-- Answer := Answer + Log_Base_10_Of_E * LogM (X); | | Answer := Answer + LogL (X); | | X := X + 0.1; | | end loop; | | end; | | end loop; | | Put (Real'Image(Answer)); | |end Log_Bench; | | | |[..]" | |---------------------------------------------------------------------------| Not all of those switches would yield fair proxies for timings of logarithms in the real code which inspired this thread, but anyway... 64bit GCC4.2.4... gnatmake -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math 6.34086408606382E+08 real 0m34.497s user 0m34.494s sys 0m0.004s gnatmake -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m34.503s user 0m34.506s sys 0m0.000s gnatmake -gnatN -funroll-loops -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m34.547s user 0m34.546s sys 0m0.004s 64bit GCC4.4.3... gnatmake -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m34.257s user 0m34.258s sys 0m0.000s gnatmake -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m34.474s user 0m34.478s sys 0m0.000s gnatmake -gnatN -funroll-loops -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m34.188s user 0m34.182s sys 0m0.004s
From: Colin Paul Gloster on 16 Feb 2010 12:33 On Mon, 15 Feb 2010, William Findlay posted: |------------------------------------------------------------------------| |"On 15/02/2010 10:58, in article | |alpine.LNX.2.00.1002151055530.17315(a)Bluewhite64.example.net, "Colin Paul| |Gloster" <Colin_Paul_Gloster(a)ACM.org> wrote: | | | |> Of the two programs shown, the fastest C++ implementation on one test | |> platform took less than one millisecond and the fastest Ada | |> implementation took one minute and 31 seconds and 874 milliseconds on | |> the same platform. Both g++ and gnatmake were from the same | |> installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). | | | |Is that 1 millisecond for 1e6 calls?" | |------------------------------------------------------------------------| No, that was less than one millisecond for 500 * 10**6 C++ calls. |------------------------------------------------------------------------| |" This implies 1ns per call in C++. | |I find it incredible that a log function could be so fast. | |I think the loop body must be evaluated at compile-time in C++." | |------------------------------------------------------------------------| The C++ compiler did manage to eliminate almost everything. |------------------------------------------------------------------------| |"On my system your Ada code gives: | | | |6.34086408536266E+08 | | | |real 0m33.918s | |user 0m33.864s | |sys 0m0.025s | | | |And your original C++ code gives: | | | |6.34086e+08 | |real 0m0.110s | |user 0m0.003s | |sys 0m0.003s | | | |But if I replace the C++ loop body by: | | | | for(int j=1; j<=500; ++j) | | answer += std::log10(j*0.100000000000000000000); | |It now gives: | | | |6.34086e+08 | |real 0m18.112s | |user 0m18.082s | |sys 0m0.015s | | | |This less than twice as fast as the more generalized Ada code. | | | |[..]" | |------------------------------------------------------------------------| Thank you for exposing this flaw in the C++ code. with Ada.Numerics.Generic_Elementary_Functions; with Interfaces.C; with Ada.Text_IO; procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop is answer : Interfaces.C.double := 0.0; package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double); begin for I in 1 .. 1_000_000 loop for J in 1 .. 500 loop answer := Interfaces.C."+"( answer, double_library.log( Interfaces.C."*"( Interfaces.C.double(J), 0.100000000000000000000 ) , 10.0 ) ); end loop; end loop; double_output_library.Put(answer); end; gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m31.091s user 0m31.090s sys 0m0.004s time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m31.094s user 0m31.094s sys 0m0.004s gnatmake -O3 Logarithmic_Work_In_Ada_with_a_Findlay_loop.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3 6.34086408606382E+08 real 0m31.388s user 0m31.378s sys 0m0.008s g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m38.388s user 0m38.390s sys 0m0.000s time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m38.547s user 0m38.546s sys 0m0.000s g++ -O3 logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3 time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3 6.34086e+08 real 0m38.428s user 0m38.426s sys 0m0.004s with Ada.Numerics.Generic_Elementary_Functions; with Interfaces.C; with Ada.Text_IO; procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism is Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm : constant Interfaces.C.Double := 0.434_294_481_903_251_827_651_129; answer : Interfaces.C.double := 0.0; package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double); begin for I in 1 .. 1_000_000 loop for J in 1 .. 500 loop answer := Interfaces.C."+" ( answer, Interfaces.C."*" ( double_library.Log ( Interfaces.C."*" ( Interfaces.C.double(J), 0.100000000000000000000 ) ) , Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm ) ); end loop; end loop; double_output_library.Put(answer); end; gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m14.434s user 0m14.433s sys 0m0.004s -bash bluewhite64 /home/Colin_Paul/logarithms $ -bash bluewhite64 /home/Colin_Paul/logarithms $ gnatmake -O3 Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3 time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3 6.34086408606382E+08 real 0m14.450s user 0m14.453s sys 0m0.000s
From: Jeffrey R. Carter on 16 Feb 2010 14:01 Colin Paul Gloster wrote: > > |---------------------------------------------------------------------------------| > |"Note that suppressing runtime checks (-gnatp) is needed to be sort of equivalent| > |to C++." | > |---------------------------------------------------------------------------------| > > Thanks for the tip, but I do not program in Ada to really program in > C++ with Ada syntax. I would hope not. But when comparing execution times between Ada and a language like C++, it's important not to try to compare apples to lugnuts. -- Jeff Carter "I don't know why I ever come in here. The flies get the best of everything." Never Give a Sucker an Even Break 102
From: Colin Paul Gloster on 17 Feb 2010 05:25 On Tue, 16 Feb 2010, Jeffrey R. Carter sent: |------------------------------------------------------------------| |"[..] | | | |[..] when comparing execution times between Ada and a language | |like C++, it's important not to try to compare apples to lugnuts."| |------------------------------------------------------------------| Fair enough, but when I say Ada is better than C++ I am not comparing an apple with an apple. Anyway, as I mentioned in news:alpine.LNX.2.00.1002161654110.21651(a)Bluewhite64.example.net in response to Bill Findlay, G++ has produced much slower code than GNAT (the GNATism is in standard Ada, merely the obvious way to do it in standard Ada is different)... gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m14.434s user 0m14.433s sys 0m0.004s g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m38.388s user 0m38.390s sys 0m0.000s
From: Colin Paul Gloster on 24 Feb 2010 05:07
On Tue, 16 Feb 2010, Colin Paul Gloster alleged: |------------------------------------------------------------------------------------------------------------------------| |"[..] | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with Interfaces.C; | |with Ada.Text_IO; | |procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism is | | Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm : constant | |Interfaces.C.Double := 0.434_294_481_903_251_827_651_129; | | | | answer : Interfaces.C.double := 0.0; | | package double_library is new | |Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); | | package double_output_library is new | |Ada.Text_IO.Float_IO(Interfaces.C.double); | |begin | | | | for I in 1 .. 1_000_000 loop | | for J in 1 .. 500 loop | | answer := Interfaces.C."+" | | ( | | answer, Interfaces.C."*" | | ( | | double_library.Log | | ( | | Interfaces.C."*" | | ( | | Interfaces.C.double(J),| | 0.100000000000000000000| | ) | | ) | | , | | Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm | | ) | | ); | | end loop; | | end loop; | | | | double_output_library.Put(answer); | |end; | | | |[..]" | |------------------------------------------------------------------------------------------------------------------------| Actually this is not a GNATism. I have noticed that "*"(Left=>variable, Right=>Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm) is faster than log(X=>variable, Base=>10.0) on a number of other compilers. |