From: Olivier Scalbert on 7 Sep 2009 03:45 Ludovic Brenta wrote: > Georg Bauhaus wrote on comp.lang.ada: >> Georg Bauhaus schrieb: >> >>> Ludovic Brenta schrieb: >>>> Apparently, passing unconstrained strings to procedure Write involves >>>> allocations on the secondary stack which account for 20% of the entire >>>> execution time. That's hot spot #1. >>> Indeed, and this particular hot spot had been cooled down twice: >>> Step 1 - we replaced Bounded_String with our own Bounded_String >>> Step 2 - we replaced this new Bounded_String with plain >>> constrained strings of suitable fixed length (using generics) >> I should add that the current program spends much of its time >> in equality comparison of fragment strings, >> and then some in the hash function. >> So not only are the bounded_strings gone; >> Jonathan has also contributed a highly efficient hashing >> function and a cute string equality function. >> >> (As mentioned, to actually see the effects (of the current >> program), String_Fragments."=" should be a renaming of Equals. >> Operator subprograms seem to confuse the profiling programs, >> or am I missing some setting?) > > So I gather that Olivier was profiling an old version of the program. > Correct? > > -- > Ludovic Brenta. Ooops, sorry for that ... Today I can provide profile for the last version on: - 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 (Debian 4.3.4-1) Can it help ? Olivier
From: Georg Bauhaus on 7 Sep 2009 05:19 Olivier Scalbert schrieb: > Today I can provide profile for the last version on: > - 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc > version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) > - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 > (Debian 4.3.4-1) > > Can it help ? Yes, as we have few measurements of what happens on 4core and 1core CPUs, and for GCC 4.3.3. If you like, arrange the task starts in different order: placing Work_On_1.Writer.Set (1) first seems to be a must. The following two (12, 18) have run longest.
From: Olivier Scalbert on 7 Sep 2009 09:31 Georg Bauhaus wrote: > Olivier Scalbert schrieb: > >> Today I can provide profile for the last version on: >> - 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc >> version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) >> - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 >> (Debian 4.3.4-1) >> >> Can it help ? > > Yes, as we have few measurements of what happens on 4core and 1core > CPUs, and for GCC 4.3.3. > > If you like, arrange the task starts in different order: > placing Work_On_1.Writer.Set (1) first seems to be a must. > The following two (12, 18) have run longest. Here are the results ! On 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) --------------------------------------- $ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run $ time ./knucleotide.gnat_run < fasta/fasta250m.dat real 0m14.607s user 0m32.798s sys 0m0.568s $ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes ../knucleotide.gnat_run < fasta/fasta25m.dat see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.32bits.1 -- On 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 (Debian 4.3.4-1) � $ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run $ time ./knucleotide.gnat_run < fasta/fasta250m.dat real 1m10.190s user 1m9.252s sys 0m0.724s $ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes ../knucleotide.gnat_run < fasta/fasta25m.dat see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.64bits.1 Olivier
From: jonathan on 7 Sep 2009 10:38 On Sep 7, 2:31 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com> wrote: > On 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4 > (Debian 4.3.4-1) µ > $ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run > $ time ./knucleotide.gnat_run < fasta/fasta250m.dat > real 1m10.190s > user 1m9.252s > sys 0m0.724s > $ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes > ./knucleotide.gnat_run < fasta/fasta25m.dat > > see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.64bits.1 > > Olivier This last test is worrisome. Are you sharing the machine with other processes? Here is what I get when I have 8-cores to myself (using GNAT 4.3.4 (GPL2009)): time ./knucleotide.gnat_run < /tmp/fasta250.dat real 0m6.647s user 0m17.273s sys 0m0.448s and here is what I get when I share with another (heavy) user of the machine: time ./knucleotide.gnat_run < /tmp/fasta250.dat real 0m25.475s user 0m24.766s sys 0m0.708s Jonathan
From: Olivier Scalbert on 7 Sep 2009 11:03
jonathan wrote: > This last test is worrisome. Are you sharing the machine with other > processes? Here is what I get when I have 8-cores to myself > (using GNAT 4.3.4 (GPL2009)): > > time ./knucleotide.gnat_run < /tmp/fasta250.dat > > real 0m6.647s > user 0m17.273s > sys 0m0.448s > > and here is what I get when I share with another (heavy) > user of the machine: > > time ./knucleotide.gnat_run < /tmp/fasta250.dat > > real 0m25.475s > user 0m24.766s > sys 0m0.708s > > > Jonathan My 64 bits machine is an "old" single core Athlon (512KB cache size). So perhaps it is normal ! Olivier |