Prev: Review of a safer memory management approach for C++?
Next: Static variable in template function
From: albert kao on 5 Jun 2010 05:50 I want to optimize the following small integer program in speed. My g++ compiler version is 4.3.4 on 32 bit linux. Please suggest or comment the following ideas. Some ideas are: 1. use compile flag such as -O3 2. rewrite the bigger function with function object #include <algorithm> #include <cstdlib> // for abs() using namespace std; const int N = 50; const int D = 20; bool bigger (int i) { return abs(i) >= D; } int main() { // initialize seq // ... adjacent_difference(seq, seq + N, diff); int count = count_if(diff + 1, diff + N, bigger); // ... return 0; } -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Saeed Amrollahi on 6 Jun 2010 07:15 On Jun 5, 11:50 pm, albert kao <albertk...(a)gmail.com> wrote: > I want to optimize the following small integer program in speed. > My g++ compiler version is 4.3.4 on 32 bit linux. > Please suggest or comment the following ideas. > Some ideas are: > 1. use compile flag such as -O3 > 2. rewrite the bigger function with function object > > #include <algorithm> > #include <cstdlib> // for abs() > using namespace std; > > const int N = 50; > const int D = 20; > bool bigger (int i) { return abs(i) >= D; } > > int main() > { > // initialize seq > // ... > adjacent_difference(seq, seq + N, diff); > int count = count_if(diff + 1, diff + N, bigger); > // ... > return 0; > > } { quoted banner removed; please do it yourself. -mod } Hi Albert 1. I compiled and ran the following program with O, O1, O2 and O3 optimization options: #include <algorithm> #include <cstdlib> #include <numeric> // for adjacent_difference #include <ctime> #include <iostream> using namespace std; const int N = 50 * 10000; const int D = 20; bool bigger(int i) { return abs(i) >= D; } int main() { clock_t t = clock(); // initialize sequence int seq[N]; for (int i = 0; i < N; ++i) seq[i] = rand(); int diff[N]; adjacent_difference(seq, seq + N, diff); int count = count_if(diff + 1, diff + N, bigger); cout << '\n' << count << '\n'; t = clock() - t; cout << t << '\n'; return 0; } You have to include <numeric>, also the N should be a large number like 50,000, for optimization make sense. As you see, I use the clock_t and time difference. Under gcc 4.4.0, the results for -O and -O1 is same and the results for -O2 and -O3 is equal. The -O2 and -O3 reduce the program execution time about %20 percents, so I think -O3 is the best option for optimizing your program. 2. the adjacent_difference and count_if algorithms aren't complex per se. The complexity of count_if is constant: last - first applications of predicate (in this case bigger() call). Also the complexity of adjacent_difference is constant: last -first + 1 applications of binary function. 3. There is no performance gain by rewriting bigger() with function object. Calling member function is almost as efficient as calling global function. Regards, -- Saeed Amrollahi -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Keith H Duggar on 6 Jun 2010 17:14 On Jun 6, 6:15 pm, Saeed Amrollahi <amrollahi.sa...(a)gmail.com> wrote: > On Jun 5, 11:50 pm, albert kao <albertk...(a)gmail.com> wrote: > 3. There is no performance gain by rewriting bigger() with function > object. Calling member function is almost as efficient as > calling global function. That is wrong. For most implementations of the STL you will find that function objects can be /faster/ than passing free function references/pointers. In the ops case struct Bigger { bool operator ( ) ( int i ) const { return abs(i) >= D ; } } ; ... count += count_if(diff + 1, diff + N, Bigger()); is likely to be much faster than bool bigger(int i) { return abs(i) >= D; } ... count += count_if(diff + 1, diff + N, bigger); Also, on many platforms struct Bigger { bool operator ( ) ( int i ) const { return i >= D || i <= - D ; } } ; will be faster still. Give it a try and you will see. By the way, if you put the count_if into a loop you will get better sampling and higher signal to noise. For example: int count = 0 ; for ( int k = 0 ; k < 1024 ; ++k ) { count += count_if(diff + 1, diff + N, Bigger()) ; } KHD -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Ulrich Eckhardt on 7 Jun 2010 00:02 Note up front: This has _very_ little to do with C++, as such things always depend on the actual implementation etc. albert kao wrote: > I want to optimize the following small integer program in speed. [...] > // initialize seq > // ... > adjacent_difference(seq, seq + N, diff); > int count = count_if(diff + 1, diff + N, bigger); You are computing a sequence of values from another sequence and then count the number of values in the generated sequence fitting certain criteria. If the sequences are large, a significant amount of time will be used by the CPU/MMU to transfer the values between RAM and the various cache levels. If you generate and count in one loop, the pressure on the memory IO system will be lower. In particular you then don't generate data, push it to RAM only to later load it again from RAM. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Saeed Amrollahi on 7 Jun 2010 00:09 On Jun 7, 11:14 am, Keith H Duggar <dug...(a)alum.mit.edu> wrote: > On Jun 6, 6:15 pm, Saeed Amrollahi <amrollahi.sa...(a)gmail.com> wrote: > > > On Jun 5, 11:50 pm, albert kao <albertk...(a)gmail.com> wrote: > > 3. There is no performance gain by rewriting bigger() with function > > object. Calling member function is almost as efficient as > > calling global function. > > That is wrong. For most implementations of the STL you will find > that function objects can be /faster/ than passing free function > references/pointers. In the ops case > > struct Bigger > { > bool operator ( ) ( int i ) const { return abs(i) >= D ; } > } ; > ... > count += count_if(diff + 1, diff + N, Bigger()); > > is likely to be much faster than > > bool bigger(int i) { return abs(i) >= D; } > ... > count += count_if(diff + 1, diff + N, bigger); > > Also, on many platforms > > struct Bigger > { > bool operator ( ) ( int i ) const { return i >= D || i <= - > D ; } > } ; > > will be faster still. Give it a try and you will see. By the way, > if you put the count_if into a loop you will get better sampling > and higher signal to noise. For example: > > int count = 0 ; > for ( int k = 0 ; k < 1024 ; ++k ) { > count += count_if(diff + 1, diff + N, Bigger()) ; > } > > KHD Hi Keith You are right. Of course we take advantage of inline expansion of operator(). In this case we have %25 performance improvement. For non-inline defintion (I mean defining function outside structure), the results are identical. The interesting question is: In case #1, when I declare bigger() with inline specifier, there is no performance improvement. Why? Regards, -- Saeed Amrollahi -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
|
Next
|
Last
Pages: 1 2 3 Prev: Review of a safer memory management approach for C++? Next: Static variable in template function |