From: Rob Gaddi on 7 Apr 2010 12:23 On 4/7/2010 8:35 AM, Paul Carpenter wrote: > In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says... >> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ] >> >> Paul Carpenter wrote: > > [snip] > Everytime you use array[ index ] the complier creates an arithmetic > pointer manipulation to calculate the actual pointer to the desired > location. > Is that really true? It seems like the sort of thing that -O3 ought to be able to take care of. -- Rob Gaddi, Highland Technology Email address is currently out of order
From: Vladimir Vassilevsky on 7 Apr 2010 13:05 Rob Gaddi wrote: > On 4/7/2010 8:35 AM, Paul Carpenter wrote: > >> In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says... >> >>> [ Adding comp.arch.embedded back to the mix, since most of the >>> replies come from that group ] >>> >>> Paul Carpenter wrote: >> >> >> [snip] >> Everytime you use array[ index ] the complier creates an arithmetic >> pointer manipulation to calculate the actual pointer to the desired >> location. >> > > Is that really true? It seems like the sort of thing that -O3 ought to > be able to take care of. It depends. Compilers can optimize simple index arithmetics into manipulation with pointers. However they are goofing with increased level of loop nesting and more complicated index expressions. For that matter, pointer arithmetics is usually faster. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
From: Paul Carpenter on 7 Apr 2010 13:10 In article <xN6dnWMvdr0ELCHWnZ2dnUVZ_jadnZ2d(a)lmi.net>, rgaddi(a)technologyhighland.com says... > On 4/7/2010 8:35 AM, Paul Carpenter wrote: > > In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says... > >> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ] > >> > >> Paul Carpenter wrote: > > > > [snip] > > Everytime you use array[ index ] the complier creates an arithmetic > > pointer manipulation to calculate the actual pointer to the desired > > location. > > > > Is that really true? It seems like the sort of thing that -O3 ought to > be able to take care of. Look back at the OP's code and the pointers are incremented between iterations of the loop then used as arrays, how is it going to optimise a moving target other than making a copy and adding sizeof bytes to the temporary pointer. -- Paul Carpenter | paul(a)pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny <http://www.badweb.org.uk/> For those web sites you hate
From: D Yuniskis on 7 Apr 2010 17:14 Noob wrote: > I need to rotate a picture clockwise 90 degrees. Are you sure you will always want to rotate by some multiple of 90 deg? Are you sure you will *always* want to rotate CW by exactly 90 deg? > Conceptually, my picture is represented by a matrix > (W columns, H rows) of pixels, with each pixel > represented by 3 octets. > > pixel original_picture[W][H]; > pixel rotated_picture[H][W]; > > At the implementation level, I'm dealing with 1D integer arrays. > > int src[W*H*3]; > int dest[W*H*3]; Your original data definition was cleaner. > Conceptually, the rotation operation comes down to > > rotated_picture[j][H-1-i] = original_picture[i][j] > > My actual code is > > for (i = 0; i < H; ++i) > for (j = 0; j < W; ++j) > memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3); Why are you taking the memcpy function invocation hit here? You're just moving "3 bytes (octets)". Create two pointers: pixel *input, *output; Initialize the input pointer to "some corner" (top left is convenient). Initialize the output pointer to the corner that is 90 degrees CW from this point. Move the input pointer across a row (or down a column -- depending on which corner you chose as a starting point) and move the output pointer down a column (or across a row, etc.). Then, just copy from *input to *output. Of course, using a three byte type makes this cumbersome. I assume you don't want to pack those three bytes into a long int (time/space efficiency concerns). So, you'll have to break it down to a set of three "byte operations". > Consider as an example, > W = 1644 > H = 1164 > size = 5.74 MB > > On my target platform, the rotation takes 1.65 seconds. > > I'm trying to get this down under half a second. > > I'm using the GNU tool chain. For some weird reason, we > compile everything -O0. The first thing I'll try is crank > gcc's optimization level. > > I'm hoping gcc can perform some strength reduction, as the > index calculation seems to be taking a non-negligible fraction > of the total run-time. > > Changing the loop to > > for (i = 0; i < H; ++i) > for (j = 0; j < W; ++j) > { > memcpy(dest+(H*j+H-i-1)*3, src, 3); > src += 3; > } > > brings a 5% improvement. > > I thought changing the memcpy call to 3 explicit assignments > might help, but it actually slowed things down. It might help if you told us which processor you are targeting. > Perhaps I need to perform some tiling magic... I don't think > gcc performs automatic tiling? > > Comments and insight would be greatly appreciated.
From: George Neuner on 7 Apr 2010 19:35
On Wed, 07 Apr 2010 12:05:03 -0500, Vladimir Vassilevsky <nospam(a)nowhere.com> wrote: >Rob Gaddi wrote: > >> On 4/7/2010 8:35 AM, Paul Carpenter wrote: >> >>> [snip] >>> Everytime you use array[ index ] the complier creates an arithmetic >>> pointer manipulation to calculate the actual pointer to the desired >>> location. >> >> Is that really true? It seems like the sort of thing that -O3 ought to >> be able to take care of. > >It depends. Compilers can optimize simple index arithmetics into >manipulation with pointers. However they are goofing with increased >level of loop nesting and more complicated index expressions. For that >matter, pointer arithmetics is usually faster. It depends on the compiler. Most C90 and later compilers do a pretty good job optimizing array indexing and are not so easily confused by high dimensionality or sub-array blocking. In many cases you will actually confuse them and reduce performance if you try to do your own pointer arithmetic. That said, old versions of GCC (certainly anything prior to 3.x) were notoriously bad at optimizing even 3-D array indexing. If your chip requires using one of these old compilers, then you certainly should do your own pointer arithmetic. George |