Faster image rotation [Embedded]

Prev: Including compile timestamp in c?
Next: fpga and the particular case of xilinx

From: Rob Gaddi on 7 Apr 2010 12:23

On 4/7/2010 8:35 AM, Paul Carpenter wrote:
> In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says...
>> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ]
>>
>> Paul Carpenter wrote:
>
> [snip]
> Everytime you use array[ index ] the complier creates an arithmetic
> pointer manipulation to calculate the actual pointer to the desired
> location.
>

Is that really true? It seems like the sort of thing that -O3 ought to
be able to take care of.

--
Rob Gaddi, Highland Technology
Email address is currently out of order

From: Vladimir Vassilevsky on 7 Apr 2010 13:05

Rob Gaddi wrote:

> On 4/7/2010 8:35 AM, Paul Carpenter wrote:
>
>> In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says...
>>
>>> [ Adding comp.arch.embedded back to the mix, since most of the
>>> replies come from that group ]
>>>
>>> Paul Carpenter wrote:
>>
>>
>> [snip]
>> Everytime you use array[ index ] the complier creates an arithmetic
>> pointer manipulation to calculate the actual pointer to the desired
>> location.
>>
>
> Is that really true? It seems like the sort of thing that -O3 ought to
> be able to take care of.

It depends. Compilers can optimize simple index arithmetics into
manipulation with pointers. However they are goofing with increased
level of loop nesting and more complicated index expressions. For that
matter, pointer arithmetics is usually faster.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

From: Paul Carpenter on 7 Apr 2010 13:10

In article <xN6dnWMvdr0ELCHWnZ2dnUVZ_jadnZ2d(a)lmi.net>,
rgaddi(a)technologyhighland.com says...
> On 4/7/2010 8:35 AM, Paul Carpenter wrote:
> > In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says...
> >> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ]
> >>
> >> Paul Carpenter wrote:
> >
> > [snip]
> > Everytime you use array[ index ] the complier creates an arithmetic
> > pointer manipulation to calculate the actual pointer to the desired
> > location.
> >
>
> Is that really true? It seems like the sort of thing that -O3 ought to
> be able to take care of.

Look back at the OP's code and the pointers are incremented between
iterations of the loop then used as arrays, how is it going to optimise a
moving target other than making a copy and adding sizeof bytes to the
temporary pointer.

--
Paul Carpenter | paul(a)pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate

From: D Yuniskis on 7 Apr 2010 17:14

Noob wrote:
> I need to rotate a picture clockwise 90 degrees.

Are you sure you will always want to rotate by some
multiple of 90 deg?

Are you sure you will *always* want to rotate CW by
exactly 90 deg?

> Conceptually, my picture is represented by a matrix
> (W columns, H rows) of pixels, with each pixel
> represented by 3 octets.
>
> pixel original_picture[W][H];
> pixel rotated_picture[H][W];
>
> At the implementation level, I'm dealing with 1D integer arrays.
>
> int src[W*H*3];
> int dest[W*H*3];

Your original data definition was cleaner.

> Conceptually, the rotation operation comes down to
>
> rotated_picture[j][H-1-i] = original_picture[i][j]
>
> My actual code is
>
> for (i = 0; i < H; ++i)
> for (j = 0; j < W; ++j)
> memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3);

Why are you taking the memcpy function invocation hit here?
You're just moving "3 bytes (octets)".

Create two pointers:

pixel *input, *output;

Initialize the input pointer to "some corner" (top left is
convenient). Initialize the output pointer to the corner
that is 90 degrees CW from this point.

Move the input pointer across a row (or down a column -- depending
on which corner you chose as a starting point) and move the output
pointer down a column (or across a row, etc.).

Then, just copy from *input to *output.

Of course, using a three byte type makes this cumbersome.
I assume you don't want to pack those three bytes into a
long int (time/space efficiency concerns). So, you'll have
to break it down to a set of three "byte operations".

> Consider as an example,
> W = 1644
> H = 1164
> size = 5.74 MB
>
> On my target platform, the rotation takes 1.65 seconds.
>
> I'm trying to get this down under half a second.
>
> I'm using the GNU tool chain. For some weird reason, we
> compile everything -O0. The first thing I'll try is crank
> gcc's optimization level.
>
> I'm hoping gcc can perform some strength reduction, as the
> index calculation seems to be taking a non-negligible fraction
> of the total run-time.
>
> Changing the loop to
>
> for (i = 0; i < H; ++i)
> for (j = 0; j < W; ++j)
> {
> memcpy(dest+(H*j+H-i-1)*3, src, 3);
> src += 3;
> }
>
> brings a 5% improvement.
>
> I thought changing the memcpy call to 3 explicit assignments
> might help, but it actually slowed things down.

It might help if you told us which processor you are targeting.

> Perhaps I need to perform some tiling magic... I don't think
> gcc performs automatic tiling?
>
> Comments and insight would be greatly appreciated.

From: George Neuner on 7 Apr 2010 19:35

On Wed, 07 Apr 2010 12:05:03 -0500, Vladimir Vassilevsky
<nospam(a)nowhere.com> wrote:

>Rob Gaddi wrote:
>
>> On 4/7/2010 8:35 AM, Paul Carpenter wrote:
>>
>>> [snip]
>>> Everytime you use array[ index ] the complier creates an arithmetic
>>> pointer manipulation to calculate the actual pointer to the desired
>>> location.
>>
>> Is that really true? It seems like the sort of thing that -O3 ought to
>> be able to take care of.
>
>It depends. Compilers can optimize simple index arithmetics into
>manipulation with pointers. However they are goofing with increased
>level of loop nesting and more complicated index expressions. For that
>matter, pointer arithmetics is usually faster.

It depends on the compiler. Most C90 and later compilers do a pretty
good job optimizing array indexing and are not so easily confused by
high dimensionality or sub-array blocking. In many cases you will
actually confuse them and reduce performance if you try to do your own
pointer arithmetic.

That said, old versions of GCC (certainly anything prior to 3.x) were
notoriously bad at optimizing even 3-D array indexing. If your chip
requires using one of these old compilers, then you certainly should
do your own pointer arithmetic.

George

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Including compile timestamp in c?
Next: fpga and the particular case of xilinx