From: Joseph Power on
On Wed, 07 Apr 2010 12:17:12 +0200 in comp.arch.embedded, Noob <root(a)127.0.0.1>
wrote:

>My actual code is
>
> for (i = 0; i < H; ++i)
> for (j = 0; j < W; ++j)
> memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3);
>

I suspect some compilers may do this 'under the hood', but you might want to
consider adding a temporary variable to cut down on repeated multiplies:

for (i = 0; i < H; ++i)
{
W_times_i = W * i;
for (j = 0; j < W; ++j)
{
memcpy(dest+(H*j+H-1-i)*3, src+(W_times_i+j)*3, 3);
}
}

This should eliminate W - 1 multiplies.

Given that there is a certain amount of overhead involved in calling memcpy
(especially for such a small number of bytes), you might want to simply replace
the call with three explicit assignments:

for (i = 0; i < H; ++i)
{
W_times_i = W * i;
for (j = 0; j < W; ++j)
{
dest_indx = (H*j+H-1-i)*3;
src_indx = (W_times_i+j)*3;
dest[dest_indx++] = src[src_indx++];
dest[dest_indx++] = src[src_indx++];
dest[dest_indx] = src[src_indx];
}
}


You might also try i++ and j++ in your for loops - if your processor's hardware
supports post-increment better than pre-increment, your compiler may be aware of
that fact.

hope that helps

Joe Power
From: Rob Gaddi on
On 4/7/2010 8:35 AM, Paul Carpenter wrote:
> In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says...
>> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ]
>>
>> Paul Carpenter wrote:
>
> [snip]
> Everytime you use array[ index ] the complier creates an arithmetic
> pointer manipulation to calculate the actual pointer to the desired
> location.
>

Is that really true? It seems like the sort of thing that -O3 ought to
be able to take care of.

--
Rob Gaddi, Highland Technology
Email address is currently out of order
From: Vladimir Vassilevsky on


Rob Gaddi wrote:

> On 4/7/2010 8:35 AM, Paul Carpenter wrote:
>
>> In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says...
>>
>>> [ Adding comp.arch.embedded back to the mix, since most of the
>>> replies come from that group ]
>>>
>>> Paul Carpenter wrote:
>>
>>
>> [snip]
>> Everytime you use array[ index ] the complier creates an arithmetic
>> pointer manipulation to calculate the actual pointer to the desired
>> location.
>>
>
> Is that really true? It seems like the sort of thing that -O3 ought to
> be able to take care of.

It depends. Compilers can optimize simple index arithmetics into
manipulation with pointers. However they are goofing with increased
level of loop nesting and more complicated index expressions. For that
matter, pointer arithmetics is usually faster.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
From: nmm1 on
In article <romu87-uk61.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Noob wrote:
>>
>> I imagine the following problem has been efficiently solved
>> over a million times in the past.
>>
>> I need to rotate a picture clockwise 90 degrees.
>>
>> Conceptually, my picture is represented by a matrix
>> (W columns, H rows) of pixels, with each pixel
>> represented by 3 octets.
>>
>> pixel original_picture[W][H];
>> pixel rotated_picture[H][W];
>
>This is indeed a well-known problem, the key to make it fast is to
>realize that it is very similar to a transpose operation!
>
>I.e. first copy everything to the target array (in a single block move),
>then transpose it, finally you reverse each of the rows.
>
>The difference between clock and counter-clock-wise rotation is in doing
>the row reverse either before or after the transpose. Try both since I
>don't remeber which is which!
>
>The key here is that the transpose operation is quite cache-friendly,
>much better than your current naive code.

For extra marks, produce a much more cache-friendly version, using
blocks rather than rows ....

You can even do that in place, and make it efficient. But I suspect
that you (Terje) know all of this!


Regards,
Nick Maclaren.
From: Paul Carpenter on
In article <xN6dnWMvdr0ELCHWnZ2dnUVZ_jadnZ2d(a)lmi.net>,
rgaddi(a)technologyhighland.com says...
> On 4/7/2010 8:35 AM, Paul Carpenter wrote:
> > In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says...
> >> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ]
> >>
> >> Paul Carpenter wrote:
> >
> > [snip]
> > Everytime you use array[ index ] the complier creates an arithmetic
> > pointer manipulation to calculate the actual pointer to the desired
> > location.
> >
>
> Is that really true? It seems like the sort of thing that -O3 ought to
> be able to take care of.

Look back at the OP's code and the pointers are incremented between
iterations of the loop then used as arrays, how is it going to optimise a
moving target other than making a copy and adding sizeof bytes to the
temporary pointer.

--
Paul Carpenter | paul(a)pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: #include "cpuid.os"
Next: aspect ratio algorithm needed.