From: Joseph Power on 7 Apr 2010 12:07 On Wed, 07 Apr 2010 12:17:12 +0200 in comp.arch.embedded, Noob <root(a)127.0.0.1> wrote: >My actual code is > > for (i = 0; i < H; ++i) > for (j = 0; j < W; ++j) > memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3); > I suspect some compilers may do this 'under the hood', but you might want to consider adding a temporary variable to cut down on repeated multiplies: for (i = 0; i < H; ++i) { W_times_i = W * i; for (j = 0; j < W; ++j) { memcpy(dest+(H*j+H-1-i)*3, src+(W_times_i+j)*3, 3); } } This should eliminate W - 1 multiplies. Given that there is a certain amount of overhead involved in calling memcpy (especially for such a small number of bytes), you might want to simply replace the call with three explicit assignments: for (i = 0; i < H; ++i) { W_times_i = W * i; for (j = 0; j < W; ++j) { dest_indx = (H*j+H-1-i)*3; src_indx = (W_times_i+j)*3; dest[dest_indx++] = src[src_indx++]; dest[dest_indx++] = src[src_indx++]; dest[dest_indx] = src[src_indx]; } } You might also try i++ and j++ in your for loops - if your processor's hardware supports post-increment better than pre-increment, your compiler may be aware of that fact. hope that helps Joe Power
From: Rob Gaddi on 7 Apr 2010 12:23 On 4/7/2010 8:35 AM, Paul Carpenter wrote: > In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says... >> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ] >> >> Paul Carpenter wrote: > > [snip] > Everytime you use array[ index ] the complier creates an arithmetic > pointer manipulation to calculate the actual pointer to the desired > location. > Is that really true? It seems like the sort of thing that -O3 ought to be able to take care of. -- Rob Gaddi, Highland Technology Email address is currently out of order
From: Vladimir Vassilevsky on 7 Apr 2010 13:05 Rob Gaddi wrote: > On 4/7/2010 8:35 AM, Paul Carpenter wrote: > >> In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says... >> >>> [ Adding comp.arch.embedded back to the mix, since most of the >>> replies come from that group ] >>> >>> Paul Carpenter wrote: >> >> >> [snip] >> Everytime you use array[ index ] the complier creates an arithmetic >> pointer manipulation to calculate the actual pointer to the desired >> location. >> > > Is that really true? It seems like the sort of thing that -O3 ought to > be able to take care of. It depends. Compilers can optimize simple index arithmetics into manipulation with pointers. However they are goofing with increased level of loop nesting and more complicated index expressions. For that matter, pointer arithmetics is usually faster. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
From: nmm1 on 7 Apr 2010 13:10 In article <romu87-uk61.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >Noob wrote: >> >> I imagine the following problem has been efficiently solved >> over a million times in the past. >> >> I need to rotate a picture clockwise 90 degrees. >> >> Conceptually, my picture is represented by a matrix >> (W columns, H rows) of pixels, with each pixel >> represented by 3 octets. >> >> pixel original_picture[W][H]; >> pixel rotated_picture[H][W]; > >This is indeed a well-known problem, the key to make it fast is to >realize that it is very similar to a transpose operation! > >I.e. first copy everything to the target array (in a single block move), >then transpose it, finally you reverse each of the rows. > >The difference between clock and counter-clock-wise rotation is in doing >the row reverse either before or after the transpose. Try both since I >don't remeber which is which! > >The key here is that the transpose operation is quite cache-friendly, >much better than your current naive code. For extra marks, produce a much more cache-friendly version, using blocks rather than rows .... You can even do that in place, and make it efficient. But I suspect that you (Terje) know all of this! Regards, Nick Maclaren.
From: Paul Carpenter on 7 Apr 2010 13:10
In article <xN6dnWMvdr0ELCHWnZ2dnUVZ_jadnZ2d(a)lmi.net>, rgaddi(a)technologyhighland.com says... > On 4/7/2010 8:35 AM, Paul Carpenter wrote: > > In article<hphv27$bnh$1(a)speranza.aioe.org>, root(a)127.0.0.1 says... > >> [ Adding comp.arch.embedded back to the mix, since most of the replies come from that group ] > >> > >> Paul Carpenter wrote: > > > > [snip] > > Everytime you use array[ index ] the complier creates an arithmetic > > pointer manipulation to calculate the actual pointer to the desired > > location. > > > > Is that really true? It seems like the sort of thing that -O3 ought to > be able to take care of. Look back at the OP's code and the pointers are incremented between iterations of the loop then used as arrays, how is it going to optimise a moving target other than making a copy and adding sizeof bytes to the temporary pointer. -- Paul Carpenter | paul(a)pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny <http://www.badweb.org.uk/> For those web sites you hate |