From: Noob on
Peter Dickerson wrote:

> Noob wrote:
>
>> Ulf Samuelsson wrote:
>>
>>> Noob wrote:
>>>
>>>> I imagine the following problem has been efficiently solved
>>>> over a million times in the past.
>>>>
>>>> I need to rotate a picture clockwise 90 degrees.
>>>
>>> Modern Hardware like the [snip] will do this in a H/W accelerator.
>>
>> I was specifically asking how to do it in software.
>>
>> I find this spamvertisement very inappropriate.
>>
>> Does your employer condone this activity?
>
> Well, the fact that some devices feel the need to do this in hardware is
> relevant. It means that either the processor does have the grunt to do the
> job well itself or that there is significant performance to be had from
> methods that are not easy possible in software (e.g. dedicated parallel
> hardware). If the second then it hints that some cleverness in software will
> be needed to do well.

As a matter of fact, I was told that the blitter on this platform (STx7109)
*does* support image rotation in hardware, but that the capability had not
been exposed by the API. Strange, wouldn't you say?

> Anyway, there is nothing about USENET that requires replies to you questions
> to be for you only.

I do understand that.

> If the reply is related, and the can reasonably expect
> there to be readers for which the reply is relevant, then I don't have a big
> problem. Some of this is about motivation. If the motivation of the response
> was to shoehorn there product or services into the discussion rather than to
> be genuinely helpful then that I surely don't like. Now, Ulf has a history
> in this group and I feel that history has been strongly positive though
> somewhat focused on Atmel products (since that's what he knows best).

I agree that it might be appropriate to mention that some platforms provide
hardware support for some operations. What I found rather spammy was the explicit
mention of one of his employer's platform.

I will concede that I may have over-reacted ;-)

Regards.
From: Meindert Sprang on
"Noob" <root(a)127.0.0.1> wrote in message
news:hpk8f1$4n1$1(a)speranza.aioe.org...
> > Then, just copy from *input to *output.
>
> I've done this, and it is still too slow for my taste :-(
>
> It brings the run-time from 1.6 seconds down to 1.1 seconds
> (for my 1644 x 1164 example).

Seriously, consider doing this in assembler. I have done some fax image
processing in the past and the difference between a HLL implementation and
an assemler implementation of the run length decoding of a PCX image was
astonishing.

> > Of course, using a three byte type makes this cumbersome.
> > I assume you don't want to pack those three bytes into a
> > long int (time/space efficiency concerns). So, you'll have
> > to break it down to a set of three "byte operations".
>
> I didn't see how to ask libjpeg to output 32-bit pixel values
> instead of 24-bit RGB values.

Converting it from 24 bit to 32 bit first, might even give you a performance
gain, especially since your image dimensions seem to be a multiple of 4.

Meindert


From: Boudewijn Dijkstra on
Op Thu, 08 Apr 2010 11:43:02 +0200 schreef Noob <root(a)127.0.0.1>:
> D Yuniskis wrote:
>> Noob wrote:
>>
>>> I need to rotate a picture clockwise 90 degrees.
>> [...]
> I'm displaying JPEG photographs. Some users want to rotate the pictures
> (landscape to portrait). AFAICS, all I need is 90� either way.
> [...]
>> Of course, using a three byte type makes this cumbersome.
>> I assume you don't want to pack those three bytes into a
>> long int (time/space efficiency concerns). So, you'll have
>> to break it down to a set of three "byte operations".
>
> I didn't see how to ask libjpeg to output 32-bit pixel values
> instead of 24-bit RGB values.

Now that you've mentioned libjpeg: it supports lossless transformations,
including 90� rotations. See transupp.h.


--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)
From: Noob on
Boudewijn Dijkstra wrote:
> Op Thu, 08 Apr 2010 11:43:02 +0200 schreef Noob <root(a)127.0.0.1>:
>> D Yuniskis wrote:
>>> Noob wrote:
>>>
>>>> I need to rotate a picture clockwise 90 degrees.
>>> [...]
>> I'm displaying JPEG photographs. Some users want to rotate the pictures
>> (landscape to portrait). AFAICS, all I need is 90� either way.
>> [...]
>>> Of course, using a three byte type makes this cumbersome.
>>> I assume you don't want to pack those three bytes into a
>>> long int (time/space efficiency concerns). So, you'll have
>>> to break it down to a set of three "byte operations".
>>
>> I didn't see how to ask libjpeg to output 32-bit pixel values
>> instead of 24-bit RGB values.
>
> Now that you've mentioned libjpeg: it supports lossless transformations,
> including 90� rotations. See transupp.h.

Pictures are not unconditionally rotated. They are rotated when
the user asks for it, which is AFTER the picture has been
decoded, which might take 2-3 seconds.

It would take longer to
rotate the JPEG representation + decode new picture
than to
rotate the bitmap of the decoded picture
From: Brett Davis on
>> I need to rotate a picture clockwise 90 degrees.
>
> The data sheet states
>
> SH-4 32-bit super-scalar RISC CPU
> o 266 MHz, 2-way set associative 16-Kbyte ICache, 32-Kbyte DCache, MMU
> o 5-stage pipeline, delayed branch support
> o floating point unit, matrix operation support
> o debug port, interrupt controller

The most important number is cache line size, which you missed.
If your image is 1,024 lines tall, that will completely thrash
the cache, resulting in 3 bytes copied per cache line load/spill.

If you copy 16x16 tiles you can get a 10x speedup.

CopyTile16(source, dest, x, y, width, height)...

You can also try 8x8 and 4x4, smaller loops can be faster due
to all the args fitting in memory, and the loop getting unrolled.
but the difference will be ~25% which is hardly worth the time to code.

> for (i = 0; i < H; ++i)
> for (j = 0; j < W; ++j)
> {
> unsigned char *C = B+(H*j+H-i-1)*3;
> C[0] = A[0];
> C[1] = A[1];
> C[2] = A[2];
> A += 3;
> }

If you do the following you can get a 2x speedup, it looks like
more code, but will generate less, and the results will be
pipelined correctly.

{
unsigned char *C = B+(H*j+H-i-1)*3;
temp0 = A[0];
temp1 = A[1];
temp2 = A[2];
C[0] = temp0;
C[1] = temp1;
C[2] = temp2;
A += 3;
}

Do not use *C++ = *A++;

The is no need to flame the guy selling hardware, even though this
is a bandwidth limited problem that will not be helped by more hardware.

Brett