Prev: ===Christian Louboutin - www.vipchristianlouboutin.com
Next: Alphabet_Soup:_a_Collection_of_Microarchitectures
From: Andy Glew "newsgroup at on 6 Aug 2010 08:56 On 8/5/2010 7:20 AM, Skybuck Flying wrote: > " > Actually you have to process only > > 1920*1200 * 3Bytes * 60/s = 0.41472 GByte/s > > THAT's a piece of cake for modern systems. > " > > I don't agree with that last sentence. > > Suppose RLE compression is used, tried it myself. > > That means a branch for each color. No it doesn't. Just think about it. On comp.arch back a few years agao, this would have been a newbie question. Think branchless code. Heck, I could code it up branchlessly in C. Not sure of branchless is a win over a machine with branch prediction, where correctly predicted branches are free. But if you say you have branch mispredictions, try the branchless version.
From: Skybuck Flying on 7 Aug 2010 00:17 "Andy Glew" <"newsgroup at comp-arch.net"> wrote in message news:UpednTydFo8am8HRnZ2dnUVZ_q2dnZ2d(a)giganews.com... > On 8/5/2010 7:20 AM, Skybuck Flying wrote: >> " >> Actually you have to process only >> >> 1920*1200 * 3Bytes * 60/s = 0.41472 GByte/s >> >> THAT's a piece of cake for modern systems. >> " >> >> I don't agree with that last sentence. >> >> Suppose RLE compression is used, tried it myself. >> >> That means a branch for each color. > > No it doesn't. Each compressed color has a color value and a count value. The count value has to be decremented. When it reaches zero a new compressed color value and a count value has to be read. I can imagine all colors to be in a color array and all counts in a count array. Therefore the count down could be done on a countdown pointer. Therefore the copieing of the color could be done from a color pointer. Both pointers would have to be incremented when the count reaches zero. I can imagine something like: ColorPointer = ColorPointer + ColorIncrementation; CountPointer = CountPointer + CountIncrementation; A slight problem is that color has to be advanced by let's say 1 byte assuming the rgb's have been split up into red, green, blue arrays, which is an additional requirement and count slow things down further but ok... I'll cut you some slack. And the count incrementation is different which needs to be 4 bytes maximum... However the counts needs to be compressed as well using 4 bytes for each count would be overkill. For now I am willing to ignore the fact that the counts have to be compressed as well... let's focus on RLE for now ;) :) ColorIncrementation could be calculated as follows: ColorIncrementation := 1 * ?; CountIncrementation := 4 * ?; Now the question is what does the question mark become ? The question mark has to be 1 if the count is zero. Therefore a branchless piece of code is necessary to determine if the count is zero. This could indeed be done with shr's and or's. Lastly... the zero count has to be inverted to a one. If any of the bits in the count was 1 it would be inverted to a zero. If none of the bits in the count was 0 it would be inverted to a one. Which would trigger the multiply. So indeed a branchless version of RLE is possible, but at what costs the question is ?! ;) This is a first version... perhaps it can be further enhanced to use less instructions. > Just think about it. I just did, see above ;) > On comp.arch back a few years agao, this would have been a newbie > question. Newbies ? lol. > Think branchless code. Heck, I could code it up branchlessly in C. I bet you can, but the question is: What it would be faster than the branch version ? ;) > Not sure of branchless is a win over a machine with branch prediction, Aha, so you not sure ! ;) :) > where correctly predicted branches are free. But if you say you have > branch mispredictions, try the branchless version. Maybe I will sometime ;) But I have a better idea... let the gpu do it ! ;) :) Bye, Skybuck =D
From: Skybuck Flying on 8 Aug 2010 12:09 I think good solution could be to replace VCL's reliance on GDI with OpenGL or maybe even DirectX. Except for one little problem: OpenGL has issue's with switching between rendering contexts... These issue's might be overwon with newer extensions like framebuffers... and maybe even opengl 4.1 multiple viewports and what not... Bye, Skybuck.
From: Skybuck Flying on 10 Aug 2010 02:33 That website works really bad, further more I don't believe in the cpu doing gui's... that's what a graphics card is for ! ;) :) Offloading work to gpu much better ?! ;) :) Bye, Skybuck =D
First
|
Prev
|
Pages: 1 2 3 Prev: ===Christian Louboutin - www.vipchristianlouboutin.com Next: Alphabet_Soup:_a_Collection_of_Microarchitectures |