Prev: ===Christian Louboutin - www.vipchristianlouboutin.com
Next: Alphabet_Soup:_a_Collection_of_Microarchitectures
From: Skybuck Flying on 4 Aug 2010 19:31 I am thinking 1920x1200x24 bit colors at 60 frames per second. There is simply no way that any cpu can achieve that with simply x86... very maybe sse etc... but I wouldn't hold my breath ! ;) ^ Even if it's just a single loop... Add a few branches in the loop and it's game over 100% for sure ;) Bye, Skybuck.
From: Wolfgang Draxinger on 5 Aug 2010 04:59 On Thu, 5 Aug 2010 01:31:39 +0200 "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote: > I am thinking 1920x1200x24 bit colors at 60 frames per second. The following calculating is extremely pessimistic, in the real world things look much, much better :) 1920 * 1200 * 8 bytes * 60/s = 0.943718400 GByte/s Let's just assume you need 10 instructions for making your picture. And let's say, you're using a modern CPU, clocked at about 2.5GHz. And due to OOE and pipeline parallelization, within one clock cycle 4 instructions deliver their result. So within one second you get values 2.5 * 10^9 * 4 / 10 = 10^9 Now let's say you're clever and operate on 32bit registers, so you're actually getting 4 * 10^9 Bytes/s = 4GByte/s, which is more than 4 times the data rate, than needed to fill that screen at 60FPS. In a pessimistic view. You may also see it from this perspective: On contemporary CPUs you can watch FullHD-Video - which requires decompressing the stuff and sending it to the graphics card, color space conversion usually happens there in the video overay - at 30Hz with no problem whatsoever, it's just that your CPU it completely cogged up with that task then. That's why modern GPUs help with decompression, but it's not something absolutely required to watch FullHD video on a PC. > There is simply no way that any cpu can achieve that with simply > x86... very maybe sse etc... but I wouldn't hold my breath ! ;) 2D Compositing is so simple, that contemporary CPUs can do it with ease, even if you don't use the most fancy SIMD instructions, but of course they really speed things up. The problem arises if the CPU shall do anything other like that. But frankly: The very first SIMD instructions (MMX) have been around for >12 years, today every x86-CPU can do SSE2 at least. Plus AMD's own 3Dnow! extensions. Today you can be pretty sure, to have some kind of SIMD available. > ^ Even if it's just a single loop... It doesn't matter how deep you stack the loops, as long you stay in the working set. And in some (and on current CPUs most!) cases it's even better to keep the loops and not unroll them. Today's OOE predictors are very, very efficient in determining something to be a loop. And unlike switch/if statements the codepath within loops can be predetermined from the very beginning (that's why loop unrolling works in the first place). Of course if you're branching within a loop, you're causing problems, if the branch is between complex code. But simple things like if(a<WHATEVER) a = b*c else a = b/c they don't even appear to modern CPUs as two different code paths. Heck a whole architecture emerged from that observation, and made that the core feature of it's instruction set (ARM). In fact loop unrolling even can make your program slower. One of the x264 developers has a nice blog entry on the topic. http://x264dev.multimedia.cx/?p=201 > Add a few branches in the loop and it's game over 100% for sure ;) Branches are a problem only if each of the code paths will perform a lot of operations on very different parts of memory. Those are real cache killers. If your branches are short, quickly rejoin and stay within a working set, you won't even notice they're there - performance wise.
From: Wolfgang Draxinger on 5 Aug 2010 05:13 On Thu, 5 Aug 2010 10:59:12 +0200 Wolfgang Draxinger <wdraxinger(a)darkstargames.de> wrote: > On Thu, 5 Aug 2010 01:31:39 +0200 > "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote: > > > I am thinking 1920x1200x24 bit colors at 60 frames per second. > > The following calculating is extremely pessimistic, in the real world > things look much, much better :) Sorry, I forgot a factor of 3 here, but still... > 1920 * 1200 * 8 bytes * 60/s = > 0.943718400 GByte/s *3 -> 3.32GByte/s > ... > (...) which is more than 4 times the data rate, than needed to fill > that screen at 60FPS. In a pessimistic view. Yet 3.32 < 4, so the whole things still holds. Wolfgang
From: Skybuck on 5 Aug 2010 07:08 My harddisk which is pretty speedy can read at about 180 Megabyte/sec. The screen/data that needs to be decompressed per second is: 1920x1200x60x3 = 414.720.000 The usual compression ratio is 200. Which means: 414.720.000 / 200 = 2.073.600 bytes remain per second. This means rougly: 2.073.600 x 200 instructions = 414.720.000 instructions for decompress, however a single instruction decoder probably doesn't exist... so it's pretty safe to multiply this with 2 or 3 or maybe even 10. Let's take 10. 414.720.000 * 10 = 4.147.200.000 instructions per second. Some instructions might require 2 to 15 cycles... let's say 3 or so: 12.441.600.000 = 12 GigaHertz processor needed to run smoothly with simple instructions. Computers are near 2.0 ghz per core to maybe 4.0 ghz at best... but no where near 12 ghz. Tricks like sse might speed it up here and there... but probably/ definetly not enough to run smooth at 60 ghz. So conclusion is: CPU can't handle it, and even if it could handle it no CPU processing power would be left to do anything else. You seem to understand that yourself as well ;) If you doubt this is true try writing a simple video codec yourself and you will quickly find out that your video codec is limited by CPU but also by the memory system... not being able to sustain random access at such high frequencies. CPU will start to wait on memory access if random access memory is done. This is where GPU's are better than can do something else while waiting on the memory. Bye, Skybuck.
From: Skybuck on 5 Aug 2010 07:11 On Aug 5, 11:13 am, Wolfgang Draxinger <wdraxin...(a)darkstargames.de> wrote: > On Thu, 5 Aug 2010 10:59:12 +0200 > > Wolfgang Draxinger <wdraxin...(a)darkstargames.de> wrote: > > On Thu, 5 Aug 2010 01:31:39 +0200 > > "Skybuck Flying" <IntoTheFut...(a)hotmail.com> wrote: > > > > I am thinking 1920x1200x24 bit colors at 60 frames per second. > > > The following calculating is extremely pessimistic, in the real world > > things look much, much better :) > > Sorry, I forgot a factor of 3 here, but still... > > > 1920 * 1200 * 8 bytes * 60/s = > > 0.943718400 GByte/s > > *3 -> 3.32GByte/s > > > ... > > (...) which is more than 4 times the data rate, than needed to fill > > that screen at 60FPS. In a pessimistic view. > > Yet 3.32 < 4, so the whole things still holds. This assumes single copy... if any where in the system additional copies have to be made this ofcourse breaks apart. And it's highly likely that somewhere in a driver somewhere a copy is being made... Perhaps because of a memory swap between application space/driver and kernel space/driver. Bye, Skybuck.
|
Next
|
Last
Pages: 1 2 3 Prev: ===Christian Louboutin - www.vipchristianlouboutin.com Next: Alphabet_Soup:_a_Collection_of_Microarchitectures |