Prev: Larrabee delayed: anyone know what's happening?
Next: Xeon 3460 SMT power consumption and performance
From: "Andy "Krazy" Glew" on 11 Dec 2009 10:24 Terje Mathisen wrote: > It seems to be a bandwidth problem much more than a flops, i.e. using > the texture units effectively was the key to the big wins. I am puzzled, torn, wondering about the texture units. There's texture memory, which is just a slightly funky form of cache/memory, with funky locality, and possible compression. But then there's also the texture computation capability. Which is just a funky form of 2 or 3D interpolation. Most people seem to be getting the benefit from texture memory. But when people use the texture interpolation compute capabilities, there's another kicker. Back in the 1990s on P6, when I was trying to make the case for CPUs to own the graphics market, and not surrender it to the only-just-nascent GPUs, the texture units were the oinker: they are just so damned necessary to graphics, and they are just so damned idiosyncratic. I do not know of any good way to do texturing in software that doesn't lose performance, or of any good way to decompose texturing into simpler instruction set primitives that could reasonably be added to an instruction set. E.g. I don't know of any good way to express texture operations in terms of 2, or 3, or even 4, register inputs. Let's try again: how about an interpolate instruction that takes 3 vector registers, and performs interpolation between tuples in the X direction along the length of the register, and in the Y generation between correspondng elements in different registers? But do you want 2, or 3, or 4, or ... arguments to interpolate along? And what about Z interpolation? Let alone compression? And skewed sampling? And ... Textures just seem to be this big mass of stuff, all of which has to be done in order to be credible. Although I usually try to decompose complex things into simpler operations, sometimes it is necessary to go the other way. Maybe we can make the texture units more general. Make them into generally useful function interpolation units. Add that capability to general purpose CPUs. How much of the benefit is texture computation vs texture memory? Can we separate these two things? Texture computation is interpolation. (Which, of course, often translates to memory savings because it changes the amount of memory you need for lookup tables - higher order interpolation, or multiscale interpolation => less memory traffic.) It looks like this can be made general purpose. But how many people need it? Texture memory is ... a funky sort of cache, with compression. Caches we can make generically useful. Compression - for read-only data structures, sure. But how can we write INTO the "compressed texture cache memory", in such a way that we don't blow out the compression when in gets kicked out of the cache? Or, can we safely create a hardware datastructure that is mainly useful for caching readonly, heaviliy preprocessed, data? It seems to me that most of the GPGPU codes are not using the compute or compression aspects of texture units. Indeed, CUDA doesn't really give access to that. So it is probably just the extra memory ports and cache behavior. -- Terje, you're the master of lookup tables. Can you see a way to make texture units generally useful?
From: Andrew Reilly on 11 Dec 2009 17:18 On Fri, 11 Dec 2009 07:24:36 -0800, Andy \"Krazy\" Glew wrote: > Back in the 1990s on P6, when I was trying to make the case for CPUs to > own the graphics market, and not surrender it to the only-just-nascent > GPUs, the texture units were the oinker: they are just so damned > necessary to graphics, and they are just so damned idiosyncratic. I do > not know of any good way to do texturing in software that doesn't lose > performance, or of any good way to decompose texturing into simpler > instruction set primitives that could reasonably be added to an > instruction set. E.g. I don't know of any good way to express texture > operations in terms of 2, or 3, or even 4, register inputs. Isn't that a fairly damning argument against Larabee, as a general- purpose graphics part? Or did Larabee have equivalent texture units bolted on to the side of their Atom-ish cores? Cheers, -- Andrew
From: "Andy "Krazy" Glew" on 12 Dec 2009 01:02 Andrew Reilly wrote: > On Fri, 11 Dec 2009 07:24:36 -0800, Andy \"Krazy\" Glew wrote: > >> Back in the 1990s on P6, when I was trying to make the case for CPUs to >> own the graphics market, and not surrender it to the only-just-nascent >> GPUs, the texture units were the oinker: they are just so damned >> necessary to graphics, and they are just so damned idiosyncratic. I do >> not know of any good way to do texturing in software that doesn't lose >> performance, or of any good way to decompose texturing into simpler >> instruction set primitives that could reasonably be added to an >> instruction set. E.g. I don't know of any good way to express texture >> operations in terms of 2, or 3, or even 4, register inputs. > > Isn't that a fairly damning argument against Larabee, as a general- > purpose graphics part? Or did Larabee have equivalent texture units > bolted on to the side of their Atom-ish cores? Where did you get your information about Larrabee? Wikipedia (http://en.wikipedia.org/wiki/Larrabee_%28GPU%29) says (as of the time I am posting this): Larrabee's x86 cores will be based on the much simpler Pentium P54C design Larrabee includes one major fixed-function graphics hardware feature: texture sampling units. These perform trilinear and anisotropic filtering and texture decompression. The following seems to be the standard reference for Larrabee: http://software.intel.com/file/2824/ Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P. 2008. Larrabee: A Many–Core x86 Architecture for Visual Computing. ACM Trans. Graph. 27, 3, Article 18 (August 2008), 15 pages. DOI = 10.1145/1360612.1360617 http://doi.acm.org/10.1145/1360612.1360617. I like their quote on texture units: Larrabee includes texture filter logic because this operation cannot be efficiently performed in software on the cores. Our analysis shows that software texture filtering on our cores would take 12x to 40x longer than our fixed function logic, depending on whether decompression is required. There are four basic reasons: • Texture filtering still most commonly uses 8-bit color components, which can be filtered more efficiently in dedicated logic than in the 32-bit wide VPU lanes. • Efficiently selecting unaligned 2x2 quads to filter requires a specialized kind of pipelined gather logic. • Loading texture data into the VPU for filtering requires an impractical amount of register file bandwidth. • On-the-fly texture decompression is dramatically more efficient in dedicated hardware than in CPU code. The Larrabee texture filter logic is internally quite similar to typical GPU texture logic. It provides 32KB of texture cache per core and supports all the usual operations, such as DirectX 10 compressed texture formats, mipmapping, anisotropic filtering, etc. Cores pass commands to the texture units through the L2 cache and receive results the same way. The texture units perform virtual to physical page translation and report any page misses to the core, which retries the texture filter command after the page is in memory. Larrabee can also perform texture operations directly on the cores when the performance is fast enough in software
From: Andrew Reilly on 12 Dec 2009 02:28 On Fri, 11 Dec 2009 22:02:51 -0800, Andy \"Krazy\" Glew wrote: > Where did you get your information about Larrabee? Only here. I don't recall it coming up, before. I'm not all that interested in specialized graphics pipelines. Thanks for the great quote! Cheers, -- Andrew
From: "Andy "Krazy" Glew" on 12 Dec 2009 11:26 Andrew Reilly wrote: > On Fri, 11 Dec 2009 22:02:51 -0800, Andy \"Krazy\" Glew wrote: > >> Where did you get your information about Larrabee? > > Only here. I don't recall it coming up, before. I'm not all that > interested in specialized graphics pipelines. Thanks for the great quote! I guess that part of the reason for this conversation is... Although I *am* interested in specialized graphics functions I am much more interested in operations that are of general use. If you think of texture units as a generalized interploation and cache with compression, then we can think of areas of more general use.
|
Next
|
Last
Pages: 1 2 Prev: Larrabee delayed: anyone know what's happening? Next: Xeon 3460 SMT power consumption and performance |