Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?
From: Stephen Fuld on 28 Mar 2010 21:56 On the Research channel, which I receive through Dish Network, they show some of the computer science colloquiums at the University of Washington. I recently watched a lecture by professor Pat Hanrahan of Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It ties together some of the topics that have occurred in different recent threads in this group, including the highly parallel SIMT stuff and the need for appropriate domain specific languages to get the most out of the environment You can watch the presentation at http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345 There were several things that I thought were interesting and perhaps even promising. First is that the Folding(a)Home client has been rewritten to use a graphics card with a great speedup. The thing I thought that was significant about this is that protein folding is a more traditional HPC application than the more graphics oriented things like Photoshop effects that seem to be dominating the GPGPU scene. This also leaves open the possibility of a lot of architecture work in developing these highly parallel systems in a way that they are effective for graphics (so that they have substantial volumes) but are better optimized for more traditional HPC applications. In a discussion about the language issue, he mentions that this is really the subject of a different presentation. So I looked at his web site and found the following presentation http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf This talks about using a new system that they are working on that supports various levels of heterogeneous parallelism in a domain specific way to support what seems to be to be straight up supercomputer applications such as turbulence modeling. I am far from an expert in this area, but it appears that people are working hard on exactly what the people here have been talking about. Comments welcome. -- - Stephen Fuld (e-mail address disguised to prevent spam)
From: nmm1 on 29 Mar 2010 05:26 In article <hop1d4$ffo$1(a)news.eternal-september.org>, Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote: > >In a discussion about the language issue, he mentions that this is >really the subject of a different presentation. So I looked at his web >site and found the following presentation > >http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf > >This talks about using a new system that they are working on that >supports various levels of heterogeneous parallelism in a domain >specific way to support what seems to be to be straight up supercomputer >applications such as turbulence modeling. > >I am far from an expert in this area, but it appears that people are >working hard on exactly what the people here have been talking about. > >Comments welcome. Deja moo. That's a little unfair, but only a little. One of the major language revolutions of the 1960s was the move away from platform-specific languages to application-domain-specific languages, generic across architectures. Since then, there have been repeated attempts to go back to the 1950s (i.e. platform-domain-specific languages), most have sunk without trace, and none have lasted very long. To a great extent, the ONLY platform-domain-specific languages that have succeeded are those for vector systems (R.I.P.), message-passing systems, and (to some extent) OpenMP. When this situation arises, I always ask the following questions: 1) Exactly what has changed since the previous times? 2) Exactly why did the previous systems succeed or fail? 3) Does (1) mean that (2) no longer holds? Given that the causes of failure in the past have NOT typically been the mismatch of a language to the platform, but the mismatch from the user's requirement and abilities to the language, a different approach is needed. Yes, some of the things proposed in that talk work, but they are already being done and need no language changes. Regards, Nick Maclaren.
From: Terje Mathisen on 29 Mar 2010 06:24 Stephen Fuld wrote: > On the Research channel, which I receive through Dish Network, they show > some of the computer science colloquiums at the University of > Washington. I recently watched a lecture by professor Pat Hanrahan of > Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It > ties together some of the topics that have occurred in different recent > threads in this group, including the highly parallel SIMT stuff and the > need for appropriate domain specific languages to get the most out of > the environment > > You can watch the presentation at > > http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345 > > There were several things that I thought were interesting and perhaps > even promising. > > First is that the Folding(a)Home client has been rewritten to use a > graphics card with a great speedup. The thing I thought that was > significant about this is that protein folding is a more traditional HPC > application than the more graphics oriented things like Photoshop > effects that seem to be dominating the GPGPU scene. A couple of months ago I posted a link to a paper by some seismic processing people, they had ported their application to NVidia, and gotten _very_ significant speedups. > > This also leaves open the possibility of a lot of architecture work in > developing these highly parallel systems in a way that they are > effective for graphics (so that they have substantial volumes) but are > better optimized for more traditional HPC applications. That sounds exactly like what Intel have stated about the reason for developing Larrabee, except they've given up on the first-generation graphics product while continuing with the HPC target. > > In a discussion about the language issue, he mentions that this is > really the subject of a different presentation. So I looked at his web > site and found the following presentation > > http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf > > This talks about using a new system that they are working on that > supports various levels of heterogeneous parallelism in a domain > specific way to support what seems to be to be straight up supercomputer > applications such as turbulence modeling. > > I am far from an expert in this area, but it appears that people are > working hard on exactly what the people here have been talking about. The seismic paper shows how they started with a straight-forward port, and got pretty much no speedup at all, then they went on to do more and more platform-specific optimizations, ending up with something which was 40X (afair) faster, but of course totally non-portable. I.e. the only real key to the speedups was to grok the mapping of the problem onto the available hardware. Terje
From: Stephen Fuld on 29 Mar 2010 11:51 On 3/29/2010 2:26 AM, nmm1(a)cam.ac.uk wrote: > In article<hop1d4$ffo$1(a)news.eternal-september.org>, > Stephen Fuld<SFuld(a)Alumni.cmu.edu.invalid> wrote: >> >> In a discussion about the language issue, he mentions that this is >> really the subject of a different presentation. So I looked at his web >> site and found the following presentation >> >> http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf >> >> This talks about using a new system that they are working on that >> supports various levels of heterogeneous parallelism in a domain >> specific way to support what seems to be to be straight up supercomputer >> applications such as turbulence modeling. >> >> I am far from an expert in this area, but it appears that people are >> working hard on exactly what the people here have been talking about. >> >> Comments welcome. > > Deja moo. > > That's a little unfair, but only a little. One of the major language > revolutions of the 1960s was the move away from platform-specific > languages to application-domain-specific languages, generic across > architectures. Since then, there have been repeated attempts to go > back to the 1950s (i.e. platform-domain-specific languages), most > have sunk without trace, and none have lasted very long. To a great > extent, the ONLY platform-domain-specific languages that have > succeeded are those for vector systems (R.I.P.), message-passing > systems, and (to some extent) OpenMP. Perhaps I am misunderstanding something, but if you look at page 3, they seem to be targeting a whole range of different platform types, ranging from clusters, multi-core chips and GPU type things as well as combinations of them. One of the things I liked about their work is that it seem not to be platform specific. > When this situation arises, I always ask the following questions: > 1) Exactly what has changed since the previous times? The ready and low cost availability of very highly parallel, high speed, but limited in various arcane ways chips. i.e. the GPGPU movement > 2) Exactly why did the previous systems succeed or fail? As you said, vector systems succeeded well for their time. In a sense, the GPGPU is sort of like a FPS co-processor, and to the extent that the instructions to use it are integrated into the CPU, sort of like a vector machine. > 3) Does (1) mean that (2) no longer holds? Well, of course, that is TBD. :-) > Given that the causes of failure in the past have NOT typically been > the mismatch of a language to the platform, but the mismatch from > the user's requirement and abilities to the language, a different > approach is needed. Again, that is one thing I thought seemed to be good about Liszt. It seem to have primitives that matched to what many HPC programs need, e.g. meshes, vectors, etc. and some automatic tools to select good methods. -- - Stephen Fuld (e-mail address disguised to prevent spam)
From: nmm1 on 30 Mar 2010 03:38
In article <hoqia6$h3b$1(a)news.eternal-september.org>, Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote: > >Perhaps I am misunderstanding something, but if you look at page 3, they >seem to be targeting a whole range of different platform types, ranging >from clusters, multi-core chips and GPU type things as well as >combinations of them. One of the things I liked about their work is >that it seem not to be platform specific. Perhaps I was being unfair. However, I looked at their examples more than their blurb, and my conclusions weren't based on that. >> 1) Exactly what has changed since the previous times? > >The ready and low cost availability of very highly parallel, high speed, >but limited in various arcane ways chips. i.e. the GPGPU movement Yes. But they have been available within the cost of a researcher's discretionary budget before, and a large number of academic staff and students failed to get far with them. >> 2) Exactly why did the previous systems succeed or fail? > >As you said, vector systems succeeded well for their time. In a sense, >the GPGPU is sort of like a FPS co-processor, and to the extent that the >instructions to use it are integrated into the CPU, sort of like a >vector machine. Yes. I use the FPS analogy, as well. That flew, for a bit, until Intel got their act together on floating-point. >Again, that is one thing I thought seemed to be good about Liszt. It >seem to have primitives that matched to what many HPC programs need, >e.g. meshes, vectors, etc. and some automatic tools to select good methods. I will try to take another look, but I was singularly unimpressed by page 12. The point is that we KNOW those problems are intractable, and the best researchers in the world have failed to make any headway over the past 40 years! The point is that you need to embed the architectural assumptions into the program design for such a compiler to have a hope in hell - yes, it can optimise for variations on a theme, but no more than that. Indeed, the very concept of owning and ghost cells is architecture- specific! Regards, Nick Maclaren. |