Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?
From: Robert Myers on 30 Mar 2010 16:07 On Mar 29, 6:24 am, Terje Mathisen <terje.mathi...(a)tmsw.no> wrote: > > The seismic paper shows how they started with a straight-forward port, > and got pretty much no speedup at all, then they went on to do more and > more platform-specific optimizations, ending up with something which was > 40X (afair) faster, but of course totally non-portable. > > I.e. the only real key to the speedups was to grok the mapping of the > problem onto the available hardware. > And you can't do that if you leave optimization in the hands of the applications programmer, who throws away information that could be used for optimization and cuts off paths to optimization at lower levels because of over-specificity. That's the real key to programming in abstract, domain-specific languages: they maximally preserve abstract, non-implementation-dependent, information, avoid premature optimization, and don't close off possible optimization paths at lower level. Robert.
From: Robert Myers on 30 Mar 2010 16:21 On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote: > On the Research channel, which I receive through Dish Network, they show > some of the computer science colloquiums at the University of > Washington. I recently watched a lecture by professor Pat Hanrahan of > Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It > ties together some of the topics that have occurred in different recent > threads in this group, including the highly parallel SIMT stuff and the > need for appropriate domain specific languages to get the most out of > the environment > > You can watch the presentation at > > http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345 > > There were several things that I thought were interesting and perhaps > even promising. > > First is that the Folding(a)Home client has been rewritten to use a > graphics card with a great speedup. The thing I thought that was > significant about this is that protein folding is a more traditional HPC > application than the more graphics oriented things like Photoshop > effects that seem to be dominating the GPGPU scene. Del asked me rhetorically in another thread how you could reach a petaflop without delivering semi-trailer loads of money to national laboratories or to one of my alma maters (UIUC)--and from those stopovers mostly to IBM? Answer: @home networks are generally good on the same problems that Blue Gene and its many cousins do well. Folding(a)home is already doing it on PS3's alone, and on exactly the same problem that Blue Gene was ostensibly designed for. Downsides: No photo to put into bulletin to alumni, no huge salary payout for management structure mostly doing nothing more exotic than is done at, say, an insurance company. Conclusion: If your goal is new computational frontiers, you won't be building or buying machines that could be done just as well on a distributed platform. If you're a bureaucrat (or a senator), there is no such thing as an installation that is too big, too grand, or too photogenic, so can the PS3's. Robert.
From: Del Cecchi` on 30 Mar 2010 21:21 Robert Myers wrote: > On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote: > >>On the Research channel, which I receive through Dish Network, they show >>some of the computer science colloquiums at the University of >>Washington. I recently watched a lecture by professor Pat Hanrahan of >>Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It >>ties together some of the topics that have occurred in different recent >>threads in this group, including the highly parallel SIMT stuff and the >>need for appropriate domain specific languages to get the most out of >>the environment >> >>You can watch the presentation at >> >>http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345 >> >>There were several things that I thought were interesting and perhaps >>even promising. >> >>First is that the Folding(a)Home client has been rewritten to use a >>graphics card with a great speedup. The thing I thought that was >>significant about this is that protein folding is a more traditional HPC >>application than the more graphics oriented things like Photoshop >>effects that seem to be dominating the GPGPU scene. > > > Del asked me rhetorically in another thread how you could reach a > petaflop without delivering semi-trailer loads of money to national > laboratories or to one of my alma maters (UIUC)--and from those > stopovers mostly to IBM? > > Answer: @home networks are generally good on the same problems that > Blue Gene and its many cousins do well. > > Folding(a)home is already doing it on PS3's alone, and on exactly the > same problem that Blue Gene was ostensibly designed for. > > Downsides: No photo to put into bulletin to alumni, no huge salary > payout for management structure mostly doing nothing more exotic than > is done at, say, an insurance company. > > Conclusion: If your goal is new computational frontiers, you won't be > building or buying machines that could be done just as well on a > distributed platform. If you're a bureaucrat (or a senator), there is > no such thing as an installation that is too big, too grand, or too > photogenic, so can the PS3's. > > Robert. Uh, aren't you the one decrying the lack of bandwidth in the network interconnecting the nodes in the expensive supercomputers? And now you are praising a configuration which has essentially zero bandwidth connecting the nodes? Which is it? Embarrassingly parallel or heavy duty bandwidth?
From: Robert Myers on 30 Mar 2010 21:48 On Mar 30, 9:21 pm, Del Cecchi` <delcec...(a)gmail.com> wrote: > Robert Myers wrote: > > On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote: > > >>On the Research channel, which I receive through Dish Network, they show > >>some of the computer science colloquiums at the University of > >>Washington. I recently watched a lecture by professor Pat Hanrahan of > >>Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It > >>ties together some of the topics that have occurred in different recent > >>threads in this group, including the highly parallel SIMT stuff and the > >>need for appropriate domain specific languages to get the most out of > >>the environment > > >>You can watch the presentation at > > >>http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345 > > >>There were several things that I thought were interesting and perhaps > >>even promising. > > >>First is that the Folding(a)Home client has been rewritten to use a > >>graphics card with a great speedup. The thing I thought that was > >>significant about this is that protein folding is a more traditional HPC > >>application than the more graphics oriented things like Photoshop > >>effects that seem to be dominating the GPGPU scene. > > > Del asked me rhetorically in another thread how you could reach a > > petaflop without delivering semi-trailer loads of money to national > > laboratories or to one of my alma maters (UIUC)--and from those > > stopovers mostly to IBM? > > > Answer: @home networks are generally good on the same problems that > > Blue Gene and its many cousins do well. > > > Folding(a)home is already doing it on PS3's alone, and on exactly the > > same problem that Blue Gene was ostensibly designed for. > > > Downsides: No photo to put into bulletin to alumni, no huge salary > > payout for management structure mostly doing nothing more exotic than > > is done at, say, an insurance company. > > > Conclusion: If your goal is new computational frontiers, you won't be > > building or buying machines that could be done just as well on a > > distributed platform. If you're a bureaucrat (or a senator), there is > > no such thing as an installation that is too big, too grand, or too > > photogenic, so can the PS3's. > > > Uh, aren't you the one decrying the lack of bandwidth in the network > interconnecting the nodes in the expensive supercomputers? And now you > are praising a configuration which has essentially zero bandwidth > connecting the nodes? > > Which is it? Embarrassingly parallel or heavy duty bandwidth? I must not have made my point clearly enough. My point is that the current batch of "supercomputers" can perform efficiently only on those jobs that can be made nearly embarrassingly parallel so that they require small global bandwidth. I'm suspicious of whatever is being done to localize the protein folding problem, but the code is out there and the burden is on me to look at it. In any case, you manifestly don't need a Blue Gene to do petaflop calculations for the protein-folding problem. If these are the kinds of computers we're going to restrict ourselves to building, then, I ask, why build them at all? Maybe we should just more aggressively marshal the computational resources we already have and that we don't have to pay another penny for. The problems that *can't* be done on a distributed platform also, for the most part, also can't be done efficiently on our current crop of "supercomputers." Now, if you're LLNL, you'll argue that it's all Q clearance stuff and has to be behind the barbed wire. So be it, but it's a consideration that has nothing to do with science. So far as science is concerned, they might just as well be spending money on fuel-air munitions or perhaps better armored land vehicles. The implication that they are advancing *anything* beyond their own institutional needs is pure fraud. If you don't have a classification excuse, then, if you're going to put money into centralized, big computers, those computers should have heavy-duty global bandwidth so that they can do the problems that *can't* be done on a distributed platform. If I still haven't made myself clear, I'll be glad to try again: localized, nearly embarrassingly parallel problems on distributed platforms. Big, centralized computers for those problems that need good global communication. Robert.
From: Del Cecchi on 31 Mar 2010 13:04
Robert Myers wrote: > On Mar 30, 9:21 pm, Del Cecchi` <delcec...(a)gmail.com> wrote: >> Robert Myers wrote: >>> On Mar 28, 9:56 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote: >>>> On the Research channel, which I receive through Dish Network, they show >>>> some of the computer science colloquiums at the University of >>>> Washington. I recently watched a lecture by professor Pat Hanrahan of >>>> Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It >>>> ties together some of the topics that have occurred in different recent >>>> threads in this group, including the highly parallel SIMT stuff and the >>>> need for appropriate domain specific languages to get the most out of >>>> the environment >>>> You can watch the presentation at >>>> http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345 >>>> There were several things that I thought were interesting and perhaps >>>> even promising. >>>> First is that the Folding(a)Home client has been rewritten to use a >>>> graphics card with a great speedup. The thing I thought that was >>>> significant about this is that protein folding is a more traditional HPC >>>> application than the more graphics oriented things like Photoshop >>>> effects that seem to be dominating the GPGPU scene. >>> Del asked me rhetorically in another thread how you could reach a >>> petaflop without delivering semi-trailer loads of money to national >>> laboratories or to one of my alma maters (UIUC)--and from those >>> stopovers mostly to IBM? >>> Answer: @home networks are generally good on the same problems that >>> Blue Gene and its many cousins do well. >>> Folding(a)home is already doing it on PS3's alone, and on exactly the >>> same problem that Blue Gene was ostensibly designed for. >>> Downsides: No photo to put into bulletin to alumni, no huge salary >>> payout for management structure mostly doing nothing more exotic than >>> is done at, say, an insurance company. >>> Conclusion: If your goal is new computational frontiers, you won't be >>> building or buying machines that could be done just as well on a >>> distributed platform. If you're a bureaucrat (or a senator), there is >>> no such thing as an installation that is too big, too grand, or too >>> photogenic, so can the PS3's. >> >> Uh, aren't you the one decrying the lack of bandwidth in the network >> interconnecting the nodes in the expensive supercomputers? And now you >> are praising a configuration which has essentially zero bandwidth >> connecting the nodes? >> >> Which is it? Embarrassingly parallel or heavy duty bandwidth? > > I must not have made my point clearly enough. > > My point is that the current batch of "supercomputers" can perform > efficiently only on those jobs that can be made nearly embarrassingly > parallel so that they require small global bandwidth. I'm suspicious > of whatever is being done to localize the protein folding problem, but > the code is out there and the burden is on me to look at it. In any > case, you manifestly don't need a Blue Gene to do petaflop > calculations for the protein-folding problem. > > If these are the kinds of computers we're going to restrict ourselves > to building, then, I ask, why build them at all? Maybe we should just > more aggressively marshal the computational resources we already have > and that we don't have to pay another penny for. > > The problems that *can't* be done on a distributed platform also, for > the most part, also can't be done efficiently on our current crop of > "supercomputers." > > Now, if you're LLNL, you'll argue that it's all Q clearance stuff and > has to be behind the barbed wire. So be it, but it's a consideration > that has nothing to do with science. So far as science is concerned, > they might just as well be spending money on fuel-air munitions or > perhaps better armored land vehicles. The implication that they are > advancing *anything* beyond their own institutional needs is pure > fraud. > > If you don't have a classification excuse, then, if you're going to > put money into centralized, big computers, those computers should have > heavy-duty global bandwidth so that they can do the problems that > *can't* be done on a distributed platform. > > If I still haven't made myself clear, I'll be glad to try again: > localized, nearly embarrassingly parallel problems on distributed > platforms. Big, centralized computers for those problems that need > good global communication. > > Robert. There certainly seems to be a spectrum of applications that can use a spectrum of bandwidth, and it would be unnecessary to spend a lot of money on interconnect and packaging if the problem can be solved with PCs connected together with string. Whether all protein folding and genomics software falls into this category is beyond me, but some seems to. There are other problems which require both lots of flops and lots of bandwidth. Flops are cheap. Bandwidth, especially bandwidth in a network with many nodes, is expensive. The various generations of Blue Gene seem to have increased bandwidth, although I'm not sure whether it has kept up with the flops. More bandwidth can be had especially if the nodes are quite fat, like blue waters. BTW thanks for the pointer to that, it was interesting. I tend to think the technical folks are doing the best they can, whilst you apparently think there is some kind of dominant hidden agenda in the form of sort a farm program for supercomputers where the government spends a lot of money on useless boxes good only for publicity and IBM's bottom line. Custom hardware is pretty expensive. So it seems that most supers are build out of pieces parts or technologies that were developed for something else. For example, I think the PowerPC cores in BG/L were 405's left over from the embedded business. The interconnect physical layer was also taken from some other project, but I forget which. Likewise the packaging power and cooling were derivitives of earlier stuff with some new ideas. |