Prev: High-bandwidth computing (hbc) wiki and mailing list
Next: Effects of Memory Latency and Bandwidth onSupercomputer,Application Performance
From: Brett Davis on 25 Jul 2010 00:03 In article <En82o.12$EJ7.2(a)hurricane>, Robert Myers <rbmyersusa(a)gmail.com> wrote: > On the Effects of Memory Latency and Bandwidth on Supercomputer. > Application Performance. Richard Murphy. Sandia National Laboratories. > www.sandia.gov/~rcmurph/doc/latency.pdf This paper has pretty graphs, but like most academic papers it has a fatal flaw. They clearly did not scale the fetch ahead to match the bandwidth, not a single benchmark was improved with greater bandwidth. Newer PowerPCs are available with greater bandwidth, and the performance of those chips clearly would not match these graphs. I know everyone wants reduced latency, but technology favors increasing latency to get more bandwidth. At least until RRAM gets embedded on the CPU die, then you will get reduced latency and even more bandwidth, whether you want it or not. ;) Of course if your problem does not fit in X gigabytes that are on die, then you will have to wait for the 4k page to be streamed in. Can you say really bad latency, yes I know you can. ;) Brett
From: Robert Myers on 25 Jul 2010 00:31 On Jul 25, 12:03 am, Brett Davis <gg...(a)yahoo.com> wrote: > In article <En82o.12$EJ7.2(a)hurricane>, > Robert Myers <rbmyers...(a)gmail.com> wrote: > > > On the Effects of Memory Latency and Bandwidth on Supercomputer. > > Application Performance. Richard Murphy. Sandia National Laboratories. > >www.sandia.gov/~rcmurph/doc/latency.pdf > > This paper has pretty graphs, but like most academic papers it has a > fatal flaw. They clearly did not scale the fetch ahead to match the > bandwidth, not a single benchmark was improved with greater bandwidth. > Newer PowerPCs are available with greater bandwidth, and the > performance of those chips clearly would not match these graphs. > > I know everyone wants reduced latency, but technology favors > increasing latency to get more bandwidth. At least until RRAM gets > embedded on the CPU die, then you will get reduced latency and > even more bandwidth, whether you want it or not. ;) > > Of course if your problem does not fit in X gigabytes that are on die, > then you will have to wait for the 4k page to be streamed in. > Can you say really bad latency, yes I know you can. ;) I was asked if I had ever looked at studies of the effects of bandwidth by others, and the answer is that I have. Since I don't get to sit in on the deliberations of the national labs as to how they will turn money into carbon emissions, I consider worthwhile any kind of evidence as to their thinking, no matter how superficial. I agree. The plots are splendid. They always are. Robert.
From: Brett Davis on 25 Jul 2010 01:27 In article <57004bc1-201f-4cd1-8584-e59c01eb0e0e(a)e5g2000yqn.googlegroups.com>, Robert Myers <rbmyersusa(a)gmail.com> wrote: > On Jul 25, 12:03�am, Brett Davis <gg...(a)yahoo.com> wrote: > > In article <En82o.12$EJ7.2(a)hurricane>, > > �Robert Myers <rbmyers...(a)gmail.com> wrote: > > > > > On the Effects of Memory Latency and Bandwidth on Supercomputer. > > > Application Performance. Richard Murphy. Sandia National Laboratories. > > >www.sandia.gov/~rcmurph/doc/latency.pdf > > > > This paper has pretty graphs, but like most academic papers it has a > > fatal flaw. They clearly did not scale the fetch ahead to match the > > bandwidth, not a single benchmark was improved with greater bandwidth. > > Newer PowerPCs are available with greater bandwidth, and the > > performance of those chips clearly would not match these graphs. > > I was asked if I had ever looked at studies of the effects of > bandwidth by others, and the answer is that I have. > > I agree. The plots are splendid. They always are. Stanford online has a truly awesome related iTunes talk by Bill Dally: http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.edu.3692287061.03692287064.3946505769?i=2123730674 From: http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.edu.3692287061.03692287064 iTunes store search term "The future of throughput". Track 13 for the course "Programming Massively Parallel Processors with CUDA (CS193G)". A related PDF: http://cva.stanford.edu/people/dally/ISSCC2005.pdf Brett
From: Robert Myers on 25 Jul 2010 23:12 On Jul 25, 1:27 am, Brett Davis <gg...(a)yahoo.com> wrote: > In article > <57004bc1-201f-4cd1-8584-e59c01eb0...(a)e5g2000yqn.googlegroups.com>, > Robert Myers <rbmyers...(a)gmail.com> wrote: > > > > > > > On Jul 25, 12:03 am, Brett Davis <gg...(a)yahoo.com> wrote: > > > In article <En82o.12$EJ7.2(a)hurricane>, > > > Robert Myers <rbmyers...(a)gmail.com> wrote: > > > > > On the Effects of Memory Latency and Bandwidth on Supercomputer. > > > > Application Performance. Richard Murphy. Sandia National Laboratories. > > > >www.sandia.gov/~rcmurph/doc/latency.pdf > > > > This paper has pretty graphs, but like most academic papers it has a > > > fatal flaw. They clearly did not scale the fetch ahead to match the > > > bandwidth, not a single benchmark was improved with greater bandwidth.. > > > Newer PowerPCs are available with greater bandwidth, and the > > > performance of those chips clearly would not match these graphs. > > > I was asked if I had ever looked at studies of the effects of > > bandwidth by others, and the answer is that I have. > > > I agree. The plots are splendid. They always are. > > Stanford online has a truly awesome related iTunes talk by Bill Dally:http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.e... > From:http://deimos3.apple.com/WebObjects/Core.woa/Browse/itunes.stanford.e... > iTunes store search term "The future of throughput". > Track 13 for the course "Programming Massively Parallel Processors with CUDA (CS193G)". > I got stopped on Bill Dally's first slide, where he says "efficiency=locality." He claims that his slide tells you everything you need to know about the future of computing, AND THAT BELIEF AND PROSELYTIZING FOR THAT BELIEF IS WHY I HAVE USED UP SO MUCH SPACE HERE. THE MOST INTERESTING PROBLEMS ARE NOT LOCAL, BECAUSE THE MOST INTERESTING PROBLEMS ARE NOT LINEAR. A problem is embarrassingly parallel IFF it is local IFF it is linear. If a problem is linear, then there is a representation in which it is both local and embarrassingly parallel. If a problem is not linear, then there is, in general, no such representation. Does one need a PhD from Cal Tech to understand this? I suppose that it is inevitable that EE's should believe that interesting things can happen in a linear universe, but, from a mathematical point of view, linear systems are as interesting as the aging of paintings hanging in the Louvre. Robert. > A related PDF:http://cva.stanford.edu/people/dally/ISSCC2005.pdf > > Brett
From: Andy Glew "newsgroup at on 26 Jul 2010 01:33
On 7/25/2010 8:12 PM, Robert Myers wrote: > I got stopped on Bill Dally's first slide, where he says > "efficiency=locality." > > He claims that his slide tells you everything you need to know about > the future of computing, AND THAT BELIEF AND PROSELYTIZING FOR THAT > BELIEF IS WHY I HAVE USED UP SO MUCH SPACE HERE. > > THE MOST INTERESTING PROBLEMS ARE NOT LOCAL, BECAUSE THE MOST > INTERESTING PROBLEMS ARE NOT LINEAR. > > A problem is embarrassingly parallel IFF it is local IFF it is linear. > > If a problem is linear, then there is a representation in which it is > both local and embarrassingly parallel. If a problem is not linear, > then there is, in general, no such representation. Does one need a > PhD from Cal Tech to understand this? (1) Is there a proof for this? Not that there are non-linear systems that are not embarassingly parallel, but that there are no interesting non-linear systems that are not amenable to parallel solutions. E.g. the N-body problem, gravitational dynamics? (2) If what you and Dally say is true, Robert, then you may be tilting at windmills. There may be no computationally efficient way of doing what you want. I don't believe this, because I do not equate computationally efficient to embarassingly parallel. Also: even though locality really matters, nevertheless we have used that as an excuse to pessimize non-local activities. At the very least we can influence the constant factor by removing these pessimizations. As I have attempted to do with my proposal for a scatter/gather based memory subsystem. |