Prev: Online Exams for Certification, Free Practice Exams, Study Material, Dumps
Next: Motherboard unusuable because of 1 millimeter of missing plastic ?!?!?!?!
From: Quadibloc on 9 Nov 2009 14:30 On Nov 9, 12:24 pm, n...(a)cam.ac.uk wrote: > In particular, when you are dealing with programs limited by memory > latency, as so many area, there has been very little increase over > the years. 15%, if one is feeling generous, less if not. What with the huge on-chip caches on today's chips, I suppose it depends on what one means by "limited by memory latency". To actually be limited by the latency of external DRAM, a program would have to try rather harder these days, so that has to count as some sort of an improvement as well. John Savard
From: "Andy "Krazy" Glew" on 9 Nov 2009 21:57 Terje Mathisen wrote: > I'm somewhat in love with the idea of a multi-level table: > > A large shared but simple (2-bit counter or similar) augmented with a > small exception table per core which only stores branch info for > branches that have missed in the large table. > > Is this even feasible? :-) Yes. But when I submitted papers on such designs, 1996-2000, they got rejected. Daniel Jimenez eventually broke through, with the second level table being a great big neural net predictor.
From: Daniel A. Jimenez on 9 Nov 2009 23:00 In article <4AF8D67F.6060604(a)patten-glew.net>, Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: >Terje Mathisen wrote: >> I'm somewhat in love with the idea of a multi-level table: >> >> A large shared but simple (2-bit counter or similar) augmented with a >> small exception table per core which only stores branch info for >> branches that have missed in the large table. >> >> Is this even feasible? :-) > >Yes. > >But when I submitted papers on such designs, 1996-2000, they got rejected. > >Daniel Jimenez eventually broke through, with the second level table >being a great big neural net predictor. I'd love to take the credit for a multi-level branch predictor, but I think my contribution was to write up the multi-level idea that had been around for a while and use it for crazy neural network schemes. (My first , idea was to use caching for branch predictor tables. Turns out, nope, that's a terrible idea, there's no locality in those tables of counters.) EV6 had what I think of as a multi-level branch predictor: a first level cache line predictor that's quick and dumb, and a second-level hybrid predictor that's slow and smart, and can override the line predictor. I'm sure people are working at doing this for sharing prediction resources between cores or thread contexts. Dean Tullsen had an interesting paper in last year's ASPLOS about doing more accurate branch prediction for short threads, and the TRIPS folks have been working on a hierarchical predictor for their TFlex microarchitecture. -- Daniel Jimenez djimenez(a)cs.utexas.edu "I've so much music in my head" -- Maurice Ravel, shortly before his death. " " -- John Cage
From: "Andy "Krazy" Glew" on 11 Nov 2009 00:08 Daniel A. Jimenez wrote: > In article <4AF8D67F.6060604(a)patten-glew.net>, > Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: >> Terje Mathisen wrote: >>> I'm somewhat in love with the idea of a multi-level table: >>> >>> A large shared but simple (2-bit counter or similar) augmented with a >>> small exception table per core which only stores branch info for >>> branches that have missed in the large table. >>> >>> Is this even feasible? :-) >> Yes. >> >> But when I submitted papers on such designs, 1996-2000, they got rejected. >> >> Daniel Jimenez eventually broke through, with the second level table >> being a great big neural net predictor. > > I'd love to take the credit for a multi-level branch predictor, but I > think my contribution was to write up the multi-level idea that had been > around for a while and use it for crazy neural network schemes. (My first , > idea was to use caching for branch predictor tables. Turns out, nope, > that's a terrible idea, there's no locality in those tables of counters.) 1) You can make the BTB into a multilevel table. There *is* locality there. 2) You can make the pattern tables multilevel. But I found that it was not efficient to make the L1 into a cache - because cache implies cache tags. My version just had the L1 being a small fast dumb predictor, and the L2 being a big, slow, smarter and more complex predictor. I simulated all sorts of combinations, such as an L1 gshare with an L2 a great big McFarling hybrid, etc., - and overall, I found that it gave the effective prediction accuracy of the larger predictor, with the predictor latency (on predicted taken branches) of the small predictor. I also tried a version where the L1 was a plain old predictor, and the L2 was a "cache", containing tagged patterns that had produced mispredictions in the past, with history lengths longer than the L1 uses. It worked, but the area efficiency was questionable. Tags are expensive, and untagged tables of two bit counters are very cheap. -- I should also mention that Adam Butts did similar work, after me and before Daniel.
From: Paul A. Clayton on 14 Nov 2009 16:03
On Nov 11, 12:08 am, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net> wrote: [snip] > 2) You can make the pattern tables multilevel. But I found that it was > not efficient to make the L1 into a cache - because cache implies cache > tags. My version just had the L1 being a small fast dumb predictor, and > the L2 being a big, slow, smarter and more complex predictor. I > simulated all sorts of combinations, such as an L1 gshare with an L2 a > great big McFarling hybrid, etc., - and overall, I found that it gave > the effective prediction accuracy of the larger predictor, with the > predictor latency (on predicted taken branches) of the small predictor. > > I also tried a version where the L1 was a plain old predictor, and the > L2 was a "cache", containing tagged patterns that had produced > mispredictions in the past, with history lengths longer than the L1 > uses. It worked, but the area efficiency was questionable. Tags are > expensive, and untagged tables of two bit counters are very cheap. Would it be useful to have a one-bit quasi-tag for the confidence bit? A match means a high-confidence possible hit OR a low-confidence miss; a mis-match means a low-confidence possible hit OR a high-confidence miss. Paul A. Clayton just a technophile |