Prev: And, following a series of proxy sites blocked open any page or download from RapidShare download sites have all accelerated the internet da free
Next: pseudoterminals and close
From: William Ahern on 22 Mar 2010 09:55 Peter Olcott <NoSpam(a)ocr4screen.com> wrote: > "Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in > message news:ho5tof$lon$1(a)news.eternal-september.org... <snip> > > But if there's another CPU/core/strand/pipeline, it's possible that one > > processor's stall time could be put to productive use by another if > > there were multiple execution threads. <snip> > is there any possible way that this app is not memory bound that you can > provide a specific concrete example of? Your question was answered. You're hung up on your numbers and preconceived ideas. Your application could be BOTH memory bound AND able to benefit from multiple CPUs. But it's nearly impossible to guess without knowing at least the algorithm; more specifically, the code.
From: William Ahern on 22 Mar 2010 09:58 Peter Olcott <NoSpam(a)ocr4screen.com> wrote: <snip> > I don't want to spent hundreds of hours making a complex > process thread-safe just to prove what I knew all along. You don't really know anything unless you've proved it. But there is such a thing as rational ignorance. If the benefit of knowing is worth less than the cost of figuring it out, move on. If you're merely trying to satisfy your curiosity, you seem to have hit a brick wall, because there's no easy answer.
From: Peter Olcott on 22 Mar 2010 13:10 "William Ahern" <william(a)wilbur.25thandClement.com> wrote in message news:pe9k77-gjk.ln1(a)wilbur.25thandClement.com... > Peter Olcott <NoSpam(a)ocr4screen.com> wrote: >> "Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in >> message news:ho5tof$lon$1(a)news.eternal-september.org... > <snip> >> > But if there's another CPU/core/strand/pipeline, it's >> > possible that one >> > processor's stall time could be put to productive use >> > by another if >> > there were multiple execution threads. > <snip> >> is there any possible way that this app is not memory >> bound that you can >> provide a specific concrete example of? > > Your question was answered. > > You're hung up on your numbers and preconceived ideas. > Your application > could be BOTH memory bound AND able to benefit from > multiple CPUs. But it's > nearly impossible to guess without knowing at least the > algorithm; more > specifically, the code. > The algorithm is essentially a huge deterministic finite automaton where the memory required is much larger than the largest cache, and the memory access pattern is essentially unpredictable to any cache algorithm. The essential core processing of this DFA is to lookup in memory the next location to look up in memory, it does very little else.
From: Ersek, Laszlo on 22 Mar 2010 13:34 In article <89bdf509-0afa-48c3-a107-67cdaaa27eee(a)t9g2000prh.googlegroups.com>, David Schwartz <davids(a)webmaster.com> writes: > But also, it may be memory bandwidth bound because it's single- > threaded. Assume, for example, the memory access pattern looks like > this: > > X, Y, X+1, Y+1, X+2, Y+2, X+3, Y+3 > > Imagine the prefetcher cannot see the request for 'X+1' until after it > processes 'Y'. This can be a worst case scenario, as the memory > controller keeps opening and closing pages and is unable to balance > thye channels. > > Now imagine you turn it into two threads, one doing this: > X, X+1, X+2, X+3 > and one doing this: > Y, Y+1, Y+2, Y+3 > > Now, the prefetcher (still seeing only one read ahead) will see the > read for X+1 when it processes the read for X. The net result will be > that the two threads will run about twice as fast with the same memory > hardware, even though they are purely memory limited. mind = blown. Thank you, I'm learning a lot from you. lacos
From: Scott Lurndal on 22 Mar 2010 14:58
"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes: >I don't want to spent hundreds of hours making a complex >process thread-safe just to prove what I knew all along. > >(1) Machine A performs process B in X minutes. >(2) Machine C performs process B in X/8 Minutes (800% >faster) >(3) The only difference between machine A and machine C > is that machine C has much faster access to RAM (by >whatever means). This is _highly_ speculative. There can be many other reasons that machine C is faster; clearly the larger L3 footprint will have an effect, but the processor internals could also have reduced the CPI (cycles per instruction) or widened the ALU for more superscaler operations; even if the gross clock speed doesn't appear to differ between A & C. The current crop of Corei7/Athlon64/Opteron all have memory controllers on-chip and have eliminated the FSB, which also improves memory throughput significantly on multisocket configurations. Upcoming intel processors will have multiple DRAM channels per memory controller, providing even more bandwidth. scott |