Prev: And, following a series of proxy sites blocked open any page or download from RapidShare download sites have all accelerated the internet da free
Next: pseudoterminals and close
From: Peter Olcott on 21 Mar 2010 14:02 I have an application that uses enormous amounts of RAM in a very memory bandwidth intensive way. I recently upgraded my hardware to a machine with 600% faster RAM and 32-fold more L3 cache. This L3 cache is also twice as fast as the prior machines cache. When I benchmarked my application across the two machines, I gained an 800% improvement in wall clock time. The new machines CPU is only 11% faster than the prior machine. Both processes were tested on a single CPU. I am thinking that all of the above would tend to show that my process is very memory bandwidth intensive, and thus could not benefit from multiple threads on the same machine because the bottleneck is memory bandwidth rather than CPU cycles. Is this analysis correct?
From: Eric Sosman on 21 Mar 2010 15:58 On 3/21/2010 2:02 PM, Peter Olcott wrote: > I have an application that uses enormous amounts of RAM in a > very memory bandwidth intensive way. I recently upgraded my > hardware to a machine with 600% faster RAM and 32-fold more > L3 cache. This L3 cache is also twice as fast as the prior > machines cache. When I benchmarked my application across the > two machines, I gained an 800% improvement in wall clock > time. The new machines CPU is only 11% faster than the prior > machine. Both processes were tested on a single CPU. > > I am thinking that all of the above would tend to show that > my process is very memory bandwidth intensive, and thus > could not benefit from multiple threads on the same machine > because the bottleneck is memory bandwidth rather than CPU > cycles. Is this analysis correct? Insufficient information, I think. The performance of the memory subsystem is certainly important to your application's elapsed time[*], and it seems likely that the CPU probably stalls a lot waiting for memory to deliver or absorb data. But if there's another CPU/core/strand/pipeline, it's possible that one processor's stall time could be put to productive use by another if there were multiple execution threads. It's also possible that multiple threads could interfere, overload things and clog the memory bus ... [*] Rant: I really, really hate "800% improvement" and similar phrases. If Machine A takes ten seconds and Machine B shows an "800% improvement," can we conclude that Machine B finishes the job in minus seventy seconds? Have you considered a career in politics? ;-) -- Eric Sosman esosman(a)ieee-dot-org.invalid
From: Peter Olcott on 21 Mar 2010 21:55 "Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in message news:ho5tof$lon$1(a)news.eternal-september.org... > On 3/21/2010 2:02 PM, Peter Olcott wrote: >> I have an application that uses enormous amounts of RAM >> in a >> very memory bandwidth intensive way. I recently upgraded >> my >> hardware to a machine with 600% faster RAM and 32-fold >> more >> L3 cache. This L3 cache is also twice as fast as the >> prior >> machines cache. When I benchmarked my application across >> the >> two machines, I gained an 800% improvement in wall clock >> time. The new machines CPU is only 11% faster than the >> prior >> machine. Both processes were tested on a single CPU. >> >> I am thinking that all of the above would tend to show >> that >> my process is very memory bandwidth intensive, and thus >> could not benefit from multiple threads on the same >> machine >> because the bottleneck is memory bandwidth rather than >> CPU >> cycles. Is this analysis correct? > > Insufficient information, I think. The performance of > the > memory subsystem is certainly important to your > application's > elapsed time[*], and it seems likely that the CPU probably > stalls > a lot waiting for memory to deliver or absorb data. But > if > there's another CPU/core/strand/pipeline, it's possible > that one > processor's stall time could be put to productive use by > another > if there were multiple execution threads. It's also > possible > that multiple threads could interfere, overload things and > clog > the memory bus ... > > [*] Rant: I really, really hate "800% improvement" and > similar phrases. If Machine A takes ten seconds and > Machine B > shows an "800% improvement," can we conclude that Machine > B > finishes the job in minus seventy seconds? Have you > considered > a career in politics? ;-) > > -- > Eric Sosman > esosman(a)ieee-dot-org.invalid The numbers are sufficiently precisely accurate. Within the specific given (as in geometry, thus immutable) premise that the only difference between two machines is much faster access to RAM, and the performance of the faster machine is 7.98-fold faster than the slower machine, is there any possible way that this app is not memory bound that you can provide a specific concrete example of?
From: Ian Collins on 21 Mar 2010 22:50 On 03/22/10 07:02 AM, Peter Olcott wrote: [please don't multi-post, cross-post if you must.] > I have an application that uses enormous amounts of RAM in a > very memory bandwidth intensive way. I recently upgraded my > hardware to a machine with 600% faster RAM and 32-fold more > L3 cache. This L3 cache is also twice as fast as the prior > machines cache. When I benchmarked my application across the > two machines, I gained an 800% improvement in wall clock > time. The new machines CPU is only 11% faster than the prior > machine. Both processes were tested on a single CPU. > > I am thinking that all of the above would tend to show that > my process is very memory bandwidth intensive, and thus > could not benefit from multiple threads on the same machine > because the bottleneck is memory bandwidth rather than CPU > cycles. Is this analysis correct? Maybe, the only way to know for sure is to measure. -- Ian Collins
From: Peter Olcott on 21 Mar 2010 23:24
"Ian Collins" <ian-news(a)hotmail.com> wrote in message news:80o47uFitsU1(a)mid.individual.net... > On 03/22/10 07:02 AM, Peter Olcott wrote: > > [please don't multi-post, cross-post if you must.] > >> I have an application that uses enormous amounts of RAM >> in a >> very memory bandwidth intensive way. I recently upgraded >> my >> hardware to a machine with 600% faster RAM and 32-fold >> more >> L3 cache. This L3 cache is also twice as fast as the >> prior >> machines cache. When I benchmarked my application across >> the >> two machines, I gained an 800% improvement in wall clock >> time. The new machines CPU is only 11% faster than the >> prior >> machine. Both processes were tested on a single CPU. >> >> I am thinking that all of the above would tend to show >> that >> my process is very memory bandwidth intensive, and thus >> could not benefit from multiple threads on the same >> machine >> because the bottleneck is memory bandwidth rather than >> CPU >> cycles. Is this analysis correct? > > Maybe, the only way to know for sure is to measure. > > -- > Ian Collins I did measure that is the whole point. An app is 7.98-fold faster on one machine than another and the only difference is that the faster machine has much faster access to RAM. Can anyone provide any possible scenario where the app is not memory bandwidth bound? |