Prev: Thread Pool Class?
Next: Intellisense
From: Joseph M. Newcomer on 23 Jan 2010 00:41 It would not be the first time Intel delivered a chip that had lower performance for equivalent clock speed. Also, depending on the application, your application may not be a good match for the particular caching strategies that are used. Caches are not just caches. Note that the Pentium III was notorious for being in every way having inferior performance to the Pentium II or Pentium Pro. Factors for performance include: Number of ALUs Cache size Cache replacement algorithm TLB size TLB replacement algorithm Prefetch pipe depth Write pipe depth Microinstruction pipe depth Front Side Bus speed Memory architecture Memory width Working set size Paging policies Available memory for programs And those are just the items I can think of off the top of my head. Such observations as you make are dismaying, to say the least, and seriously disappointing, but you have essentially assumed that both machines are identical in most ways. And in the single most critical parameter, total physical memory, the slower machine has half the memory of the faster machine. Sounds like paging to me. Sometimes, you can have a program whose access patterns work well with one caching strategy and work absolutely against another caching strategy. Things like the "stride" of accesses become factors. A program that hits the caches "wrong" relative to its replacement algorithm can give an order of magnitude degradation. A factor of 2 is well within this variance. Note that "faster access to RAM" is only one of the parameters in the above list. A faster FSB doesn't necessarily translate to faster memory access for a particular algorithm. That's because "memory access" time based on raw memory speed is NOT the operative parameter for algorithm performance; in fact, it is one of the least important parameters. Cache hit ratio is critical. Cache behavior can reduce your performance by an order of magnitude if you hit the wrong patterns. Back in the days when real people (not teams of thousands) designed caches, a friend of mine was designing the cache for a high-performance personal workstation. "I'm trying to increase the cache hit ratio from 97% to 98%" he told me. When I asked what only a 1% improvement would buy, he said "A 1% improvement in cache hits means a 30% improvement in program execution". I think this was in about 1979, when caches were still a pretty new cool idea (memory had finally gotten cheap enough that we could actually consider building multilevel memory hierarchies). If you have bad paging behavior, you should consider yourself fortunate that you have ONLY a factor of 2 degradation. Paging can reduce your performance by ORDERS (not just ORDER) of magnitude. Note that you have not indicated if you are using the same OS. You have not reported on the number of page faults your program took during the measurement interval (this is trivially available from Task Manager). You have not measured the actual executable code performance of key subroutines, or indicated performance figures (available from kernel APIs) around those key algorithms. Note that the same OS could have different policies on the two machines. Working set configuration for the account would have a profound impact on overall performance. There are so many variables involved here that a superficial measurement of front-to-back execution with no instrumentation effort is meaningless. You have to make some effort to come up with some of the "why" yourself. You have presented essentially zero useful information that someone trying to answer this question could use to say anything meaningful. All we know is one machine is an i5 core and one is a Celeron, some raw memory bus time information (largely irrelevant), disk performance (relevant only if the disk is involved in the problem), nothing about the operating system running, the dozens of performance tuning parameters that exist in the user policies. You observe that your program is memory-intensive, and the slower machine has half the memory of the faster machine, which almost immediately screams "paging". If they do not have the same size memory, comparisons are not going to be particularly meaningful. I'd look at paging performance first, cache organization second. Those are probably the two most useful parameters to study at this point. If paging performance differs, you need to think about either a memory upgrade or looking into paging tuning parameters. joe On Fri, 22 Jan 2010 21:29:55 -0600, "Peter Olcott" <NoSpam(a)SeeScreen.com> wrote: >I is very memory intensive, thus much faster memory and much >larger cache should make it faster and not slower. It is >also single threaded and no floating point is used at all. I >bought the Core i5 specifically because it has faster access >to RAM. > >"Alexander Grigoriev" <alegr(a)earthlink.net> wrote in message >news:%23BWgrp9mKHA.1548(a)TK2MSFTNGP02.phx.gbl... >> Is the application multithreaded? Is it floating-point >> intensive, memory-intensive, and what else? >> >> "Peter Olcott" <NoSpam(a)SeeScreen.com> wrote in message >> news:6PydnYAu9qOHgcfWnZ2dnUVZ_v2dnZ2d(a)giganews.com... >>>I recently upgraded my computer hardware from a 2.4 Ghz >>>Celeron to a 2.66 Ghz Core i5 and an MFC application that >>>I developed runs only half as fast on the faster machine. >>>What could be causing this? >>> >>> Both machines have identical sata hard-drives, and the >>> fast machine has much faster RAM 1333 ddr3 and 4.0 GB. >>> The slower machine has 2.0 GB of 333 ddr RAM. Why is the >>> slower machine twice as fast on the same executable? >>> >> >> > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Tom Serface on 23 Jan 2010 02:43 I'd be suspicious that something you're doing is causing it to swap memory to disk. You could use Task Manager to check what's happening with physical memory. I'm not sure why that would be the case since you have more memory on the new machine, but maybe something else is running that's not on the other machine that could be affecting memory usage or disk speed. Also, is the program accessing the network at all. Maybe there is a problem with your network setup. Tom "Peter Olcott" <NoSpam(a)SeeScreen.com> wrote in message news:6PydnYAu9qOHgcfWnZ2dnUVZ_v2dnZ2d(a)giganews.com... > I recently upgraded my computer hardware from a 2.4 Ghz Celeron to a 2.66 > Ghz Core i5 and an MFC application that I developed runs only half as fast > on the faster machine. What could be causing this? > > Both machines have identical sata hard-drives, and the fast machine has > much faster RAM 1333 ddr3 and 4.0 GB. The slower machine has 2.0 GB of 333 > ddr RAM. Why is the slower machine twice as fast on the same executable? >
From: Woody on 23 Jan 2010 04:17 I would look at two aspects of your app: 1) Use SysInternals' VMMap to show memory usage. 2) Use AMD's CodeAnalyst to profile the execution. By comparing results on the two systems, you may be able to see where the difference is. If you are doing disk writes, you should be sure both systems are set the same in regard to delayed writes. This could cause a drastic difference in performance, even with identical hw. You can also use Task Manager to check whether the app is doing disk activity, such as loading and unloading DLLs, differently on the two systems. While you're in TM, look at the task's priority.
From: Peter Olcott on 23 Jan 2010 10:10 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:5s0ll5hlimpptv63bkegcfbig3au6gpc6g(a)4ax.com... > It would not be the first time Intel delivered a chip that > had lower performance for > equivalent clock speed. Also, depending on the > application, your application may not be a > good match for the particular caching strategies that are > used. Caches are not just > caches. > > Note that the Pentium III was notorious for being in every > way having inferior performance > to the Pentium II or Pentium Pro. > > Factors for performance include: > Number of ALUs > Cache size > Cache replacement algorithm > TLB size > TLB replacement algorithm > Prefetch pipe depth > Write pipe depth > Microinstruction pipe depth > Front Side Bus speed > Memory architecture > Memory width > Working set size > Paging policies > Available memory for programs > > And those are just the items I can think of off the top of > my head. Such observations as > you make are dismaying, to say the least, and seriously > disappointing, but you have > essentially assumed that both machines are identical in > most ways. And in the single most > critical parameter, total physical memory, the slower > machine has half the memory of the > faster machine. Sounds like paging to me. The slower machine has twice the memory, the slower machine has 1333 memory, and the faster machine has 333 memory. The slower machine is a Core i5 quad core 2.66, and the faster machine is a Celeron 2.4. The process uses a single thread. > > Sometimes, you can have a program whose access patterns > work well with one caching > strategy and work absolutely against another caching > strategy. Things like the "stride" > of accesses become factors. A program that hits the > caches "wrong" relative to its > replacement algorithm can give an order of magnitude > degradation. A factor of 2 is well > within this variance. > > Note that "faster access to RAM" is only one of the > parameters in the above list. A > faster FSB doesn't necessarily translate to faster memory > access for a particular > algorithm. That's because "memory access" time based on > raw memory speed is NOT the > operative parameter for algorithm performance; in fact, it > is one of the least important > parameters. Cache hit ratio is critical. > > Cache behavior can reduce your performance by an order of > magnitude if you hit the wrong > patterns. > > Back in the days when real people (not teams of thousands) > designed caches, a friend of > mine was designing the cache for a high-performance > personal workstation. "I'm trying to > increase the cache hit ratio from 97% to 98%" he told me. > When I asked what only a 1% > improvement would buy, he said "A 1% improvement in cache > hits means a 30% improvement in > program execution". I think this was in about 1979, when > caches were still a pretty new > cool idea (memory had finally gotten cheap enough that we > could actually consider building > multilevel memory hierarchies). > > If you have bad paging behavior, you should consider > yourself fortunate that you have ONLY > a factor of 2 degradation. Paging can reduce your > performance by ORDERS (not just ORDER) > of magnitude. > > Note that you have not indicated if you are using the same > OS. You have not reported on > the number of page faults your program took during the > measurement interval (this is > trivially available from Task Manager). You have not > measured the actual executable code > performance of key subroutines, or indicated performance > figures (available from kernel > APIs) around those key algorithms. > > Note that the same OS could have different policies on the > two machines. Working set > configuration for the account would have a profound impact > on overall performance. > > There are so many variables involved here that a > superficial measurement of front-to-back > execution with no instrumentation effort is meaningless. > > You have to make some effort to come up with some of the > "why" yourself. You have > presented essentially zero useful information that someone > trying to answer this question > could use to say anything meaningful. All we know is one > machine is an i5 core and one is > a Celeron, some raw memory bus time information (largely > irrelevant), disk performance > (relevant only if the disk is involved in the problem), > nothing about the operating system > running, the dozens of performance tuning parameters that > exist in the user policies. You > observe that your program is memory-intensive, and the > slower machine has half the memory > of the faster machine, which almost immediately screams > "paging". If they do not have the > same size memory, comparisons are not going to be > particularly meaningful. > > I'd look at paging performance first, cache organization > second. Those are probably the > two most useful parameters to study at this point. If > paging performance differs, you > need to think about either a memory upgrade or looking > into paging tuning parameters. > joe > I found out last night that the difference is related to video card settings. I was able to make the faster machine much faster than the slower machine by setting the NVIDIA 9800 GTX to maximize performance over quality. This setting has now stopped working. > > On Fri, 22 Jan 2010 21:29:55 -0600, "Peter Olcott" > <NoSpam(a)SeeScreen.com> wrote: > >>I is very memory intensive, thus much faster memory and >>much >>larger cache should make it faster and not slower. It is >>also single threaded and no floating point is used at all. >>I >>bought the Core i5 specifically because it has faster >>access >>to RAM. >> >>"Alexander Grigoriev" <alegr(a)earthlink.net> wrote in >>message >>news:%23BWgrp9mKHA.1548(a)TK2MSFTNGP02.phx.gbl... >>> Is the application multithreaded? Is it floating-point >>> intensive, memory-intensive, and what else? >>> >>> "Peter Olcott" <NoSpam(a)SeeScreen.com> wrote in message >>> news:6PydnYAu9qOHgcfWnZ2dnUVZ_v2dnZ2d(a)giganews.com... >>>>I recently upgraded my computer hardware from a 2.4 Ghz >>>>Celeron to a 2.66 Ghz Core i5 and an MFC application >>>>that >>>>I developed runs only half as fast on the faster >>>>machine. >>>>What could be causing this? >>>> >>>> Both machines have identical sata hard-drives, and the >>>> fast machine has much faster RAM 1333 ddr3 and 4.0 GB. >>>> The slower machine has 2.0 GB of 333 ddr RAM. Why is >>>> the >>>> slower machine twice as fast on the same executable? >>>> >>> >>> >> > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Alexander Grigoriev on 23 Jan 2010 12:08
"Peter Olcott" <NoSpam(a)SeeScreen.com> wrote in message news:4OKdnQ_BNJzFjMbWnZ2dnUVZ_tWdnZ2d(a)giganews.com... > > > I found out last night that the difference is related to video card > settings. I was able to make the faster machine much faster than the > slower machine by setting the NVIDIA 9800 GTX to maximize performance over > quality. This setting has now stopped working. > Boot in VGA-only mode (hit F8) and see if the performance get better. Maybe it's the video adapter+driver that is causing excessive interrupts or unnecessary bus accesses. Does your program use video card for high volume operations? |