From: Ulf Samuelsson on 25 Sep 2009 18:17 David Brown skrev: > Ulf Samuelsson wrote: >> The GNU toolchain can be OK, and it can be horrible. >> If you look at ST's home page you will find some discussion >> about performance of GCC-4.2.1 on the STM32. >> > > Could you provide a link to this? I could not see any such discussion. > > I note that gcc-4.2.1 was the CodeSourcery release two years ago, when > Thumb-2 support was very new in gcc. And if the gcc-4.2.1 in question > was not from CodeSourcery but based on the official FSF tree, then I > don't think it had Thumb-2 at all. It is very important with gcc to be > precise about the source and versions - particularly so since > CodeSourcery (who maintain the ARM ports amongst others) have > target-specific features long before they become part of the official > FSF tree. > >> The rumoured 90 MIPS becomes: >> >> wait for it... >> >> 32 MIPS... >> >> With a Keil compiler you can reach about 60-65 MIPS at least with >> a 72 MHz Cortex-M3. >> >> Anyone seen improvement in later gcc versions? >> > > I would be very surprised to see any major ARM compiler generating code > at twice the speed of another major ARM compiler, whether we are talking > gcc or commercial compilers. To me, this indicates either something odd > about the benchmark code, something wrong in the use of the tools (such > as compiler flags or libraries), or something wrong in the setup of the > device in question (maybe failing to set clock speeds or wait states > correctly). > > If there was consistently such a big difference, I would not expect > gcc-based development tools to feature so prominently on websites such > as ST's or TI (Luminary Micros) - a compiler as bad as you suggest here > would put the devices themselves in a very bad light. > > I haven't used the ST32 devices, but I am considering TI's Cortex-M3 for > a project, so I interested in the state of development tools for the > same core. > >> ... >> On the AVR I noted things like pushing ALL registers >> when entering an interrupt. > > avr-gcc does /not/ push all registers when entering an interrupt. It > does little for the credibility of your other points when you make such > widely inaccurate claims. In the case I investigated for a customer (which was more than one year ago) the interrupt routines took a lot longer time to execute, and this causes a lot of grievance. > > avr-gcc always pushes three registers in interrupts - SREG, and its > "zero" register and "tmp" register because some code sequences generated > by avr-gcc make assumptions about being able to use these registers. > Theoretically, these could be omitted in some cases, but it turns out to > be a difficult to do in avr-gcc, and the advantages are small (for > non-trivial interrupt functions). No one claims that avr-gcc is > perfect, merely that it is very good. > > Beyond that, avr-gcc pushes registers if they are needed - pretty much > like any other compiler I have used. If your interrupt function calls > an external function, and you are not using whole-program optimisation, > then this means pushing all ABI "volatile" registers - an additional 12 > registers. Again, this is the same as for any other compiler I have > seen. And as with any other compiler, you avoid the overhead by keeping > your interrupt functions small and avoiding external function calls, or > by using whole-program optimisations. > >> The IAR is simply - better - . >> > > I'll not argue with you about IAR producing somewhat smaller or faster > code than avr-gcc. I have only very limited experience with IAR, so I > can't judge properly. But then, you apparently have very little > experience with avr-gcc - I don't disagree with that. I have both, but I quickly scurry back to the IAR compiler if I need to show off the AVR. > few people have really studied and compared > both compilers in a fair and objective test. There is certainly room > for improvement in avr-gcc - there are people working on it, and it gets > better over time. > > But to say "IAR is simply better" is too sweeping a statement to be > taken seriously, since "better" means so many different things to > different people. OK, let me rephrase: It generally outputs smaller and faster code. > >> The gcc compiler can be OK, as shown with the AVR32 gnu compiler. >> > > To go back to your original statement, "The GNU toolchain can be OK, and > it can be horrible", I agree in general - although I'd rate the range a > bit higher (from "very good" down to "pretty bad", perhaps). There have > been gcc ports in the past that could rate as "horrible", but I don't > think that applies to any modern gcc port in serious active use. > >> BR Ulf Samuelsson
From: David Brown on 26 Sep 2009 08:03 Niklas Holsti wrote: > FreeRTOS info wrote: >> >> "ChrisQ" <meru(a)devnull.com> wrote in message >> news:sK4vm.199649$AC5.36013(a)newsfe06.ams2... >>> FreeRTOS info wrote: >>> >>>> >>>> GCC and IAR compilers do very different things on the AVR - the >>>> biggest difference being that IAR use two stacks whereas GCC uses >>>> one. This makes IAR more difficult to setup and tune, and GCC >>>> slower and clunkier because it has to disable interrupts for a few >>>> instructions on every function call. Normally this is not a problem, >>>> but it is not as elegant as the two stack solution for sure. GCC is >>>> very popular on the AVR though, and is good enough for most >>>> applications, especially used in combination with the other free AVR >>>> tools such as AVRStudio. >>>> >>> >>> Can you elaborate a bit as to why 2 stacks are used with IAR ?. >>> Haven't user avr, so have no real experience. The AVR 32 has shadow >>> register sets, including stacks for each processor and exception >>> mode. Thus, separate initialisation on startup, but so do Renasas >>> 80C87 and some arm machines. How does gcc work for arm, for example ?. >> >> >> I have not gone back to check, but from memory (might not be >> completely accurate) the AVR uses two 8 bit registers to implement a >> 16 bit stack pointer. When entering/exiting a function the stack >> pointer has to potentially be updated as two separate operations, and >> you don't want the update to be split by an interrupt occuring half >> way through. > > Adding a bit to Richard's reply: The AVR call and return instructions > update the 16-bit "hardware" stack pointer (to push and pop the return > address) but they do so atomically, so they don't need interrupt > disabling. But gcc uses the "hardware" stack also for data, and must > then update the stack pointer as two 8-bit parts, which needs interrupt > disabling as Richard describes above. > > The IAR compiler uses the AVR Y register (a pair of 8-bit registers > making up a 16-bit number) as the stack pointer for the second, > compiler-defined "software" stack. IAR still uses the hardware stack for > return addresses, so it still uses the normal call and return > instructions (usually), but it puts all stack-allocated data on the > software stack accessed via the Y register. The AVR provides > instructions that can increment or decrement the Y register atomically, > as a 16-bit entity, and the IAR compiler's function prologues/epilogues > often use these instructions. However, sometimes the IAR compiler > generates code that adds or subtracts a larger number (> 1) to/from Y, > and then it must use two 8-bit operations, and must disable interrupts > just as gcc does. > > Conclusion: the frequency of interrupt disabling is probably less in > IAR-generated code than in gcc-generated code, but the impact in terms > of an increased worst-case interrupt response latency is the same. > One point to remember here is that this only applies to functions that need to allocate a stack frame for data on the stack. The AVR has a fair number of registers, so that a great many functions do not require data to be allocated on the stack, and thus don't need such a stack frame. I had a quick "grep" through a medium-sized project (20K code) for which I happened to have listing files - there were only two functions in the entire project that had a stack frame. For the great majority of the time, it is sufficient to save and restore registers using push and pop. For AVR compilers that use a separate data stack (I am familiar with ImageCraft rather than IAR, but the technique is the same), saving and restoring on the data stack via Y++/Y-- is the same size and speed. Also note that you only need to disable interrupts if you are changing both the high and the low bytes of the stack pointer. If you know your stack will never be more than 256 bytes (which is very often the case), you can use the "-mtiny-stack" flag to tell avr-gcc that the SP_H register is unchanged by any stack frame allocation, and thus interrupts are not disabled. There are two advantages of using Y as a data stack pointer rather than using the hardware stack. One is that it is possible to use common routines to handle register save and restores rather than a sequence of push/pops in each function, which saves a bit of code space (at the cost of a little run-time). Secondly, you don't have to set up a frame pointer to access the data, as Y is already available (the AVR can access data at [Y+index], but not [SP+index]). However, this is a minor benefit - any function that needs a frame will be large enough that the few extra instructions needed are a small cost in time and space. Interrupts do need to be disabled (unless you use -mtiny-stack), but it is only for a couple of clock cycles. But there are two disadvantages of using Y as a data stack pointer, rather than using a single stack. One is that you have to think about where your two stacks are situated in memory, and how big they must be - it is hard to be safe without wasting data space (especially if you also use a heap). The other is that if your code uses more than one pointer at a time, the compiler must generate code to save and restore Y (maybe also disabling interrupts in the process), or miss out on using it. The AVR has only two good pointers - Y and Z, and a limited third pointer X. Code that uses pointers to structs will see particular benefits of having Y available for general use. All in all, you cannot make clear decisions as to which method is the "best". <http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_spman>
From: David Brown on 26 Sep 2009 08:30 Ulf Samuelsson wrote: > David Brown skrev: >> Ulf Samuelsson wrote: >>> The GNU toolchain can be OK, and it can be horrible. >>> If you look at ST's home page you will find some discussion >>> about performance of GCC-4.2.1 on the STM32. >>> >> >> Could you provide a link to this? I could not see any such discussion. >> >> I note that gcc-4.2.1 was the CodeSourcery release two years ago, when >> Thumb-2 support was very new in gcc. And if the gcc-4.2.1 in question >> was not from CodeSourcery but based on the official FSF tree, then I >> don't think it had Thumb-2 at all. It is very important with gcc to >> be precise about the source and versions - particularly so since >> CodeSourcery (who maintain the ARM ports amongst others) have >> target-specific features long before they become part of the official >> FSF tree. >> >>> The rumoured 90 MIPS becomes: >>> >>> wait for it... >>> >>> 32 MIPS... >>> >>> With a Keil compiler you can reach about 60-65 MIPS at least with >>> a 72 MHz Cortex-M3. >>> >>> Anyone seen improvement in later gcc versions? >>> >> >> I would be very surprised to see any major ARM compiler generating >> code at twice the speed of another major ARM compiler, whether we are >> talking gcc or commercial compilers. To me, this indicates either >> something odd about the benchmark code, something wrong in the use of >> the tools (such as compiler flags or libraries), or something wrong in >> the setup of the device in question (maybe failing to set clock speeds >> or wait states correctly). >> >> If there was consistently such a big difference, I would not expect >> gcc-based development tools to feature so prominently on websites such >> as ST's or TI (Luminary Micros) - a compiler as bad as you suggest >> here would put the devices themselves in a very bad light. >> >> I haven't used the ST32 devices, but I am considering TI's Cortex-M3 >> for a project, so I interested in the state of development tools for >> the same core. >> >>> ... >>> On the AVR I noted things like pushing ALL registers >>> when entering an interrupt. >> >> avr-gcc does /not/ push all registers when entering an interrupt. It >> does little for the credibility of your other points when you make >> such widely inaccurate claims. > > In the case I investigated for a customer > (which was more than one year ago) > the interrupt routines took a lot longer time to execute, > and this causes a lot of grievance. > I don't remember if avr-gcc ever pushed all registers when entering an interrupt, but if so it was much more than a year ago (I have used it for over 6 years). I have no problem believing that an interrupt routine took significantly longer to execute with avr-gcc than with IAR - my issue is only with your reasoning, particularly since you emphasised that "ALL registers" were pushed. Without knowing anything about the customer, the code, the compiler versions, or compiler switches used, I would hazard a guess that the interrupt function called an external function in another module (or perhaps in a library). My guess is that IAR did full-program optimisation, and pushed the called code into the interrupt handler and thus avoided saving all the ABI volatile registers since it new exactly what the called code would need. Full-program optimisation (using --combine and -fwhole-program flags) is relatively new to avr-gcc, and not yet well known - it is very unlikely that it was used in your comparison. Of course, developers who understand how their tools work and how their target processor works would normally avoid making an external function call from an interrupt routine in the first place. It is fair to say that the ability to choose compiler options like full-program optimisation through simple dialog boxes is an advantage of IAR over avr-gcc - getting the absolute best out of avr-gcc requires more thought, research and experimenting than it does with a tool like IAR. > >> >> avr-gcc always pushes three registers in interrupts - SREG, and its >> "zero" register and "tmp" register because some code sequences >> generated by avr-gcc make assumptions about being able to use these >> registers. Theoretically, these could be omitted in some cases, but it >> turns out to be a difficult to do in avr-gcc, and the advantages are >> small (for non-trivial interrupt functions). No one claims that >> avr-gcc is perfect, merely that it is very good. > > > >> >> Beyond that, avr-gcc pushes registers if they are needed - pretty much >> like any other compiler I have used. If your interrupt function calls >> an external function, and you are not using whole-program >> optimisation, then this means pushing all ABI "volatile" registers - >> an additional 12 registers. Again, this is the same as for any other >> compiler I have seen. And as with any other compiler, you avoid the >> overhead by keeping your interrupt functions small and avoiding >> external function calls, or by using whole-program optimisations. >> >>> The IAR is simply - better - . >>> >> >> I'll not argue with you about IAR producing somewhat smaller or faster >> code than avr-gcc. I have only very limited experience with IAR, so I >> can't judge properly. But then, you apparently have very little >> experience with avr-gcc - > > I don't disagree with that. > I have both, but I quickly scurry back to the IAR compiler > if I need to show off the AVR. > You have colleagues at Atmel who put a great deal of time and effort into avr-gcc. You might want to talk to them about how to get the best out of avr-gcc - that way you can offer your customers a wider choice. Different tools are better for different users and different projects - your aim is that customers have the best tools for their use, and know how to get the best from those tools, so that they will get the best out of your devices. On the other hand, I fully understand that no one has the time to learn about all the tools available, and you have to concentrate on particular choices. It's fair enough to tell people how wonderful IAR and the AVR go together - but it is not fair enough to tell people that avr-gcc is a poor choice without better technical justification. > > > few people have really studied and compared >> both compilers in a fair and objective test. There is certainly room >> for improvement in avr-gcc - there are people working on it, and it >> gets better over time. >> >> But to say "IAR is simply better" is too sweeping a statement to be >> taken seriously, since "better" means so many different things to >> different people. > > OK, let me rephrase: It generally outputs smaller and faster code. > That is much better - although some day I'd like to hear numbers based on real code examples, generated by someone familiar with both tools. I guess some day I'll need to test out IAR's compiler for myself. But this is certainly an opinion I've heard enough to make it believable. If you have any links that can actually show numbers, I'd appreciate looking at them. The only independent comparison I have found is from the www.freertos.org page, and that's badly out of date (the avr-gcc is from 2003, I don't know about IAR). There is no size comparison, but avr-gcc beats IAR on most of the speed tests... >> >>> The gcc compiler can be OK, as shown with the AVR32 gnu compiler. >>> >> >> To go back to your original statement, "The GNU toolchain can be OK, >> and it can be horrible", I agree in general - although I'd rate the >> range a bit higher (from "very good" down to "pretty bad", perhaps). >> There have been gcc ports in the past that could rate as "horrible", >> but I don't think that applies to any modern gcc port in serious >> active use. >> >>> > BR > Ulf Samuelsson >
From: Rocky on 26 Sep 2009 09:32 On Sep 26, 2:30 pm, David Brown <da...(a)westcontrol.removethisbit.com> wrote: >Snip interesting stuff< > I don't remember if avr-gcc ever pushed all registers when entering an > interrupt, but if so it was much more than a year ago (I have used it > for over 6 years). Must be a lot of code in that interrupt!
From: ChrisQ on 28 Sep 2009 05:36
Niklas Holsti wrote: > A small addition to my own posting, sorry for omitting it initially: > > Niklas Holsti wrote: > (I elide most of the context): > >> However, sometimes the IAR compiler generates code that adds or >> subtracts a larger number (> 1) to/from Y, and then it must use two >> 8-bit operations, and must disable interrupts just as gcc does. > > Some AVR models do provide instructions (ADIW, SBIW) that can atomically > add/subtract an immediate number (0..63) to/from the 16-bit Y register. > I assume, but haven't checked, that IAR uses these instructions when > possible, rather than two 8-bit operations in an interrupt-disabled region. > A very good explanation and thanks. It's the intricacies of architecture that is sometimes hard to get a big picture of when choosing a processor for a project. I've never used avr for any project and info like this would tend to keep me in the 8051 world for small logic replacement tasks, no matter how constrained it is. AVR32 looks much better though. In summary then, it looks like the 8 bit avr's need special compiler support to get best results, which I wouldn't necessarily expect gcc to provide. I'm quite happy to accept that IAR would produce better code, in much the same way as Keil is arguably the best solution for 8051. Both are 8 bit legacy architectures, designed before the days of general hll development. I think if I were trying to find a low end micro now, msp430 would be the first point of call, as it is a much more compiler friendly 16 bit architecture. Stuff like this does matter as it can have a significant impact on software development timescales and quality... Regards, Chris |