Prev: A tool that suggests optimized logic for a piece of code/module/function
Next: How to ascertain the type of files being read
From: Jon Kirwan on 18 Jan 2010 14:51 On Mon, 18 Jan 2010 09:28:20 +0100, David Brown <david(a)westcontrol.removethisbit.com> wrote: >Jon Kirwan wrote: >> On Sun, 17 Jan 2010 17:26:12 -0500, Walter Banks >> <walter(a)bytecraft.com> wrote: >> >>> -jg wrote: >>> >>>> Not sure how you'd 'compiler automate' this ? >>>> perhaps insert a start tag, and a series of stop tags, >>>> all in the source, and create/maintain/calibrate a whole series of >>>> cycle-tables, for the cores your compiler supports. There are over a >>>> dozen timing choices on 80C51's alone now. >>>> (NOT going to be easy for the compiler to correctly add value- >>>> dependant multiple branches, so a pencil is _still_ needed) >>> We have one advantage in our compilers for this because we >>> normally compile directly to machine code. For processors with >>> deterministic timing constant timing is possible for the limited >>> set of problems whose timing is deterministic. >> >> I'd imagine that by deferring some of the work involved into >> the link process, much can also be done here. I think I read >> recently here that GNU GCC, v4.5, starts to do more of the >> significant optimizations in the link phase. But I might >> have misunderstood what I read. > >gcc 4.5 has merged the experimental LTO (link-time optimisation) branch >of gcc into the mainline. Such optimisations are not about getting >exact, predictable or consistent timing - it's about getting the fastest >and/or smallest code. As such, using LTO would probably make it harder >to get deterministic timing. I knew that much, or assumed it. I wouldn't expect that, at all. My point was merely about pointing up an alternative to having the compiler itself "compile directly to machine code," as Walter mentioned. Separate compilation limits the viewpoint of a compiler. The linker has broader view, since it must combine the results of compilation units. Some optimization is appropriate for the compiler, some for the linker phase. I think it's best to retain the general pattern of compilation and link phases, and place appropriate optimizations where they are addressed better. So I applaud the general idea of LTO. >The basic idea of LTO is that when the compiler compiles a C (or CPP, >Ada, whatever) file, it saves a partly digested internal tree to the >object file as well as the generated object code. When you later link a >set of object files (or libraries) that have this LTO code, the linker >passes the LTO code back to the compiler again for final code >generation. The compiler can then apply cross-module optimisations >(such as inlining, constant propagation, code merging, etc.) across >these separately partially-compiled modules. That's how I took what I'd read earlier, perhaps from you. >In other words, it is a very flexible form of whole program >optimisation, since it works with libraries, separately compiled modules >(no need to have the whole source code on hand), different languages, >and it can work step-wise for very large programs as well as for small >programs. > >Another feature of gcc 4.5 that is more directly relevant here is that >you can now specify optimisation options for particular functions >directly in the source code. Thus you can have your timing-critical >bit-bang function compiled with little or no optimisation to be sure you >get the same target code each time, while the rest of the module can be >highly optimised as the compiler sees fit. I wonder what Walter is doing to compete with this approach. In time, it _is_ the right approach to build upon. The overall idea seems to me to have been the right one three decades ago, let alone now. That it has only seen fits and starts over those years has long bothered me. I know that c++ itself has driven _some_ improvements in the skill of linkers, but not nearly as much as I might have hoped. Jon
From: David Brown on 18 Jan 2010 15:12 Vladimir Vassilevsky wrote: > > Just a couple of things that would be good to have: > > 1. A tool which combines all of the C/C++ source code into one temporary > file prior to compillation, resolving name conflicts automatically. So > the compiler could optimize through the whole project. > With gcc, you can do exactly this with the "-combine -fwhole-program" flags. The "-combine" flag tells the compiler to compile all the specified C files at the same time, and take advantage of this knowledge for cross-module optimisations. The "-fwhole-program" flag tells gcc that there are no other modules to consider, so it can optimise global functions and data in much the same way as file static data. Unfortunately (for C++ fans) the -combine flag only works with C, not C++. Of course, the up-and-coming version 4.5 will reduce the need for "-combine", as you can get the same effect with LTO. It would be nice to have a general, compiler-independent way to generate such combined files. > 2. A function attribute with the meaning opposite to "inline". So the > function with this attribute will never be inlined by compiler. Why: > autimatic inlining is great, however different functions may need to be > placed in the different memory sections. If compiler inlines a function > automatically, then the actual code could go into the wrong section. > Again, gcc has an answer - the "noline" function attribute. I expect many compilers have a something similar, with a function attribute or pragma having that effect. For other compilers, you can use compiler limitations to force the effect (for example, perhaps it will not inline a function containing an assembly statement).
From: David Brown on 18 Jan 2010 15:27 Jon Kirwan wrote: > On Mon, 18 Jan 2010 09:28:20 +0100, David Brown > <david(a)westcontrol.removethisbit.com> wrote: > >> Jon Kirwan wrote: >>> On Sun, 17 Jan 2010 17:26:12 -0500, Walter Banks >>> <walter(a)bytecraft.com> wrote: >>> >>>> -jg wrote: >>>> >>>>> Not sure how you'd 'compiler automate' this ? >>>>> perhaps insert a start tag, and a series of stop tags, >>>>> all in the source, and create/maintain/calibrate a whole series of >>>>> cycle-tables, for the cores your compiler supports. There are over a >>>>> dozen timing choices on 80C51's alone now. >>>>> (NOT going to be easy for the compiler to correctly add value- >>>>> dependant multiple branches, so a pencil is _still_ needed) >>>> We have one advantage in our compilers for this because we >>>> normally compile directly to machine code. For processors with >>>> deterministic timing constant timing is possible for the limited >>>> set of problems whose timing is deterministic. >>> I'd imagine that by deferring some of the work involved into >>> the link process, much can also be done here. I think I read >>> recently here that GNU GCC, v4.5, starts to do more of the >>> significant optimizations in the link phase. But I might >>> have misunderstood what I read. >> gcc 4.5 has merged the experimental LTO (link-time optimisation) branch >> of gcc into the mainline. Such optimisations are not about getting >> exact, predictable or consistent timing - it's about getting the fastest >> and/or smallest code. As such, using LTO would probably make it harder >> to get deterministic timing. > > I knew that much, or assumed it. I wouldn't expect that, at > all. My point was merely about pointing up an alternative to > having the compiler itself "compile directly to machine > code," as Walter mentioned. > > Separate compilation limits the viewpoint of a compiler. The > linker has broader view, since it must combine the results of > compilation units. Some optimization is appropriate for the > compiler, some for the linker phase. I think it's best to > retain the general pattern of compilation and link phases, > and place appropriate optimizations where they are addressed > better. So I applaud the general idea of LTO. > Although in a sense the LTO is part of the linker process, it is handled by the compiler and linker together. The linker collects together the LTO objects (from *.o files, libraries, etc.) and any required non-LTO objects from traditional object files and libraries. It handles some of the combining process, symbol resolution, and finding missing symbols in libraries. The collected LTO code is then passed back to the compiler to generate pure target object code. The linker takes this result, and completes the process by linking it to any non-LTO sections it needs. >> The basic idea of LTO is that when the compiler compiles a C (or CPP, >> Ada, whatever) file, it saves a partly digested internal tree to the >> object file as well as the generated object code. When you later link a >> set of object files (or libraries) that have this LTO code, the linker >> passes the LTO code back to the compiler again for final code >> generation. The compiler can then apply cross-module optimisations >> (such as inlining, constant propagation, code merging, etc.) across >> these separately partially-compiled modules. > > That's how I took what I'd read earlier, perhaps from you. > Could be - I've talked about it before. You can read some more at <http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fwhole_002dprogram-795> >> In other words, it is a very flexible form of whole program >> optimisation, since it works with libraries, separately compiled modules >> (no need to have the whole source code on hand), different languages, >> and it can work step-wise for very large programs as well as for small >> programs. >> >> Another feature of gcc 4.5 that is more directly relevant here is that >> you can now specify optimisation options for particular functions >> directly in the source code. Thus you can have your timing-critical >> bit-bang function compiled with little or no optimisation to be sure you >> get the same target code each time, while the rest of the module can be >> highly optimised as the compiler sees fit. > On further reading, I note that function-specific optimisation attributes are in gcc 4.4. > I wonder what Walter is doing to compete with this approach. > In time, it _is_ the right approach to build upon. > One point that must be remembered about gcc is that it is targeted for a wide range of processors, multiple languages, and huge range of software. Walter's compilers are much more dedicated, and are mostly for smaller systems. It is perfectly reasonable for Walter's compilers to be based on requiring all the source code to be available at the point of compilation, and that all the code is compiled anew each time - the compiler then has the best possible view of the problem. Who cares if the compiler takes 10 seconds to run instead of 5 seconds? But with gcc, you want to compile systems that take /hours/ to compile on top-grade systems, making full use of multiple processors or even multiple machines, and with source code scattered over thousands of directories (as well as being able to compile code for an 8K AVR). That is why the LTO approach is the right approach for gcc, but need not be the best (and certainly not the only) approach for dedicated embedded compilers. If you are curious about LTO for big programs, look up the "-fwhopr" flag in the link above - this is work done by Google aimed at doing efficient LTO on huge programs. > The overall idea seems to me to have been the right one three > decades ago, let alone now. That it has only seen fits and > starts over those years has long bothered me. I know that > c++ itself has driven _some_ improvements in the skill of > linkers, but not nearly as much as I might have hoped. > > Jon
From: Walter Banks on 18 Jan 2010 15:49 Jon Kirwan wrote: > >Another feature of gcc 4.5 that is more directly relevant here is that > >you can now specify optimisation options for particular functions > >directly in the source code. Thus you can have your timing-critical > >bit-bang function compiled with little or no optimisation to be sure you > >get the same target code each time, while the rest of the module can be > >highly optimised as the compiler sees fit. > > I wonder what Walter is doing to compete with this approach. > In time, it _is_ the right approach to build upon. Why? Not because I don't understand the comment or am being augmentative. Why is user controlled optimization at the function or line level the right approach for a language translator? > you can have your timing-critical bit-bang function compiled > with little or no optimisation to be sure you get the same target > code each time. Most of the more recent processors instruction sets that make exact execution timing with code difficult in some cases even in simple cases. This is why we moved away from this approach to precomputing output values and syncing with a timer or free running clock. Many communication protocols especially at the application level use protocols with interleaved clock and data to make them more reliable in the presence of jitter. (I2C, SPI for example) PWM's can be made jitter resistant by comparing a desired value to a random number and outputting the logical less than as often as possible. Regards, -- Walter Banks Byte Craft Limited http://www.bytecraft.com
From: Walter Banks on 18 Jan 2010 16:12
David Brown wrote: > One point that must be remembered about gcc is that it is targeted for a > wide range of processors, multiple languages, and huge range of > software. Walter's compilers are much more dedicated, and are mostly > for smaller systems. It is perfectly reasonable for Walter's compilers > to be based on requiring all the source code to be available at the > point of compilation, and that all the code is compiled anew each time - > the compiler then has the best possible view of the problem. To clear up a misconception. We can compile with all source code available but we it doesn't need to be. We can also compile each module to obj and then link. In all cases the code generation has a full application view. We have written 28 compilers some for small systems some for large systems. The processors we have targeted have varied widely with natural data sizes of 8,9,12,13,16,24 and 32 bits. We have targeted both heterogeneous and homogenous multiprocessor execution platforms. Our real skill is compiling to processors with unusual architectures or processors with limited resources. Regards, -- Walter Banks Byte Craft Limited http://www.bytecraft.com |