From: Jon Kirwan on
On Mon, 18 Jan 2010 09:28:20 +0100, David Brown
<david(a)westcontrol.removethisbit.com> wrote:

>Jon Kirwan wrote:
>> On Sun, 17 Jan 2010 17:26:12 -0500, Walter Banks
>> <walter(a)bytecraft.com> wrote:
>>
>>> -jg wrote:
>>>
>>>> Not sure how you'd 'compiler automate' this ?
>>>> perhaps insert a start tag, and a series of stop tags,
>>>> all in the source, and create/maintain/calibrate a whole series of
>>>> cycle-tables, for the cores your compiler supports. There are over a
>>>> dozen timing choices on 80C51's alone now.
>>>> (NOT going to be easy for the compiler to correctly add value-
>>>> dependant multiple branches, so a pencil is _still_ needed)
>>> We have one advantage in our compilers for this because we
>>> normally compile directly to machine code. For processors with
>>> deterministic timing constant timing is possible for the limited
>>> set of problems whose timing is deterministic.
>>
>> I'd imagine that by deferring some of the work involved into
>> the link process, much can also be done here. I think I read
>> recently here that GNU GCC, v4.5, starts to do more of the
>> significant optimizations in the link phase. But I might
>> have misunderstood what I read.
>
>gcc 4.5 has merged the experimental LTO (link-time optimisation) branch
>of gcc into the mainline. Such optimisations are not about getting
>exact, predictable or consistent timing - it's about getting the fastest
>and/or smallest code. As such, using LTO would probably make it harder
>to get deterministic timing.

I knew that much, or assumed it. I wouldn't expect that, at
all. My point was merely about pointing up an alternative to
having the compiler itself "compile directly to machine
code," as Walter mentioned.

Separate compilation limits the viewpoint of a compiler. The
linker has broader view, since it must combine the results of
compilation units. Some optimization is appropriate for the
compiler, some for the linker phase. I think it's best to
retain the general pattern of compilation and link phases,
and place appropriate optimizations where they are addressed
better. So I applaud the general idea of LTO.

>The basic idea of LTO is that when the compiler compiles a C (or CPP,
>Ada, whatever) file, it saves a partly digested internal tree to the
>object file as well as the generated object code. When you later link a
>set of object files (or libraries) that have this LTO code, the linker
>passes the LTO code back to the compiler again for final code
>generation. The compiler can then apply cross-module optimisations
>(such as inlining, constant propagation, code merging, etc.) across
>these separately partially-compiled modules.

That's how I took what I'd read earlier, perhaps from you.

>In other words, it is a very flexible form of whole program
>optimisation, since it works with libraries, separately compiled modules
>(no need to have the whole source code on hand), different languages,
>and it can work step-wise for very large programs as well as for small
>programs.
>
>Another feature of gcc 4.5 that is more directly relevant here is that
>you can now specify optimisation options for particular functions
>directly in the source code. Thus you can have your timing-critical
>bit-bang function compiled with little or no optimisation to be sure you
>get the same target code each time, while the rest of the module can be
>highly optimised as the compiler sees fit.

I wonder what Walter is doing to compete with this approach.
In time, it _is_ the right approach to build upon.

The overall idea seems to me to have been the right one three
decades ago, let alone now. That it has only seen fits and
starts over those years has long bothered me. I know that
c++ itself has driven _some_ improvements in the skill of
linkers, but not nearly as much as I might have hoped.

Jon
From: David Brown on
Vladimir Vassilevsky wrote:
>
> Just a couple of things that would be good to have:
>
> 1. A tool which combines all of the C/C++ source code into one temporary
> file prior to compillation, resolving name conflicts automatically. So
> the compiler could optimize through the whole project.
>

With gcc, you can do exactly this with the "-combine -fwhole-program"
flags. The "-combine" flag tells the compiler to compile all the
specified C files at the same time, and take advantage of this knowledge
for cross-module optimisations. The "-fwhole-program" flag tells gcc
that there are no other modules to consider, so it can optimise global
functions and data in much the same way as file static data.
Unfortunately (for C++ fans) the -combine flag only works with C, not
C++. Of course, the up-and-coming version 4.5 will reduce the need for
"-combine", as you can get the same effect with LTO.

It would be nice to have a general, compiler-independent way to generate
such combined files.

> 2. A function attribute with the meaning opposite to "inline". So the
> function with this attribute will never be inlined by compiler. Why:
> autimatic inlining is great, however different functions may need to be
> placed in the different memory sections. If compiler inlines a function
> automatically, then the actual code could go into the wrong section.
>

Again, gcc has an answer - the "noline" function attribute. I expect
many compilers have a something similar, with a function attribute or
pragma having that effect. For other compilers, you can use compiler
limitations to force the effect (for example, perhaps it will not inline
a function containing an assembly statement).
From: David Brown on
Jon Kirwan wrote:
> On Mon, 18 Jan 2010 09:28:20 +0100, David Brown
> <david(a)westcontrol.removethisbit.com> wrote:
>
>> Jon Kirwan wrote:
>>> On Sun, 17 Jan 2010 17:26:12 -0500, Walter Banks
>>> <walter(a)bytecraft.com> wrote:
>>>
>>>> -jg wrote:
>>>>
>>>>> Not sure how you'd 'compiler automate' this ?
>>>>> perhaps insert a start tag, and a series of stop tags,
>>>>> all in the source, and create/maintain/calibrate a whole series of
>>>>> cycle-tables, for the cores your compiler supports. There are over a
>>>>> dozen timing choices on 80C51's alone now.
>>>>> (NOT going to be easy for the compiler to correctly add value-
>>>>> dependant multiple branches, so a pencil is _still_ needed)
>>>> We have one advantage in our compilers for this because we
>>>> normally compile directly to machine code. For processors with
>>>> deterministic timing constant timing is possible for the limited
>>>> set of problems whose timing is deterministic.
>>> I'd imagine that by deferring some of the work involved into
>>> the link process, much can also be done here. I think I read
>>> recently here that GNU GCC, v4.5, starts to do more of the
>>> significant optimizations in the link phase. But I might
>>> have misunderstood what I read.
>> gcc 4.5 has merged the experimental LTO (link-time optimisation) branch
>> of gcc into the mainline. Such optimisations are not about getting
>> exact, predictable or consistent timing - it's about getting the fastest
>> and/or smallest code. As such, using LTO would probably make it harder
>> to get deterministic timing.
>
> I knew that much, or assumed it. I wouldn't expect that, at
> all. My point was merely about pointing up an alternative to
> having the compiler itself "compile directly to machine
> code," as Walter mentioned.
>
> Separate compilation limits the viewpoint of a compiler. The
> linker has broader view, since it must combine the results of
> compilation units. Some optimization is appropriate for the
> compiler, some for the linker phase. I think it's best to
> retain the general pattern of compilation and link phases,
> and place appropriate optimizations where they are addressed
> better. So I applaud the general idea of LTO.
>

Although in a sense the LTO is part of the linker process, it is handled
by the compiler and linker together. The linker collects together the
LTO objects (from *.o files, libraries, etc.) and any required non-LTO
objects from traditional object files and libraries. It handles some of
the combining process, symbol resolution, and finding missing symbols in
libraries. The collected LTO code is then passed back to the compiler
to generate pure target object code. The linker takes this result, and
completes the process by linking it to any non-LTO sections it needs.

>> The basic idea of LTO is that when the compiler compiles a C (or CPP,
>> Ada, whatever) file, it saves a partly digested internal tree to the
>> object file as well as the generated object code. When you later link a
>> set of object files (or libraries) that have this LTO code, the linker
>> passes the LTO code back to the compiler again for final code
>> generation. The compiler can then apply cross-module optimisations
>> (such as inlining, constant propagation, code merging, etc.) across
>> these separately partially-compiled modules.
>
> That's how I took what I'd read earlier, perhaps from you.
>

Could be - I've talked about it before. You can read some more at
<http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fwhole_002dprogram-795>

>> In other words, it is a very flexible form of whole program
>> optimisation, since it works with libraries, separately compiled modules
>> (no need to have the whole source code on hand), different languages,
>> and it can work step-wise for very large programs as well as for small
>> programs.
>>
>> Another feature of gcc 4.5 that is more directly relevant here is that
>> you can now specify optimisation options for particular functions
>> directly in the source code. Thus you can have your timing-critical
>> bit-bang function compiled with little or no optimisation to be sure you
>> get the same target code each time, while the rest of the module can be
>> highly optimised as the compiler sees fit.
>

On further reading, I note that function-specific optimisation
attributes are in gcc 4.4.

> I wonder what Walter is doing to compete with this approach.
> In time, it _is_ the right approach to build upon.
>

One point that must be remembered about gcc is that it is targeted for a
wide range of processors, multiple languages, and huge range of
software. Walter's compilers are much more dedicated, and are mostly
for smaller systems. It is perfectly reasonable for Walter's compilers
to be based on requiring all the source code to be available at the
point of compilation, and that all the code is compiled anew each time -
the compiler then has the best possible view of the problem. Who cares
if the compiler takes 10 seconds to run instead of 5 seconds?

But with gcc, you want to compile systems that take /hours/ to compile
on top-grade systems, making full use of multiple processors or even
multiple machines, and with source code scattered over thousands of
directories (as well as being able to compile code for an 8K AVR). That
is why the LTO approach is the right approach for gcc, but need not be
the best (and certainly not the only) approach for dedicated embedded
compilers.

If you are curious about LTO for big programs, look up the "-fwhopr"
flag in the link above - this is work done by Google aimed at doing
efficient LTO on huge programs.

> The overall idea seems to me to have been the right one three
> decades ago, let alone now. That it has only seen fits and
> starts over those years has long bothered me. I know that
> c++ itself has driven _some_ improvements in the skill of
> linkers, but not nearly as much as I might have hoped.
>
> Jon
From: Walter Banks on


Jon Kirwan wrote:

> >Another feature of gcc 4.5 that is more directly relevant here is that
> >you can now specify optimisation options for particular functions
> >directly in the source code. Thus you can have your timing-critical
> >bit-bang function compiled with little or no optimisation to be sure you
> >get the same target code each time, while the rest of the module can be
> >highly optimised as the compiler sees fit.
>
> I wonder what Walter is doing to compete with this approach.
> In time, it _is_ the right approach to build upon.

Why? Not because I don't understand the comment or am being
augmentative. Why is user controlled optimization at the function
or line level the right approach for a language translator?

> you can have your timing-critical bit-bang function compiled
> with little or no optimisation to be sure you get the same target
> code each time.

Most of the more recent processors instruction sets that make
exact execution timing with code difficult in some cases even in
simple cases.

This is why we moved away from this approach to
precomputing output values and syncing with a timer or free
running clock.

Many communication protocols especially at the application
level use protocols with interleaved clock and data to make
them more reliable in the presence of jitter. (I2C, SPI for
example)

PWM's can be made jitter resistant by comparing a desired
value to a random number and outputting the logical less than
as often as possible.



Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

From: Walter Banks on


David Brown wrote:

> One point that must be remembered about gcc is that it is targeted for a
> wide range of processors, multiple languages, and huge range of
> software. Walter's compilers are much more dedicated, and are mostly
> for smaller systems. It is perfectly reasonable for Walter's compilers
> to be based on requiring all the source code to be available at the
> point of compilation, and that all the code is compiled anew each time -
> the compiler then has the best possible view of the problem.

To clear up a misconception. We can compile with all source code
available but we it doesn't need to be. We can also compile each
module to obj and then link. In all cases the code generation has
a full application view.

We have written 28 compilers some for small systems some for
large systems. The processors we have targeted have varied widely
with natural data sizes of 8,9,12,13,16,24 and 32 bits. We have
targeted both heterogeneous and homogenous multiprocessor
execution platforms.

Our real skill is compiling to processors with unusual architectures or
processors with limited resources.

Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com