A tool that suggests optimized logic for a piece of code/module/function [Embedded]

Prev: regarding DAC in dsPIC33FJ128GP802
Next: CPU with SDRAM, USB, MAC, CAN, ADC and PWM ?

From: Walter Banks on 15 Jan 2010 07:04

Hans-Bernhard Br�ker wrote:

> David Brown wrote:
>
> > Nothing stops the compiler from doing this sort of thing in /theory/.
> > But /practice/ is a different matter.
>
> The same thing applies to the original question. If a compiler's
> full-blown optimizer, given practically infinite time to ponder the
> problem, can't get that analysis job done, then no other tool can, and
> certainly not while just looking over the programmers' shoulders as they
> type.

The compilers optimizer primarily goal is to map an application
on a target processor. Most good optimizers have a lot of
information on the application code and the resources required to
implement a specific instance. A possible implementation might
involve trying alternative approaches and using compiler metrics
This could be automated.

Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

From: Jon Kirwan on 15 Jan 2010 17:18

On Fri, 15 Jan 2010 12:19:11 -0800 (PST), -jg wrote:

>On Jan 16, 8:28�am, Jon Kirwan <j...(a)infinitefactors.org> wrote:
>>
>> Walter, they don't even do _that_ task well enough.
>>
>> Since this topic appears to suggest the idea of having a
>> compiler do something I consider almost crazy-minded at this
>> point in time.....
>
> Yes, the best reply to the original overall question, is a Pencil !!.

Hehe. I can conceive (and accept) the idea that once good
_human_ imagination has been applied, that these concepts can
be more rigorously explored (along the lines found in Knuth's
early 1970's 3-volume set) about their boundary conditions
and application space and placed into a "dictionary of moves"
(as in some chess programs) that a compiler can use, together
with heuristics to adduce which of the many options to choose
as a "fit." Yeah, I can imagine the idea. And perhaps some
of the low hanging fruit there can be picked, too. Don't
know.

But cripes! Some time ago, I engaged in a very interesting
process exploring various compilers' results upon compiling a
very simple (we are talking maybe 10-12 lines of code here?)
and well-known algorithm for calculating the GCD of two
integers. The creative juices of several were applied, with
the result of more than one excellent topological reformation
that, had a compiler observed it, would have resulted in the
automatic generation of object code very close to the best a
human can do on their own. (Once the slight reorganization
was found, merely using traditional optimizations that have
been implemented, not just known about, for two decades and
more.)

I have no problem discussing viable ideas about moving well
forward. But a journey of a thousand miles starts with a
single step. I'd just like to see some serious discussion
about some of these earlier steps and some actual
implementation work exhibited.

I'm not speaking from afar, from some distanced or uninvolved
position. I'm willing to put in my own time here, to put
money where my mouth is at. I'm NOT a practiced compiler
writer, only having a few experiences of my own (a BASIC
interpreter, with a library, and a toy c compiler without any
of the library system.) But I'd be willing to, under the
guidance of those more recently and broadly experienced,
cooperate and supply time working on some ideas if it would
help here. I'm seriously interested in meaningful compiler
advances.

There is so much that is yet to do that is right in front of
us, though. I'd like to see success there before wasting so
much breath on what appears right now to be pretty far out
and even then relatively ad hoc besides.

> However, there are some details, aside from the
>'find a new algorithm'(?!) request that tools could be
>expected to offer.
>
> Taking this:
>[" It will be even more helpful if that tool
>also provides the cycle counts, cache
>usage, cache misses and lines of code
>also. "]

To add something else I'd like to have in a language above
the level of assembly, since we are chatting, is the ability
to specify that two different code edges on either side of a
branch must execute in the exact same cycle count. In other
words, that regardless of the condition, the entire block of
code including the condition test must have one and only one
execution time. I have application for this.

But we may be simply discussing all this by starting from a
bottom up approach, listing various "leaf ideas" we'd like to
see -- hoping perhaps that, by induction, we may find an
enclosing principle from that. I'd like to see a more
theoretical approach taken, where specific approaches are
_deduced_ readily by applying circumstances to theory.

So here and regarding your suggestion about cache usage and
cache misses, I'd refer to this:

http://mitpress.mit.edu/catalog/item/default.asp?tid=5098&ttype=2

A 1985 dissertation. Although it was more focused on DRAM
bank organization, among other things, I suspect that the
theoretical ideas it presents could also be used to deduce
some thoughts addressing a variety of cache optimizations,
taking those specifics into account.

> Lines of code should already be there, in a listing output, and also
>in a debugger.

Yeah. It's been done. It just isn't done enough for
embedded work, I suppose.

> Likewise, on a chip/ICE with HW breakpoints and sys timers, you could
>expect a good system to be able to give time-of-flight on running
>code, if you really need that.
> The OP sees to expect all this in an EDITOR, which is miss-directed,
>but it is possible in a full tool chain.

I suppose so.

I was addressing myself to the idea of swapping in a quick
sort for a bubble sort and similar thrusts, though. On that
point, there are some earlier steps to take first.

Jon

From: David Brown on 16 Jan 2010 11:11

Andy wrote:
> On Jan 14, 9:57 am, David Brown <da...(a)westcontrol.removethisbit.com>
> wrote:
>> I don't agree here (perhaps as a compiler writer you are thinking of
>> "implementation" in terms of generated target code - then I'd agree).
>> Kids use Logo to learn about programming concepts, and how to get the
>> computer to do what you want it to do. They learn to write things like:
>>
>> TO SQUARE :size
>> REPEAT 4 [ FD :size RT 90 ]
>> END
>>
>> This is entirely about writing an imperative implementation of how you
>> want the system to draw a square.
>>
>> Compare this to a sample program in a real functional programming
>> language, Haskell:
>>
>> factorial 0 = 1
>> factorial n = n * factorial(n - 1)
>>
>> Here you tell the system what you want - you give a mathematical
>> description of the results. You don't care how it gets them - maybe it
>> uses recursion, maybe it uses iteration, maybe it builds up a cache of
>> calculated values as it goes along.
>>
>
> The LOGO interpreter/compiler is just as free to implement alternative
> solutions to drawing a square as the Haskell compiler is of altering
> the described recursive implementation of a factorial. Whether the
> compiler is smart enough to do so has nothing to do with the language
> being "procedural" or "functional".
>

It is certainly true that the compiler is free to implement alternative
solutions in any language. However, the style of language has a great
deal to say in how hard a task this is for the compiler writer. In
particular, with a language that does not have states or allow side
effects in functions, it is much easier for the compiler to have a
complete view of what the function does - and thus it has a better
chance of generating alternative implementations.

You can think of compiling (or interpreting - it's the same thing in
theory) as taking a high level source code down to a low level
implementation in a series of steps. At each step down, the compiler
can pick one of many implementations at the lower level. It can also
re-arrange the code at the same level. Moving up a level from an
implementation is a much harder task, and will rarely lead to a better
result in the end - too much of the "overview" information is already
lost, and there are too many implementation details that cannot be
distinguished from requirements. Thus an ideal compiler has the best
chance of generating the best code if the source is at a higher level,
and if it is written in a language that gives a clearer and more
mathematically precise specification.

Of course, practical reality does not match directly with the theory.
In particular, vastly more man-hours have been spent to write better
compilers for C than for Haskell - the result being that you can compile
C code to tighter target code than you can with Haskell code.

From: Jon Kirwan on 16 Jan 2010 17:01

On Sat, 16 Jan 2010 13:01:24 -0800 (PST), -jg
<jim.granville(a)gmail.com> wrote:

>On Jan 16, 11:18�pm, Walter Banks <wal...(a)bytecraft.com> wrote:
>
>> To illustrate this point. Your exact timing example is not a
>> C compiler limitation but a language limitation. How do
>> you describe exact timing all paths in a function in an
>> unambiguous way in C? Exact timing is an application
>> objective.
>
> Something like exact timing would be via local directives, and is
>going to be very locally dependant.

Oh, yes. On some machines (on one extreme, the modern x86
with branch detection logic, read-around-write data flows,
registration stations, all manner of out-of-order execution,
and multi-level caching), it would be a nightmare to achieve.
On others, quite achievable in practice. The "requirement"
would instead be a "goal" for the compiler language.
Something like "register" is for a c variable -- a suggestion
-- but perhaps with a little higher priority meant than
'register' means today. #pragma might be appropriate to
bracket the code where all edges should perform in similar
times, though I'm sure there will be problems to work out
there, too.

> It almost comes into "better reporting"

Not just better reporting for the case I'm talking about. If
you can't provide something that actually impacts the code
generation but have very good reporting on what you get out
of it, you will find yourself fighting the darned thing at
every turn of the code trying to work out equal times. It's
far better to provide the clues and hints a compiler can then
use in generating code.

> I remember we added a simple code-size report, for each code block,
>so pgmrs could quickly compare different choices. Trivial to do, and
>useful.
>
> We did think about reporting time, but that quickly degenerates as
>you do not know what path combinations will actually occur, and it can
>vary widely across
>same core:different vendor.
> So our solution here was always to verify on a scope,
>and patch as needed. That covers all complexity levels.
>
>
> On this 'exact timing' topic, I note the new Cortex M0 cores (NXP,
>Nuvoton?), claims a fixed-int-response time, so they remove the
>'current opcode' jitter. This was also discussed some years ago as a
>feature option for a faster 8051 that never got to silicon.
> It's a good thing to see devices that are moving 'less deterministic'
>offer a way to be 'more deterministic' :)
>[- and it side-steps all that SW work ]

The ADSP-21xx provides rock-solid timing. If the only
interrupt source is the timer, since that is in synch with
the clock and doesn't need to be asynchronously latched and
tested, then even the interrupt latency is _exactly_ known.
Very sweet. I used that to advantage.

Jon

>
>-jg

From: Jon Kirwan on 16 Jan 2010 22:08

On Sat, 16 Jan 2010 14:43:11 -0800 (PST), -jg
<jim.granville(a)gmail.com> wrote:

>On Jan 17, 11:01�am, Jon Kirwan <j...(a)infinitefactors.org> wrote:
>> On Sat, 16 Jan 2010 13:01:24 -0800 (PST), -jg
>>
>> <jim.granvi...(a)gmail.com> wrote:
>>
>> > It almost comes into "better reporting"
>>
>> Not just better reporting for the case I'm talking about. �If
>> you can't provide something that actually impacts the code
>> generation but have very good reporting on what you get out
>> of it, you will find yourself fighting the darned thing at
>> every turn of the code trying to work out equal times. �It's
>> far better to provide the clues and hints a compiler can then
>> use in generating code.
>
> In the simplest cases, the difference between manually adding NOPs
>and checking, or trawling the manual, setting a directive, and then
>also checking the tool really DID do what you hoped, is negligible.
>
> In the more complex cases, the tools are almost certain to drop the
>ball, so you'll also need to manually check.
>
> So that's why I favour 'better reporting' over 'granular smarts'
>anytime, as the reporting is ALWAYS useful, and the smarts are only
>case-by-case useful and with risky failure modes.
>[aka The illusion of intelligence is dangerous]
>
> The rare times I've needed to do this, it was no sweat to just use a
>pencil, and then do an operational check.

Well, what can I say? I'd doing pretty much exactly this in
the case where I needed it -- wrote it in assembly and
counted instructions. However, I acknowledge that a compiler
would do a reliable job of counting cycles. This is one of
those good examples where it's better to let the compiler
worry about the bookkeeping issues.

Jon

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: regarding DAC in dsPIC33FJ128GP802
Next: CPU with SDRAM, USB, MAC, CAN, ADC and PWM ?