A tool that suggests optimized logic for a piece of code/module/function [General Programming]

Prev: Best way to learn ... need some help
Next: A tool that suggests optimized logic for a piece of code/module/function

From: -jg on 16 Jan 2010 16:01

On Jan 16, 11:18 pm, Walter Banks <wal...(a)bytecraft.com> wrote:

> To illustrate this point. Your exact timing example is not a
> C compiler limitation but a language limitation. How do
> you describe exact timing all paths in a function in an
> unambiguous way in C? Exact timing is an application
> objective.

Something like exact timing would be via local directives, and is
going to be very locally dependant.
It almost comes into "better reporting"

I remember we added a simple code-size report, for each code block,
so pgmrs could quickly compare different choices. Trivial to do, and
useful.

We did think about reporting time, but that quickly degenerates as
you do not know what path combinations will actually occur, and it can
vary widely across
same core:different vendor.
So our solution here was always to verify on a scope,
and patch as needed. That covers all complexity levels.

On this 'exact timing' topic, I note the new Cortex M0 cores (NXP,
Nuvoton?), claims a fixed-int-response time, so they remove the
'current opcode' jitter. This was also discussed some years ago as a
feature option for a faster 8051 that never got to silicon.
It's a good thing to see devices that are moving 'less deterministic'
offer a way to be 'more deterministic' :)
[- and it side-steps all that SW work ]

-jg

From: -jg on 16 Jan 2010 17:43

On Jan 17, 11:01 am, Jon Kirwan <j...(a)infinitefactors.org> wrote:
> On Sat, 16 Jan 2010 13:01:24 -0800 (PST), -jg
>
> <jim.granvi...(a)gmail.com> wrote:
>
> > It almost comes into "better reporting"
>
> Not just better reporting for the case I'm talking about. If
> you can't provide something that actually impacts the code
> generation but have very good reporting on what you get out
> of it, you will find yourself fighting the darned thing at
> every turn of the code trying to work out equal times. It's
> far better to provide the clues and hints a compiler can then
> use in generating code.

In the simplest cases, the difference between manually adding NOPs
and checking, or trawling the manual, setting a directive, and then
also checking the tool really DID do what you hoped, is negligible.

In the more complex cases, the tools are almost certain to drop the
ball, so you'll also need to manually check.

So that's why I favour 'better reporting' over 'granular smarts'
anytime, as the reporting is ALWAYS useful, and the smarts are only
case-by-case useful and with risky failure modes.
[aka The illusion of intelligence is dangerous]

The rare times I've needed to do this, it was no sweat to just use a
pencil, and then do an operational check.

-jg

From: -jg on 17 Jan 2010 15:15

On Jan 18, 7:52 am, Jon Kirwan <j...(a)infinitefactors.org> wrote:
> On Sun, 17 Jan 2010 07:11:53 -0500, Walter Banks
> <wal...(a)bytecraft.com> wrote:
> >3) Pre computing the next result and posting the result when
> > needed. (This is routinely the approach in automotive
> > controllers)

Pre computing can also mean taking a multi-branch-derived answer, and
applying it earliest in an interrupt.
So each interrupt sample/decision/calculates/stores, but before it
starts the SW branches, it pops out the answer from the last
interrupt.
So you trade off latency, for less jitter.

We do exactly this in a rate-multiplier DAC library.

This is also very revision safe.

[with the M0, this would yield 0 jitter (on one interrupt)
In the 8051, if you are in idle, there is also zero jitter.

Another jitter-accumulate-avoidance trick usable on small uC, with 'no
AutoReload on that timer', is to do the next value calc/reload only on
the high byte.
Places caveats on your crystal, and a ceiling on INT response times,
but does side-step any SW added creepage to total times.

'Timer-snap' that Walter mentioned, does not need to wholly consume a
timer, just have it running.
Useful when you have too many branches to control...

You read the lower-byte cycle value as a starting value, run all your
variant branches, and then pad the fastest ones with a timer-derived
pause.
Timer granularity is usually less an issue then SW granularity.
Getting single cycle increments in SW usually means multiple paths..
and your fix-it SW can consume more than the do-it sw ;)

Not sure how you'd 'compiler automate' this ?
perhaps insert a start tag, and a series of stop tags,
all in the source, and create/maintain/calibrate a whole series of
cycle-tables, for the cores your compiler supports. There are over a
dozen timing choices on 80C51's alone now.
(NOT going to be easy for the compiler to correctly add value-
dependant multiple branches, so a pencil is _still_ needed)

That's a lot of work, for a rarely used feature, that is sure to never
quite exactly match what the customer wants to do anyway! ;)

So I still don't think you can replace that pencil, but good reports
can make the task easier.

-jg

From: Walter Banks on 17 Jan 2010 17:26

-jg wrote:

> Not sure how you'd 'compiler automate' this ?
> perhaps insert a start tag, and a series of stop tags,
> all in the source, and create/maintain/calibrate a whole series of
> cycle-tables, for the cores your compiler supports. There are over a
> dozen timing choices on 80C51's alone now.
> (NOT going to be easy for the compiler to correctly add value-
> dependant multiple branches, so a pencil is _still_ needed)

We have one advantage in our compilers for this because we
normally compile directly to machine code. For processors with
deterministic timing constant timing is possible for the limited
set of problems whose timing is deterministic.

Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

From: Walter Banks on 17 Jan 2010 17:33

-jg wrote:

> > On Sun, 17 Jan 2010 07:11:53 -0500, Walter Banks
> > <wal...(a)bytecraft.com> wrote:
> > >3) Pre computing the next result and posting the result when
> > > needed. (This is routinely the approach in automotive
> > > controllers)
>
> Pre computing can also mean taking a multi-branch-derived answer, and
> applying it earliest in an interrupt.
> So each interrupt sample/decision/calculates/stores, but before it
> starts the SW branches, it pops out the answer from the last
> interrupt.
> So you trade off latency, for less jitter.

In one of the processors each I/O pin had an associated interrupt
and one of the modes transferred latched an input bit or transferred
a latched bit to an output when the interrupt fired. The interrupt
was also synched to a specific clock edge.

I have used the same technique of pre computing bits for
processors with no special hardware.

Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Best way to learn ... need some help
Next: A tool that suggests optimized logic for a piece of code/module/function