interrupting for overflow and loop termination [Computer Architecture]

Prev: Intel x86 memory model question
Next: C++: 64 bit performance vs. 32 bit

From: Terje Mathisen on 10 Sep 2005 06:22

andrewspencers(a)yahoo.com wrote:
> My argument is essentially this:
> Interrupt-triggered events are better than polling-triggered events,
> assuming that the cost of setting up the interrupt is less than the
> total cost of all the polling repetitions.

The cost of setting up _and_ the cost of actually taking it!

> Compare-and-branch inside a loop is a poll of the loop-exiting
> condition.
> Thus for a high-repetition loop, an interrupt-triggered exit is better
> than a compare-and-branch exit.

It is _not_ better:

a) If the loop count is high, the loop branch will always be correctly
predicted, except for the loop exit, right?

b) If the loop count is stored in a fixed register, like ECX for an x86
LOOP instruction, or otherwise designated, then the number of branch
misses should be zero instead of one.

c) The loop count register is often used inside the loop anyway, so you
must use alu resources to update it.

d) How do you propose to generate loops unless you have some kind of (in
your case unconditional) branch at the bottom?

e) Modern cpus are very seldom integer alu resource constrained, so even
when you have no other need for the loop count register, the actual
update is effectively free.

OTOH, ints _can_ be a win in other situations, specifically when they
are used to guard against/handle really exceptional situations, where
the extra overhead of a taken interrupts can be amortized over many runs
when nothing happens.

Noting my argument (e), the overhead of inserting INTO opcodes at
relevant sites in the code can be close to zero, since the cpu can treat
it like a "strongly predicted to fall through" branch, i.e. it doesn't
even need to take up any branch predictor buffer space.

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

From: andrewspencers on 10 Sep 2005 22:10

Terje Mathisen wrote:
> a) If the loop count is high, the loop branch will always be correctly
> predicted, except for the loop exit, right?
Right.

> c) The loop count register is often used inside the loop anyway, so you
> must use alu resources to update it.
Yes, but I wasn't arguing against using a loop count register or
against updating it inside the loop; I was arguing against including
instructions inside the loop which conditionally branch based on the
value in the register. I wanted the processor to watch the register and
trigger an interrupt when it reached some target value, the same way
that the processor watches e.g. the program counter and triggers an
interrupt when it matches the value in a breakpoint register, with zero
impact on program execution speed until the interrupt is triggered.

> d) How do you propose to generate loops unless you have some kind of (in
> your case unconditional) branch at the bottom?
Yes, I proposed using an unconditional branch at the bottom.

> OTOH, ints _can_ be a win in other situations, specifically when they
> are used to guard against/handle really exceptional situations, where
> the extra overhead of a taken interrupts can be amortized over many runs
> when nothing happens.
For example, the really exceptional situation of exiting a
high-repetition loop, where the extra overhead of the interrupt can be
amortized over the many repetitions when the interrupt isn't triggered?
I.e. isn't what you wrote here actually an argument in favor of my
proposal?
But actually now I agree that your argument is correct that my proposal
is useless for normal loops, but can be useful for other situations.
But I think you mischaracterized those other situations; ints aren't
useful just because they're handling really exceptional situations, but
because they're handling _a large amount_ of exceptional situations
inside the loop. Specifically, exiting a high-repeition loop is an
exceptional situation, but the cache, branch prediction, and
speculative execution features combine to make a single test-branch (or
possibly even several of them) inside the loop effectively free, so my
interrupt proposal wouldn't gain anything. But if there are enough
test-branches inside the loop (especially if there are a bunch of inner
loops inside the loop), then the speculative execution circuitry and
possibly even the cache can get overloaded, such that test-branch
instructions are no longer free. In _this_ case, my interrupt proposal
would be useful.
And one such case is a loop (possibly a big loop with a lot of inner
loops) with a large number of integer operations each of which must
test for and handle overflows, which is generally the case with bignum
(infinite precision integer) arithmetic, which is the particular case
which prompted me to start this thread in the first place.
Another case would be a linear search for which the target string is
generally very long and there are a lot of partial matches within the
search space.
Another case, I think (but I'm too tired right now to think clearly
enough to be sure), would be a relational join algorithm.

Is my analysis accurate?

> Noting my argument (e), the overhead of inserting INTO opcodes at
> relevant sites in the code can be close to zero, since the cpu can treat
> it like a "strongly predicted to fall through" branch, i.e. it doesn't
> even need to take up any branch predictor buffer space.
But I'm not proposing using INTO (a soft interrupt); I'm proposing
using a hard interrupt, which works the same way that breakpointing
works.

From: David Hopwood on 11 Sep 2005 12:39

andrewspencers(a)yahoo.com wrote:
> But actually now I agree that your argument is correct that my proposal
> is useless for normal loops, but can be useful for other situations.
> But I think you mischaracterized those other situations; ints aren't
> useful just because they're handling really exceptional situations, but
> because they're handling _a large amount_ of exceptional situations
> inside the loop. Specifically, exiting a high-repetition loop is an
> exceptional situation, but the cache, branch prediction, and
> speculative execution features combine to make a single test-branch (or
> possibly even several of them) inside the loop effectively free, so my
> interrupt proposal wouldn't gain anything. But if there are enough
> test-branches inside the loop (especially if there are a bunch of inner
> loops inside the loop), then the speculative execution circuitry and
> possibly even the cache can get overloaded, such that test-branch
> instructions are no longer free. In _this_ case, my interrupt proposal
> would be useful.

If the speculative execution circuitry is "overloaded", then the same is
likely to be true of the interrupt checking under the same circumstances.
Checking for interrupts (which then need to be delivered precisely) is
not free; in terms of complexity it is similar to speculative execution.
That is, you strongly speculate that the interrupt is not taken on each
instruction, and have to recover if it is. The difference is that you need
dedicated hardware resources to check the interrupt conditions on *every*
instruction, rather than using shared resources to do it only when the
condition is relevant.

(Also, although this would be a fairly minor issue: interrupts used for
this purpose add to the state that has to be saved and restored on context
switches.)

--
David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>

From: andrewspencers on 11 Sep 2005 15:41

David Hopwood wrote:
> If the speculative execution circuitry is "overloaded", then the same is
> likely to be true of the interrupt checking under the same circumstances.
> Checking for interrupts (which then need to be delivered precisely) is
> not free; in terms of complexity it is similar to speculative execution.
> That is, you strongly speculate that the interrupt is not taken on each
> instruction, and have to recover if it is. The difference is that you need
> dedicated hardware resources to check the interrupt conditions on *every*
> instruction, rather than using shared resources to do it only when the
> condition is relevant.
Yes, that makes sense. But then in that case, when you wrote in your
prior message:

> OTOH, ints _can_ be a win in other situations, specifically when they
> are used to guard against/handle really exceptional situations, where
> the extra overhead of a taken interrupts can be amortized over many runs
> when nothing happens.
Which such situations did you have in mind?

From: David Hopwood on 11 Sep 2005 19:14

andrewspencers(a)yahoo.com wrote:
> David Hopwood wrote:
>
>>If the speculative execution circuitry is "overloaded", then the same is
>>likely to be true of the interrupt checking under the same circumstances.
>>Checking for interrupts (which then need to be delivered precisely) is
>>not free; in terms of complexity it is similar to speculative execution.
>>That is, you strongly speculate that the interrupt is not taken on each
>>instruction, and have to recover if it is. The difference is that you need
>>dedicated hardware resources to check the interrupt conditions on *every*
>>instruction, rather than using shared resources to do it only when the
>>condition is relevant.
>
> Yes, that makes sense. But then in that case, when you wrote in your
> prior message:
>
>>OTOH, ints _can_ be a win in other situations, specifically when they
>>are used to guard against/handle really exceptional situations, where
>>the extra overhead of a taken interrupts can be amortized over many runs
>>when nothing happens.
>
> Which such situations did you have in mind?

I didn't write that; Terje Mathisen did.

--
David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: Intel x86 memory model question
Next: C++: 64 bit performance vs. 32 bit