From: Terje Mathisen on 10 Sep 2005 06:22 andrewspencers(a)yahoo.com wrote: > My argument is essentially this: > Interrupt-triggered events are better than polling-triggered events, > assuming that the cost of setting up the interrupt is less than the > total cost of all the polling repetitions. The cost of setting up _and_ the cost of actually taking it! > Compare-and-branch inside a loop is a poll of the loop-exiting > condition. > Thus for a high-repetition loop, an interrupt-triggered exit is better > than a compare-and-branch exit. It is _not_ better: a) If the loop count is high, the loop branch will always be correctly predicted, except for the loop exit, right? b) If the loop count is stored in a fixed register, like ECX for an x86 LOOP instruction, or otherwise designated, then the number of branch misses should be zero instead of one. c) The loop count register is often used inside the loop anyway, so you must use alu resources to update it. d) How do you propose to generate loops unless you have some kind of (in your case unconditional) branch at the bottom? e) Modern cpus are very seldom integer alu resource constrained, so even when you have no other need for the loop count register, the actual update is effectively free. OTOH, ints _can_ be a win in other situations, specifically when they are used to guard against/handle really exceptional situations, where the extra overhead of a taken interrupts can be amortized over many runs when nothing happens. Noting my argument (e), the overhead of inserting INTO opcodes at relevant sites in the code can be close to zero, since the cpu can treat it like a "strongly predicted to fall through" branch, i.e. it doesn't even need to take up any branch predictor buffer space. Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: andrewspencers on 10 Sep 2005 22:10 Terje Mathisen wrote: > a) If the loop count is high, the loop branch will always be correctly > predicted, except for the loop exit, right? Right. > c) The loop count register is often used inside the loop anyway, so you > must use alu resources to update it. Yes, but I wasn't arguing against using a loop count register or against updating it inside the loop; I was arguing against including instructions inside the loop which conditionally branch based on the value in the register. I wanted the processor to watch the register and trigger an interrupt when it reached some target value, the same way that the processor watches e.g. the program counter and triggers an interrupt when it matches the value in a breakpoint register, with zero impact on program execution speed until the interrupt is triggered. > d) How do you propose to generate loops unless you have some kind of (in > your case unconditional) branch at the bottom? Yes, I proposed using an unconditional branch at the bottom. > OTOH, ints _can_ be a win in other situations, specifically when they > are used to guard against/handle really exceptional situations, where > the extra overhead of a taken interrupts can be amortized over many runs > when nothing happens. For example, the really exceptional situation of exiting a high-repetition loop, where the extra overhead of the interrupt can be amortized over the many repetitions when the interrupt isn't triggered? I.e. isn't what you wrote here actually an argument in favor of my proposal? But actually now I agree that your argument is correct that my proposal is useless for normal loops, but can be useful for other situations. But I think you mischaracterized those other situations; ints aren't useful just because they're handling really exceptional situations, but because they're handling _a large amount_ of exceptional situations inside the loop. Specifically, exiting a high-repeition loop is an exceptional situation, but the cache, branch prediction, and speculative execution features combine to make a single test-branch (or possibly even several of them) inside the loop effectively free, so my interrupt proposal wouldn't gain anything. But if there are enough test-branches inside the loop (especially if there are a bunch of inner loops inside the loop), then the speculative execution circuitry and possibly even the cache can get overloaded, such that test-branch instructions are no longer free. In _this_ case, my interrupt proposal would be useful. And one such case is a loop (possibly a big loop with a lot of inner loops) with a large number of integer operations each of which must test for and handle overflows, which is generally the case with bignum (infinite precision integer) arithmetic, which is the particular case which prompted me to start this thread in the first place. Another case would be a linear search for which the target string is generally very long and there are a lot of partial matches within the search space. Another case, I think (but I'm too tired right now to think clearly enough to be sure), would be a relational join algorithm. Is my analysis accurate? > Noting my argument (e), the overhead of inserting INTO opcodes at > relevant sites in the code can be close to zero, since the cpu can treat > it like a "strongly predicted to fall through" branch, i.e. it doesn't > even need to take up any branch predictor buffer space. But I'm not proposing using INTO (a soft interrupt); I'm proposing using a hard interrupt, which works the same way that breakpointing works.
From: David Hopwood on 11 Sep 2005 12:39 andrewspencers(a)yahoo.com wrote: > But actually now I agree that your argument is correct that my proposal > is useless for normal loops, but can be useful for other situations. > But I think you mischaracterized those other situations; ints aren't > useful just because they're handling really exceptional situations, but > because they're handling _a large amount_ of exceptional situations > inside the loop. Specifically, exiting a high-repetition loop is an > exceptional situation, but the cache, branch prediction, and > speculative execution features combine to make a single test-branch (or > possibly even several of them) inside the loop effectively free, so my > interrupt proposal wouldn't gain anything. But if there are enough > test-branches inside the loop (especially if there are a bunch of inner > loops inside the loop), then the speculative execution circuitry and > possibly even the cache can get overloaded, such that test-branch > instructions are no longer free. In _this_ case, my interrupt proposal > would be useful. If the speculative execution circuitry is "overloaded", then the same is likely to be true of the interrupt checking under the same circumstances. Checking for interrupts (which then need to be delivered precisely) is not free; in terms of complexity it is similar to speculative execution. That is, you strongly speculate that the interrupt is not taken on each instruction, and have to recover if it is. The difference is that you need dedicated hardware resources to check the interrupt conditions on *every* instruction, rather than using shared resources to do it only when the condition is relevant. (Also, although this would be a fairly minor issue: interrupts used for this purpose add to the state that has to be saved and restored on context switches.) -- David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>
From: andrewspencers on 11 Sep 2005 15:41 David Hopwood wrote: > If the speculative execution circuitry is "overloaded", then the same is > likely to be true of the interrupt checking under the same circumstances. > Checking for interrupts (which then need to be delivered precisely) is > not free; in terms of complexity it is similar to speculative execution. > That is, you strongly speculate that the interrupt is not taken on each > instruction, and have to recover if it is. The difference is that you need > dedicated hardware resources to check the interrupt conditions on *every* > instruction, rather than using shared resources to do it only when the > condition is relevant. Yes, that makes sense. But then in that case, when you wrote in your prior message: > OTOH, ints _can_ be a win in other situations, specifically when they > are used to guard against/handle really exceptional situations, where > the extra overhead of a taken interrupts can be amortized over many runs > when nothing happens. Which such situations did you have in mind?
From: David Hopwood on 11 Sep 2005 19:14
andrewspencers(a)yahoo.com wrote: > David Hopwood wrote: > >>If the speculative execution circuitry is "overloaded", then the same is >>likely to be true of the interrupt checking under the same circumstances. >>Checking for interrupts (which then need to be delivered precisely) is >>not free; in terms of complexity it is similar to speculative execution. >>That is, you strongly speculate that the interrupt is not taken on each >>instruction, and have to recover if it is. The difference is that you need >>dedicated hardware resources to check the interrupt conditions on *every* >>instruction, rather than using shared resources to do it only when the >>condition is relevant. > > Yes, that makes sense. But then in that case, when you wrote in your > prior message: > >>OTOH, ints _can_ be a win in other situations, specifically when they >>are used to guard against/handle really exceptional situations, where >>the extra overhead of a taken interrupts can be amortized over many runs >>when nothing happens. > > Which such situations did you have in mind? I didn't write that; Terje Mathisen did. -- David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk> |