From: Rob Warnock on 4 Jun 2010 00:31 Rick Jones <rick.jones2(a)hp.com> wrote: +--------------- | Noob <root(a)127.0.0.1> wrote: | > There are situations where polling makes sense. | > Consider a high-throughput NIC constantly interrupting the CPU. | > Keywords: interrupt mitigation, interrupt coalescing | | I would not equate interrupt coalescing with polling, merely with | being "smarter" about when to generate an interrupt. Alas, some NICs | and their drivers don't do coalescing in what I would consider a | "smart" way and so affect unloaded latency at the same time. +--------------- Hah! Having worked on a design[1] in *1971* [and *many* similar designs since] which IMHO did interrupt coalescing & holdoff "the right way", I have found it fascinating in the decades since (and sometimes extremely frustrating) that hardware designers tend to be (for the most part) *very* reluctant to do it in that way. They always seem to want to put any holdoffs *before* the interrupts, which seriously hurts latency! IMHO, "the right way" is this: 0. The hardware (for a device or class of devices) shall contain a pre-settable countdown timer that, when running, masks off (a.k.a. "holds off") interrupts being produced from an "attention" states in the device(s) -- *WITHOUT* preventing the underlying attention state from being read!! That is, the interrupt holdoff shall *not* prevent software from seeing that the device (still) wants attention.[2] 1. The default state of the system is that the interrupt holdoff countdown timer is expired, and device interrups are enabled. Thus, the CPU will be interrupted immediately upon any attention condition in the device. 2. [Assumption:] The system will internally disable recursive interrupts from a device until the device driver specifically re-enables them and/or performs some explicit interrupt "dismiss" operation. If this is not the case [as with some CPUs/systems] then a small amount of additional hardware/software needs to be wrapped around the device. Take it as read that this can be a bit tricky, but is usually cheap. 3. "Coalescing": The device driver interrupt service routine shall *continue* to poll for attention status and continue to service the device until the attention status goes false. [In fact, in some systems it's a good idea if it polls the *first* time into the ISR as well, to deal with potential spurious interrupts. But that's another story...] 4. "Dally": For some combinations of device/CPU/system/software/application, it's a good idea for the device driver to *continue* polling for an additional "dally" time *after* the attention status goes false, just in case it comes true again "quickly" (for a system/app-dependent definition of "quickly"), and if so, go back to step #3. [The dally counter can trivially be merged into the coalescing poll loop, so that #3 & #4 are one piece of code. Nevertheless, the dally time should be a separate tunable, preferably dynamically.] 5. When the dally time has expired, the driver should write a (tunable) holdoff time into the device's countdown timer. [If further interrupts had to be explicitly disabled before entering the coalescing loop, it will be convenient if this PIO Write *also* re-enables device interrupts. Note, however, that no actual CPU interrupt will occur until the countdown timer expires.] 6. The device driver now dismisses the interrupt. 7. "Holdoff": The countdown timer ticks away, suppressing any possible new device interrupts, until it expires. If a device attention state had occurred while the holdoff timer was running, then a new interrupt is generated immediately upon timer expiration, and we transition to state #2 above. Otherwise, the timer expiration is a silent, unobserved event, with no effect other than putting us back in state #1 above (idle). [And thus any subsequent device attention state generates a new interrupt immediately.] The latency on an idle system is a short as is possible, yet under heavy load the efficiency is as high as possible [and latency is dominated by queueing effects, since the CPU will be saturated]. The "holdoff" parameter can be tuned for a smooth tradeoff between latency & efficiency across the load range, and the "dally" parameter can be tuned to improve both latency and efficiency in the presence of known bursty traffic patterns[3]. Thus endeth the reading of the lesson. ;-} -Rob [1] The DCA "SmartMux" family of terminal networking frontends & nodes. [2] Yes, one 3rd-party device (which shall not be named) really did that: made the attention status invisible if the holdoff was running!! *ARGGGH!* [3] If you know that there is a certain minimal device device response time to driver actions -- or a certain minimal response on the far end of an interconnect you're talking through (hint: MPI) -- then it can be helpful to make the "dally" time just larger than this minimal response time, to avoid unnecessary "holdoff" of responses in a rapid flurry of exchanges. E.g., in one case a certain common control command from the driver would, after a short delay, result in a flurry of status change events from the device. Adding a dally time just longer than that short delay lowered the CPU utilization by more than *half*! In most typical applications, the optimal dally time will be a small fraction of the holdoff time. And in any case, the peaks for "good" values of both parameters are rather broad. ----- Rob Warnock <rpw3(a)rpw3.org> 627 26th Avenue <URL:http://rpw3.org/> San Mateo, CA 94403 (650)572-2607
From: Terje Mathisen "terje.mathisen at on 4 Jun 2010 04:33 Rob Warnock wrote: > Hah! Having worked on a design[1] in *1971* [and *many* similar designs > since] which IMHO did interrupt coalescing& holdoff "the right way", I > have found it fascinating in the decades since (and sometimes extremely > frustrating) that hardware designers tend to be (for the most part) *very* > reluctant to do it in that way. They always seem to want to put any holdoffs > *before* the interrupts, which seriously hurts latency! > > IMHO, "the right way" is this: What's fun is that I had to independently rediscover most of these in order to get an early PC to handle fast serial comms: > 0. The hardware (for a device or class of devices) shall contain a > pre-settable countdown timer that, when running, masks off (a.k.a. > "holds off") interrupts being produced from an "attention" states > in the device(s) -- *WITHOUT* preventing the underlying attention > state from being read!! That is, the interrupt holdoff shall *not* > prevent software from seeing that the device (still) wants attention.[2] The 16450 rs232 chip could be programmed to delay interrupts until a given percentage of the 16-entry FIFO buffer had been consumed, but a receive irq handler could still see that the buffer was non-empty by polling the status. > 1. The default state of the system is that the interrupt holdoff > countdown timer is expired, and device interrups are enabled. > Thus, the CPU will be interrupted immediately upon any attention > condition in the device. Yes, that was the default. > > 2. [Assumption:] The system will internally disable recursive interrupts > from a device until the device driver specifically re-enables them > and/or performs some explicit interrupt "dismiss" operation. If this > is not the case [as with some CPUs/systems] then a small amount of > additional hardware/software needs to be wrapped around the device. > Take it as read that this can be a bit tricky, but is usually cheap. The IRQ driver could selectively re-enable all other interrupt sources as soon as possible, then the final IRET instruction would allow all sources back in. This made recursive IRQs impossible while still allowing the minimum possible latency for other devices. > > 3. "Coalescing": The device driver interrupt service routine shall > *continue* to poll for attention status and continue to service the > device until the attention status goes false. [In fact, in some systems > it's a good idea if it polls the *first* time into the ISR as well, to > deal with potential spurious interrupts. But that's another story...] This was needed in case the previous polling loop had emptied out a receive buffer which had been momentarily empty and then re-filled, thereby causing another interrupt to be queued up. > > 4. "Dally": For some combinations of device/CPU/system/software/application, > it's a good idea for the device driver to *continue* polling for > an additional "dally" time *after* the attention status goes false, > just in case it comes true again "quickly" (for a system/app-dependent > definition of "quickly"), and if so, go back to step #3. [The dally > counter can trivially be merged into the coalescing poll loop, so > that #3& #4 are one piece of code. Nevertheless, the dally time > should be a separate tunable, preferably dynamically.] For the very fastest speeds (i.e. running 115 kbit/s) this was also required. > > 5. When the dally time has expired, the driver should write a (tunable) > holdoff time into the device's countdown timer. [If further interrupts > had to be explicitly disabled before entering the coalescing loop, > it will be convenient if this PIO Write *also* re-enables device > interrupts. Note, however, that no actual CPU interrupt will occur > until the countdown timer expires.] This I couldn't do. > > 6. The device driver now dismisses the interrupt. Right. > > 7. "Holdoff": The countdown timer ticks away, suppressing any possible > new device interrupts, until it expires. If a device attention state > had occurred while the holdoff timer was running, then a new interrupt > is generated immediately upon timer expiration, and we transition to > state #2 above. Otherwise, the timer expiration is a silent, unobserved > event, with no effect other than putting us back in state #1 above (idle). > [And thus any subsequent device attention state generates a new interrupt > immediately.] This had to be emulated by the delayed interrupt facility, allowing up to N bytes to be received back-to-back, but generating an interrupt as soon as the idle gap after a byte passed some fraction of the minimum byte time. I did get my file transfer/sync program to run consistently at full (115k) speed this way. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Rick Jones on 4 Jun 2010 14:09 Rob Warnock <rpw3(a)rpw3.org> wrote: > 3. "Coalescing": The device driver interrupt service routine shall > *continue* to poll for attention status and continue to service > the device until the attention status goes false. [In fact, in > some systems it's a good idea if it polls the *first* time into > the ISR as well, to deal with potential spurious interrupts. But > that's another story...] I trust that is something other than a PIO Read? > Thus endeth the reading of the lesson. ;-} Thank you sensei :) rick jones > [3] If you know that there is a certain minimal device device > response time to driver actions -- or a certain minimal response > on the far end of an interconnect you're talking through (hint: > MPI) -- then it can be helpful to make the "dally" time just > larger than this minimal response time, to avoid unnecessary > "holdoff" of responses in a rapid flurry of exchanges. E.g., in > one case a certain common control command from the driver would, > after a short delay, result in a flurry of status change events > from the device. Adding a dally time just longer than that short > delay lowered the CPU utilization by more than *half*! > In most typical applications, the optimal dally time will be a > small fraction of the holdoff time. And in any case, the peaks > for "good" values of both parameters are rather broad. Sounds a little like deciding how long to spin on a mutex before going into the "other guy has it" path :) rick jones -- oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates these opinions are mine, all mine; HP might not want them anyway... :) feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: robertwessel2 on 4 Jun 2010 16:46 On Jun 4, 3:33 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > The 16450 rs232 chip could be programmed to delay interrupts until a > given percentage of the 16-entry FIFO buffer had been consumed, but a > receive irq handler could still see that the buffer was non-empty by > polling the status. To be pedantic, that was the 16550A. The 16450 was basically an 8250 clone, with official support for higher speeds. Of course programming the 16550 and using the buffer was complicated by a number of bugs, not least being its propensity of the write FIFO getting stuck if you put a single byte into it at just the wrong time.
From: FredK on 4 Jun 2010 17:48
"Rick Jones" <rick.jones2(a)hp.com> wrote in message news:hubfhf$o27$4(a)usenet01.boi.hp.com... > Rob Warnock <rpw3(a)rpw3.org> wrote: >> 3. "Coalescing": The device driver interrupt service routine shall >> *continue* to poll for attention status and continue to service >> the device until the attention status goes false. [In fact, in >> some systems it's a good idea if it polls the *first* time into >> the ISR as well, to deal with potential spurious interrupts. But >> that's another story...] > > I trust that is something other than a PIO Read? > On a typical PCI device this is a PIO register read. On some devices, the ISR read itself may ack the interrupt, but more often it's a write to the ISR to reset it. Worse, is that it is often followed by another read which both forces the write to be completed before the read (stalling the bus and CPU) before you see the ISR to see if there is another interrupt that can be serviced. The attempt to fix this is changing from pin based interrupts to message based interrupts (MSI) on PCIe. It allows for the possibility of a variety of interrupts from a device that don't (necessarily) need to be acked simply to discover what the interrupt itself was - which causes the bus to come to a screeching halt. Which isn't good if you have multiple slots on a single bus. But while all this is nice, none of this has anything to do with *eliminating* the interrupt mechanism or interrupts. Just ways to mimimize the number and cost of IO interrupts. None of which is particularly new. I wouldn't even call some of the things being described here as "polling". It isn't unsual for a driver concerned about latency to sometimes spin "a little" when it gets an input interrupt on the assumption that things come in bursts. But unless you do it as a hard spin - it's useless. And a hard spin wastes CPU time (of course unless you have CPUs to burn). |