From: Weng Tianxiang on 11 Feb 2010 15:05 Hi, I finally understand the reason when a flip-flops can be replaced by a latch. Here is the excerpt from the paper "Atom Processor Core Made FPGA Synthesizable" Optimized for a frequency range from 800MHz to 1.86Ghz, the original Atom design makes extensive use of latches to support time borrowing along the critical timing paths. With level-sensitive latches, a signal may have a delay larger than the clock period and may flush through the latches without causing incorrect data propagation, whereas the delay of a signal in designs with edge-triggered flip-flops must be smaller than the clock period to ensure the correctness of data propagation across flip-flop stages [3]. It is well known that the static timing analysis of latch-based pipeline designs with level-sensitive latches is challenging due to two salient characteristics of time borrowing [2, 3, 14]: (1) a delay in one pipeline stage depends on the delays in the previous pipeline stage. (2) in a pipeline design, not only do the longest and shortest delays from a primary input to a primary output need to be propagated through the pipeline stages, but also the critical probabilities that the delays on latches violate setup-time and hold-time constraints. Such high dependency across the pipeline stages makes it very difficult to gauge the impact of correlations among delay random variables, especially the correlations resulting from reconvergent fanouts. Due to this innate difficulty, synthesis tools like DC-FPGA simply do not support latch analysis and synthesis correctly." In short, a pipeline with several FFs can be replaced with a pipeline with two FFs in the ends and normal latches inserted between them to steal time slack. FF1 ---> FF2 ---> FF3 ---> FF4 FF1 ------->l2 --------> l3--> FF4. I saw the circuits before, but not realized what the basic reason was. With the above paper, I now know that the technology is not a new, it originated in 1980s. Weng
From: Patrick Maupin on 11 Feb 2010 20:33 Yes, latch-based design is much older than flop-based design, for the simple reason that it can be cheaper. Think about it -- every flop is really two latches! (At least for static designs that can be clocked down to DC...) Where I work (at a chip company), we're still occasionally converting latch-based designs into flop-based ones. But (and this is a big but) FPGAs themselves (not just the design tools) are designed for flop-based design, so if you use latch-based designs with FPGAs you are not only stressing the timing tools, you are also avoiding the nice, packaged, back-to-back dedicated latches they give you called flops. Pat On Feb 11, 2:05 pm, Weng Tianxiang <wtx...(a)gmail.com> wrote: > Hi, > I finally understand the reason when a flip-flops can be replaced by a > latch. > > Here is the excerpt from the paper "Atom Processor Core Made FPGA > Synthesizable" > Optimized for a frequency range from 800MHz to 1.86Ghz, > the original Atom design makes extensive use of latches > to support time borrowing along the critical timing paths. > With level-sensitive latches, a signal may have a delay larger > than the clock period and may flush through the latches > without causing incorrect data propagation, whereas the delay > of a signal in designs with edge-triggered flip-flops must > be smaller than the clock period to ensure the correctness of > data propagation across flip-flop stages [3]. It is well known > that the static timing analysis of latch-based pipeline designs > with level-sensitive latches is challenging due to two > salient characteristics of time borrowing [2, 3, 14]: (1) a > delay in one pipeline stage depends on the delays in the previous > pipeline stage. (2) in a pipeline design, not only do > the longest and shortest delays from a primary input to a > primary output need to be propagated through the pipeline > stages, but also the critical probabilities that the delays on > latches violate setup-time and hold-time constraints. Such > high dependency across the pipeline stages makes it very > difficult to gauge the impact of correlations among delay > random variables, especially the correlations resulting from > reconvergent fanouts. Due to this innate difficulty, synthesis > tools like DC-FPGA simply do not support latch analysis > and synthesis correctly." > > In short, a pipeline with several FFs can be replaced with a pipeline > with two FFs in the ends and normal latches inserted between them to > steal time slack. > > FF1 ---> FF2 ---> FF3 ---> FF4 > FF1 ------->l2 --------> l3--> FF4. > > I saw the circuits before, but not realized what the basic reason was. > With the above paper, I now know that the technology is not a new, it > originated in 1980s. > > Weng
From: glen herrmannsfeldt on 11 Feb 2010 21:33 In comp.arch.fpga Patrick Maupin <pmaupin(a)gmail.com> wrote: > Yes, latch-based design is much older than flop-based design, for the > simple reason that it can be cheaper. Think about it -- every flop is > really two latches! (At least for static designs that can be clocked > down to DC...) Where I work (at a chip company), we're still > occasionally converting latch-based designs into flop-based ones. Often using a two (or more) phase clock. Some latches work on one phase, some on the other. With appropriately non-overlapping, one avoids race conditions and the timing isn't so hard to get right. > But (and this is a big but) FPGAs themselves (not just the design > tools) are designed for flop-based design, so if you use latch-based > designs with FPGAs you are not only stressing the timing tools, you > are also avoiding the nice, packaged, back-to-back dedicated latches > they give you called flops. Well, you could use a sequence of FF's, clocking on different clock edges, or the same edge of two clocks. That allows for some of the advantages. If there was enough demand, I suppose FPGA companies would build transparent latch based devices. (Who remembers the 7475?) In pipelined processors of years past the Earle latch combined one level of logic with the latch logic, reducing the latch delay. -- glen
From: rickman on 12 Feb 2010 11:32 On Feb 11, 3:05 pm, Weng Tianxiang <wtx...(a)gmail.com> wrote: > Hi, > I finally understand the reason when a flip-flops can be replaced by a > latch. > > Here is the excerpt from the paper "Atom Processor Core Made FPGA > Synthesizable" > Optimized for a frequency range from 800MHz to 1.86Ghz, > the original Atom design makes extensive use of latches > to support time borrowing along the critical timing paths. > With level-sensitive latches, a signal may have a delay larger > than the clock period and may flush through the latches > without causing incorrect data propagation, whereas the delay > of a signal in designs with edge-triggered flip-flops must > be smaller than the clock period to ensure the correctness of > data propagation across flip-flop stages [3]. It is well known > that the static timing analysis of latch-based pipeline designs > with level-sensitive latches is challenging due to two > salient characteristics of time borrowing [2, 3, 14]: (1) a > delay in one pipeline stage depends on the delays in the previous > pipeline stage. (2) in a pipeline design, not only do > the longest and shortest delays from a primary input to a > primary output need to be propagated through the pipeline > stages, but also the critical probabilities that the delays on > latches violate setup-time and hold-time constraints. Such > high dependency across the pipeline stages makes it very > difficult to gauge the impact of correlations among delay > random variables, especially the correlations resulting from > reconvergent fanouts. Due to this innate difficulty, synthesis > tools like DC-FPGA simply do not support latch analysis > and synthesis correctly." > > In short, a pipeline with several FFs can be replaced with a pipeline > with two FFs in the ends and normal latches inserted between them to > steal time slack. > > FF1 ---> FF2 ---> FF3 ---> FF4 > FF1 ------->l2 --------> l3--> FF4. > > I saw the circuits before, but not realized what the basic reason was. > With the above paper, I now know that the technology is not a new, it > originated in 1980s. > > Weng I'm a little unclear on how this works. Is this just a matter of the outputs of the latches settling earlier if the logic path is faster so that the next stage actually has more setup time? This requires that there be a minimum delay in any given path so that the correct data is latched on the current clock cycle while the result for the next clock cycle is still propagating through the logic. I can see where this might be helpful, but it would be a nightmare to analyze in timing, mainly because of the wide range of delays with process, voltage and temperature (PVT). I have been told you need to allow 2:1 range when considering all three. I think similar issues are involved when considering async design (or more accurately termed self-timed). In that design method the variations in delay affect the timing of both the data path and clock path so that they are largely nulled out so that the min delays do not need to include the full 2:1 range compared to the max. Some amount of slack time must be given so the clock arrives after the data, but otherwise all the speed of the logic is utilized at all times. This also is supposed to provide for lower noise designs because there is no chip wide clock giving rise to simultaneous switching noise. Self- timed logic does not really result in significant increases in processing speed because although the max speed can be faster, an application can never rely on that faster speed being available. But for applications where there is optional processing that can be done using the left over clock cycles (poor term in this case, but you know what I mean) it can be useful. In the case of using latches in place of registers, the speed gains are always usable. But can't the same sort of gains be made by register leveling? If you have logic that is slower than a clock cycle followed by logic that is faster than a clock cycle, why not just move some of the slow logic across the register to the faster logic section? Rick
From: Patrick Maupin on 12 Feb 2010 22:26 On Feb 11, 8:33 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote: > In comp.arch.fpga Patrick Maupin <pmau...(a)gmail.com> wrote: > > > But (and this is a big but) FPGAs themselves (not just the design > > tools) are designed for flop-based design, so if you use latch-based > > designs with FPGAs you are not only stressing the timing tools, you > > are also avoiding the nice, packaged, back-to-back dedicated latches > > they give you called flops. > > Well, you could use a sequence of FF's, clocking on different clock > edges, or the same edge of two clocks. > I actually did this in Xilinx FPGAs back in 1999. The specific problem I was solving was an insufficient number of global clocks (a lot of interconnects with source-based clocking). Xilinx has solutions for this now (regional clocks), but not back then. So I used regular interconnect for clocking, and that was very high skew, so that you couldn't guarantee that the same edge was, in fact, the same edge for all the flops on the clock. The solution was to do as you said -- the inputs to every flop were from flops clocked on the opposite edge. That, and reducing the amount of logic in that clock domain and clock-crossing to a "real" clock domain as soon as possible.
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Multple architectures in ISE top level module? Next: QDRII on StratixIII pinout strangeness |