What is the basis on flip-flops replaced by a latch [FPGA]

Prev: Multple architectures in ISE top level module?
Next: QDRII on StratixIII pinout strangeness

From: Weng Tianxiang on 11 Feb 2010 15:05

Hi,
I finally understand the reason when a flip-flops can be replaced by a
latch.

Here is the excerpt from the paper "Atom Processor Core Made FPGA
Synthesizable"
Optimized for a frequency range from 800MHz to 1.86Ghz,
the original Atom design makes extensive use of latches
to support time borrowing along the critical timing paths.
With level-sensitive latches, a signal may have a delay larger
than the clock period and may flush through the latches
without causing incorrect data propagation, whereas the delay
of a signal in designs with edge-triggered flip-flops must
be smaller than the clock period to ensure the correctness of
data propagation across flip-flop stages [3]. It is well known
that the static timing analysis of latch-based pipeline designs
with level-sensitive latches is challenging due to two
salient characteristics of time borrowing [2, 3, 14]: (1) a
delay in one pipeline stage depends on the delays in the previous
pipeline stage. (2) in a pipeline design, not only do
the longest and shortest delays from a primary input to a
primary output need to be propagated through the pipeline
stages, but also the critical probabilities that the delays on
latches violate setup-time and hold-time constraints. Such
high dependency across the pipeline stages makes it very
difficult to gauge the impact of correlations among delay
random variables, especially the correlations resulting from
reconvergent fanouts. Due to this innate difficulty, synthesis
tools like DC-FPGA simply do not support latch analysis
and synthesis correctly."

In short, a pipeline with several FFs can be replaced with a pipeline
with two FFs in the ends and normal latches inserted between them to
steal time slack.

FF1 ---> FF2 ---> FF3 ---> FF4
FF1 ------->l2 --------> l3--> FF4.

I saw the circuits before, but not realized what the basic reason was.
With the above paper, I now know that the technology is not a new, it
originated in 1980s.

Weng

From: Patrick Maupin on 11 Feb 2010 20:33

Yes, latch-based design is much older than flop-based design, for the
simple reason that it can be cheaper. Think about it -- every flop is
really two latches! (At least for static designs that can be clocked
down to DC...) Where I work (at a chip company), we're still
occasionally converting latch-based designs into flop-based ones.

But (and this is a big but) FPGAs themselves (not just the design
tools) are designed for flop-based design, so if you use latch-based
designs with FPGAs you are not only stressing the timing tools, you
are also avoiding the nice, packaged, back-to-back dedicated latches
they give you called flops.

Pat

On Feb 11, 2:05 pm, Weng Tianxiang <wtx...(a)gmail.com> wrote:
> Hi,
> I finally understand the reason when a flip-flops can be replaced by a
> latch.
>
> Here is the excerpt from the paper "Atom Processor Core Made FPGA
> Synthesizable"
> Optimized for a frequency range from 800MHz to 1.86Ghz,
> the original Atom design makes extensive use of latches
> to support time borrowing along the critical timing paths.
> With level-sensitive latches, a signal may have a delay larger
> than the clock period and may flush through the latches
> without causing incorrect data propagation, whereas the delay
> of a signal in designs with edge-triggered flip-flops must
> be smaller than the clock period to ensure the correctness of
> data propagation across flip-flop stages [3]. It is well known
> that the static timing analysis of latch-based pipeline designs
> with level-sensitive latches is challenging due to two
> salient characteristics of time borrowing [2, 3, 14]: (1) a
> delay in one pipeline stage depends on the delays in the previous
> pipeline stage. (2) in a pipeline design, not only do
> the longest and shortest delays from a primary input to a
> primary output need to be propagated through the pipeline
> stages, but also the critical probabilities that the delays on
> latches violate setup-time and hold-time constraints. Such
> high dependency across the pipeline stages makes it very
> difficult to gauge the impact of correlations among delay
> random variables, especially the correlations resulting from
> reconvergent fanouts. Due to this innate difficulty, synthesis
> tools like DC-FPGA simply do not support latch analysis
> and synthesis correctly."
>
> In short, a pipeline with several FFs can be replaced with a pipeline
> with two FFs in the ends and normal latches inserted between them to
> steal time slack.
>
> FF1 ---> FF2 ---> FF3 ---> FF4
> FF1 ------->l2 --------> l3--> FF4.
>
> I saw the circuits before, but not realized what the basic reason was.
> With the above paper, I now know that the technology is not a new, it
> originated in 1980s.
>
> Weng

From: glen herrmannsfeldt on 11 Feb 2010 21:33

In comp.arch.fpga Patrick Maupin <pmaupin(a)gmail.com> wrote:

> Yes, latch-based design is much older than flop-based design, for the
> simple reason that it can be cheaper. Think about it -- every flop is
> really two latches! (At least for static designs that can be clocked
> down to DC...) Where I work (at a chip company), we're still
> occasionally converting latch-based designs into flop-based ones.

Often using a two (or more) phase clock. Some latches work on
one phase, some on the other. With appropriately non-overlapping,
one avoids race conditions and the timing isn't so hard to get right.

> But (and this is a big but) FPGAs themselves (not just the design
> tools) are designed for flop-based design, so if you use latch-based
> designs with FPGAs you are not only stressing the timing tools, you
> are also avoiding the nice, packaged, back-to-back dedicated latches
> they give you called flops.

Well, you could use a sequence of FF's, clocking on different clock
edges, or the same edge of two clocks.

That allows for some of the advantages. If there was enough demand,
I suppose FPGA companies would build transparent latch based devices.
(Who remembers the 7475?)

In pipelined processors of years past the Earle latch combined one
level of logic with the latch logic, reducing the latch delay.

-- glen

From: rickman on 12 Feb 2010 11:32

On Feb 11, 3:05 pm, Weng Tianxiang <wtx...(a)gmail.com> wrote:
> Hi,
> I finally understand the reason when a flip-flops can be replaced by a
> latch.
>
> Here is the excerpt from the paper "Atom Processor Core Made FPGA
> Synthesizable"
> Optimized for a frequency range from 800MHz to 1.86Ghz,
> the original Atom design makes extensive use of latches
> to support time borrowing along the critical timing paths.
> With level-sensitive latches, a signal may have a delay larger
> than the clock period and may flush through the latches
> without causing incorrect data propagation, whereas the delay
> of a signal in designs with edge-triggered flip-flops must
> be smaller than the clock period to ensure the correctness of
> data propagation across flip-flop stages [3]. It is well known
> that the static timing analysis of latch-based pipeline designs
> with level-sensitive latches is challenging due to two
> salient characteristics of time borrowing [2, 3, 14]: (1) a
> delay in one pipeline stage depends on the delays in the previous
> pipeline stage. (2) in a pipeline design, not only do
> the longest and shortest delays from a primary input to a
> primary output need to be propagated through the pipeline
> stages, but also the critical probabilities that the delays on
> latches violate setup-time and hold-time constraints. Such
> high dependency across the pipeline stages makes it very
> difficult to gauge the impact of correlations among delay
> random variables, especially the correlations resulting from
> reconvergent fanouts. Due to this innate difficulty, synthesis
> tools like DC-FPGA simply do not support latch analysis
> and synthesis correctly."
>
> In short, a pipeline with several FFs can be replaced with a pipeline
> with two FFs in the ends and normal latches inserted between them to
> steal time slack.
>
> FF1 ---> FF2 ---> FF3 ---> FF4
> FF1 ------->l2 --------> l3--> FF4.
>
> I saw the circuits before, but not realized what the basic reason was.
> With the above paper, I now know that the technology is not a new, it
> originated in 1980s.
>
> Weng

I'm a little unclear on how this works. Is this just a matter of the
outputs of the latches settling earlier if the logic path is faster so
that the next stage actually has more setup time? This requires that
there be a minimum delay in any given path so that the correct data is
latched on the current clock cycle while the result for the next clock
cycle is still propagating through the logic. I can see where this
might be helpful, but it would be a nightmare to analyze in timing,
mainly because of the wide range of delays with process, voltage and
temperature (PVT). I have been told you need to allow 2:1 range when
considering all three.

I think similar issues are involved when considering async design (or
more accurately termed self-timed). In that design method the
variations in delay affect the timing of both the data path and clock
path so that they are largely nulled out so that the min delays do not
need to include the full 2:1 range compared to the max. Some amount
of slack time must be given so the clock arrives after the data, but
otherwise all the speed of the logic is utilized at all times. This
also is supposed to provide for lower noise designs because there is
no chip wide clock giving rise to simultaneous switching noise. Self-
timed logic does not really result in significant increases in
processing speed because although the max speed can be faster, an
application can never rely on that faster speed being available. But
for applications where there is optional processing that can be done
using the left over clock cycles (poor term in this case, but you know
what I mean) it can be useful.

In the case of using latches in place of registers, the speed gains
are always usable. But can't the same sort of gains be made by
register leveling? If you have logic that is slower than a clock
cycle followed by logic that is faster than a clock cycle, why not
just move some of the slow logic across the register to the faster
logic section?

Rick

From: Patrick Maupin on 12 Feb 2010 22:26

On Feb 11, 8:33 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote:
> In comp.arch.fpga Patrick Maupin <pmau...(a)gmail.com> wrote:
>
> > But (and this is a big but) FPGAs themselves (not just the design
> > tools) are designed for flop-based design, so if you use latch-based
> > designs with FPGAs you are not only stressing the timing tools, you
> > are also avoiding the nice, packaged, back-to-back dedicated latches
> > they give you called flops.
>
> Well, you could use a sequence of FF's, clocking on different clock
> edges, or the same edge of two clocks.
>

I actually did this in Xilinx FPGAs back in 1999. The specific
problem I was solving was an insufficient number of global clocks (a
lot of interconnects with source-based clocking). Xilinx has
solutions for this now (regional clocks), but not back then. So I
used regular interconnect for clocking, and that was very high skew,
so that you couldn't guarantee that the same edge was, in fact, the
same edge for all the flops on the clock.

The solution was to do as you said -- the inputs to every flop were
from flops clocked on the opposite edge. That, and reducing the
amount of logic in that clock domain and clock-crossing to a "real"
clock domain as soon as possible.

| Next | Last
Pages: 1 2 3 4 5
Prev: Multple architectures in ISE top level module?
Next: QDRII on StratixIII pinout strangeness