From: Patrick Maupin on 12 Feb 2010 22:35 On Feb 12, 10:32 am, rickman <gnu...(a)gmail.com> wrote: > In the case of using latches in place of registers, the speed gains > are always usable. But can't the same sort of gains be made by > register leveling? If you have logic that is slower than a clock > cycle followed by logic that is faster than a clock cycle, why not > just move some of the slow logic across the register to the faster > logic section? That's a similar technique, to be sure, for speed-gains. But as I wrote in an earlier post, I think the primary motivation for latch- based design was originally cost. For example, since each flop is really two latches, if you are going to have logic which ANDs together the output of two flops, you could replace that with ANDing the output of two latches, and outputting that result through another latch, for a net savings of 75% of the latches.
From: Weng Tianxiang on 13 Feb 2010 02:01 On Feb 12, 7:35 pm, Patrick Maupin <pmau...(a)gmail.com> wrote: > On Feb 12, 10:32 am, rickman <gnu...(a)gmail.com> wrote: > > > In the case of using latches in place of registers, the speed gains > > are always usable. But can't the same sort of gains be made by > > register leveling? If you have logic that is slower than a clock > > cycle followed by logic that is faster than a clock cycle, why not > > just move some of the slow logic across the register to the faster > > logic section? > > That's a similar technique, to be sure, for speed-gains. But as I > wrote in an earlier post, I think the primary motivation for latch- > based design was originally cost. For example, since each flop is > really two latches, if you are going to have logic which ANDs together > the output of two flops, you could replace that with ANDing the output > of two latches, and outputting that result through another latch, for > a net savings of 75% of the latches. Your method's target and the target used by CPU designers inserting latches in the pipeline line are totally different. They use it because a combinational signal time delay is tool long to fit within one clock cycle and too short within two clock cycles in a pipeline, not in any places you may want to. Weng
From: John_H on 13 Feb 2010 09:00 On Feb 12, 11:32 am, rickman <gnu...(a)gmail.com> wrote: <snip> > > In the case of using latches in place of registers, the speed gains > are always usable. But can't the same sort of gains be made by > register leveling? If you have logic that is slower than a clock > cycle followed by logic that is faster than a clock cycle, why not > just move some of the slow logic across the register to the faster > logic section? > > Rick I argued with my coworker for a few days about the benefit of latches versus registers before I finally realized the advantage of latch based designs. Not only is granularity less of a problem (e.g., only able to fit 2 logic delays in a level rather than the maximum 2.8 available, losing nearly 30%) but synchronous delays are different. Rather than accounting for Tco+Tsu for every register in a chain of a few clock cycles where register leveling is helpful, only the Tito transparent latch delay (minus the Tilo LUT delay) needs to be added for each latch in the chain [using Xilinx timing nomenclature]. I agree that the register based FPGAs are probably designed (and tested) to minimize Tsu and Tco without strong consideration for Tito and that the timing analysis is NOT set up to do a good job with "latch leveled" timing analysis. When I do use latches (when transferring data between rising/falling time domains for a fast clock, for instance) I have to specify false values around the latch for synchronous analysis rather than the precise values through the latch because the analysis wants to see registers at each stage even with the proper analysis flag turned on. If the analyzer would recognize a chain of rise/fall/rise/fall controlled latches and automatically increase the timing constraint by a half period for each stage, we'd potentially have a powerful tool at our disposal. But they don't so we don't. At least not in FPGAs. - John_H
From: glen herrmannsfeldt on 13 Feb 2010 15:09 In comp.arch.fpga John_H <newsgroup(a)johnhandwork.com> wrote: (snip) > I argued with my coworker for a few days about the benefit of latches > versus registers before I finally realized the advantage of latch > based designs. Not only is granularity less of a problem (e.g., only > able to fit 2 logic delays in a level rather than the maximum 2.8 > available, losing nearly 30%) but synchronous delays are different. > Rather than accounting for Tco+Tsu for every register in a chain of a > few clock cycles where register leveling is helpful, only the Tito > transparent latch delay (minus the Tilo LUT delay) needs to be added > for each latch in the chain [using Xilinx timing nomenclature]. I would have thought that they were fast enough now for that not to matter so much. My thought would be that clock skew, even with the fancy clock distribution system, would be the important factor. If the granularity is the problem then you might try clocking some on rising and some on falling edge (if available) or having two clocks with known phase difference. That would be especially true if the DLL's could generate the appropriate clocks. > I agree that the register based FPGAs are probably designed (and > tested) to minimize Tsu and Tco without strong consideration for Tito > and that the timing analysis is NOT set up to do a good job with > "latch leveled" timing analysis. > When I do use latches (when transferring data between rising/falling > time domains for a fast clock, for instance) I have to specify false > values around the latch for synchronous analysis rather than the > precise values through the latch because the analysis wants to see > registers at each stage even with the proper analysis flag turned on. > If the analyzer would recognize a chain of rise/fall/rise/fall > controlled latches and automatically increase the timing constraint by > a half period for each stage, we'd potentially have a powerful tool at > our disposal. But they don't so we don't. At least not in FPGAs. That sounds useful. If it gets popular enough, maybe they will add it. -- glen
From: John_H on 13 Feb 2010 19:21 On Feb 13, 3:09 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote: <snip> > > Rather than accounting for Tco+Tsu for every register in a chain of a > > few clock cycles where register leveling is helpful, only the Tito > > transparent latch delay (minus the Tilo LUT delay) needs to be added > > for each latch in the chain [using Xilinx timing nomenclature]. > > I would have thought that they were fast enough now for that > not to matter so much. My thought would be that clock skew, > even with the fancy clock distribution system, would be the important > factor. Clock skew becomes entirely unimportant in the latch scheme as I know it unless CLK and CLK180 are used instead of normal and inverted versions of the same clock. The latches are explicitly alternated posedge/negedge/posedge/negedge effectively decomposing a conceptual register into its two latches and balancing the logic between them. For clock skew to be an issue, two consecutive latches would have to be transparent long enough for the logic path plus delays to sneak through; that won't happen when using the normal and invert of the *same* clock net unless things are very, very wrong in the latch design. > If the granularity is the problem then you might try clocking > some on rising and some on falling edge (if available) or having > two clocks with known phase difference. That would be especially > true if the DLL's could generate the appropriate clocks. Some... registers? Using the posedge and negedge in a registered arrangement would simply exacerbate the granularity problem, able to fit fewer whole delays into the same clock period by dividing the logic into two phases. The latches allow longer delays to move the valid data further toward the end of the transparent window and shorter delays to move it back, always with the safeguard that data for the next (half) cycle isn't allowed to be valid any sooner than the front edge of the transparent window. The description comes out a little muddy which is why it took me a few days to buy in to the whole concept. It's sweet! It just takes some timing diagrams and head scratching. And it's certainly not set up for proper analysis especially in the Xilinx tools where I experimented with the phase domain changes. - John_H
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Multple architectures in ISE top level module? Next: QDRII on StratixIII pinout strangeness |