Prev: mux behavior
Next: Software bloat (Larkin was right)
From: Brian Drummond on 25 May 2010 20:36 On Tue, 25 May 2010 14:32:59 -0700, Rob Gaddi <rgaddi(a)technologyhighland.com> wrote: >I've got a Spartan 6 design that I'm working with under ISE 11.5. A >code block that I would expect to take up about 200 LUTs is taking 800 >instead. 600 LUTs wouldn't be the end of the world, except I'm planning >to replicate this block 32 times, which puts me well over the top. > >So the question becomes where are all of the LUTs going? > Then I tried looking >through the technology schematic instead. The viewer took forever to >open the schematic, and when I finally got it open it took better than a >minute any time I wanted to refresh the screen. Needless to say, this >got me nowhere. Rather than use the technology viewer, I've had better luck reading the post-synthesis netlist in a text editor! I'm not necessarily recommending that approach, but it has its uses. You could quickly search for the first few instances of "ram_k_hi", then every instance of "ram_k_hi<whatever>(63) to see if e.g. the LUT RAMs have been duplicated to give you enough ports. But my recommendation would be divide and conquer on that block; it's not large. For example, comment or "generate" out the coefficient readback module and see how the size changes. Or "generate" out the whole lot then re-introduce it a block at a time, comparing the synth result with your expectations. Have you allowed for the size of the coefficient rams - 3x64-bit as far as I can tell from the posted code? Or how are the 4 ports of the quad port RAM organised? With more than 1 write port, that can get complex and inefficient... - Brian
From: Rob Gaddi on 25 May 2010 20:45 On 5/25/2010 5:36 PM, Brian Drummond wrote: > On Tue, 25 May 2010 14:32:59 -0700, Rob Gaddi > <rgaddi(a)technologyhighland.com> wrote: > >> I've got a Spartan 6 design that I'm working with under ISE 11.5. A >> code block that I would expect to take up about 200 LUTs is taking 800 >> instead. 600 LUTs wouldn't be the end of the world, except I'm planning >> to replicate this block 32 times, which puts me well over the top. >> >> So the question becomes where are all of the LUTs going? > >> Then I tried looking >> through the technology schematic instead. The viewer took forever to >> open the schematic, and when I finally got it open it took better than a >> minute any time I wanted to refresh the screen. Needless to say, this >> got me nowhere. > > Rather than use the technology viewer, I've had better luck reading the > post-synthesis netlist in a text editor! > > I'm not necessarily recommending that approach, but it has its uses. You > could quickly search for the first few instances of "ram_k_hi", then > every instance of "ram_k_hi<whatever>(63) to see if e.g. the LUT RAMs > have been duplicated to give you enough ports. > > But my recommendation would be divide and conquer on that block; it's > not large. For example, comment or "generate" out the coefficient > readback module and see how the size changes. Or "generate" out the > whole lot then re-introduce it a block at a time, comparing the synth > result with your expectations. > > Have you allowed for the size of the coefficient rams - 3x64-bit as far > as I can tell from the posted code? Or how are the 4 ports of the quad > port RAM organised? With more than 1 write port, that can get complex > and inefficient... > > - Brian The quad port only became a quad port because XST decided to implement the reset logic on it's own dedicated write port rather than just have one write port and feed it from an AND gate. It turns out that, if I just comment out the reset logic, the utilization drops to 236 LUTs. It must have been implementing something truly awful to try to get that extra write port in. Why it thought it needed it in the first place I'll never know, but at least I'm back on track now. -- Rob Gaddi, Highland Technology Email address is currently out of order
From: Nial Stewart on 26 May 2010 04:54 > It turns out that, if I just comment out the reset logic, the utilization drops to 236 LUTs. It > must have been implementing something truly awful to try to get that extra write port in. Why it > thought it needed it in the first place I'll never know, but at least I'm back on track now. Rob, some(/most) templates for inferring RAMs don't work if you have a reset defined. Nial.
From: Brian Drummond on 26 May 2010 06:59 On Tue, 25 May 2010 17:45:33 -0700, Rob Gaddi <rgaddi(a)technologyhighland.com> wrote: >On 5/25/2010 5:36 PM, Brian Drummond wrote: >> On Tue, 25 May 2010 14:32:59 -0700, Rob Gaddi >> <rgaddi(a)technologyhighland.com> wrote: >> >>> I've got a Spartan 6 design that I'm working with under ISE 11.5. A >>> code block that I would expect to take up about 200 LUTs is taking 800 >>> instead. >> Or how are the 4 ports of the quad >> port RAM organised? With more than 1 write port, that can get complex >> and inefficient... >The quad port only became a quad port because XST decided to implement >the reset logic on it's own dedicated write port rather than just have >one write port and feed it from an AND gate. > >It turns out that, if I just comment out the reset logic, the >utilization drops to 236 LUTs. Glad you found it. Implementing the reset externally as you described, is the sort of trick that is occasionally necessary to get round XST limitations. Or eliminating the reset, and writing all those zeroes across the wishbone bus. If you think that XST can be usefully improved in this area, submit a testcase to Webcase. - Brian
From: Rob Gaddi on 26 May 2010 12:05
On 5/26/2010 1:54 AM, Nial Stewart wrote: >> It turns out that, if I just comment out the reset logic, the utilization drops to 236 LUTs. It >> must have been implementing something truly awful to try to get that extra write port in. Why it >> thought it needed it in the first place I'll never know, but at least I'm back on track now. > > > Rob, some(/most) templates for inferring RAMs don't work if you have a > reset defined. > > > Nial. > The reset logic was sequential, i.e. reset address 0, then reset address 1, one per clock until the entire thing was done. The intention being that the entire thing would take place on the normal write port of the RAM, which wasn't being used while it was in the reset state. Apparently it didn't work out that way. -- Rob Gaddi, Highland Technology Email address is currently out of order |