Prev: More details: VHDL: assignment to two different fields of therecord in two different processes
Next: More details: VHDL: assignment to two different fields of the record in two different processes
From: John_H on 1 Jan 2010 18:20 On Dec 29 2009, 4:37 pm, Rob Doyle <radioe...(a)gmail.com> wrote: > > I *need* one write port and three read ports - so I'm OK just > duplicating the RAM. > > I could save a clock cycle in the ALU if I could do two writes > and three reads. If I have to stall the pipeline to implement > this, I've gained nothing. > > The timing won't permit 2 Register clock cycles per ALU clock cycle > to double-up the register accesses. > > The multi-port "flag" memory is the trick I was looking for. The ALU > has 1024 registers so I can envision some tall data selectors, > multiplexers, and accompanying levels of logic to implement the > address decoders. > > I think I'm going to stay with simple for now and put this in my > back pocket as "Plan B". > > I greatly appreciate the help. > > Rob Doyle If you wanted fewer registers (1024, really?) there's a nice technique that can use LUT RAMs to provide (combinatorially) the read values you want with two write ports. Two writes to the same address would result in an undefined value but avoiding that condition results in seamless operation. Two write ports with three reads would use 8 dual- port LUT RAM arrays - (write_ports x (read_ports+1) ). The reason LUT RAMs are needed is the operation is a read-modify- write. One could get around the read-modify-write need by delaying the write one cycle but that method of selecting the read value or the delayed write value is still needed. I can provide more detail if needed. Multiple write ports start to eat resources but they're doable if the performance gain is worth the resource loss.
From: Peter Alfke on 1 Jan 2010 19:54 From the bowels of my computer I resurrected a file written more than 3 years ago: Using Virtex-5 CLB as Multi-Port Memory The four M-LUTs in a half-CLB can be combined to form a quad-port RAM, ideally suited for register-file applications. The four LUTs, called A, B, C, and D are configured in such a way that the write address applied to D is automatically also multiplexed onto the write addressing of LUTs A, B, and C. Writing into D thus also writes into the same location in A, B, and C, but these three LUTs have their address inputs still available as read addresses. (In this application, LUT D is never read.) The structure functions as a quad-port RAM with one write port (address applied to D) and common data written into LUTs A, B, and C . There are three independent read ports (addresses applied to LUTs A, B, and C.) Writing is synchronous, reading is combinatorial. Each LUT can either be a 64 x 1, or a 32 x 2 RAM. A similar structure, using common read addresses and individual Data inputs, acts as simple dual-port memory, either 3 bits wide and 64 deep, or 3 bits wide and 32 deep. In the Virtex-5 MicroBlaze application, the 32 x 32 register file with one write port and two read ports, using 384 LUTs in Virtex-4, is reduced to 44 LUTs, a saving of over 88%. Peter Alfke, 3-21-06
From: whygee on 2 Jan 2010 04:06 wow, a great new year's present :-))) Peter Alfke wrote: > From the bowels of my computer I resurrected a file written more than > 3 years ago: > > Using Virtex-5 CLB as Multi-Port Memory <snip> > In the Virtex-5 MicroBlaze application, the 32 x 32 register file with > one write port and two read ports, using 384 LUTs in Virtex-4, is > reduced to 44 LUTs, a saving of over 88%. > > Peter Alfke, 3-21-06 Any more information, diagram, schematics, source code, appnote, or whatever, would be really appreciated :-) thanks and greetings, yg -- http://ygdes.com / http://yasep.org
From: John_H on 2 Jan 2010 11:49
On Jan 1, 7:54 pm, Peter Alfke <al...(a)sbcglobal.net> wrote: > From the bowels of my computer I resurrected a file written more than > 3 years ago: > > Using Virtex-5 CLB as Multi-Port Memory > > The four M-LUTs in a half-CLB can be combined to form a quad-port RAM, > ideally suited for register-file applications. > The four LUTs, called A, B, C, and D are configured in such a way that > the write address applied to D is automatically also multiplexed onto > the write addressing of LUTs A, B, and C. > Writing into D thus also writes into the same location in A, B, and C, > but these three LUTs have their address inputs still available as read > addresses. (In this application, LUT D is never read.) > The structure functions as a quad-port RAM with one write port > (address applied to D) and common data written into LUTs A, B, and C . > There are three independent read ports (addresses applied to LUTs A, > B, and C.) Writing is synchronous, reading is combinatorial. > Each LUT can either be a 64 x 1, or a 32 x 2 RAM. > > A similar structure, using common read addresses and individual Data > inputs, acts as simple dual-port memory, either 3 bits wide and 64 > deep, or 3 bits wide and 32 deep. > > In the Virtex-5 MicroBlaze application, the 32 x 32 register file with > one write port and two read ports, using 384 LUTs in Virtex-4, is > reduced to 44 LUTs, a saving of over 88%. > > Peter Alfke, 3-21-06 Greetings Peter, always a pleasure. You describe a physical implementation which folds beautifully into the Xilinx fabric showing how few routing resources are needed to implement the bits of the multi-port (read) memories. But isn't this precisely what one gets when inferring a single port write, multi-port read memory through HDL? For that, I wouldn't think example code would be needed since the inference can have the same physical implementation you describe. Without explicit placement constraints, both inferred and instantiated methods are left to the Place & Route to fold everything for each bit into single CLBs, aren't they? It's certainly easier to apply those constraints if the designer defines the names for each instance in the first place. The bigger challenge raised in this thread is the multi-port with two write ports which can be performed in CLBs very nicely but with a little overhead. If the reader has no interest in multi-port writes with CLB memories, you can ignore the rest of the message. reg [n:0] m1 [m:0], m2 [m:0]; wire [n:0] rd1, rd2, rd3; always @(posedge clk) if( we1 ) m1[wa1] <= wdata1 ^ m2[wa1]; always @(posedge clk) if( we2 ) m2[wa2] <= wdata2 ^ m1[wa2]; assign rd1 = m1[ra1] ^ m2[ra1]; assign rd2 = m1[ra2] ^ m2[ra2]; assign rd3 = m1[ra3] ^ m2[ra3]; Since m1 has 4 unique addresses, the inferred memory will be replicated for 4 total copies. Since 2 memories are needed for 2 writes, there are 2 sets of these 4 memory copies. Since writing a value to m1 doesn't affect m2, reading that address later results in rdx == m1[was_wa1] ^ m2[was_wa1] == (was_wdata1 ^ m2[was_wa1]) ^ m2 [was_wa1] == was_wdata1 The XOR on the input and the output resurrects the original data written to that port independent of which read memory accesses it. The one caveat: two writes to the same address on the same clock results in no change to the existing data. Priority could be assigned to one write port by disabling the write on the other port when a conflict is detected instead. This is where CLB SelectRAM design gets interesting and fun! |