From: Eric on 20 Apr 2010 13:00 Hi All, I've recently published a paper exploring how to implement memories with multiple read and write ports on existing FPGAs. I figured it might be of interest to some. Summary, paper, slides, and example code are here: http://www.eecg.utoronto.ca/~laforest/multiport/index.html There are no patents or other additional IP encumbrances on the code. If you have any comments or other feedback, I'd like to hear it. Eric LaForest PhD student, ECE Dept. University of Toronto http://www.eecg.utoronto.ca/~laforest/
From: Eric on 22 Apr 2010 10:52 On Apr 21, 6:26 am, John_H <newsgr...(a)johnhandwork.com> wrote: > Could you mention here or on your page what you mean by > "multipumping?" If you mean time multiplexed access, I can see why > multipumping is bad. [The "pure logic" approach also isn't obvious.] Yes, multipumping is time-multiplexing. It's not entirely bad, as there may be a speed margin leftover that you can trade for area using multipumping. Also, it is useful if you have few ports or low speed requirements. Pure logic refers simply to using only the reconfigurable fabric of the FPGA to implement the memory. It's not a very scalable solution. :) > Do you update the LVT in the same way I might update the RAM value in > a many-write BlockRAM? No. We've had several independent mentions of using XOR, but we hadn't heard of it at the time. We'll be looking at it in the future. The LVT is implemented in pure logic and has multiple read and write ports which can all work simultaneously. It remains practical because it is very narrow (log2(# of write ports) instead of full data word width). > Aside from wide data, however, I don't see (without going into the > attachments on that page) how updating the LVT is any different than > updating the memory in the first place. The LVT manages a bunch of Block RAMs with only one write and one read port, making them all behave as a single multiported memory. The LVT simply keeps track of which port last wrote to each address. Since the actual data is stored in Block RAMs, the end result is faster and more area efficient than other approaches. Please let me know if you have more questions. Eric
From: rickman on 22 Apr 2010 12:36 On Apr 22, 10:52 am, Eric <eric.lafor...(a)gmail.com> wrote: > On Apr 21, 6:26 am, John_H <newsgr...(a)johnhandwork.com> wrote: > > > Could you mention here or on your page what you mean by > > "multipumping?" If you mean time multiplexed access, I can see why > > multipumping is bad. [The "pure logic" approach also isn't obvious.] > > Yes, multipumping is time-multiplexing. It's not entirely bad, as > there may be a speed margin leftover that you can trade for area using > multipumping. Also, it is useful if you have few ports or low speed > requirements. > > Pure logic refers simply to using only the reconfigurable fabric of > the FPGA to implement the memory. It's not a very scalable > solution. :) > > > Do you update the LVT in the same way I might update the RAM value in > > a many-write BlockRAM? > > No. We've had several independent mentions of using XOR, but we hadn't > heard of it at the time. We'll be looking at it in the future. The LVT > is implemented in pure logic and has multiple read and write ports > which can all work simultaneously. It remains practical because it is > very narrow (log2(# of write ports) instead of full data word width). > > > Aside from wide data, however, I don't see (without going into the > > attachments on that page) how updating the LVT is any different than > > updating the memory in the first place. > > The LVT manages a bunch of Block RAMs with only one write and one read > port, making them all behave as a single multiported memory. The LVT > simply keeps track of which port last wrote to each address. Since the > actual data is stored in Block RAMs, the end result is faster and more > area efficient than other approaches. > > Please let me know if you have more questions. > > Eric I guess I don't understand what you are accomplishing with this. Block rams in FPGAs are almost always multiported. Maybe not N way ported, but you assume they are single ported when they are dual ported. Can you give a general overview of what you are doing without using jargon? I took a look and didn't get it at first glance. Rick
From: Eric on 22 Apr 2010 13:55 On Apr 22, 12:36 pm, rickman <gnu...(a)gmail.com> wrote: > I guess I don't understand what you are accomplishing with this. > Block rams in FPGAs are almost always multiported. Maybe not N way > ported, but you assume they are single ported when they are dual > ported. But what if you want more ports, say 2-write/4-read, without wait states? I assume them to be "simply dual-ported", which means one write port and one read port, both operating concurrently. It is also possible to run them in "true dual port" mode, where each port can either read or write in a cycle. Some of the designs in the paper do that. > Can you give a general overview of what you are doing without using > jargon? I took a look and didn't get it at first glance. OK. Let me try: Assume a big, apparently multiported memory of some given capacity and number of ports. Inside it, I use a small multiported memory implemented using only the fabric of an FPGA, which stores only the number of the write port which wrote last to a given address. Thus this small memory is of the same depth as the whole memory, but much narrower, hence it scales better. When you read at a given address from the big memory, internally you use that address to look up which write port wrote there last, and use that information to steer the read to the correct internal memory bank which will hold the data you want. These banks are built-up of multiple Block RAMs so as to have one write port each, and as many read ports as the big memory appears to have. The net result is a memory which appears to have multiple read and write ports which can all work simultaneously, but which leaves the bulk of the storage to Block RAMs instead of the FPGA fabric, which makes for better speed and smaller area. Does that help? Eric
From: John_H on 22 Apr 2010 18:45 On Apr 22, 1:55 pm, Eric <eric.lafor...(a)gmail.com> wrote: > On Apr 22, 12:36 pm, rickman <gnu...(a)gmail.com> wrote: > > > I guess I don't understand what you are accomplishing with this. > > Block rams in FPGAs are almost always multiported. Maybe not N way > > ported, but you assume they are single ported when they are dual > > ported. > > But what if you want more ports, say 2-write/4-read, without wait > states? > I assume them to be "simply dual-ported", which means one write port > and one read port, both operating concurrently. It is also possible to > run them in "true dual port" mode, where each port can either read or > write in a cycle. Some of the designs in the paper do that. > > > Can you give a general overview of what you are doing without using > > jargon? I took a look and didn't get it at first glance. > > OK. Let me try: > > Assume a big, apparently multiported memory of some given capacity and > number of ports. Inside it, I use a small multiported memory > implemented using only the fabric of an FPGA, which stores only the > number of the write port which wrote last to a given address. Thus > this small memory is of the same depth as the whole memory, but much > narrower, hence it scales better. > > When you read at a given address from the big memory, internally you > use that address to look up which write port wrote there last, and use > that information to steer the read to the correct internal memory bank > which will hold the data you want. These banks are built-up of > multiple Block RAMs so as to have one write port each, and as many > read ports as the big memory appears to have. > > The net result is a memory which appears to have multiple read and > write ports which can all work simultaneously, but which leaves the > bulk of the storage to Block RAMs instead of the FPGA fabric, which > makes for better speed and smaller area. > > Does that help? > > Eric I appreciate the elaboration here in the newsgroup. The "true dual port" nature of the BlockRAMs allows one independent address on each of the two ports with a separate write enable for each port. The behavior of the BlockRAM can be modified to provide read data based on the new write data, old data, or no change in the read data value from last cycle (particularly helpful for multi-pumping). For an M write, N read memory, your approach appears to need M x (N+1) memories since you can have M writes all happening at the same time N accesses are made to the same "most recently written" memory. Please correct me if I'm wrong. This is the same number of memories required with the XOR approach but without the LVT overhead. The time delay in reading the LVT and multiplexing the memories feels like it would be cumbersome. While this might not add "wait states" it appears the system would not be able to run terribly quickly. XORs are pretty quick. There are always more ways to approach a problem that any one group can come up with. Kudos on your effort to bring a better approach to a tough system level issue for difficult designs.
|
Next
|
Last
Pages: 1 2 3 4 Prev: clock routing to generic IO pins? Next: Raggedstone2 Spartan-6 Board Update |