Prev: Embedded clocks
Next: CPU design
From: Martin Schoeberl on 27 Aug 2006 13:33 >> BTW: As I'm also academic I should/have to publish papers. SimpCon >> is on my list for months to be published - and now it seems to be >> the right time. I will write a draft of the paper in the next few >> days. If you are interested I'll post a link to it in this thread >> and your comments are very welcome. >> > OK. > I've uploaded the first draft of the paper: http://www.jopdesign.com/doc/simpcon_date.pdf It's still very similar to the SimpCon definition at opencores.org. Comments are very welcome. To KJ and Tommy: I've added you both in the Acknowledgments. I hope this is ok with you ;-) Martin
From: Tommy Thorn on 27 Aug 2006 19:47 Martin Schoeberl wrote: > I've uploaded the first draft of the paper: > http://www.jopdesign.com/doc/simpcon_date.pdf > > It's still very similar to the SimpCon definition > at opencores.org. Comments are very welcome. > > To KJ and Tommy: I've added you both in the > Acknowledgments. I hope this is ok with you ;-) I fail to see what's "controversial" about our discussion. Maybe you meant "heated". As a reviewer (I used to be academic) I'd be very concerned that you fail to disclose that SimpCon is single-master only, and thus not a full alternative to the others listed. Tommy
From: Martin Schoeberl on 28 Aug 2006 03:50 > Martin Schoeberl wrote: >> I've uploaded the first draft of the paper: >> http://www.jopdesign.com/doc/simpcon_date.pdf >> >> It's still very similar to the SimpCon definition >> at opencores.org. Comments are very welcome. >> >> To KJ and Tommy: I've added you both in the >> Acknowledgments. I hope this is ok with you ;-) > > I fail to see what's "controversial" about our discussion. Maybe you meant "heated". in my feeling the discussion with KJ is a little bit 'controversial', but this is meant positiv. However, I can remove this ;-) > > As a reviewer (I used to be academic) I'd be very concerned that you fail to disclose that SimpCon is single-master only, and thus > not a full alternative to the others listed. And that's still our little controverse. I agree that SimpCon is not ideal for several outstanding requests from a single master. However, I don't see an issue for a multi-master system with SimpCon. It's possible with Wishbone and SimpCon is 'more expressive' than Wishbone, right? Martin
From: KJ on 29 Aug 2006 07:23 "Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message news:44f1d74f$0$12642$3b214f66(a)tunews.univie.ac.at... >>> BTW: As I'm also academic I should/have to publish papers. SimpCon >>> is on my list for months to be published - and now it seems to be >>> the right time. I will write a draft of the paper in the next few >>> days. If you are interested I'll post a link to it in this thread >>> and your comments are very welcome. >>> >> OK. >> > I've uploaded the first draft of the paper: > http://www.jopdesign.com/doc/simpcon_date.pdf > > It's still very similar to the SimpCon definition > at opencores.org. Comments are very welcome. Mostly minor comments Section 1.1.1 Avalon After the question "How is the great complexity handled?" You answered with "The switch fabric is in charge to connect those different devices and perform the necessary conversion" This is generally not quite the case. It is up to the Avalon slave design to perform the necessary conversions (I believe this is referring to whether this is a 'simple' connection or a pipeline with variable latency) with the Avalon fabric responsible for the connections between the devices (as you said). I think the SOPC Builder tool and the various components that Altera bundles in with SOPC Builder is giving the impression that it is the fabric but my interpretation is that SOPC Builder is allowing you to easily create slave components that can be used to interface to external parts without even writing a single line of VHDL as you at first did. That leads to the last question of "Who provides this switch fabric? Is it proprietary Altera design, or are there open source implementations available?" The switch fabric itself can easily be written, it is on the order of six lines of code per interface for a point to point connection, there is nothing really magic in what Altera spits out of SOPC Builder based on the Avalon bus definition. I don't think it would be difficult to create an open source version of this connection logic, but whether simple use of the Avalon bus without also targetting an Altera device (even if no Altera software is involved) is violating anything is an open legal question as you've pointed out (I'm guessing that it might but not really sure). Section 2 In the paragraph starting "The third issue is..." you ask the question "Why not force the slaves to hold the data on the output registers as long as there is no new transaction?" A couple follow up questions to that though would be - What is a SimpCon slave to do if there IS a new transaction before the old one has been acknowledged? - Does the SimpCon fabric prevent this from happening? (I think it does, but not exactly sure) Other comments: Personally, I'm still a bit confused about just exactly which pipeline on the master side is 'released' when rdy_cnt hits the magic number and how that differs from releasing the Avalon master address/command pipeline via waitrequest and releasing the Avalon master readdata pipeline via readdatavalid. With the JOP example we've been bandying about I accept that it's not totally the master address/command that is being released but 'something' on the JOP master for kick starting the data processing pipeline based on the read data actually becoming available (and getting 'early' notice of this). Maybe my last post regarding the use of the Avalon waitrequest signal to generate the SimpCon early rdy_cnt signal came too late to make it to the presses, but I think that addresses the perceived Avalon issue. Although now that I think I understand how rdy_cnt works and where it really comes from it might seem again that since my Avalon master side change involved having the master 'know' the latency about the 'slave' would seem to be cheating (since this wasn't required by the SimpCon implementation as I had thought) but I guess I'll fall back to what Tommy mentioned earlier that since JOP was optomized for SimpCon in the first place it implies that an Avalon/SimpCon bridge must be built and such bridges can tend to be either less than optimal in performance (as your performance numbers indicate) or involve a bit of cheating to get maximum performance (as my change would be). Had JOP been optomized for Avalon to begin with would the numbers be any better without any cheating? That's sort of the open question and I'm not necessarily expecting an answer. My main interest in the thread was in understanding what sort of bottlenecks might be lurking in Avalon that I'm not aware of. A couple areas of 'possible' weakness for Avalon that I am aware of are: - It can hang (there is no requirement for a max time for a cycle). Something to be aware of, but generally not an issue since whoever is designing the slave components had better address this and not allow for a hang to occur. - No notion of a 'retry'. Again, given the environment of being on a chip, the slave design shouldn't be allowe to say 'try again later please' so I don't think this should be an actual design issue, just something to be aware of. - Can't have pending reads from multiple slaves. I suppose this could be important to some, it hasn't for me. Another possible area of weakness 'might' be whatever it is that is hindering performance of JOP on Avalon as compared to JOP on SimpCon. If fundamentally it's just the SimpCon/Avalon bridge and that a native JOP on Avalon would not require any cheating and would meet performance then there is no other issue then. Just curious, Figure 7 shows pipelined reads from a static RAM where a new address is presented on every other clock cycle which matches the performance of the actual SRAM. If the device had been an external DRAM/DDR or such, you can clock out a new read command on every clock cycle even before you get the first data item back. After the data from that first read does eventually come back, the data from the subsequent reads will also come back on consecutive clock cycles. Does SimpCon support that sort of device? Or would it have to take multiple clock cycles per read? At first glance since the slave has only the one 'rdy_cnt' it would appear that it would not whereas the mythical "native Avalon interface JOP" would be able to hit the DRAM at a much higher rate. But maybe it could by varying the
From: Martin Schoeberl on 29 Aug 2006 20:50
> After the question "How is the great complexity handled?" You answered with "The switch fabric is in charge to connect those > different devices and perform the necessary conversion" This is generally not quite the case. It is up to the Avalon slave > design to perform the necessary conversions (I mmh, as far as I understand it you can provide any master or slave implementation that follows the rule of Avalon: From a very simple asynchronous device to a pipelining device. The generated switch fabric will provide the adaption. > proprietary Altera design, or are there open source implementations available?" The switch fabric itself can easily be written, > it is on the order of six lines of code per interface for a point to point connection, there is nothing really magic in what > Altera spits out of SOPC Builder based I think that this is actually the power of SOPC builder. It will do all your glue logic stuff for the interconnect (address decoding, byte order managing, byte enable on write,...). And that's a lot more than 6 lines of code ;-) > an open source version of this connection logic, but whether simple use of the Avalon bus without also targetting an Altera device > (even if no Altera software is involved) is violating anything is an open legal question as you've pointed out (I'm guessing that > it might but not really sure). AFAIK the bus definition is kind of open-source - free. However, I'm sure you're not allowed to use the SOPC builder generated VHDL code on a Xilinx device ;-) BTW: I asked Altera Austria about a related topic: Is it allowed to 'use' the DRAM controller in an open-source environment (means can I upload the VHDL code to a web server). However, they had no real answare to this. They sayed that the SDRAM controller does is part of NIOS and does only work with NIOS. Therefore, one has to buy a NIOS license. But it works quite well with JOP too ;-) > Section 2 > In the paragraph starting "The third issue is..." you ask the question "Why not force the slaves to hold the data on the output > registers as long as there is no new transaction?" A couple follow up questions to that though would be > - What is a SimpCon slave to do if there IS a new transaction before the old one has been acknowledged? good question ;-) It depends on the pipeline level. It can accept it. But this is not directly related on the request to 'just' keep the data valid until a newer one is available (and was requested). > - Does the SimpCon fabric prevent this from happening? (I think it does, but not exactly sure) It is the master who decides when to issue a new request and when to leave the slave with the last data. > Personally, I'm still a bit confused about just exactly which pipeline on the master side is 'released' when rdy_cnt hits the > magic number and how that differs from releasing the Avalon master address/command pipeline via waitrequest and releasing the > Avalon master readdata pipeline via From the pure pipelining point of view there is not so much difference between Avalon and SimpCon. Both can do pipelining when the slave supports it. The pipeline level in Avalon is not restricted (and not so obvious to see). In SimpCon with the 2 bit rdy_cnt the pipeline level is restricted. > it's not totally the master address/command that is being released but 'something' on the JOP master for kick starting the data > processing pipeline based on the read data actually becoming available (and getting 'early' notice of this). the early notice is the little thing I like on my design: It helps a pipelined master waiting on data *and* performs the flow control for the pipelining. Perhaps too much information for a singel 2 bit signal. However, I like less signals. See the OPB spec. for the extreme different way to do it: They have way too many signals defined - I don't like to read so many signal definitions ;-) > having the master 'know' the latency about the 'slave' would seem to be cheating (since this wasn't required by the SimpCon > implementation as I had thought) but I guess I'll fall back to what Tommy mentioned earlier that since JOP was optomized for > SimpCon in the first place it implies that an Avalon/SimpCon bridge must be built and such bridges can tend to be either about cheating and bridge: I'm allready cheating on the Avalon interface (as mentioned in an earlier post) to generate the address/control/data holding in the master - I switch between the original single cycle register at the first cycle and a hold register on the following cycles - that's not allowed in the original spec. Bridges are a difficult topic: In the general case you add latency cycles. > be). Had JOP been optomized for Avalon to begin with would the numbers be any better without any cheating? That's sort of the > open question and I'm not necessarily expecting an answer. Mmh, hard to say. I implemented the memory interface without any SoC interconnect in mind - just tried to get the best performance on SRAMs. The bus thing came very late in the design. So it looks like I'm now defining a bus that 'fits' to the way the original memory interface of JOP was. However, perhaps that's not that bad as a new idea for a differnt bus comes up ;-) > My main interest in the thread was in understanding what sort of bottlenecks might be lurking in Avalon that I'm not aware of. A > couple areas of 'possible' weakness for Avalon that I am aware of are: > - It can hang (there is no requirement for a max time for a cycle). Something to be aware of, but generally not an issue since > whoever is designing the slave components had better address this and not allow for a hang to occur. > - No notion of a 'retry'. Again, given the environment of being on a chip, the slave design shouldn't be allowe to say 'try again > later please' so I don't think this should be an actual design issue, just something to be aware of. Implementing those two features makes your bus (and interfacing) way more complicated. AFAIK OPB does it, but you end up with so many signals... > - Can't have pending reads from multiple slaves. I suppose this could be important to some, it hasn't for me. That's more an issue of the interconnect logic and not the bus definition. > > Just curious, Figure 7 shows pipelined reads from a static RAM where a new address is presented on every ot |