Prev: Embedded clocks
Next: CPU design
From: KJ on 24 Aug 2006 06:36 "Tommy Thorn" <tommy.thorn(a)gmail.com> wrote in message news:1156359725.967272.219130(a)74g2000cwt.googlegroups.com... >A quick answer for this one: > >> rdy_cnt=1 sounds like it is allowing JOP on SimpCon to start up the >> next transaction (read/write or twiddle thumbs) one clock cycle before >> the read data is actually available. But how is that different than >> the Avalon slave dropping wait request one clock cycle before the data >> is available and then asserting read data valid once the data actually >> is available? > > The signal waitrequest has nothing to do with the output, but is > property of the input. What you're suggesting is an "abuse" of Avalon > and would only work for slaves that support only one outstanding > transfer with a latency of exactly one. Clearly incompatible with > existing Avalon components. > Not at all an abuse of Avalon. In fact it is the way waitrequest is intended to be used. I'm not quite sure what you're referring to by input and output when you say "nothing to do with the output, but is property of the input" but what waitrequest is all about is to signal the end of the 'address phase' of the transaction where 'address phase' are the clock cycle(s) where read and/or write are asserted along with address and writedata (if write is asserted). Waitrequest is an output from a slave component that, when asserted, signals the Avalon fabric that the address and command inputs (and writedata if performing a write) needs to be held for another clock cycle. Once the slave component no longer needs the address and command inputs it can drop waitrequest even if it has not actually completed the transaction. The Avalon fabric 'almost' passes the waitrequest signal right back to the master device, the only change being that the Avalon logic basically gates the slave's waitrequest output with the slave's chipselect input (which the Avalon fabric creates) to form the master's waitrequest input (assuming a simple single master/slave connection for simplicity here). Per Avalon, when an Avalon master sees it's waitrequest input asserted it simply must not change the state of the address, read, write or writedata outputs on that particular clock cycle. When the Avalon master is performing a read or write and it sees waitrequest not asserted it is free to start up another transaction on the next clock cycle. In particular, if the first transaction was a read, this means that the 'next' transaction can be started even though the data has not yet been returned from the first read. For a slave device that has a readdatavalid output signal Avalon does not define any min/max time for when readdatavalid must come back just that for each read that has been accepted by the slave (i.e. one with read asserted, waitrequest not asserted) there must be exactly one cycle with readdatavalid asserted flagging the readdata output as having valid data. During a read, Avalon allows the delay between the clock cycle with "read and not(waitrequest)" and the eventual clock cycle with "readatavalid" to be either fixed or variable. If fixed, then SOPC Builder allows the fixed latency number to be entered into the class.ptf file for the slave and no readdatavalid output from the slave is required. All that does though is cause SOPC Builder to synthesize the code itself to generate readdatavalid as if it came from the slave code itself. If the readdatavalid output IS part of the component then SOPC Builder allows the latency delay to be variable; whether it actually is or not is up to the slave's VHDL/Verilog design code. Bottom line is that Avalon does have a mechanism built right into the basic specification that allows a master device to start up another read or write cycle one clock cycle prior to readdata actually having been provided. Given the description that Martin posted on how his SimpCon interface logic works it 'appears' that he believes that this ability to start up another cycle prior to completion (meaning the data from the read has actually been returned) is what is giving SimpCon the edge over Avalon. At least that's how it appears to me, which is why I asked him to walk me through the transaction to find where I'm missing something. My basic confusion is not understanding just exactly where in the read transaction does SimpCon 'pull ahead' of Avalon and give 'JOP on SimpCon' the performance edge over 'JOP on Avalon'. Anyway, hopefully that explains why it's not abusing Avalon in any way. KJ
From: Tommy Thorn on 24 Aug 2006 13:03 KJ wrote: ..... a (AFAICT) correct description of Avalon. > During a read, Avalon allows the delay between the clock cycle with "read > and not(waitrequest)" and the eventual clock cycle with "readatavalid" to be > either fixed or variable. If fixed, then SOPC Builder allows the fixed > latency number to be entered into the class.ptf file for the slave and no > readdatavalid output from the slave is required. All that does though is > cause SOPC Builder to synthesize the code itself to generate readdatavalid > as if it came from the slave code itself. If the readdatavalid output IS > part of the component then SOPC Builder allows the latency delay to be > variable; whether it actually is or not is up to the slave's VHDL/Verilog > design code. Bottom line is that Avalon does have a mechanism built right > into the basic specification that allows a master device to start up another > read or write cycle one clock cycle prior to readdata actually having been > provided. Ah, we only differ in perspective. Yes, Avalon _allows_ you to write slaves like that and if your fabric consists only of such slaves, then yes, they are the same. But variable latency does _not_ work like that, thus you can't make such an assumption in general if you wish the fabric to be able to accommodate arbitrary Avalon slaves. > Given the description that Martin posted on how his SimpCon interface logic > works it 'appears' that he believes that this ability to start up another > cycle prior to completion (meaning the data from the read has actually been > returned) is what is giving SimpCon the edge over Avalon. At least that's > how it appears to me, which is why I asked him to walk me through the > transaction to find where I'm missing something. My basic confusion is not > understanding just exactly where in the read transaction does SimpCon 'pull > ahead' of Avalon and give 'JOP on SimpCon' the performance edge over 'JOP on > Avalon'. That was not my understanding. SimpCon allows Martin to get an "early warning" that a transaction is about to complete. As I mentioned, this is not an uncommon idea and it works great for point-to-point interfaces. My claim is that it doesn't scale if you wish to use SimpCon like a general purpose fabric like Avalon. Being able to "start up another cycle prior to completion" is what I mean by multiple outstanding requests (known as "posted reads" in PCI lingo). It is definitely a feature of Avalon. > Anyway, hopefully that explains why it's not abusing Avalon in any way. My wording was poor. Another way to say it is "to use Avalon in a constrained way". Used this way you cannot hook up slaves with variable latency, so it's not really Avalon, it's a subset of Avalon. Cheers, Tommy
From: KJ on 24 Aug 2006 15:32 Tommy Thorn wrote: > KJ wrote: > .... a (AFAICT) correct description of Avalon. > > Ah, we only differ in perspective. Yes, Avalon _allows_ you to write > slaves like that Umm, yeah it's defined up front in the spec and not off in some corner like Wishbone's tag method either. > and if your fabric consists only of such slaves, then > yes, they are the same. What is the same as what? Also, there is no restriction about having latency aware masters and slaves. > But variable latency does _not_ work like that, How do you think it works? I've been using the term 'variable latency' as it is used by Avalon which is that there can be an arbitrary delay between the end of the address phase (i.e. when waitrequest is not asserted to the master) and the end of the data phase (i.e. when readdatavalid, is asserted to the master). > thus you can't make such an assumption in general if you wish the fabric > to be able to accommodate arbitrary Avalon slaves. What assumption do you think I'm making? The Avalon fabric can connect any mix of Avalon slaves whether they are fixed latency, variable latency or no latency (i.e. controlled by waitrequest). Furthermore it can be connected to an Avalon master that is 'latency aware' (i.e. has a 'readdatavalid' input) or one that is not (i.e. does not have 'readdatavalid' as an input, so cycles are controlled only by 'waitrequest'). You get different performance based on which method is used but that is a design choice on the master and slave side design, not something that Avalon is doing anything to help or hinder. > > That was not my understanding. SimpCon allows Martin to get an "early > warning" that a transaction is about to complete. And what happens as a result of this 'early warning'? I *thought* it allowed the JOP Avalon master to start up another transaction of some sort. If so, then that can be accomplished with waitrequest and readdatavalid. But maybe it's something on the data path side that gets the jump that I'm just not getting just yet. > > > Anyway, hopefully that explains why it's not abusing Avalon in any way. > > My wording was poor. Another way to say it is "to use Avalon in a > constrained way". I'm not clear on what constraint you're seeing in the usage. > Used this way you cannot hook up slaves with variable > latency, so it's not really Avalon, it's a subset of Avalon. If anything, choosing to not use the readdatavalid signal in the master or slave design to allow for completion of the address phase prior to the data phase is the subset not the other way around. KJ
From: Martin Schoeberl on 24 Aug 2006 16:44 Hi Kevin, now I know more from your name than KJ ;-) >> My pipeline approach is just this little funny busy counter >> instead of a single ack and that a slave has to declare it's >> pipeline level (0 to 3). Level 1 is almost ever possible. >> It's more or less for free in a slave. Level 1 means that >> the master can issue the next read/write command in the same >> cycle when the data is available (rdy_cnt=0). Level 2 means >> issue the next command one cycle earlier (rdy_cnt=1). Still >> not a big issue for a slave (especially for a memory slave >> where you need a little state machine anyway). > > I'm assuming that the master side address and command signals enter the > 'Simpcon' bus and the 'Avalon' bus on the same clock cycle. Maybe this > assumption is where my hang up is and maybe JOP on Simpcon is getting a > 'head start' over JOP on Avalon. This assumption is true. Address and command (+write data) are issued in the same cycle - no magic there. In SimpCon this is a single cycle thing and there is no ack or busy signal involed in this first cycle. That means no combinatorial generation of ack or busy. And no combinatorial reaction of the master in the first cycle. What I loos with SimpCon is a single cycle latency access. However, I think this is not too much to give up for easier pipelining of the arbitration/data in MUX. > Given that assumption though, it's not clear to me why the address and > command could not be designed to also end up at the actual memory > device on the same clock cycle. Again, maybe this is where my hang up > is. The register that holds the address is probably a ALU result register (or in my case the top-of-stack). That one is usually buried deep in the design. Additional you have to generate your slave selection (chip select) from that address. This ends up with some logic and long routing pathes to the pins. In a practical example with the Cyclone 6-7 ns are not so uncommon. Almost one cycle at 100 MHz. Furthermore, this delay is not easy to control in your design - add another slave and the output delay changes. To avoid this unpredictability one will add a register at the IO pad for address and rd/wr/cs. If we agree on this additional register at the slave/memory interface we can drop the requirement on the master to hold the address and control longer than one cycle. Furthermore, as we have this minimum one cycle latency from master command till address/rd/wr/data on the pins we do not need an ack/busy indication during this command cycle. We just say to the master: in the cycle the follows your command you will get the information about ready or wait. > Given that address and command end up at the memory device on the same > clock cycle whether SimpCon or Avalon, the resulting read data would > then be valid and returned to the SimpCon/Avalon memory interface logic > on the same clock cycle. Pretty sure this is correct since this is > just saying that the external memory performance is the same which is > should be since it does not depend on SimpCon or Avalon. In SimpCon it will definitely arrive one cycle later. With Avalon (and the generated memory interface) I 'assume' that there is also one cycle latency - I read this from the tco values of the output pins in the Quartus timing analyzer report. For the SRAM interface I did in VHDL I explicitly added registers at the addredd/rd/wr/data output. I don't know if the switch fabric adds another cycle. Probably not, if you do not check the pipelined checkbox in the SOPC Builds. > Given all of that, it's not clear to me why the actual returned data > would show up on the SimpCon bus ahead of Avalon or how it would be any > slower getting back to the SimpCon or Avalon master. Again, this might > be where my hangup is but if my assumptions have been correct up to > this paragraph then I think the real issue is not here but in the next > paragraph. Completely agree. The read data should arrive in the same cycle from Avalon or SimpCon to the master. Now that's the point where this bsy_cnt comes into play. In my master (JOP) I can take advantage of the early knowledge when data will arrive. I can restart my waiting pipeline earlier with this information. This is probably the main performance difference. Going through my VHDL code for the Avalon interface I found on more issue with the JOP/Avalon interface: In JOP I issue read/write commands and continue to execute microcode if possible. Only when the result is needed the main pipeline waits for the slave result. However, the slave can deliver the result earlier than needed. In that case the slave has to hold the data for JOP. The Avalon specification guarantees the read data valid only for a single cycle. So I added a register to hold the data and got one cycle latency: * one register at the input pins for the read data * one register at the JOP/Avalon interface to hold the data longer than one cycle As I see it, this can be enhanced in the same way I did the little Avalon specification violation on the master side. Use a MUX to deliver the data from the input register in the first cycle and switch to the 'hold' register for the other cycles. Should change the interface for a fairer comparison. Thanks for pointing me to this :-) > If I got through this far then it comes down to....You say "Level 1 > means that the master can issue the next read/write command in the same > cycle when the data is available (rdy_cnt=0). Level 2 means issue the > next command one cycle earlier (rdy_cnt=1)." and presumably the > 'rdy_cnt=1' is the reason for the better SimpCon numbers. Where I'm > pretty sure I'm hung up then is why can't the Avalon slave drop the > wait request output on the clock cycle that corresponds to rdy_cnt=1 > (i.e. one before data is available at the master)? Because rdy_cnt has a different meaning than waitrequest. It is more like an early datavalid. Dropping waitrequest does not help with my pipeline restart thing. > rdy_cnt=1 sounds like it is allowing JOP on SimpCon to start up the > next transaction (read/write or twiddle thumbs) one clock cycle before > the read data is actually available. But how is that different than As above: the main thing is to get the master pipeline started early to use the read data. Perhaps this is a special design feature of JOP and not usable in a di
From: Martin Schoeberl on 24 Aug 2006 17:13
> the input" but what waitrequest is all about is to signal the end of the > 'address phase' of the transaction where 'address phase' are the clock > cycle(s) where read and/or write are asserted along with address and > writedata (if write is asserted). If we could agree on slaves that don't need address/write data/commands for more than one cycle we could completely eliminate the waitrequest ;-) Let's say the address/command phase is per definition one cycle. That definition frees the master to do whatever it wants in the next cycle. For another request to the same slave it has to watch for the rdy_cnt in SimpCon. However, you can design a switch fabric with SimpCon where it is legal to issue a command to a different slave in the next cycle without attention to the first slave. You can just ignore the first slaves output until you want to use it. > The Avalon fabric 'almost' passes the waitrequest signal right back to the > master device, the only change being that the Avalon logic basically gates > the slave's waitrequest output with the slave's chipselect input (which the > Avalon fabric creates) to form the master's waitrequest input (assuming a > simple single master/slave connection for simplicity here). Per Avalon, I'm repeating myself ;-) That's the point I don't like in Avalon, Wishbone, OPB,...: You have a combinatorial path from address register - decoding - slave decision - master decision (to hold address/command or not). With a few slaves this will not be an issue. With more slaves or a more complicated interconnect (multiple master) this can be your critical path. BTW: AMBA APB is an exception: It also requests the ready decision (PREADY) in the following cycle. But AMBA APB still forces the master to hold address/command till PREADY. AMBA AHB is a little different: there is still an address and data phase, but hey can overlap. On a wait request the address and data have to be held by the master (although in the basic transfer this is not necessary. A little bit confusing... > Given the description that Martin posted on how his SimpCon interface logic > works it 'appears' that he believes that this ability to start up another > cycle prior to completion (meaning the data from the read has actually been > returned) is what is giving SimpCon the edge over Avalon. At least that's No, that's not the difference. I agree that for fully pipelined transactions (e.g. cache line read) both busses should give you absolutely the same performance. > how it appears to me, which is why I asked him to walk me through the > transaction to find where I'm missing something. My basic confusion is not > understanding just exactly where in the read transaction does SimpCon 'pull > ahead' of Avalon and give 'JOP on SimpCon' the performance edge over 'JOP on > Avalon'. As described in the other posting: a.) the early pipeline restart b.) the additional cycle in the Avalon interface for the register to hold the data for the master (should be enhanced) Martin |