Prev: Embedded clocks
Next: CPU design
From: KJ on 12 Aug 2006 14:04 "Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message news:44ddb2d4$0$8024$3b214f66(a)tunews.univie.ac.at... > The Avalon bus is very flexible. Therefore, writing a slave or > master (SOPC component) is not that hard. The magic is in the Avalon > switch fabric generated by the builder. However, an example would > have helped (Altera listening?). I didn't find anything on Altera's > website or with Google. Now a very simple slave can be found at [1]. > As you get into making your own components you'll find a lack of documentation about important things that go into the .PTF file. Altera used to have a document on their website that was invaluable called the "PTF File Reference Manual" (or something like that). They've chosen to pull that out so your only source for crucial information now is your FAE (maybe) or someone who happens to have that file available. I've complained to Altera to no avail that they need to put that document back and maintain it or at least make it available upon request to component developers. Maybe others also complaining will help as well (hint). > One thing to take care: When you (like me) like to avoid VHDL files > in the Quartus directory you can easily end up with three copies of > your design files. Can get confusing which one to edit. When you > edit your VHDL file in the component directory (the source for the > SOPC builder) don't forget to rebuild your system. The build process > copies it to your Quartus project directory. > Damn annoying too of the tool to do those copies like it does. You have to be very careful about which file you edit as being the 'source' or it will get overwritten because it really isn't. > The master is also ease: just address, read and write data, > read/write and you have to react to waitrequest. See as example the > SimpCon/Avalon bridge at [2]. The Avalon interconnect fabric handles > all bus multiplexing, bus resizing, and control signal translation. > If you're going for a very high speed design and you have multiple masters accessing a slave (i.e. multiple CPUs, or DMA controllers accessing memory) the performance degrades rather quickly using SOPC Builder to perform the arbitration. You don't necessarily need a large number of masters either, 4-5 killed it for me and necessitated redesign to work around how Avalon handled things. > Another point is, in my opinion, the wrong role who has to hold data > for more than one cycle. This is true for several busses (e.g. also > Wishbone). For these busses the master has to hold address and write > data till the slave is ready. This is a result from the backplane > bus thinking. In an SoC the slave can easily register those signals > when needed longer and the master can continue. What's you're describing is not an Avalon issue or a result of 'backplane bus thinking', and is not a limitation of Avalon. If it exists in your design than it's a limitation of the slave component design. The slave generates the wait request output which is used to tell the master that it needs to hold the address and data for it because it essentially doesn't have any space left to hold it itself. If the slave component design has provisions to register and hold the address and data than it can do this and leave the wait request output not asserted and the cycle completes. If you think about it, this would simply be a one deep fifo for holding the address/data/command. If you generalize a bit more you would see that the fifo wouldn't need to be restricted to being only one deep and could be any depth. So as the master device performs reads and writes these commands would be written into the fifo without asserting wait request but also remember that any fifo can fill up at which point the slave must assert wait request because it has no more room to store anything which means that the master device has to hold on to it for a bit. > On the other hand, > as JOP continues to execute and it is not so clear when the result > is read, the slave should hold the data when available. That is easy > to implement, but Wishbone and Avalon specify just a single cycle > data valid. > What you would need then is a signal generated by the master back to the slave to say that the master isn't ready to receive the data and would then cause the slave to hold on to the read data. But if you think about it a bit more, the only reason that the slave is providing read data in the first place is because the master device requested it in the first place. If the master wasn't ready to receive data it should simply not assert the read signal command output. By the way, Avalon has a leg up on Wishbone in regards to a cleaner logical approach to handling wait states and latency. Avalon treats the address cycle as a single phase controllable by the slave's wait request and separates that from the read data phase by allowing for latency with the 'readdatavalid' output. With Wishbone you can accomplish the same thing by extending the bus definition with 'tags' but since not all components are required to support 'tags' when you have a mismatch you're on your own for getting the interconnect right. With Avalon, they designed it right with a clear logical distinction between address and data phases so that any incompatibilites between master and slave can still be handled automatically by an automated tool (SOPC Builder). KJ
From: KJ on 12 Aug 2006 14:05 "Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message news:44ddc530$0$11352$3b214f66(a)tunews.univie.ac.at... > That's fine for me. When the connection magic happens and I don't > have to care it's fine. OK, one exception: Perhaps I would like > to know more details on the latency. The switch fabric is 'plain' > VHdL or Verilog. However, generated code is very hard to read. > What? You don't have a display that can show 2000 columns on your screen as is nearly required to view the VHDL/Verilog that pops out of SOPC? Actually the best place I've found to look at and understand the wait states and latency is simply the .PTF file since that's where all the information is. Although the .PTF file requires a little bit of a learning curve due to the lack of documentation on Altera's part it's not that hard and once you get a feel for it, it is very easy to see if a slave device requires wait states (and if it does, is it a fixed number or controllable by the slave) and whether the slave device has any read latency (and if it does, it is a fixed number, or controllable by the slave, and how many reads can be pending at one time). Looking at the VHDL is much harder and is not truly the source code anyway, the 'source' really is the .PTF file since the VHDL gets generated from it. > >> the avalon master is really as simple as the slave. > > Almost, you have to hold address, data and read/write active > as long as waitrequest is pending. I don't like this, see above. > The master side is a bit more complicated than the slave side. There is a very simple template though that one must almost always follow for the master. When you try to deviate from it you're likely to get burned (voice of experience, I've already had to fix other's code in this area). The template is process(Clock) begin if rising_edge(Clock) then if (Reset = '1') then Read <= '0'; Write <= '0'; elsif (WaitRequest = '0') then -- Put your code here for whenever it is you want to read and/or write -- When writing you would also set WriteData here -- For example, if you're not ready to receive data whenever the slave says it is -- ready than you simply set Read <= '0' until you are ready. end if; end if; end process; For sampling the data on a read it depends on whether the master is implementing the 'Readdatavalid' input (i.e. 'latency aware' in Avalon terminology) or not. If so, then you sample the data when readdatavalid is asserted, if not then sample the data when both the read output is asserted and the wait request is not. > In my case e.g. the address from JOP (= top of stack) is valid > only for a single cycle. To avoid one more cycle latency I present > in the first cycle the TOS and register it. For additional wait > cycles a MUX switches from TOS to the address register. I know this is a > slight violation of the Avalon specification. > There can be some glitches on the MUX switch. You might try looking at incorporating the above mentioned template and avoid the Avalon violation. What I've also found in debugging other's code that doesn't adhere to the above template is that there can be subtle errors that take just the right combination of events to occur in order to cause an actual system error of some sort (i.e. not just the Avalon generated assert in simulation). If you use the above template, you're guaranteed to be Avalon compliant and not have this issue. In my opinion, the Avalon bus and the .PTF files to completely define component I/O interfaces is a huge improvement over Wishbone. Although others disagree and don't like .PTF they don't offer any alternative definitions other than comments or documentation to defining all those interface things that one needs to know (i.e. wait states, latency, bus size, etc.). Comments and documentation are nice, but they are not synthesizable whereas .PTF files are (i.e. SOPC Builder sucks them in and spits out VHDL/Verilog)....PTF may not be a standard anywhere outside of Altera, but then is there an open standard that defines a file format that can be used to accomplish what .PTF does? I haven't run across it, and if there is one, I wouldn't mind badgering the tool vendors to support it to that I'm not locked into a vendor specific implementation until then I can be much more productive using PTF than not. > For synchronous on-chip > peripherals this is absolute not issue. However, this signals > are also used for off-chip asynchronous peripherals (SRAM). > However, I assume that this possible switching glitches are > not really seen on the output pins (or at the SRAM input). Again, if you use the template, you won't have the gliching even if the signals go off chip to a device. KJ
From: Martin Schoeberl on 12 Aug 2006 14:47 Hi KJ, > get a feel for it, it is very easy to see if a slave device requires wait states (and if it does, is it a fixed number or > controllable by the slave) and whether the slave device has any read latency (and if it does, it is a Yes, but e.g. for an SRAM interface there are some timings in ns. And it's not that clear how this translates to wait states. > The template is > > process(Clock) > begin > if rising_edge(Clock) then > if (Reset = '1') then > Read <= '0'; > Write <= '0'; > elsif (WaitRequest = '0') then > -- Put your code here for whenever it is you want to read and/or write > -- When writing you would also set WriteData here > -- For example, if you're not ready to receive data whenever the slave says it is > -- ready than you simply set Read <= '0' until you are ready. > end if; > end if; > end process; > I disagree on this template ;-) Perhaps, I'm wrong (as an Avalon newbie), but: Why is all your active code in waitrequest='0'? From the Avalon specification. You have to bring out address, read, write and writedata to start the transaction - independent of waitrequest. waitrequest=0 just ends your transaction. From the specification (p 47, 49) it is allowed to start a read or write transaction independent of the status of waitrequest. Did you run into troubles with this? Ok, after a second thought on your code it looks like you're starting your actions at the last cycle of the former transaction. Mmh, kind of strange thinking. What about this version (sc_* signals are my internal master signals) that case is the next state logic and combinatorial: case state is when idl => if sc_rd='1' then if av_waitrequest='0' then next_state <= rd; else next_state <= rdw; end if; elsif sc_wr='1' then if av_waitrequest='0' then next_state <= wr; else next_state <= wrw; end if; end if; when rdw => if av_waitrequest='0' then next_state <= rd; end if; when rd => next_state <= idl; -- here I could add the code from the idl -- state for back to back read and writes .... sc_rd and sc_wr directly start setting read and write. However, again I have to register them for keeping them set for wait states (sc_rd and sc_wr are only valid for one cycle). When there is a waitrequest, I'm just waiting. Read data is registered in the state register process: elsif rising_edge(clk) then state <= next_state; reg_rd <= '0'; .... case next_state is when idl => when rdw => reg_rd <= '1'; when rd => reg_rd_data <= av_readdata; .... That's my (violation) trick as an example on the Avalon read signal: av_read <= sc_rd or reg_rd; >> In my case e.g. the address from JOP (= top of stack) is valid >> only for a single cycle. To avoid one more cycle latency I present >> in the first cycle the TOS and register it. For additional wait >> cycles a MUX switches from TOS to the address register. I know this is a slight violation of the Avalon specification. >> There can be some glitches on the MUX switch. > > You might try looking at incorporating the above mentioned template and avoid the Avalon violation. What I've also found in > debugging other's code Then I get an additional cycle latency. That's what I want to avoid. > that doesn't adhere to the above template is that there can be subtle errors that take just the right combination of events to > occur in order to cause an actual system error of some sort (i.e. not just the Avalon generated assert in simulation). If you use > the above template, you're guaranteed to be Avalon compliant and not have this issue. Good to hear the comments from one who struggled with Avalon. However, I'm still not so happy with the style the bus is specified. The first timing diagrams look more like an asynch. SRAM timing specification with a clock drawn on top of it. And then it goes on with slaves with fixed wait states. Why? If do not provide a waitrequest in a slave that needs wait states you can get into troubles when you specify it wrong at component genration. Or does the Avalon switch fabric, when registered, take this information into account for the waitrequest of the master? Could be for the SRAM component. Should look into the generated VHDL code (or in a simulation)... > In my opinion, the Avalon bus and the .PTF files to completely define component I/O interfaces is a huge improvement over > Wishbone. Although agree, that's nice. >> For synchronous on-chip >> peripherals this is absolute not issue. However, this signals >> are also used for off-chip asynchronous peripherals (SRAM). >> However, I assume that this possible switching glitches are >> not really seen on the output pins (or at the SRAM input). > > Again, if you use the template, you won't have the gliching even if the signals go off chip to a device. Again, one more cycle latency ;-) Martin
From: Martin Schoeberl on 12 Aug 2006 15:15 >> Another point is, in my opinion, the wrong role who has to hold data >> for more than one cycle. This is true for several busses (e.g. also >> Wishbone). For these busses the master has to hold address and write >> data till the slave is ready. This is a result from the backplane >> bus thinking. In an SoC the slave can easily register those signals >> when needed longer and the master can continue. > > What's you're describing is not an Avalon issue or a result of 'backplane > bus thinking', and is not a limitation of Avalon. If it exists in your > design than it's a limitation of the slave component design. The slave Ok, but what when I'm not writing the slave. At the moment I think the master side. > generates the wait request output which is used to tell the master that it > needs to hold the address and data for it because it essentially doesn't > have any space left to hold it itself. If the slave component design has > provisions to register and hold the address and data than it can do this You could force the slave designers to register the address and data if needed with a different specification - as SimpCon ;-) Or you could allow non registering slaves, but register it in the Avalon switch fabric for those slaves that do not register the address and data by themself. However, this is not only an issue with Avalon. It is the same with Wishbone, OPB, AMBA, and OCP. So, perhaps my idea is wrong ;-) > leave the wait request output not asserted and the cycle completes. If you > think about it, this would simply be a one deep fifo for holding the > address/data/command. If you generalize a bit more you would see that the > fifo wouldn't need to be restricted to being only one deep and could be any > depth. So as the master device performs reads and writes these commands > would be written into the fifo without asserting wait request but also > remember that any fifo can fill up at which point the slave must assert wait > request because it has no more room to store anything which means that the > master device has to hold on to it for a bit. That idea is incorporated in a similar way in the SimpCon spec. See at: http://www.opencores.org/cvsweb.cgi/~checkout~/simpcon/doc/simpcon.pdf page 7, Figure 4. Perhaps it could be drawn a little bit clearer. > >> On the other hand, >> as JOP continues to execute and it is not so clear when the result >> is read, the slave should hold the data when available. That is easy >> to implement, but Wishbone and Avalon specify just a single cycle >> data valid. >> > What you would need then is a signal generated by the master back to the > slave to say that the master isn't ready to receive the data and would then > cause the slave to hold on to the read data. But if you think about it a > bit more, the only reason that the slave is providing read data in the first > place is because the master device requested it in the first place. If the > master wasn't ready to receive data it should simply not assert the read > signal command output. Why not? What about issue a read command and than just continue with other instructions to hide the latency. Isn't this also the idea of prefetching in newer processors? > By the way, Avalon has a leg up on Wishbone in regards to a cleaner logical > approach to handling wait states and latency. Avalon treats the address Agree, with Wishbone you can not issue overlapping transactions. Martin
From: KJ on 12 Aug 2006 15:44
"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message news:44de2247$0$28520$3b214f66(a)tunews.univie.ac.at... > > Yes, but e.g. for an SRAM interface there are some timings in ns. And > it's not that clear how this translates to wait states. Since Avalon is not directly compatible the typical SRAMs, this implies that you need to have an Avalon compatible component that translates Avalon into the particular SRAM that you're interested in. In other words, you need an Avalon SRAM Controller component. Once you have this component, you would just plop it down in SOPC Builder just like you would a DDR Controller, a PCI interface, or any other SOPC component. Assuming for the moment, that you wanted to write the code for such a component, one would likely define that the component to have the following: - A set of Avalon bus signals - SRAM Signals that are defined as Avalon 'external' (i.e. they will get exported to the top level) so that they can be brought out of the FPGA. - Generic parameters so that the actual design code does not need to hard code any of the specific SRAM timing requirements. Given that, the VHDL code inside the SRAM controller would set it's Avalon side wait request high as appropriate while it physically performs the read/write to the external SRAM. The number of wait states would be roughly equal to the SRAM cycle time divided by the Avalon clock cycle time. Although maybe it sounds like a lot of work and you may think it results in some sort of 'inefficient bloat' it really isn't. Any synthesizer will quickly reduce the logic to what is needed based on the usage of the design. What you get in exchange is very portable and reusable components. > >> The template is >> >> process(Clock) >> begin >> if rising_edge(Clock) then >> if (Reset = '1') then >> Read <= '0'; >> Write <= '0'; >> elsif (WaitRequest = '0') then >> -- Put your code here for whenever it is you want to read >> and/or write >> -- When writing you would also set WriteData here >> -- For example, if you're not ready to receive data whenever >> the slave says it is >> -- ready than you simply set Read <= '0' until you are ready. >> end if; >> end if; >> end process; >> > > I disagree on this template ;-) Perhaps, I'm wrong (as an Avalon newbie), > but: Why is all your active code in waitrequest='0'? From the > Avalon specification. You have to bring out address, read, write and > writedata to start the transaction - independent of waitrequest. > waitrequest=0 just ends your transaction. Not true. The Avalon bus specification requires the master hold (i.e. not change) Address, WriteData, Read and Write if WaitRequest is '1'. Given that the 'elsif' in the template insures that the inner code only gets executed when WaitRequest = '0'. > > From the specification (p 47, 49) it is allowed to start a read or > write transaction independent of the status of waitrequest. Did you > run into troubles with this? > That's true, you can 'start' a read/write transaction independent of wait request, the thing is that you can't end it or allow any of the outputs to change if waitrequest is active. > Ok, after a second thought on your code it looks like you're starting > your actions at the last cycle of the former transaction. Mmh, kind > of strange thinking. Not really, it is just simpler to say that I'm not going to go anywhere near code that can potentially change any of the outputs if wait request is active. As an example, take a look at your code below where you've had to sprinkle the 'if av_waitrequest = '0' throughout the code to make sure you don't change states at the 'wrong' time (i.e. when av_waitrequest is active). Where problems can come up is when you miss one of those 'if av_waitrequest = '0' statements. Depending on just where exactly you missed putting it in is is where it can be a rather subtle problem to debug. Now consider if you had simply put the 'if av_waitrequest = '0' statement around your entire case statement (with it understood that outside that though you would have to have the obligatory 'if reset go to idle'). Now it is much easier to see that your entire state machine will not change states on you at the wrong time...less code and more easily code inspected for correctness. I've also seen it reduce the number of states required which simplifies the code even more. > > What about this version (sc_* signals are my internal master signals) > > that case is the next state logic and combinatorial: > > case state is > > when idl => > if sc_rd='1' then > if av_waitrequest='0' then > next_state <= rd; > else > next_state <= rdw; > end if; > elsif sc_wr='1' then > if av_waitrequest='0' then > next_state <= wr; > else > next_state <= wrw; > end if; > end if; > > when rdw => > if av_waitrequest='0' then > next_state <= rd; > end if; > > when rd => > next_state <= idl; --- Are you sure you always want to go to idl? This would probably cause an error if the avalon outputs were active in this state. > > -- here I could add the code from the idl > -- state for back to back read and writes > ... > > sc_rd and sc_wr directly start setting read and write. However, > again I have to register them for keeping them set for wait > states (sc_rd and sc_wr are only valid for one cycle). > When there is a waitrequest, I'm just waiting. > > Read data is registered in the state register process: > > elsif rising_edge(clk) then > > state <= next_state; > reg_rd <= '0'; > ... > case next_state is > > when idl => > > when rdw => > reg_rd <= '1'; > > when rd => > reg_rd_data <= av_readdata; > ... > > That's my (violation) trick as an example on the Avalon read signal: > > av_read <= sc_rd or reg_rd; Whether it works or not for you would take more analysis, I'll just say that every time I've run across code that wasn't working for 'some reason' and I managed to trace it back to a mishandled Avalon data transfer and the Avalon master code did not match m |