JOP as SOPC component [FPGA]

From: KJ on 12 Aug 2006 14:04

"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message
news:44ddb2d4$0$8024$3b214f66(a)tunews.univie.ac.at...
> The Avalon bus is very flexible. Therefore, writing a slave or
> master (SOPC component) is not that hard. The magic is in the Avalon
> switch fabric generated by the builder. However, an example would
> have helped (Altera listening?). I didn't find anything on Altera's
> website or with Google. Now a very simple slave can be found at [1].
>
As you get into making your own components you'll find a lack of
documentation about important things that go into the .PTF file. Altera
used to have a document on their website that was invaluable called the "PTF
File Reference Manual" (or something like that). They've chosen to pull
that out so your only source for crucial information now is your FAE (maybe)
or someone who happens to have that file available. I've complained to
Altera to no avail that they need to put that document back and maintain it
or at least make it available upon request to component developers. Maybe
others also complaining will help as well (hint).

> One thing to take care: When you (like me) like to avoid VHDL files
> in the Quartus directory you can easily end up with three copies of
> your design files. Can get confusing which one to edit. When you
> edit your VHDL file in the component directory (the source for the
> SOPC builder) don't forget to rebuild your system. The build process
> copies it to your Quartus project directory.
>
Damn annoying too of the tool to do those copies like it does. You have to
be very careful about which file you edit as being the 'source' or it will
get overwritten because it really isn't.

> The master is also ease: just address, read and write data,
> read/write and you have to react to waitrequest. See as example the
> SimpCon/Avalon bridge at [2]. The Avalon interconnect fabric handles
> all bus multiplexing, bus resizing, and control signal translation.
>
If you're going for a very high speed design and you have multiple masters
accessing a slave (i.e. multiple CPUs, or DMA controllers accessing memory)
the performance degrades rather quickly using SOPC Builder to perform the
arbitration. You don't necessarily need a large number of masters either,
4-5 killed it for me and necessitated redesign to work around how Avalon
handled things.

> Another point is, in my opinion, the wrong role who has to hold data
> for more than one cycle. This is true for several busses (e.g. also
> Wishbone). For these busses the master has to hold address and write
> data till the slave is ready. This is a result from the backplane
> bus thinking. In an SoC the slave can easily register those signals
> when needed longer and the master can continue.

What's you're describing is not an Avalon issue or a result of 'backplane
bus thinking', and is not a limitation of Avalon. If it exists in your
design than it's a limitation of the slave component design. The slave
generates the wait request output which is used to tell the master that it
needs to hold the address and data for it because it essentially doesn't
have any space left to hold it itself. If the slave component design has
provisions to register and hold the address and data than it can do this and
leave the wait request output not asserted and the cycle completes. If you
think about it, this would simply be a one deep fifo for holding the
address/data/command. If you generalize a bit more you would see that the
fifo wouldn't need to be restricted to being only one deep and could be any
depth. So as the master device performs reads and writes these commands
would be written into the fifo without asserting wait request but also
remember that any fifo can fill up at which point the slave must assert wait
request because it has no more room to store anything which means that the
master device has to hold on to it for a bit.

> On the other hand,
> as JOP continues to execute and it is not so clear when the result
> is read, the slave should hold the data when available. That is easy
> to implement, but Wishbone and Avalon specify just a single cycle
> data valid.
>
What you would need then is a signal generated by the master back to the
slave to say that the master isn't ready to receive the data and would then
cause the slave to hold on to the read data. But if you think about it a
bit more, the only reason that the slave is providing read data in the first
place is because the master device requested it in the first place. If the
master wasn't ready to receive data it should simply not assert the read
signal command output.

By the way, Avalon has a leg up on Wishbone in regards to a cleaner logical
approach to handling wait states and latency. Avalon treats the address
cycle as a single phase controllable by the slave's wait request and
separates that from the read data phase by allowing for latency with the
'readdatavalid' output. With Wishbone you can accomplish the same thing by
extending the bus definition with 'tags' but since not all components are
required to support 'tags' when you have a mismatch you're on your own for
getting the interconnect right. With Avalon, they designed it right with a
clear logical distinction between address and data phases so that any
incompatibilites between master and slave can still be handled automatically
by an automated tool (SOPC Builder).

KJ

From: KJ on 12 Aug 2006 14:05

"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message
news:44ddc530$0$11352$3b214f66(a)tunews.univie.ac.at...
> That's fine for me. When the connection magic happens and I don't
> have to care it's fine. OK, one exception: Perhaps I would like
> to know more details on the latency. The switch fabric is 'plain'
> VHdL or Verilog. However, generated code is very hard to read.
>

What? You don't have a display that can show 2000 columns on your screen as
is nearly required to view the VHDL/Verilog that pops out of SOPC?

Actually the best place I've found to look at and understand the wait states
and latency is simply the .PTF file since that's where all the information
is. Although the .PTF file requires a little bit of a learning curve due to
the lack of documentation on Altera's part it's not that hard and once you
get a feel for it, it is very easy to see if a slave device requires wait
states (and if it does, is it a fixed number or controllable by the slave)
and whether the slave device has any read latency (and if it does, it is a
fixed number, or controllable by the slave, and how many reads can be
pending at one time). Looking at the VHDL is much harder and is not truly
the source code anyway, the 'source' really is the .PTF file since the VHDL
gets generated from it.

>
>> the avalon master is really as simple as the slave.
>
> Almost, you have to hold address, data and read/write active
> as long as waitrequest is pending. I don't like this, see above.
>

The master side is a bit more complicated than the slave side.

There is a very simple template though that one must almost always follow
for the master. When you try to deviate from it you're likely to get burned
(voice of experience, I've already had to fix other's code in this area).
The template is

process(Clock)
begin
if rising_edge(Clock) then
if (Reset = '1') then
Read <= '0';
Write <= '0';
elsif (WaitRequest = '0') then
-- Put your code here for whenever it is you want to read and/or
write
-- When writing you would also set WriteData here
-- For example, if you're not ready to receive data whenever the
slave says it is
-- ready than you simply set Read <= '0' until you are ready.
end if;
end if;
end process;

For sampling the data on a read it depends on whether the master is
implementing the 'Readdatavalid' input (i.e. 'latency aware' in Avalon
terminology) or not. If so, then you sample the data when readdatavalid is
asserted, if not then sample the data when both the read output is asserted
and the wait request is not.

> In my case e.g. the address from JOP (= top of stack) is valid
> only for a single cycle. To avoid one more cycle latency I present
> in the first cycle the TOS and register it. For additional wait
> cycles a MUX switches from TOS to the address register. I know this is a
> slight violation of the Avalon specification.
> There can be some glitches on the MUX switch.

You might try looking at incorporating the above mentioned template and
avoid the Avalon violation. What I've also found in debugging other's code
that doesn't adhere to the above template is that there can be subtle errors
that take just the right combination of events to occur in order to cause an
actual system error of some sort (i.e. not just the Avalon generated assert
in simulation). If you use the above template, you're guaranteed to be
Avalon compliant and not have this issue.

In my opinion, the Avalon bus and the .PTF files to completely define
component I/O interfaces is a huge improvement over Wishbone. Although
others disagree and don't like .PTF they don't offer any alternative
definitions other than comments or documentation to defining all those
interface things that one needs to know (i.e. wait states, latency, bus
size, etc.). Comments and documentation are nice, but they are not
synthesizable whereas .PTF files are (i.e. SOPC Builder sucks them in and
spits out VHDL/Verilog)....PTF may not be a standard anywhere outside of
Altera, but then is there an open standard that defines a file format that
can be used to accomplish what .PTF does? I haven't run across it, and if
there is one, I wouldn't mind badgering the tool vendors to support it to
that I'm not locked into a vendor specific implementation until then I can
be much more productive using PTF than not.

> For synchronous on-chip
> peripherals this is absolute not issue. However, this signals
> are also used for off-chip asynchronous peripherals (SRAM).
> However, I assume that this possible switching glitches are
> not really seen on the output pins (or at the SRAM input).

Again, if you use the template, you won't have the gliching even if the
signals go off chip to a device.

KJ

From: Martin Schoeberl on 12 Aug 2006 14:47

Hi KJ,

> get a feel for it, it is very easy to see if a slave device requires wait states (and if it does, is it a fixed number or
> controllable by the slave) and whether the slave device has any read latency (and if it does, it is a

Yes, but e.g. for an SRAM interface there are some timings in ns. And
it's not that clear how this translates to wait states.

> The template is
>
> process(Clock)
> begin
> if rising_edge(Clock) then
> if (Reset = '1') then
> Read <= '0';
> Write <= '0';
> elsif (WaitRequest = '0') then
> -- Put your code here for whenever it is you want to read and/or write
> -- When writing you would also set WriteData here
> -- For example, if you're not ready to receive data whenever the slave says it is
> -- ready than you simply set Read <= '0' until you are ready.
> end if;
> end if;
> end process;
>

I disagree on this template ;-) Perhaps, I'm wrong (as an Avalon newbie),
but: Why is all your active code in waitrequest='0'? From the
Avalon specification. You have to bring out address, read, write and
writedata to start the transaction - independent of waitrequest.
waitrequest=0 just ends your transaction.

From the specification (p 47, 49) it is allowed to start a read or
write transaction independent of the status of waitrequest. Did you
run into troubles with this?

Ok, after a second thought on your code it looks like you're starting
your actions at the last cycle of the former transaction. Mmh, kind
of strange thinking.

What about this version (sc_* signals are my internal master signals)

that case is the next state logic and combinatorial:

case state is

when idl =>
if sc_rd='1' then
if av_waitrequest='0' then
next_state <= rd;
else
next_state <= rdw;
end if;
elsif sc_wr='1' then
if av_waitrequest='0' then
next_state <= wr;
else
next_state <= wrw;
end if;
end if;

when rdw =>
if av_waitrequest='0' then
next_state <= rd;
end if;

when rd =>
next_state <= idl;

-- here I could add the code from the idl
-- state for back to back read and writes
....

sc_rd and sc_wr directly start setting read and write. However,
again I have to register them for keeping them set for wait
states (sc_rd and sc_wr are only valid for one cycle).
When there is a waitrequest, I'm just waiting.

Read data is registered in the state register process:

elsif rising_edge(clk) then

state <= next_state;
reg_rd <= '0';
....
case next_state is

when idl =>

when rdw =>
reg_rd <= '1';

when rd =>
reg_rd_data <= av_readdata;
....

That's my (violation) trick as an example on the Avalon read signal:

av_read <= sc_rd or reg_rd;

>> In my case e.g. the address from JOP (= top of stack) is valid
>> only for a single cycle. To avoid one more cycle latency I present
>> in the first cycle the TOS and register it. For additional wait
>> cycles a MUX switches from TOS to the address register. I know this is a slight violation of the Avalon specification.
>> There can be some glitches on the MUX switch.
>
> You might try looking at incorporating the above mentioned template and avoid the Avalon violation. What I've also found in
> debugging other's code

Then I get an additional cycle latency. That's what I want to avoid.

> that doesn't adhere to the above template is that there can be subtle errors that take just the right combination of events to
> occur in order to cause an actual system error of some sort (i.e. not just the Avalon generated assert in simulation). If you use
> the above template, you're guaranteed to be Avalon compliant and not have this issue.

Good to hear the comments from one who struggled with Avalon.
However, I'm still not so happy with the style the bus is
specified. The first timing diagrams look more like an asynch.
SRAM timing specification with a clock drawn on top of it.
And then it goes on with slaves with fixed wait states. Why?
If do not provide a waitrequest in a slave that needs wait
states you can get into troubles when you specify it wrong
at component genration.

Or does the Avalon switch fabric, when registered, take this
information into account for the waitrequest of the master?
Could be for the SRAM component. Should look into the
generated VHDL code (or in a simulation)...

> In my opinion, the Avalon bus and the .PTF files to completely define component I/O interfaces is a huge improvement over
> Wishbone. Although

agree, that's nice.

>> For synchronous on-chip
>> peripherals this is absolute not issue. However, this signals
>> are also used for off-chip asynchronous peripherals (SRAM).
>> However, I assume that this possible switching glitches are
>> not really seen on the output pins (or at the SRAM input).
>
> Again, if you use the template, you won't have the gliching even if the signals go off chip to a device.

Again, one more cycle latency ;-)

Martin

From: Martin Schoeberl on 12 Aug 2006 15:15

>> Another point is, in my opinion, the wrong role who has to hold data
>> for more than one cycle. This is true for several busses (e.g. also
>> Wishbone). For these busses the master has to hold address and write
>> data till the slave is ready. This is a result from the backplane
>> bus thinking. In an SoC the slave can easily register those signals
>> when needed longer and the master can continue.
>
> What's you're describing is not an Avalon issue or a result of 'backplane
> bus thinking', and is not a limitation of Avalon. If it exists in your
> design than it's a limitation of the slave component design. The slave

Ok, but what when I'm not writing the slave. At the moment I think
the master side.

> generates the wait request output which is used to tell the master that it
> needs to hold the address and data for it because it essentially doesn't
> have any space left to hold it itself. If the slave component design has
> provisions to register and hold the address and data than it can do this

You could force the slave designers to register the address and data
if needed with a different specification - as SimpCon ;-)
Or you could allow non registering slaves, but register it
in the Avalon switch fabric for those slaves that do not
register the address and data by themself.

However, this is not only an issue with Avalon. It is the
same with Wishbone, OPB, AMBA, and OCP. So, perhaps
my idea is wrong ;-)

> leave the wait request output not asserted and the cycle completes. If you
> think about it, this would simply be a one deep fifo for holding the
> address/data/command. If you generalize a bit more you would see that the
> fifo wouldn't need to be restricted to being only one deep and could be any
> depth. So as the master device performs reads and writes these commands
> would be written into the fifo without asserting wait request but also
> remember that any fifo can fill up at which point the slave must assert wait
> request because it has no more room to store anything which means that the
> master device has to hold on to it for a bit.

That idea is incorporated in a similar way in the SimpCon spec. See at:
http://www.opencores.org/cvsweb.cgi/~checkout~/simpcon/doc/simpcon.pdf

page 7, Figure 4. Perhaps it could be drawn a little bit
clearer.

>
>> On the other hand,
>> as JOP continues to execute and it is not so clear when the result
>> is read, the slave should hold the data when available. That is easy
>> to implement, but Wishbone and Avalon specify just a single cycle
>> data valid.
>>
> What you would need then is a signal generated by the master back to the
> slave to say that the master isn't ready to receive the data and would then
> cause the slave to hold on to the read data. But if you think about it a
> bit more, the only reason that the slave is providing read data in the first
> place is because the master device requested it in the first place. If the
> master wasn't ready to receive data it should simply not assert the read
> signal command output.

Why not? What about issue a read command and than just continue
with other instructions to hide the latency. Isn't this also the
idea of prefetching in newer processors?

> By the way, Avalon has a leg up on Wishbone in regards to a cleaner logical
> approach to handling wait states and latency. Avalon treats the address

Agree, with Wishbone you can not issue overlapping transactions.

Martin

From: KJ on 12 Aug 2006 15:44

"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message
news:44de2247$0$28520$3b214f66(a)tunews.univie.ac.at...
>
> Yes, but e.g. for an SRAM interface there are some timings in ns. And
> it's not that clear how this translates to wait states.

Since Avalon is not directly compatible the typical SRAMs, this implies that
you need to have an Avalon compatible component that translates Avalon into
the particular SRAM that you're interested in. In other words, you need an
Avalon SRAM Controller component. Once you have this component, you would
just plop it down in SOPC Builder just like you would a DDR Controller, a
PCI interface, or any other SOPC component.

Assuming for the moment, that you wanted to write the code for such a
component, one would likely define that the component to have the following:
- A set of Avalon bus signals
- SRAM Signals that are defined as Avalon 'external' (i.e. they will get
exported to the top level) so that they can be brought out of the FPGA.
- Generic parameters so that the actual design code does not need to hard
code any of the specific SRAM timing requirements.

Given that, the VHDL code inside the SRAM controller would set it's Avalon
side wait request high as appropriate while it physically performs the
read/write to the external SRAM. The number of wait states would be roughly
equal to the SRAM cycle time divided by the Avalon clock cycle time.

Although maybe it sounds like a lot of work and you may think it results in
some sort of 'inefficient bloat' it really isn't. Any synthesizer will
quickly reduce the logic to what is needed based on the usage of the design.
What you get in exchange is very portable and reusable components.

>
>> The template is
>>
>> process(Clock)
>> begin
>> if rising_edge(Clock) then
>> if (Reset = '1') then
>> Read <= '0';
>> Write <= '0';
>> elsif (WaitRequest = '0') then
>> -- Put your code here for whenever it is you want to read
>> and/or write
>> -- When writing you would also set WriteData here
>> -- For example, if you're not ready to receive data whenever
>> the slave says it is
>> -- ready than you simply set Read <= '0' until you are ready.
>> end if;
>> end if;
>> end process;
>>
>
> I disagree on this template ;-) Perhaps, I'm wrong (as an Avalon newbie),
> but: Why is all your active code in waitrequest='0'? From the
> Avalon specification. You have to bring out address, read, write and
> writedata to start the transaction - independent of waitrequest.
> waitrequest=0 just ends your transaction.

Not true. The Avalon bus specification requires the master hold (i.e. not
change) Address, WriteData, Read and Write if WaitRequest is '1'. Given
that the 'elsif' in the template insures that the inner code only gets
executed when WaitRequest = '0'.

>
> From the specification (p 47, 49) it is allowed to start a read or
> write transaction independent of the status of waitrequest. Did you
> run into troubles with this?
>
That's true, you can 'start' a read/write transaction independent of wait
request, the thing is that you can't end it or allow any of the outputs to
change if waitrequest is active.

> Ok, after a second thought on your code it looks like you're starting
> your actions at the last cycle of the former transaction. Mmh, kind
> of strange thinking.

Not really, it is just simpler to say that I'm not going to go anywhere near
code that can potentially change any of the outputs if wait request is
active. As an example, take a look at your code below where you've had to
sprinkle the 'if av_waitrequest = '0' throughout the code to make sure you
don't change states at the 'wrong' time (i.e. when av_waitrequest is
active). Where problems can come up is when you miss one of those 'if
av_waitrequest = '0' statements. Depending on just where exactly you missed
putting it in is is where it can be a rather subtle problem to debug.

Now consider if you had simply put the 'if av_waitrequest = '0' statement
around your entire case statement (with it understood that outside that
though you would have to have the obligatory 'if reset go to idle'). Now it
is much easier to see that your entire state machine will not change states
on you at the wrong time...less code and more easily code inspected for
correctness. I've also seen it reduce the number of states required which
simplifies the code even more.
>
> What about this version (sc_* signals are my internal master signals)
>
> that case is the next state logic and combinatorial:
>
> case state is
>
> when idl =>
> if sc_rd='1' then
> if av_waitrequest='0' then
> next_state <= rd;
> else
> next_state <= rdw;
> end if;
> elsif sc_wr='1' then
> if av_waitrequest='0' then
> next_state <= wr;
> else
> next_state <= wrw;
> end if;
> end if;
>
> when rdw =>
> if av_waitrequest='0' then
> next_state <= rd;
> end if;
>

> when rd =>
> next_state <= idl;
--- Are you sure you always want to go to idl? This would probably cause an
error if the avalon outputs were active in this state.
>
> -- here I could add the code from the idl
> -- state for back to back read and writes
> ...
>
> sc_rd and sc_wr directly start setting read and write. However,
> again I have to register them for keeping them set for wait
> states (sc_rd and sc_wr are only valid for one cycle).
> When there is a waitrequest, I'm just waiting.
>
> Read data is registered in the state register process:
>
> elsif rising_edge(clk) then
>
> state <= next_state;
> reg_rd <= '0';
> ...
> case next_state is
>
> when idl =>
>
> when rdw =>
> reg_rd <= '1';
>
> when rd =>
> reg_rd_data <= av_readdata;
> ...
>
> That's my (violation) trick as an example on the Avalon read signal:
>
> av_read <= sc_rd or reg_rd;

Whether it works or not for you would take more analysis, I'll just say that
every time I've run across code that wasn't working for 'some reason' and I
managed to trace it back to a mishandled Avalon data transfer and the Avalon
master code did not match m

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Embedded clocks
Next: CPU design