JOP as SOPC component [FPGA]

From: Martin Schoeberl on 12 Aug 2006 16:30

that's almost like chatting - high speed news group
discussion ;-)

to keep up with your speed I've to split the answers
according to the sub topics. Here about the Avalon
SRAM interface.

>> Yes, but e.g. for an SRAM interface there are some timings in ns. And
>> it's not that clear how this translates to wait states.
>
> Since Avalon is not directly compatible the typical SRAMs, this implies that

Again disagree ;-) The Avalon specification also covers asynchronous
peripherals. That's adds to a little bit to the complexity of the
specification.

> Assuming for the moment, that you wanted to write the code for such a component, one would likely define that the component to
> have the following:
> - A set of Avalon bus signals
> - SRAM Signals that are defined as Avalon 'external' (i.e. they will get exported to the top level) so that they can be brought
> out of the FPGA.
> - Generic parameters so that the actual design code does not need to hard code any of the specific SRAM timing requirements.

Yes, that's the way it is described in the Quartus manual. I did my
SRAM interface in this way. Here is a part of the .ptf that describes
the timing of the external SRAM:

SLAVE sram_tristate_slave
{
SYSTEM_BUILDER_INFO
{
....
Setup_Time = "0ns";
Hold_Time = "2ns";
Read_Wait_States = "18ns";
Write_Wait_States = "10ns";
Read_Latency = "0";
....

> Given that, the VHDL code inside the SRAM controller would set it's Avalon side wait request high as appropriate while it
> physically performs the

There is no VHDL code associated with this SRAM. All is done by the
SOPC builder.

> read/write to the external SRAM. The number of wait states would be roughly equal to the SRAM cycle time divided by the Avalon
> clock cycle time.

The SOPC builder will translate the timing from ns to clock cycles for
me. However, this is a kind of iterative process as the timing of the
component depends on tco and tsu of the FPGA pins of the compiled design.
Input pin th can usually be ignored as it is covered by the minimum tco
of the output pins. The same is true for the SRAM write th.

> Although maybe it sounds like a lot of work and you may think it results in some sort of 'inefficient bloat' it really isn't. Any
> synthesizer will quickly reduce the logic to what is needed based on the usage of the design. What you get in exchange is very
> portable and reusable components.

No, it's really not much work. Just a few mouse clicks (no VHDL) and the
synthesized result is not big. The SRAM tristate bridge contains just
the address and control output registers. I assume the input registers
are somwhere burried in the arbitrator.

Martin

From: Martin Schoeberl on 12 Aug 2006 17:03

> Not really, it is just simpler to say that I'm not going to go anywhere near code that can potentially change any of the outputs
> if wait request is active. As an example, take a look at your code below where you've had to sprinkle the 'if av_waitrequest =
> '0' throughout the code to make sure you don't change states at the 'wrong' time (i.e. when av_waitrequest is active). Where
> problems can come up is when you miss one of those 'if av_waitrequest = '0' statements. Depending on just where exactly you
> missed putting it in is is where it can be a rather subtle problem to debug.

Agree on the save side, but...

>
> Now consider if you had simply put the 'if av_waitrequest = '0' statement around your entire case statement (with it understood
> that outside that

I cannot do this. This case statement is combinatoric. It would introduce
a latch for next_state. The reason to split up the state machine in
a combinatoric next state logic and the clocked part is to react
'one cycle earlier' with state machine output registers depending
on next_state. You can code this also with a single case in a clock
process. However, than you have to code your output registers on the
transitions (in the if part), which gets a little bit more confusing.

>>
>> What about this version (sc_* signals are my internal master signals)
>>
>> that case is the next state logic and combinatorial:

the process containing this case statement is:

process(state, sc_rd, sc_wr, av_waitrequest)

begin

next_state <= state;

>>
>> case state is
>>
>> when idl =>
>> if sc_rd='1' then
>> if av_waitrequest='0' then
>> next_state <= rd;
>> else
>> next_state <= rdw;
>> end if;
>> elsif sc_wr='1' then
>> if av_waitrequest='0' then
>> next_state <= wr;
>> else
>> next_state <= wrw;
>> end if;
>> end if;
>>
>> when rdw =>
>> if av_waitrequest='0' then
>> next_state <= rd;
>> end if;
>>
>
>> when rd =>
>> next_state <= idl;
> --- Are you sure you always want to go to idl? This would probably cause an error if the avalon outputs were active in this
> state.

No problem as next_state goes to rd only when av_waitrequest is '0'.
Perhaps 'rd' is a missleading state name. The input data is registered
when next_state is 'rd'. So state is 'rd' when the input data is registered.

>
> Whether it works or not for you would take more analysis, I'll just say that

For a complete picture you can look at the whole thing at:
http://www.opencores.org/cvsweb.cgi/~checkout~/jop/vhdl/scio/sc2avalon.vhd

>>> You might try looking at incorporating the above mentioned template and avoid the Avalon violation. What I've also found in
>>> debugging other's code
>>
>> Then I get an additional cycle latency. That's what I want to avoid.
>
> Not on the Avalon bus, maybe for getting stuff into the template but even that is a handshake. I've even used Avalon within
> components to transfer

Ok, than not at the Avalon bus directly but as you sayed 'getting stuff
into the template'. That's the same for me (in my case).

If my master has a (internal) read request and I have to forward it
to Avalon in a clocked process (as you do with your template)
I will loose one cycle. Ok in the interface and not in the bus.
Still a lost cycle ;-)

> data between rather complicated processes just because it is a clean data transfer interface and still have no problem
> transferring data on every clock cycle when it is available. I'm not familiar enough with your code, but I suspect that it can be
> done in your case as well.

You can do it when your template 'controls' the master logic but not
the other way round.

>> And then it goes on with slaves with fixed wait states. Why?
>> If do not provide a waitrequest in a slave that needs wait
>> states you can get into troubles when you specify it wrong
>> at component genration.
>
> No, PTF files let you state that there are a fixed number of wait states and not have an explicit waitrequest on the slave.

I meant when you assume n wait states in your VHDL code, but
did a mistake in the PTF file and specified less wait states.
This erro cannot happen when you generate the waitrequest within
your VHDL code.

>>
>> Or does the Avalon switch fabric, when registered, take this
>> information into account for the waitrequest of the master?
>
> It does.

That's a reason to go with fix wait states!

Or a bus specification that counts down the number of
wait states ;-)

BTW: Did you take a look into the SimpCon idea?

Dreaming a little bit: Would be cool to write an
open-source system generator (like SOPC builder) for
it. Including your suggestion of an open and documented
specification file format.

>
>> Could be for the SRAM component. Should look into the
>> generated VHDL code (or in a simulation)...
>>
> I'd suggest looking at the system.ptf file for your design.

It's still in ns, which makes sense.

Martin

>> Again, one more cycle latency ;-)
> Again, nope not if done correctly.

I think we finally agreed, did we?

Cheers,
Martin

From: Tommy Thorn on 13 Aug 2006 01:15

Wow, this spanned a long thread.

Martin Schoeberl wrote:
> What helps is to know in advance (one or two cycles) when the result
> will be available. That's the trick with the SimpCon interface.

That approach is common internally in real cores, but adds a lot of
complication while it's an open question how many Avalon application
could benefit from it.

> There is not a single ack or waitrequest signal, but a counter that
> will say how many cycles it will take to provide the result. In this
> case I can restart the pipeline earlier.

AFAIR, Avalon _does_ support slaves with fixed number of latency cycles,
but an SDRAM controller by nature won't be fixed cycles.

> Another point is, in my opinion, the wrong role who has to hold data
> for more than one cycle. This is true for several busses (e.g. also
> Wishbone). For these busses the master has to hold address and write
> data till the slave is ready. This is a result from the backplane
> bus thinking. In an SoC the slave can easily register those signals
> when needed longer and the master can continue.

When happens then when you issue another request to a slave which hasn't
finished processing the first? Any queue will be finite and eventually
you'd have to deal with stalling anyway. Any issue is that there are
generally many more slaves than masters so it makes sense to move the
complication to the master.

....
> Wishbone and Avalon specify just a single cycle data valid.

Again, simplify the slave (and the interconnect) and burden the master.

Avalon is IMO the best balance between complexity, performance and
features in all the (few) interconnect I've seen yet (I haven't seen
SimpCon yet). In particular I found Wishbone severely lacking for my
needs. Avalon is proprietary though, so I roll my own portable
implementation inspired by Avalon with just the features I needed:
- all reads are pipelined with variable latency (accept of request is
distinct from delivery of data, thus inherently supporting multiple
outstanding requests)
- multi master support
- burst support (actually not implemented yet, but not that hard)

It's nearly as trivial as Wishbone, though offers much higher
performance. Latency is entirely up to the slave which can deliver data
as soon as the cycle after the request was posted. (Though, arriving at
this simplicity took a few false starts).

> Are there any other data available on that. I did not find many
> comments in this group on experiences with Cyclone I and II. Looks
> like the CII was more optimized for cost than speed. Yes, waiting
> for III ;-)

The only mention of Cyclone III I've seen outside this newsgroups was
some mentioning in passing on EEtimes that suggested SIII and CIII were
expected this year. I just used Cyclone III as a generic term for
whatever the next Altera low-cost part is.

Regards,
Tommy

From: Tommy Thorn on 13 Aug 2006 01:28

Antti Lukats wrote:
>>> as very simple example for avalon master-slave type of peripherals there
>>> is on free avalon IP core for SD-card support the core can be found
>>> at some russian forum and later it was also added to the user ip
>>> section of the microtronix forums.
>> Any link handy for this example?
>>
> http://forum.niosforum.com/forum/index.php?showtopic=4430

"Sorry, the link that brought you to this page seems to be out of date
or broken."

I can see other postings just fine, though. Another reference?

Tommy

From: Antti Lukats on 13 Aug 2006 02:11

"Tommy Thorn" <foobar(a)nowhere.void> schrieb im Newsbeitrag
news:44DEB88F.50805(a)nowhere.void...
> Antti Lukats wrote:
>>>> as very simple example for avalon master-slave type of peripherals
>>>> there
>>>> is on free avalon IP core for SD-card support the core can be found
>>>> at some russian forum and later it was also added to the user ip
>>>> section of the microtronix forums.
>>> Any link handy for this example?
>>>
>> http://forum.niosforum.com/forum/index.php?showtopic=4430
>
> "Sorry, the link that brought you to this page seems to be out of date or
> broken."
>
> I can see other postings just fine, though. Another reference?
>
> Tommy

Tommy the link works, but you may have to register at the niosforum
in any case the sd card ip is one of the lasting postings at "post your ip"
section at niosforum
i dont have an link ready where the download would be accessible without
registration
sure I can re-upload it somewhere:)

antti

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Embedded clocks
Next: CPU design