JOP as SOPC component [FPGA]

From: Martin Schoeberl on 27 Aug 2006 13:33

>> BTW: As I'm also academic I should/have to publish papers. SimpCon
>> is on my list for months to be published - and now it seems to be
>> the right time. I will write a draft of the paper in the next few
>> days. If you are interested I'll post a link to it in this thread
>> and your comments are very welcome.
>>
> OK.
>
I've uploaded the first draft of the paper:
http://www.jopdesign.com/doc/simpcon_date.pdf

It's still very similar to the SimpCon definition
at opencores.org. Comments are very welcome.

To KJ and Tommy: I've added you both in the
Acknowledgments. I hope this is ok with you ;-)

Martin

From: Tommy Thorn on 27 Aug 2006 19:47

Martin Schoeberl wrote:
> I've uploaded the first draft of the paper:
> http://www.jopdesign.com/doc/simpcon_date.pdf
>
> It's still very similar to the SimpCon definition
> at opencores.org. Comments are very welcome.
>
> To KJ and Tommy: I've added you both in the
> Acknowledgments. I hope this is ok with you ;-)

I fail to see what's "controversial" about our discussion. Maybe you
meant "heated".

As a reviewer (I used to be academic) I'd be very concerned that you
fail to disclose that SimpCon is single-master only, and thus not a full
alternative to the others listed.

Tommy

From: Martin Schoeberl on 28 Aug 2006 03:50

> Martin Schoeberl wrote:
>> I've uploaded the first draft of the paper:
>> http://www.jopdesign.com/doc/simpcon_date.pdf
>>
>> It's still very similar to the SimpCon definition
>> at opencores.org. Comments are very welcome.
>>
>> To KJ and Tommy: I've added you both in the
>> Acknowledgments. I hope this is ok with you ;-)
>
> I fail to see what's "controversial" about our discussion. Maybe you meant "heated".

in my feeling the discussion with KJ is a little bit 'controversial',
but this is meant positiv. However, I can remove this ;-)

>
> As a reviewer (I used to be academic) I'd be very concerned that you fail to disclose that SimpCon is single-master only, and thus
> not a full alternative to the others listed.

And that's still our little controverse. I agree that SimpCon is
not ideal for several outstanding requests from a single master.
However, I don't see an issue for a multi-master system with
SimpCon. It's possible with Wishbone and SimpCon is 'more
expressive' than Wishbone, right?

Martin

From: KJ on 29 Aug 2006 07:23

"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message
news:44f1d74f$0$12642$3b214f66(a)tunews.univie.ac.at...
>>> BTW: As I'm also academic I should/have to publish papers. SimpCon
>>> is on my list for months to be published - and now it seems to be
>>> the right time. I will write a draft of the paper in the next few
>>> days. If you are interested I'll post a link to it in this thread
>>> and your comments are very welcome.
>>>
>> OK.
>>
> I've uploaded the first draft of the paper:
> http://www.jopdesign.com/doc/simpcon_date.pdf
>
> It's still very similar to the SimpCon definition
> at opencores.org. Comments are very welcome.
Mostly minor comments
Section 1.1.1 Avalon
After the question "How is the great complexity handled?" You answered with
"The switch fabric is in charge to connect those different devices and
perform the necessary conversion" This is generally not quite the case. It
is up to the Avalon slave design to perform the necessary conversions (I
believe this is referring to whether this is a 'simple' connection or a
pipeline with variable latency) with the Avalon fabric responsible for the
connections between the devices (as you said).

I think the SOPC Builder tool and the various components that Altera bundles
in with SOPC Builder is giving the impression that it is the fabric but my
interpretation is that SOPC Builder is allowing you to easily create slave
components that can be used to interface to external parts without even
writing a single line of VHDL as you at first did.

That leads to the last question of "Who provides this switch fabric? Is it
proprietary Altera design, or are there open source implementations
available?" The switch fabric itself can easily be written, it is on the
order of six lines of code per interface for a point to point connection,
there is nothing really magic in what Altera spits out of SOPC Builder based
on the Avalon bus definition. I don't think it would be difficult to create
an open source version of this connection logic, but whether simple use of
the Avalon bus without also targetting an Altera device (even if no Altera
software is involved) is violating anything is an open legal question as
you've pointed out (I'm guessing that it might but not really sure).

Section 2
In the paragraph starting "The third issue is..." you ask the question "Why
not force the slaves to hold the data on the output registers as long as
there is no new transaction?" A couple follow up questions to that though
would be
- What is a SimpCon slave to do if there IS a new transaction before the old
one has been acknowledged?
- Does the SimpCon fabric prevent this from happening? (I think it does,
but not exactly sure)

Other comments:

Personally, I'm still a bit confused about just exactly which pipeline on
the master side is 'released' when rdy_cnt hits the magic number and how
that differs from releasing the Avalon master address/command pipeline via
waitrequest and releasing the Avalon master readdata pipeline via
readdatavalid. With the JOP example we've been bandying about I accept that
it's not totally the master address/command that is being released but
'something' on the JOP master for kick starting the data processing pipeline
based on the read data actually becoming available (and getting 'early'
notice of this).

Maybe my last post regarding the use of the Avalon waitrequest signal to
generate the SimpCon early rdy_cnt signal came too late to make it to the
presses, but I think that addresses the perceived Avalon issue. Although
now that I think I understand how rdy_cnt works and where it really comes
from it might seem again that since my Avalon master side change involved
having the master 'know' the latency about the 'slave' would seem to be
cheating (since this wasn't required by the SimpCon implementation as I had
thought) but I guess I'll fall back to what Tommy mentioned earlier that
since JOP was optomized for SimpCon in the first place it implies that an
Avalon/SimpCon bridge must be built and such bridges can tend to be either
less than optimal in performance (as your performance numbers indicate) or
involve a bit of cheating to get maximum performance (as my change would
be). Had JOP been optomized for Avalon to begin with would the numbers be
any better without any cheating? That's sort of the open question and I'm
not necessarily expecting an answer.

My main interest in the thread was in understanding what sort of bottlenecks
might be lurking in Avalon that I'm not aware of. A couple areas of
'possible' weakness for Avalon that I am aware of are:
- It can hang (there is no requirement for a max time for a cycle).
Something to be aware of, but generally not an issue since whoever is
designing the slave components had better address this and not allow for a
hang to occur.
- No notion of a 'retry'. Again, given the environment of being on a chip,
the slave design shouldn't be allowe to say 'try again later please' so I
don't think this should be an actual design issue, just something to be
aware of.
- Can't have pending reads from multiple slaves. I suppose this could be
important to some, it hasn't for me.

Another possible area of weakness 'might' be whatever it is that is
hindering performance of JOP on Avalon as compared to JOP on SimpCon. If
fundamentally it's just the SimpCon/Avalon bridge and that a native JOP on
Avalon would not require any cheating and would meet performance then there
is no other issue then.

Just curious, Figure 7 shows pipelined reads from a static RAM where a new
address is presented on every other clock cycle which matches the
performance of the actual SRAM. If the device had been an external DRAM/DDR
or such, you can clock out a new read command on every clock cycle even
before you get the first data item back. After the data from that first
read does eventually come back, the data from the subsequent reads will also
come back on consecutive clock cycles. Does SimpCon support that sort of
device? Or would it have to take multiple clock cycles per read? At first
glance since the slave has only the one 'rdy_cnt' it would appear that it
would not whereas the mythical "native Avalon interface JOP" would be able
to hit the DRAM at a much higher rate. But maybe it could by varying the

From: Martin Schoeberl on 29 Aug 2006 20:50

> After the question "How is the great complexity handled?" You answered with "The switch fabric is in charge to connect those
> different devices and perform the necessary conversion" This is generally not quite the case. It is up to the Avalon slave
> design to perform the necessary conversions (I

mmh, as far as I understand it you can provide any master or
slave implementation that follows the rule of Avalon: From
a very simple asynchronous device to a pipelining device.
The generated switch fabric will provide the adaption.

> proprietary Altera design, or are there open source implementations available?" The switch fabric itself can easily be written,
> it is on the order of six lines of code per interface for a point to point connection, there is nothing really magic in what
> Altera spits out of SOPC Builder based

I think that this is actually the power of SOPC builder. It will
do all your glue logic stuff for the interconnect (address
decoding, byte order managing, byte enable on write,...).

And that's a lot more than 6 lines of code ;-)

> an open source version of this connection logic, but whether simple use of the Avalon bus without also targetting an Altera device
> (even if no Altera software is involved) is violating anything is an open legal question as you've pointed out (I'm guessing that
> it might but not really sure).

AFAIK the bus definition is kind of open-source - free. However,
I'm sure you're not allowed to use the SOPC builder generated
VHDL code on a Xilinx device ;-)
BTW: I asked Altera Austria about a related topic: Is it allowed
to 'use' the DRAM controller in an open-source environment (means
can I upload the VHDL code to a web server). However, they had
no real answare to this. They sayed that the SDRAM controller
does is part of NIOS and does only work with NIOS. Therefore,
one has to buy a NIOS license. But it works quite well with JOP
too ;-)

> Section 2
> In the paragraph starting "The third issue is..." you ask the question "Why not force the slaves to hold the data on the output
> registers as long as there is no new transaction?" A couple follow up questions to that though would be
> - What is a SimpCon slave to do if there IS a new transaction before the old one has been acknowledged?

good question ;-) It depends on the pipeline level. It can accept
it. But this is not directly related on the request to 'just'
keep the data valid until a newer one is available (and was
requested).

> - Does the SimpCon fabric prevent this from happening? (I think it does, but not exactly sure)

It is the master who decides when to issue a new request
and when to leave the slave with the last data.

> Personally, I'm still a bit confused about just exactly which pipeline on the master side is 'released' when rdy_cnt hits the
> magic number and how that differs from releasing the Avalon master address/command pipeline via waitrequest and releasing the
> Avalon master readdata pipeline via

From the pure pipelining point of view there is not so much
difference between Avalon and SimpCon. Both can do pipelining
when the slave supports it. The pipeline level in Avalon is
not restricted (and not so obvious to see). In SimpCon with
the 2 bit rdy_cnt the pipeline level is restricted.

> it's not totally the master address/command that is being released but 'something' on the JOP master for kick starting the data
> processing pipeline based on the read data actually becoming available (and getting 'early' notice of this).

the early notice is the little thing I like on my design: It helps
a pipelined master waiting on data *and* performs the flow control
for the pipelining. Perhaps too much information for a singel
2 bit signal. However, I like less signals. See the OPB spec.
for the extreme different way to do it: They have way too many
signals defined - I don't like to read so many signal definitions ;-)

> having the master 'know' the latency about the 'slave' would seem to be cheating (since this wasn't required by the SimpCon
> implementation as I had thought) but I guess I'll fall back to what Tommy mentioned earlier that since JOP was optomized for
> SimpCon in the first place it implies that an Avalon/SimpCon bridge must be built and such bridges can tend to be either

about cheating and bridge: I'm allready cheating on the Avalon
interface (as mentioned in an earlier post) to generate the
address/control/data holding in the master - I switch between
the original single cycle register at the first cycle
and a hold register on the following cycles - that's not
allowed in the original spec.

Bridges are a difficult topic: In the general case you add
latency cycles.

> be). Had JOP been optomized for Avalon to begin with would the numbers be any better without any cheating? That's sort of the
> open question and I'm not necessarily expecting an answer.

Mmh, hard to say. I implemented the memory interface without
any SoC interconnect in mind - just tried to get the best
performance on SRAMs. The bus thing came very late in the
design. So it looks like I'm now defining a bus that 'fits'
to the way the original memory interface of JOP was.
However, perhaps that's not that bad as a new idea for a
differnt bus comes up ;-)

> My main interest in the thread was in understanding what sort of bottlenecks might be lurking in Avalon that I'm not aware of. A
> couple areas of 'possible' weakness for Avalon that I am aware of are:
> - It can hang (there is no requirement for a max time for a cycle). Something to be aware of, but generally not an issue since
> whoever is designing the slave components had better address this and not allow for a hang to occur.
> - No notion of a 'retry'. Again, given the environment of being on a chip, the slave design shouldn't be allowe to say 'try again
> later please' so I don't think this should be an actual design issue, just something to be aware of.

Implementing those two features makes your bus (and interfacing)
way more complicated. AFAIK OPB does it, but you end up with
so many signals...

> - Can't have pending reads from multiple slaves. I suppose this could be important to some, it hasn't for me.

That's more an issue of the interconnect logic and not the
bus definition.

>
> Just curious, Figure 7 shows pipelined reads from a static RAM where a new address is presented on every ot

First | Prev | Next | Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Embedded clocks
Next: CPU design