JOP as SOPC component [FPGA]

From: jacko on 24 Aug 2006 18:23

hi

http://indi.joox.net now has the first compiled quartus files for the
16 bit indi core.

basiclly and alu, control and registers, with fast interrupt switch.

asynchro busy of cpu, and syncronous reset.

the bus interface is not complete yet, as i have to think about the
expansion modules.

bsd licence, not tested as no test bench, but logic looks ok.

cheers

jacko

p.s. i am thinking different read and write address buses, and some
carry extension logic.

From: Martin Schoeberl on 25 Aug 2006 03:49

seems that you posted in the wrond thread ;-)

> http://indi.joox.net now has the first compiled quartus files for the
> 16 bit indi core.
>
> basiclly and alu, control and registers, with fast interrupt switch.
>
> asynchro busy of cpu, and syncronous reset.
>
> the bus interface is not complete yet, as i have to think about the
> expansion modules.

Now you have to decide about: Avalon, SimpCon, Wishbone,....

Perhaps you can independetly compare Avalon and SimpCon with
your CPU design :-)

KJ can give you Avalon support, and I can give you SimpCon
support. However, you will find lot of information already
in this thread.

Martin

From: KJ on 25 Aug 2006 07:16

Martin,

Thanks for the detailed response. OK, we're definitely in the home stretch
on this one.
To summarize...
>> I'm assuming that the master side address and command signals enter the
>> 'Simpcon' bus and the 'Avalon' bus on the same clock cycle.
> This assumption is true. Address and command (+write data) are
> issued in the same cycle - no magic there.
So Avalon and SimpCon are both leaving the starting blocks at the same
time....no false starts from the starting gun.

>> Given that assumption though, it's not clear to me why the address and
>> command could not be designed to also end up at the actual memory
>> device on the same clock cycle.
I don't think your response here hit my point. I wasn't questioning on
which cycle the address/command/write data actually got to the SRAM, just
that I didn't see any reason why the Avalon or SimpCon version would arrive
on different clock cycles. Given the later responses from you though I
think that this is true....we'll get to that.

>> Given that address and command end up at the memory device on the same
>> clock cycle whether SimpCon or Avalon, the resulting read data would
>> then be valid and returned to the SimpCon/Avalon memory interface logic
>> on the same clock cycle.
> In SimpCon it will definitely arrive one cycle later. With Avalon
> (and the generated memory interface) I 'assume' that there is also
> one cycle latency - I read this from the tco values of the output
> pins in the Quartus timing analyzer report. For the SRAM interface I
> did in VHDL I explicitly added registers at the addredd/rd/wr/data
> output. I don't know if the switch fabric adds another cycle.
> Probably not, if you do not check the pipelined checkbox in the SOPC
> Builds.
Again, when I was saying 'the same clock cycle' I'm referring to clock cycle
differences between Avalon and SimpCon. In other words, if the
SimpCon/Avalon bus cycle started on clock cycle 0, then when we start
talking about when the data from the SRAM arriving back at the input to the
FPGA, then with both designs it happens on clock cycle 'N'. For the
relative comparison between the two busses, I don't much care what 'N' is
(although it appears to either be '1' or '2') just that 'N' is the same for
both designs. Again, I *think* you might be agreeing that this is true
here, but coming up is a more definitive agreement.

By the way, no Avalon does not add any clock cycle latency in the fabric.
It is basically just a combinatorial logic router as it relates to moving
data around.

>> Given all of that, it's not clear to me why the actual returned data
>> would show up on the SimpCon bus ahead of Avalon or how it would be any
>> slower getting back to the SimpCon or Avalon master. Again, this might
>> be where my hangup is but if my assumptions have been correct up to
>> this paragraph then I think the real issue is not here but in the next
>> paragraph.
>
> Completely agree. The read data should arrive in the same cycle from
> Avalon or SimpCon to the master.
And this is a key point. So regardless of the implementation (SimpCon or
Avalon), the JOP master starts the command at the same time for both and the
actual data arrives back at the JOP master at the same time. So the race
through the data path is identical....(whew!), now on to the differences.

> Now that's the point where this
> bsy_cnt comes into play. In my master (JOP) I can take advantage of
> the early knowledge when data will arrive. I can restart my waiting
> pipeline earlier with this information. This is probably the main
> performance difference.
To contine the race analogy...So in some sense, even though the race through
the data path ends in a tie, the advantage you feel you have with SimpCon is
that the JOP master is endowed with the knowledge of when that race is going
to end by virtue of this bsy_cnt signal and with Avalon you think you don't
have this apriori knowledge.

So to the specifics now...I'm (mis)interpreting this to mean that if
'somehow' Avalon could give JOP the knowledge of when 'readdatavalid' is
going to be asserted one clock cycle earlier before it actually is then JOP
on Avalon 'should' be able to match JOP on SimpCon in performance, is that
correct? (Again, this is a key point, where if this assumption is not
correct, the following paragraphs will be irrelevant).

So under the assumption that the key problem to solve is to somehow enable
the Avalon JOP master with the knowledge of when 'readdatavalid' is going to
be asserted, one clock cycle before it actually is I put on my Avalon Mr.
Wizard hat and say....well, gee, for an Avalon connection between a master
and slave that are both latency aware (i.e. they implement 'readdatavalid')
the Avalon specification requires that the 'waitrequest' output be asserted
at least one clock cycle prior to 'readdatavalid'. It can be more than one
and it can vary (what Avalon calls 'variable latency') but it does have to
be at least one clock cycle. Since the Avalon slave design is under your
design control, you could design it to act just this way, to assert
'readdatavalid' one clock cycle after dropping 'waitrequest'. So now, I
have my 'early readdatavalid' signal.

Now inside the JOP master, currently you have some sort of signal that I'll
call 'start_the_pipeline' which is currently based on this busy_cnt hitting
a particular count. 'start_the_pipeline' happens to fire one clock cycle
prior to the data from the SRAM actually arriving back at JOP (from the
previously stated and possibly incorrect assumption). My Avalon equivalent
cheat to the sort of SimpCon cheating about having apriori knowledge about
when the race completes is simply the following
start_the_pipeline <= Jop_Master_Read and not(JOP_Master_Wait_Request);

To reiterate, this JOP master side equation is working under the assumption
that the Avalon slave component that interfaces to the actual SRAM is
designed to assert it's readdatavalid output one clock cycle after dropping
it's waitrequest output. So in some sense now I've endowed the Avalon JOP
with the same sort of apriori knowledge of when the data is available that
the SimpCon implementation is getting.

And here is another point where I think we need to stop and flat out agree
or not agree that
- My stated assumption that if Avalon was to 'somehow' pr

From: KJ on 25 Aug 2006 07:31

>
> Let's say the address/command phase is per definition one cycle.
>
> That definition frees the master to do whatever it wants in the next
> cycle. For another request to the same slave it has to watch for the
> rdy_cnt in SimpCon. However, you can design a switch fabric with
> SimpCon where it is legal to issue a command to a different slave in
> the next cycle without attention to the first slave. You can just
> ignore the first slaves output until you want to use it.
In Avalon this would happen as well. By your definition, the Avalon slave
(if it needed more than one clock cycle to totally complete the operation)
would have to store away the address and command. It would not assert
waitrequest on the first access. If the subsequent access to that slave
occurred while the first was still going on it would then assert wait
request but accesses to other slaves would not be hindered. The Avalon
approach does not put this sort of stuff in the switch fabric but inside the
slave design itself. In fact, the slave could queue up as many commands as
needed (i.e. not just one) but I don't get the impression that SimpCon would
allow this because there is one rdy_cnt per slave (I'm guessing).

>> The Avalon fabric 'almost' passes the waitrequest signal right back to
>> the
>> master device, the only change being that the Avalon logic basically
>> gates
>> the slave's waitrequest output with the slave's chipselect input (which
>> the
>> Avalon fabric creates) to form the master's waitrequest input (assuming a
>> simple single master/slave connection for simplicity here). Per Avalon,
>
> I'm repeating myself ;-) That's the point I don't like in Avalon,
> Wishbone, OPB,...: You have a combinatorial path from address
> register - decoding - slave decision - master decision (to hold
> address/command or not). With a few slaves this will not be an
> issue. With more slaves or a more complicated interconnect (multiple
> master) this can be your critical path.
You're right, in fact it most likely will be the critical path. Does
SimpCon support different delays from different slaves? If not and
'everyone' is required to have the same number of wait states than I can see
where SimpCon would have a performance advantage in terms of final clock
speed on the FPGA, the tradeoff being that...everyone MUST have the same
number of wait states. Whether that is a good or bad tradeoff is a design
decision specific to a particular design so in that regard it's good to have
both SimpCon (as I limitedly understand it) and Avalon.

If SimpCon does allow for different slaves to have different delays than I
don't see how SimpCon would be any better since there would still need to be
address decoding done to figure out what the rdy_cnt needs to count to and
such. Whether that code lives in the master side logic or slave side logic
is irrelevant to the synthesis enging.

>> how it appears to me, which is why I asked him to walk me through the
> As described in the other posting:
Yep, go that posting for the blow by blow description.

KJ

From: KJ on 25 Aug 2006 09:04

Martin,

A bit of an ammendment to my previous post starting...
KJ wrote:
> Martin,
>
> Thanks for the detailed response. OK, we're definitely in the home stretch
> on this one.

After pondering a bit more, I believe the Avalon slave component to the
SRAM should NOT have a one clock cycle delay between waitrequest
de-asserted and readdatavalid asserted since that obviously would stall
the Avalon master (JOP) needlessly. Instead the slave component should
simply assert waitrequest when a request comes in while it is still
busy processing an earlier one. Something along the lines of...

process(Clock)
begin
if rising_edge(Clock) then
if (Reset = '1') or (Count = MAX_COUNT) then
Wait_Request <= '0';
elsif (Chip_Select = '1') then
Wait_Request <= '1';
end if;
end if;
end process;

where 'Count' and MAX_COUNT are used to count however many cycles it
takes for the SRAM data to come back/or be written. If the SRAM only
needs one clock cycle then the term "(Count = MAX_COUNT)" could be
replaced with simply "Wait_Request = '1'"

So now back on the Avalon master side, I can still count on the Avalon
waitrequest to precede readdatavalid but now I've removed the guarantee
that the slave will make the delay between the two to be exactly one
clock cycle. To compensate, I still would key off when the JOP Avalon
master read signal is asserted and waitrequest is not asserted. In
other words the basic logic of my 'start_the_pipeline' signal is OK,
but depending on what the actual latency is for the design, maybe it
needs to be delayed by a clock cycle or so. In any case, that signal
will still provide an 'early' form of the Avalon readdatavalid signal
and I think all of my points on that previous post would still apply.

Hopefully you've read this post before you got too far into typing a
reply to that post.

After yet more pondering on whether this is 'cheating' on the Avalon
side or not I think perhaps it's not. The 'questionable' logic is in
the generation of the 'start_the_pipeline' signal that keys off of
waitrequest and uses it to produce this 'early data valid' signal. But
this logic is simply a part of what I would consider to be a SimpCon to
Avalon bridge. As such, that bridge is privy to whatever signals and
apriori knowledge that the SimpCon bus specification provides as well
as whatever signals and apriori knowledge that the Avalon bus
specification provides and has the task of mating the two.

If SimpCon needs an 'early data valid' signal as part of the interface
then it also needs to pony up to providing whatever info that the
SimpCon master has in regards to being able to know ahead of time when
that data will be valid...in other words, it would need to know the
same thing that you used to generate your rdy_cnt or busy_cnt whatever
it was called.

So I've basically concluded that while it might appear on the surface
to be a 'cheat' to use the Avalon signals as I have, you can only say
that if you're looking strictly at Avalon alone. But since the
function being implemented is a bridge between SimpCon and Avalon, use
of SimpCon information to implement that function is fair game and not
a 'cheat'.

KJ

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Embedded clocks
Next: CPU design