Prev: Embedded clocks
Next: CPU design
From: jacko on 24 Aug 2006 18:23 hi http://indi.joox.net now has the first compiled quartus files for the 16 bit indi core. basiclly and alu, control and registers, with fast interrupt switch. asynchro busy of cpu, and syncronous reset. the bus interface is not complete yet, as i have to think about the expansion modules. bsd licence, not tested as no test bench, but logic looks ok. cheers jacko p.s. i am thinking different read and write address buses, and some carry extension logic.
From: Martin Schoeberl on 25 Aug 2006 03:49 seems that you posted in the wrond thread ;-) > http://indi.joox.net now has the first compiled quartus files for the > 16 bit indi core. > > basiclly and alu, control and registers, with fast interrupt switch. > > asynchro busy of cpu, and syncronous reset. > > the bus interface is not complete yet, as i have to think about the > expansion modules. Now you have to decide about: Avalon, SimpCon, Wishbone,.... Perhaps you can independetly compare Avalon and SimpCon with your CPU design :-) KJ can give you Avalon support, and I can give you SimpCon support. However, you will find lot of information already in this thread. Martin
From: KJ on 25 Aug 2006 07:16 Martin, Thanks for the detailed response. OK, we're definitely in the home stretch on this one. To summarize... >> I'm assuming that the master side address and command signals enter the >> 'Simpcon' bus and the 'Avalon' bus on the same clock cycle. > This assumption is true. Address and command (+write data) are > issued in the same cycle - no magic there. So Avalon and SimpCon are both leaving the starting blocks at the same time....no false starts from the starting gun. >> Given that assumption though, it's not clear to me why the address and >> command could not be designed to also end up at the actual memory >> device on the same clock cycle. I don't think your response here hit my point. I wasn't questioning on which cycle the address/command/write data actually got to the SRAM, just that I didn't see any reason why the Avalon or SimpCon version would arrive on different clock cycles. Given the later responses from you though I think that this is true....we'll get to that. >> Given that address and command end up at the memory device on the same >> clock cycle whether SimpCon or Avalon, the resulting read data would >> then be valid and returned to the SimpCon/Avalon memory interface logic >> on the same clock cycle. > In SimpCon it will definitely arrive one cycle later. With Avalon > (and the generated memory interface) I 'assume' that there is also > one cycle latency - I read this from the tco values of the output > pins in the Quartus timing analyzer report. For the SRAM interface I > did in VHDL I explicitly added registers at the addredd/rd/wr/data > output. I don't know if the switch fabric adds another cycle. > Probably not, if you do not check the pipelined checkbox in the SOPC > Builds. Again, when I was saying 'the same clock cycle' I'm referring to clock cycle differences between Avalon and SimpCon. In other words, if the SimpCon/Avalon bus cycle started on clock cycle 0, then when we start talking about when the data from the SRAM arriving back at the input to the FPGA, then with both designs it happens on clock cycle 'N'. For the relative comparison between the two busses, I don't much care what 'N' is (although it appears to either be '1' or '2') just that 'N' is the same for both designs. Again, I *think* you might be agreeing that this is true here, but coming up is a more definitive agreement. By the way, no Avalon does not add any clock cycle latency in the fabric. It is basically just a combinatorial logic router as it relates to moving data around. >> Given all of that, it's not clear to me why the actual returned data >> would show up on the SimpCon bus ahead of Avalon or how it would be any >> slower getting back to the SimpCon or Avalon master. Again, this might >> be where my hangup is but if my assumptions have been correct up to >> this paragraph then I think the real issue is not here but in the next >> paragraph. > > Completely agree. The read data should arrive in the same cycle from > Avalon or SimpCon to the master. And this is a key point. So regardless of the implementation (SimpCon or Avalon), the JOP master starts the command at the same time for both and the actual data arrives back at the JOP master at the same time. So the race through the data path is identical....(whew!), now on to the differences. > Now that's the point where this > bsy_cnt comes into play. In my master (JOP) I can take advantage of > the early knowledge when data will arrive. I can restart my waiting > pipeline earlier with this information. This is probably the main > performance difference. To contine the race analogy...So in some sense, even though the race through the data path ends in a tie, the advantage you feel you have with SimpCon is that the JOP master is endowed with the knowledge of when that race is going to end by virtue of this bsy_cnt signal and with Avalon you think you don't have this apriori knowledge. So to the specifics now...I'm (mis)interpreting this to mean that if 'somehow' Avalon could give JOP the knowledge of when 'readdatavalid' is going to be asserted one clock cycle earlier before it actually is then JOP on Avalon 'should' be able to match JOP on SimpCon in performance, is that correct? (Again, this is a key point, where if this assumption is not correct, the following paragraphs will be irrelevant). So under the assumption that the key problem to solve is to somehow enable the Avalon JOP master with the knowledge of when 'readdatavalid' is going to be asserted, one clock cycle before it actually is I put on my Avalon Mr. Wizard hat and say....well, gee, for an Avalon connection between a master and slave that are both latency aware (i.e. they implement 'readdatavalid') the Avalon specification requires that the 'waitrequest' output be asserted at least one clock cycle prior to 'readdatavalid'. It can be more than one and it can vary (what Avalon calls 'variable latency') but it does have to be at least one clock cycle. Since the Avalon slave design is under your design control, you could design it to act just this way, to assert 'readdatavalid' one clock cycle after dropping 'waitrequest'. So now, I have my 'early readdatavalid' signal. Now inside the JOP master, currently you have some sort of signal that I'll call 'start_the_pipeline' which is currently based on this busy_cnt hitting a particular count. 'start_the_pipeline' happens to fire one clock cycle prior to the data from the SRAM actually arriving back at JOP (from the previously stated and possibly incorrect assumption). My Avalon equivalent cheat to the sort of SimpCon cheating about having apriori knowledge about when the race completes is simply the following start_the_pipeline <= Jop_Master_Read and not(JOP_Master_Wait_Request); To reiterate, this JOP master side equation is working under the assumption that the Avalon slave component that interfaces to the actual SRAM is designed to assert it's readdatavalid output one clock cycle after dropping it's waitrequest output. So in some sense now I've endowed the Avalon JOP with the same sort of apriori knowledge of when the data is available that the SimpCon implementation is getting. And here is another point where I think we need to stop and flat out agree or not agree that - My stated assumption that if Avalon was to 'somehow' pr
From: KJ on 25 Aug 2006 07:31 > > Let's say the address/command phase is per definition one cycle. > > That definition frees the master to do whatever it wants in the next > cycle. For another request to the same slave it has to watch for the > rdy_cnt in SimpCon. However, you can design a switch fabric with > SimpCon where it is legal to issue a command to a different slave in > the next cycle without attention to the first slave. You can just > ignore the first slaves output until you want to use it. In Avalon this would happen as well. By your definition, the Avalon slave (if it needed more than one clock cycle to totally complete the operation) would have to store away the address and command. It would not assert waitrequest on the first access. If the subsequent access to that slave occurred while the first was still going on it would then assert wait request but accesses to other slaves would not be hindered. The Avalon approach does not put this sort of stuff in the switch fabric but inside the slave design itself. In fact, the slave could queue up as many commands as needed (i.e. not just one) but I don't get the impression that SimpCon would allow this because there is one rdy_cnt per slave (I'm guessing). >> The Avalon fabric 'almost' passes the waitrequest signal right back to >> the >> master device, the only change being that the Avalon logic basically >> gates >> the slave's waitrequest output with the slave's chipselect input (which >> the >> Avalon fabric creates) to form the master's waitrequest input (assuming a >> simple single master/slave connection for simplicity here). Per Avalon, > > I'm repeating myself ;-) That's the point I don't like in Avalon, > Wishbone, OPB,...: You have a combinatorial path from address > register - decoding - slave decision - master decision (to hold > address/command or not). With a few slaves this will not be an > issue. With more slaves or a more complicated interconnect (multiple > master) this can be your critical path. You're right, in fact it most likely will be the critical path. Does SimpCon support different delays from different slaves? If not and 'everyone' is required to have the same number of wait states than I can see where SimpCon would have a performance advantage in terms of final clock speed on the FPGA, the tradeoff being that...everyone MUST have the same number of wait states. Whether that is a good or bad tradeoff is a design decision specific to a particular design so in that regard it's good to have both SimpCon (as I limitedly understand it) and Avalon. If SimpCon does allow for different slaves to have different delays than I don't see how SimpCon would be any better since there would still need to be address decoding done to figure out what the rdy_cnt needs to count to and such. Whether that code lives in the master side logic or slave side logic is irrelevant to the synthesis enging. >> how it appears to me, which is why I asked him to walk me through the > As described in the other posting: Yep, go that posting for the blow by blow description. KJ
From: KJ on 25 Aug 2006 09:04
Martin, A bit of an ammendment to my previous post starting... KJ wrote: > Martin, > > Thanks for the detailed response. OK, we're definitely in the home stretch > on this one. After pondering a bit more, I believe the Avalon slave component to the SRAM should NOT have a one clock cycle delay between waitrequest de-asserted and readdatavalid asserted since that obviously would stall the Avalon master (JOP) needlessly. Instead the slave component should simply assert waitrequest when a request comes in while it is still busy processing an earlier one. Something along the lines of... process(Clock) begin if rising_edge(Clock) then if (Reset = '1') or (Count = MAX_COUNT) then Wait_Request <= '0'; elsif (Chip_Select = '1') then Wait_Request <= '1'; end if; end if; end process; where 'Count' and MAX_COUNT are used to count however many cycles it takes for the SRAM data to come back/or be written. If the SRAM only needs one clock cycle then the term "(Count = MAX_COUNT)" could be replaced with simply "Wait_Request = '1'" So now back on the Avalon master side, I can still count on the Avalon waitrequest to precede readdatavalid but now I've removed the guarantee that the slave will make the delay between the two to be exactly one clock cycle. To compensate, I still would key off when the JOP Avalon master read signal is asserted and waitrequest is not asserted. In other words the basic logic of my 'start_the_pipeline' signal is OK, but depending on what the actual latency is for the design, maybe it needs to be delayed by a clock cycle or so. In any case, that signal will still provide an 'early' form of the Avalon readdatavalid signal and I think all of my points on that previous post would still apply. Hopefully you've read this post before you got too far into typing a reply to that post. After yet more pondering on whether this is 'cheating' on the Avalon side or not I think perhaps it's not. The 'questionable' logic is in the generation of the 'start_the_pipeline' signal that keys off of waitrequest and uses it to produce this 'early data valid' signal. But this logic is simply a part of what I would consider to be a SimpCon to Avalon bridge. As such, that bridge is privy to whatever signals and apriori knowledge that the SimpCon bus specification provides as well as whatever signals and apriori knowledge that the Avalon bus specification provides and has the task of mating the two. If SimpCon needs an 'early data valid' signal as part of the interface then it also needs to pony up to providing whatever info that the SimpCon master has in regards to being able to know ahead of time when that data will be valid...in other words, it would need to know the same thing that you used to generate your rdy_cnt or busy_cnt whatever it was called. So I've basically concluded that while it might appear on the surface to be a 'cheat' to use the Avalon signals as I have, you can only say that if you're looking strictly at Avalon alone. But since the function being implemented is a bridge between SimpCon and Avalon, use of SimpCon information to implement that function is fair game and not a 'cheat'. KJ |