From: Walter Banks on 23 Aug 2006 09:02 > Jim Granville wrote: > > > The tiniest CPUs do not need a stack, and interupts do not need to be > > re-entrant, so a faster context switch is to re-map the Registers, Flags > > (and even PC ? ) onto a different area in BRAM. > > You can share this resource by INTs re-map top-down, and calls re-map > > bottom up - with a hardware trap when they collide :) > > Once you get into seeing clearly the relationship between features and > cost a lot can be removed. > > Interrupts can be removed at extremely low cost to applications. Both the > Microchip PIC12 and Freescale RS08 do not have interrupts. In the > RS08 C compiler we developed some software IP to where possible > go into a power down mode and launch execution threads that compiled as > execution to completion. > > The threads are typically short and a as a side effect run to completion > makes local re-use easy > > C compilers implemented for small processors work well with out either > a data or subroutine return stack. Two of the processors we have written > compilers for in the last couple years both used an addressable return > register. Flow control analysis in the compiler make nested subroutines > user transparent. > > The instruction set reduction in the RS08 from the S08 parent had a > 4-6% impact on application performance. > > Walter..
From: Martin Schoeberl on 23 Aug 2006 12:35 >> What do you mean with 'very close to the hardware'? I try to >> avoid vendor specific library elements as much as possible and >> stay with plain VHDL. If you mean that the VHDL coding style >> is more hardware oriented, than I agree. > > Yes, this was what I mean, e.g. figures 5.6 to 5.9 of your thesis, where > you describe the processor pipeline with gates and which is implemented > like this in VHDL. But maybe this is the normal case and I'm just to new to > VHDL to write and interconnect components in this way. > > http://www.jopdesign.com/thesis/thesis.pdf nice that you read it ;-) > >> I started directly >> in an FPGA implementation and did almost no simulation. > > Why not? When I was implementing my CRC32 check for my network core, I've > tested the algorithm with a VHDL testbench (ethernet packet send and > receive works at 10 Mbit and 100 Mbit on my Spartan 3E starter kit now). > The turnaround times are faster with simulation and it is very easy to > debug it, instead of debugging a synthesized core in hardware. The same was > true for my DS2432 ROM id reader, where I've written the testbench, first > and then implemented the reader. > http://www.frank-buss.de/vhdl/spartan3e.html Ok, the main reason for not using simulation was just because I had no ModelSim and the Quartus simulator was a pain (actually I started with MaxPlus II). However, I wrote my own kind of debugging device using the printer port on the PC. Clocked the design with the printer port and read back the interesting signals with a small state machine. Kind of creasy ;-) Now, a lot has changed. E.g. ModelSim for Xilinx is free. So there is now a testbench for JOP available that you can use with ModelSim XE. For all FPGA specific parts (on-chip memories) I wrote plain VHDL models. So you can now debug with ModelSim XE and compile for Altera.... And I agree, simulation can save you a lot of time (and sometimes waste a lot of time - I still like to look on the code till I find the issue). Martin
From: Jim Granville on 23 Aug 2006 16:02 Walter Banks wrote: > > Jim Granville wrote: > > >>The tiniest CPUs do not need a stack, and interupts do not need to be >>re-entrant, so a faster context switch is to re-map the Registers, Flags >>(and even PC ? ) onto a different area in BRAM. >>You can share this resource by INTs re-map top-down, and calls re-map >>bottom up - with a hardware trap when they collide :) > > > Once you get into seeing clearly the relationship between features and > cost a lot can be removed. > > Interrupts can be removed at extremely low cost to applications. Both the > Microchip PIC12 and Freescale RS08 do not have interrupts. In the > RS08 C compiler we developed some software IP to where possible > go into a power down mode and launch execution threads that compiled as > execution to completion. > > The threads are typically short and a as a side effect run to completion > makes local re-use easy > > C compilers implemented for small processors work well with out either > a data or subroutine return stack. Two of the processors we have written > compilers for in the last couple years both used an assessable return > register. Flow control analysis in the compiler make nested subroutines > user transparent. > > The instruction set reduction in the RS08 from the S08 parent had a > 4-6% impact on application performance. > > Walter.. Hi Walter, Have you ever thought about doing a Compiler+FPGA_CPU (+Sim+Debug?) bundle ? -jg
From: PeteS on 23 Aug 2006 17:50 Frank Buss wrote: > PeteS wrote: > > > Do you want a processor you can simply instantiate, or are you willing > > to tweak so you get the features you want? If so, you could take one of > > the less ambitious cores and adjust the instruction set to optimise it > > for your application. > > Adjusting the instruction set to the problem domain is a good idea. I'll > try to write the functions, first, maybe using domain specific instructions > (like a block copy command), and then I'll implement the core for it. > > -- > Frank Buss, fb(a)frank-buss.de > http://www.frank-buss.de, http://www.it4-systems.de I did exactly this in a previous job. Picoblaze was nice, but there were things it did not have, and conversely things I would never use. So I did the code (pseudocode first) and then designed the device to do the necessary functions at the microcode level. Because my problem domain was very constrained, I needed only 16 instructions (I like it when I get nice numbers like that as a solution) to do what I needed. Then I wrote (well, I changed :) an assembler to program it. Worked very well, and took about half the space of a picoblaze, including a DMAC engine (excluding the memory interface which was there anyway). Cheers PeteS
From: jacko on 23 Aug 2006 22:25
PeteS wrote: > Frank Buss wrote: > > PeteS wrote: > > > > > Do you want a processor you can simply instantiate, or are you willing > > > to tweak so you get the features you want? If so, you could take one of > > > the less ambitious cores and adjust the instruction set to optimise it > > > for your application. > > > > Adjusting the instruction set to the problem domain is a good idea. I'll > > try to write the functions, first, maybe using domain specific instructions > > (like a block copy command), and then I'll implement the core for it. > > > > -- > > Frank Buss, fb(a)frank-buss.de > > http://www.frank-buss.de, http://www.it4-systems.de > > I did exactly this in a previous job. Picoblaze was nice, but there > were things it did not have, and conversely things I would never use. > > So I did the code (pseudocode first) and then designed the device to do > the necessary functions at the microcode level. Because my problem > domain was very constrained, I needed only 16 instructions (I like it > when I get nice numbers like that as a solution) to do what I needed. > > Then I wrote (well, I changed :) an assembler to program it. > > Worked very well, and took about half the space of a picoblaze, > including a DMAC engine (excluding the memory interface which was there > anyway). > > Cheers > > PeteS AHDL for a two register NOP, INC, DEC, WRITE unit http://indi.joox.net link to quartus II files, BIREGU.bdf good for interruptable stack pointers |