Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?
From: JJ on 7 May 2006 00:12 Jeremy Ralph wrote: > If one wanted to develop an FPGA-based hardware accelerator that could > attach to the average PC to process data stored in PC memory what > options are there available. > > Decision factors are: > + ease of use (dev kit, user's guide, examples) > + ability to move data with minimal load on the host PC > + cost > + scalability (i.e. ability to upsize RAM and FPGA gates) > + ability to instantiate a 32 bit RISC (or equiv) > > Someone recommended the TI & Altera Cyclone II PCIe dev board, which is > said to be available soon. Any other recommendations? > > Also, what is the best way to move data between PC mem to FPGA? DMA? > What transfer rates should one realistically expect? > > Thanks, > Jeremy FPGAs and standard cpus are bit like oil & water, don't mix very well, very parallel or very sequential. What exactly does your PC workload include. Most PCs that are fast enough to run Windows and the web software like Flash are idle what 99% of the time, and even under normal use still idle 90% of the time, maybe 50% idle while playing DVDs. Even if you have compute jobs like encoding video, it is now close enough to real time or a couple of PCs can be tied together to get it done. Even if FPGAs were infinitely fast and cheap, they still don't have a way to get to the data unless you bring it to them directly, in a PC accelerator form, they are bandwidth starved compared to the cache & memory bandwidth the PC cpu has. There have been several DIMM based modules, one even funded by Xilinx VC a few years back, I suspect Xilinx probably scraped up the remains and any patents? That PCI bus is way to slow to be of much use except for problems that do a lot of compute on relatively little data, but then you could use distributed computing instead. PCIe will be better but then again you have to deal with new PCIe interfaces or using a bridge chip if you are building one. And that leaves the potential of HT connections for multi socket (940 & other) Opteron systems as a promising route, lots of bandwidth to the caches, probably some patent walls already, but in reality, very few users have multi socket server boards. It is best to limit the scope of use of FPGAs to what they are actually good at and therefore economical to use, that means bringing the problem right to the pins, real time continuous video, radar, imaging, audio, packet, signal processing, whatever with some logging to a PC. If a processor can be in the FPGA, then you can have much more throughput to that since it is in the fabric rather than if you go through an external skinny pipe to a relatively infinitely faster serial cpu. Further, if your application is parallel, the you can possibly replicate blocks each with a specialized processor possibly with custom instructions or coprocessor till you run out of fabric or FPGAs. Eventually though input & output will become limiting factors again, do you have acquisition of live signals and or results that need to be saved. It really all depends on what you are processing and the rate it can be managed. John Jakson transputer guy
From: Alif Wahid on 7 May 2006 01:10 Falk Brunner wrote: > Jeremy Ralph schrieb: > >> If one wanted to develop an FPGA-based hardware accelerator that could >> attach to the average PC to process data stored in PC memory what >> options are there available. > > Nice idea, but to beat a nowady CPU (Pentium 4 and Athlon 64 etc.) and a > nowadays GPU (Nvidi Gforce whatever-is-uptodate etc.) is hard to achive > even with the big guys in the business. (yeah, yeah, special task can be > optimized to run faster on FPGA based hardware, but to speed up "normal" > PC tasks is difficult) > >> Decision factors are: >> + ease of use (dev kit, user's guide, examples) >> + ability to move data with minimal load on the host PC >> + cost >> + scalability (i.e. ability to upsize RAM and FPGA gates) >> + ability to instantiate a 32 bit RISC (or equiv) > >> Someone recommended the TI & Altera Cyclone II PCIe dev board, which is >> said to be available soon. Any other recommendations? > >> Also, what is the best way to move data between PC mem to FPGA? DMA? > > Sure. > >> What transfer rates should one realistically expect? > > PCI is 133 Mbyte/s max. PCI-X 64-bits @ 133 MHz will give you around 1 GByte/s max in one direction. Most high-end server mother-boards have PCI-X rather than PCI. > AGP is 2 GByte/s max. (AFAIK) > PCI-Express is nx250 Mbyte/s (with n up to 16) Currently the maximum theoretical speed of PCI-Express is 2.5 Gbits/s per lane per direction as specified in the standard. That immediately drops to 2.0 Gbits/s per lane per direction due to 8b10b encoding. Then of course in practice the smallish nature of Transaction Layer Packet (TLP) sizes (i.e. the ratio of payload compared to header) cause further reduction in the useful data throughput. In reality you're looking at approximately 1.5 Gbits/s per lane per direction of real data throughput. The big advantage with PCI-Express is the seamless scalability and the point-to-point serial protocol. So a 16-lane PCI-Express end point should give you 24 Gbits/s in each direction of useful data throughput. Regards, Alif.
From: Alif Wahid on 7 May 2006 01:18 JJ wrote: > Jeremy Ralph wrote: >> If one wanted to develop an FPGA-based hardware accelerator that could >> attach to the average PC to process data stored in PC memory what >> options are there available. >> >> Decision factors are: >> + ease of use (dev kit, user's guide, examples) >> + ability to move data with minimal load on the host PC >> + cost >> + scalability (i.e. ability to upsize RAM and FPGA gates) >> + ability to instantiate a 32 bit RISC (or equiv) >> >> Someone recommended the TI & Altera Cyclone II PCIe dev board, which is >> said to be available soon. Any other recommendations? >> >> Also, what is the best way to move data between PC mem to FPGA? DMA? >> What transfer rates should one realistically expect? >> >> Thanks, >> Jeremy > > FPGAs and standard cpus are bit like oil & water, don't mix very well, > very parallel or very sequential. > > What exactly does your PC workload include. > > Most PCs that are fast enough to run Windows and the web software like > Flash are idle what 99% of the time, and even under normal use still > idle 90% of the time, maybe 50% idle while playing DVDs. > > Even if you have compute jobs like encoding video, it is now close > enough to real time or a couple of PCs can be tied together to get it > done. > > Even if FPGAs were infinitely fast and cheap, they still don't have a > way to get to the data unless you bring it to them directly, in a PC > accelerator form, they are bandwidth starved compared to the cache & > memory bandwidth the PC cpu has. > > There have been several DIMM based modules, one even funded by Xilinx > VC a few years back, I suspect Xilinx probably scraped up the remains > and any patents? > > That PCI bus is way to slow to be of much use except for problems that > do a lot of compute on relatively little data, but then you could use > distributed computing instead. PCIe will be better but then again you > have to deal with new PCIe interfaces or using a bridge chip if you are > building one. What about PCIe IP cores? That may be a better option than bridge chips since it keeps everything FPGA oriented. However, it does mean that the FPGA must have gigabit speed serial transceivers built in and that limits one's options a little bit. Regards Alif
From: JJ on 7 May 2006 01:54 I always hated that the PCI cores were so heavily priced compared to the FPGA they might go into. The pricing seemed to reflect the value they once added to ASICs some 10 or 15 years ago and not the potential of really low cost low volume applications. A $100 FPGA in small vol applications doesn't support $20K IP for a few $ worth of fabric it uses. It might be a bargain compared to the cost of rolling your own though, just as buying an FPGA is a real bargain compared to rolling my own FPGA/ASIC too. FPGAs need lots of I/O to be useful, so why not put the damn IP in hard macro form and let everyone at it. And do the same again for PCIe (if thats possible?). We see how much more usefull FPGAs are with all the other stuff thats been added over the years, BlockRams, multipliers, clocks etc, but its really the IO where the sh1t hits the fan first. I would say that if we were to see PCIe on chip, even if on a higher $ part, we would quickly see alot more co pro board activity even just plain vanilla PC boards. I wonder if there were multiple built in narrow PCIe links whether they could be used to build node to node links ala HT for FPGA arrays? . Not that I really know much about PCIe yet. John Jakson transputer guy
From: Falk Brunner on 7 May 2006 07:01
Jeremy Ralph schrieb: > Thanks Falk, for the numbers. Any reason why AGP couldn't be used > for non-graphics streams? Non that I know of. Regards Falk |