FPGA-based hardware accelerator for PC [FPGA]

Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?

From: JJ on 8 May 2006 14:25

Andreas Ehliar wrote:
> On 2006-05-07, JJ <johnjakson(a)gmail.com> wrote:
> > I would say that if we were to see PCIe on chip, even if on a higher $
> > part, we would quickly see alot more co pro board activity even just
> > plain vanilla PC boards.
>
> You might be interested in knowing that Lattice is doing just that in
> some of their LatticeSC parts. On the other hand, you are somewhat
> limited in the kinds of application you are going to accelerate since
> LatticeSC do not have embedded multipliers IIRC. (Lattice are
> targetting communication solutions such as line cards that rarely needs
> high performance multiplication in LatticeSC.)
>
> /Andreas

Yeh, I have been following Lattice more closely recently, will take me
some time to evaluate their specs more fully, may get more interested
if they have a free use tool chain I can redo my work with.

Does anyone have PCIe on chip though?

John Jakson
transputer guy

From: JJ on 8 May 2006 15:21

Piotr Wyderski wrote:
> Andreas Ehliar wrote:
>
> > One interesting application for most of the people on this
> > newsgroup would be synthesis, place & route and HDL simulation.
> > My guess would be that these applications could be heavily
> > accelerated by FPGA:s.
>
> A car is not the best tool to make another cars.
> It's not a bees & butterflies story. :-) Same with FPGAs.
>

Well xyz auto workers do eat their own usually subsidised by the
employer.

I disagree, in a situation where FPGAs develop relatively slowly and
P/R jobs take many hours, there would be a good opportunity to use
FPGAs for just such a job. But then again FPGAs and the software is
evolving too fast and P/R jobs in my case have gone from 8-30hrs a few
years ago to a few minutes today so the incentive has gone.

If I was paying $250K like the ASIC guys do for this and that, a
hardware coprocessor might look quite cheap and the EDA software is
much more independant of the foundries. DAC usually has a few hardware
copro vendors most of them based on FPGAs. At one time some of those
were even done in full custom silicon, that was really eating your own.

> > My second guess that it is far from trivial to actually do this :)
>
> And who actually would need that?
>

I would be rather amazed if in a few years my 8 core Ulteron x86 chip
was still running EDA tools on 1 core.

> Best regards
> Piotr Wyderski

John Jakson
transputer guy

From: Phil Tomson on 8 May 2006 20:09

In article <e3mq62$k21$2(a)news.lysator.liu.se>,
Andreas Ehliar <ehliar(a)lysator.liu.se> wrote:
>On 2006-05-06, Piotr Wyderski
><wyderski(a)mothers.against.spam-ii.uni.wroc.pl> wrote:
>> What could it accelerate? Modern PCs are quite fast beasts...
>> If you couldn't speed things up by a factor of, say, 300%, your
>> device would be useless. Modest improvements by several tens
>> of percents can be neglected -- Moore's law constantly works
>> for you. FPGAs are good for special-purpose tasks, but there
>> are not many such tasks in the realm of PCs.
>
>One interesting application for most of the people on this
>newsgroup would be synthesis, place & route and HDL simulation.
>My guess would be that these applications could be heavily
>accelerated by FPGA:s. My second guess that it is far from trivial
>to actually do this :)
>

Certainly on the simulation side of things various companies like Ikos
(are they still around?) have been doing stuff like this for years.

To some extent this is what ChipScope and Synplicity's Identify are doing
only using more of a logic analyzer metaphor. Breakpoints are set and
triggered through JTAG.

As far as synthesis itself and P&R I would think that these could be
acellerated in a highly parallel architecture like an FPGA.

There are lots of algorithms that could be sped up in an FPGA - someone
earlier in the thread said that the set of algorithms that could benefit
from the parallelism available in FPGAs was small, bit I suspect it's
actually quite large.

Phil

From: Phil Tomson on 8 May 2006 20:12

In article <e3nvv7$h30$1(a)atlantis.news.tpi.pl>,
Piotr Wyderski <wyderskiREMOVE(a)ii.uni.wroc.pl> wrote:
>Andreas Ehliar wrote:
>
>> One interesting application for most of the people on this
>> newsgroup would be synthesis, place & route and HDL simulation.
>> My guess would be that these applications could be heavily
>> accelerated by FPGA:s.
>
>A car is not the best tool to make another cars.
>It's not a bees & butterflies story. :-) Same with FPGAs.

Err... well, cars aren't exactly reprogrammable for many different
purposes, though, are they?

>
>> My second guess that it is far from trivial to actually do this :)
>
>And who actually would need that?
>

Possibly you? What if we could decrease your wait for P&R from hours to
minutes? I suspect you'd find that interesting, no?

Phil

From: Phil Tomson on 8 May 2006 20:37

In article <1146975146.177800.163180(a)g10g2000cwb.googlegroups.com>,
JJ <johnjakson(a)gmail.com> wrote:
>
>Jeremy Ralph wrote:
>> If one wanted to develop an FPGA-based hardware accelerator that could
>> attach to the average PC to process data stored in PC memory what
>> options are there available.
>>
>> Decision factors are:
>> + ease of use (dev kit, user's guide, examples)
>> + ability to move data with minimal load on the host PC
>> + cost
>> + scalability (i.e. ability to upsize RAM and FPGA gates)
>> + ability to instantiate a 32 bit RISC (or equiv)
>>
>> Someone recommended the TI & Altera Cyclone II PCIe dev board, which is
>> said to be available soon. Any other recommendations?
>>
>> Also, what is the best way to move data between PC mem to FPGA? DMA?
>> What transfer rates should one realistically expect?
>>
>> Thanks,
>> Jeremy
>
>FPGAs and standard cpus are bit like oil & water, don't mix very well,
>very parallel or very sequential.

Actually, that's what could make it the perfect marriage.

General purpose CPUs for the things they're good at like data IO,
displaying information, etc. FPGAs for applications where parallelism is
key.

I think the big problem right now is conceptual: we've been living in a
serial, Von Neumann world for so long we don't know how to make effective
use of parallelism in writng code - we have a hard time picturing it.
Read some software engineering blogs:
with the advent of things like multi-core processors, the Cell, etc. (and
most of them are blissfully unaware of the existence of FPGAs) they're
starting to wonder about how they are going to be able to model their
problems to take advantage of that kind of paralellism. They're looking
for new abstractions (remember, software engineering [and even hardware
engineering these days] is all about creating and managing abstractions).
They're looking for and creating new languages (Erlang is often mentioned
in these sorts of conversations). Funny thing is that it's the hardware
engineers who hold part of the key: HDLs are very good at modelling
parallelism and dataflow. Of course HDLs as they are now would be pretty
crappy for building software, but it's pretty easy to see that some of the
ideas inherant in HDLs could be usefully borrowed by software engineers.

>
>What exactly does your PC workload include.
>
>Most PCs that are fast enough to run Windows and the web software like
>Flash are idle what 99% of the time, and even under normal use still
>idle 90% of the time, maybe 50% idle while playing DVDs.
>
>Even if you have compute jobs like encoding video, it is now close
>enough to real time or a couple of PCs can be tied together to get it
>done.
>
>Even if FPGAs were infinitely fast and cheap, they still don't have a
>way to get to the data unless you bring it to them directly, in a PC
>accelerator form, they are bandwidth starved compared to the cache &
>memory bandwidth the PC cpu has.

Well, there's that DRC computing product that puts a big FPGA in one slot
of a dual opteron motherboard passing data between the Opteron and FPGA
at very high speed via the hypertransport bus. It seems like the perfect
combination. The transfer speeds are high enough to enable lots of types
of FPGA accelerator applications that wouldn't have been practical before.

>
>There have been several DIMM based modules, one even funded by Xilinx
>VC a few years back, I suspect Xilinx probably scraped up the remains
>and any patents?
>
>That PCI bus is way to slow to be of much use except for problems that
>do a lot of compute on relatively little data, but then you could use
>distributed computing instead. PCIe will be better but then again you
>have to deal with new PCIe interfaces or using a bridge chip if you are
>building one.

Certainly there are classes of problems which require very little data
transfer between FPGA and CPU that could work acceptably even in a PCI
environment.

>
>And that leaves the potential of HT connections for multi socket (940 &
>other) Opteron systems as a promising route, lots of bandwidth to the
>caches, probably some patent walls already, but in reality, very few
>users have multi socket server boards.
>
>It is best to limit the scope of use of FPGAs to what they are actually
>good at and therefore economical to use, that means bringing the
>problem right to the pins, real time continuous video, radar, imaging,
>audio, packet, signal processing, whatever with some logging to a PC.
>
>If a processor can be in the FPGA, then you can have much more
>throughput to that since it is in the fabric rather than if you go
>through an external skinny pipe to a relatively infinitely faster
>serial cpu. Further, if your application is parallel, the you can
>possibly replicate blocks each with a specialized processor possibly
>with custom instructions or coprocessor till you run out of fabric or
>FPGAs. Eventually though input & output will become limiting factors
>again, do you have acquisition of live signals and or results that need
>to be saved.
>

One wonders how different history might be now if instead of the serial
Von Neumann architectures (that are now ubiquitious) we would have instead
started out with say, cellular automata-like architectures. CAs
are one computing architecture that are perfectly suited for the
parallelism of FPGAs. (there are others like neural nets and their
derivatives). Our thinking is limited by our 'legos', is it not?
If all you know is a general purpose serial CPU then everything starts
looking very serial.

(if I recall correctly, before he died Von Neumann himself was looking
into things like CAs and NNs because he wanted more of a parallel architecture)

There are classes of biologicially inspired algorithms like GAs, ant
colony optimization, particle swarm optimization, etc. which could greatly
benefit from being mapped into FPGAs.

Phil

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?