FPGA-based hardware accelerator for PC [FPGA]

Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?

From: JJ on 7 May 2006 00:12

Jeremy Ralph wrote:
> If one wanted to develop an FPGA-based hardware accelerator that could
> attach to the average PC to process data stored in PC memory what
> options are there available.
>
> Decision factors are:
> + ease of use (dev kit, user's guide, examples)
> + ability to move data with minimal load on the host PC
> + cost
> + scalability (i.e. ability to upsize RAM and FPGA gates)
> + ability to instantiate a 32 bit RISC (or equiv)
>
> Someone recommended the TI & Altera Cyclone II PCIe dev board, which is
> said to be available soon. Any other recommendations?
>
> Also, what is the best way to move data between PC mem to FPGA? DMA?
> What transfer rates should one realistically expect?
>
> Thanks,
> Jeremy

FPGAs and standard cpus are bit like oil & water, don't mix very well,
very parallel or very sequential.

What exactly does your PC workload include.

Most PCs that are fast enough to run Windows and the web software like
Flash are idle what 99% of the time, and even under normal use still
idle 90% of the time, maybe 50% idle while playing DVDs.

Even if you have compute jobs like encoding video, it is now close
enough to real time or a couple of PCs can be tied together to get it
done.

Even if FPGAs were infinitely fast and cheap, they still don't have a
way to get to the data unless you bring it to them directly, in a PC
accelerator form, they are bandwidth starved compared to the cache &
memory bandwidth the PC cpu has.

There have been several DIMM based modules, one even funded by Xilinx
VC a few years back, I suspect Xilinx probably scraped up the remains
and any patents?

That PCI bus is way to slow to be of much use except for problems that
do a lot of compute on relatively little data, but then you could use
distributed computing instead. PCIe will be better but then again you
have to deal with new PCIe interfaces or using a bridge chip if you are
building one.

And that leaves the potential of HT connections for multi socket (940 &
other) Opteron systems as a promising route, lots of bandwidth to the
caches, probably some patent walls already, but in reality, very few
users have multi socket server boards.

It is best to limit the scope of use of FPGAs to what they are actually
good at and therefore economical to use, that means bringing the
problem right to the pins, real time continuous video, radar, imaging,
audio, packet, signal processing, whatever with some logging to a PC.

If a processor can be in the FPGA, then you can have much more
throughput to that since it is in the fabric rather than if you go
through an external skinny pipe to a relatively infinitely faster
serial cpu. Further, if your application is parallel, the you can
possibly replicate blocks each with a specialized processor possibly
with custom instructions or coprocessor till you run out of fabric or
FPGAs. Eventually though input & output will become limiting factors
again, do you have acquisition of live signals and or results that need
to be saved.

It really all depends on what you are processing and the rate it can be
managed.

John Jakson
transputer guy

From: Alif Wahid on 7 May 2006 01:10

Falk Brunner wrote:
> Jeremy Ralph schrieb:
>
>> If one wanted to develop an FPGA-based hardware accelerator that could
>> attach to the average PC to process data stored in PC memory what
>> options are there available.
>
> Nice idea, but to beat a nowady CPU (Pentium 4 and Athlon 64 etc.) and a
> nowadays GPU (Nvidi Gforce whatever-is-uptodate etc.) is hard to achive
> even with the big guys in the business. (yeah, yeah, special task can be
> optimized to run faster on FPGA based hardware, but to speed up "normal"
> PC tasks is difficult)
>
>> Decision factors are:
>> + ease of use (dev kit, user's guide, examples)
>> + ability to move data with minimal load on the host PC
>> + cost
>> + scalability (i.e. ability to upsize RAM and FPGA gates)
>> + ability to instantiate a 32 bit RISC (or equiv)
>
>> Someone recommended the TI & Altera Cyclone II PCIe dev board, which is
>> said to be available soon. Any other recommendations?
>
>> Also, what is the best way to move data between PC mem to FPGA? DMA?
>
> Sure.
>
>> What transfer rates should one realistically expect?
>
> PCI is 133 Mbyte/s max.

PCI-X 64-bits @ 133 MHz will give you around 1 GByte/s max in one
direction. Most high-end server mother-boards have PCI-X rather than PCI.

> AGP is 2 GByte/s max. (AFAIK)
> PCI-Express is nx250 Mbyte/s (with n up to 16)

Currently the maximum theoretical speed of PCI-Express is 2.5 Gbits/s
per lane per direction as specified in the standard. That immediately
drops to 2.0 Gbits/s per lane per direction due to 8b10b encoding. Then
of course in practice the smallish nature of Transaction Layer Packet
(TLP) sizes (i.e. the ratio of payload compared to header) cause
further reduction in the useful data throughput. In reality you're
looking at approximately 1.5 Gbits/s per lane per direction of real data
throughput. The big advantage with PCI-Express is the seamless
scalability and the point-to-point serial protocol. So a 16-lane
PCI-Express end point should give you 24 Gbits/s in each direction of
useful data throughput.

Regards,

Alif.

From: Alif Wahid on 7 May 2006 01:18

JJ wrote:
> Jeremy Ralph wrote:
>> If one wanted to develop an FPGA-based hardware accelerator that could
>> attach to the average PC to process data stored in PC memory what
>> options are there available.
>>
>> Decision factors are:
>> + ease of use (dev kit, user's guide, examples)
>> + ability to move data with minimal load on the host PC
>> + cost
>> + scalability (i.e. ability to upsize RAM and FPGA gates)
>> + ability to instantiate a 32 bit RISC (or equiv)
>>
>> Someone recommended the TI & Altera Cyclone II PCIe dev board, which is
>> said to be available soon. Any other recommendations?
>>
>> Also, what is the best way to move data between PC mem to FPGA? DMA?
>> What transfer rates should one realistically expect?
>>
>> Thanks,
>> Jeremy
>
> FPGAs and standard cpus are bit like oil & water, don't mix very well,
> very parallel or very sequential.
>
> What exactly does your PC workload include.
>
> Most PCs that are fast enough to run Windows and the web software like
> Flash are idle what 99% of the time, and even under normal use still
> idle 90% of the time, maybe 50% idle while playing DVDs.
>
> Even if you have compute jobs like encoding video, it is now close
> enough to real time or a couple of PCs can be tied together to get it
> done.
>
> Even if FPGAs were infinitely fast and cheap, they still don't have a
> way to get to the data unless you bring it to them directly, in a PC
> accelerator form, they are bandwidth starved compared to the cache &
> memory bandwidth the PC cpu has.
>
> There have been several DIMM based modules, one even funded by Xilinx
> VC a few years back, I suspect Xilinx probably scraped up the remains
> and any patents?
>
> That PCI bus is way to slow to be of much use except for problems that
> do a lot of compute on relatively little data, but then you could use
> distributed computing instead. PCIe will be better but then again you
> have to deal with new PCIe interfaces or using a bridge chip if you are
> building one.

What about PCIe IP cores? That may be a better option than bridge chips
since it keeps everything FPGA oriented. However, it does mean that the
FPGA must have gigabit speed serial transceivers built in and that
limits one's options a little bit.

Regards

Alif

From: JJ on 7 May 2006 01:54

I always hated that the PCI cores were so heavily priced compared to
the FPGA they might go into. The pricing seemed to reflect the value
they once added to ASICs some 10 or 15 years ago and not the potential
of really low cost low volume applications. A $100 FPGA in small vol
applications doesn't support $20K IP for a few $ worth of fabric it
uses. It might be a bargain compared to the cost of rolling your own
though, just as buying an FPGA is a real bargain compared to rolling my
own FPGA/ASIC too.

FPGAs need lots of I/O to be useful, so why not put the damn IP in hard
macro form and let everyone at it. And do the same again for PCIe (if
thats possible?). We see how much more usefull FPGAs are with all the
other stuff thats been added over the years, BlockRams, multipliers,
clocks etc, but its really the IO where the sh1t hits the fan first.

I would say that if we were to see PCIe on chip, even if on a higher $
part, we would quickly see alot more co pro board activity even just
plain vanilla PC boards. I wonder if there were multiple built in
narrow PCIe links whether they could be used to build node to node
links ala HT for FPGA arrays? .

Not that I really know much about PCIe yet.

John Jakson
transputer guy

From: Falk Brunner on 7 May 2006 07:01

Jeremy Ralph schrieb:
> Thanks Falk, for the numbers. Any reason why AGP couldn't be used
> for non-graphics streams?

Non that I know of.

Regards
Falk

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?