Inferring mutipliers [FPGA]

Prev: data2mem
Next: Booting Linux from my own bootloader

From: dgreig on 27 Apr 2010 06:41

On Apr 26, 9:42 pm, "Pete Fraser" <pfra...(a)covad.net> wrote:
> "dgreig" <dgr...(a)ieee.org> wrote in message
>
> news:8c1c9a60-fef0-4aa2-aa00-54761db33af0(a)s41g2000vba.googlegroups.com...
>
> > Perhaps suppose it is image sensor data and feature detection DSP.
> > Data is naturally unsigned and the other operand mostly signed. In
> > this case say 9*9 and 2D DSP. Losing a bit off the image data is
> > certainly undesirable and the alternative cosly or limits function if
> > 18*18 multipliers have to be inferred as a kludge.
>
> One trick I've used is to convert the unsigned data to signed
> at the filter input, then back to unsigned on the output.
> Most of the video filters I built in the late 70s worked that way.
> Once you've wrapped your brain round mid-grey being 0, it's
> easy to deal with.

Unfortunataly unsigned to signed requires zero padding, adding the
extra bit inferres a 18*18 block rather than 9*9. In the case of 18
bit inputs the unsigned to signed requires one more bit than the block
actually has.
Going the other way does (Altera Cyclone 3 & Quartus) not make use of
the dsp block IO registers but at least the multipliers are used.

Example 1 : This ends as as 60 extra logic cells + a 9*9 multiplier,
not pretty and certainly not desirous
--
============================================================================================================================================================--
-- COPYRIGHT (c) 2010 DAVID GREIG. This source file is the
property of David Greig. This work must not be copied without
permission from David Greig. --
-- Any copy or derivative of this
source file must include this copyright
statement. --
----------------------------------------------------------------------------------------------------------------------------------------------------------------
-- File : SyUS_Mult.vhd
-- Author : David Greig (email : dgreig(a)ieee.org)
-- Revision :
-- Description : signed input data multiplier with clken output reg
------------------------------------------------------------------------------------------------------------------------
-- Notes : 2 clock cycle delay
-- : arstn and clk2xen must meet required setup and hold
requirements of the register
--
============================================================================================================================================================--
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
--
============================================================================================================================================================--
entity SyUS_Mult is
generic(
Gdwidth : natural;
Gcoeff_width : natural;
Gmult_pref : string
);
port(
arstn : in std_logic;
clk2x : in std_logic;
clk2xen : in std_logic;

da_i : in std_logic_vector(Gdwidth - 1 downto 0);
coeff_i : in std_logic_vector(Gcoeff_width - 1 downto 0);

q_o : out std_logic_vector(Gdwidth + Gcoeff_width - 1 downto 0)
);
end entity SyUS_Mult;
--
============================================================================================================================================================--

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
architecture rtl of SyUS_Mult is
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
attribute multstyle : string; -- Implementation style, "logic" "dsp"
------------------------------------------------------------------------------------------------------------------------
signal da_r : unsigned(Gdwidth - 1 downto 0);
signal coeff_r : unsigned(Gcoeff_width - 1 downto 0);
signal product_s : signed(Gdwidth + Gcoeff_width - 1 downto 0);
signal product_r : signed(Gdwidth + Gcoeff_width - 1 downto 0);
attribute multstyle of product_r : signal is Gmult_pref;
------------------------------------------------------------------------------------------------------------------------
begin
------------------------------------------------------------------------------------------------------------------------
prcs_SyUS_Mult : process(arstn, clk2xen, clk2x)
begin
if (arstn = '0') then
da_r <= (others => '0');
coeff_r <= (others => '0');

product_r <= (others => '0');
elsif (clk2xen = '0') then
null;
elsif rising_edge(clk2x) then
da_r <= unsigned(da_i);
coeff_r <= unsigned(coeff_i);

product_r <= product_s;
end if;
end process prcs_SyUS_Mult;
----------------------------------------
product_s <= -(signed(da_r*unsigned(abs(signed(coeff_r))))) when
(coeff_r(coeff_r'high) = '1') else
signed(da_r*unsigned(abs(signed(coeff_r))));
------------------------------------------------------------------------------------------------------------------------
q_o <= std_logic_vector(product_r);
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
end architecture rtl;
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--

Example 2 : This is not transportable, but does work for brand A.
--
============================================================================================================================================================--
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
--
============================================================================================================================================================--
entity SyUS_Mult is
generic(
Gdwidth : natural;
Gcoeff_width : natural;
Gmult_pref : string
);
port(
arstn : in std_logic;
clk2x : in std_logic;
clk2xen : in std_logic;

da_i : in std_logic_vector(Gdwidth - 1 downto 0);
coeff_i : in std_logic_vector(Gcoeff_width - 1 downto 0);

q_o : out std_logic_vector(Gdwidth + Gcoeff_width - 1 downto 0)
);
end entity SyUS_Mult;
--
============================================================================================================================================================--

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
architecture rtl of SyUS_Mult is
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
attribute multstyle : string; -- Implementation style, "logic" "dsp"
------------------------------------------------------------------------------------------------------------------------
signal aclr_s : std_logic;

component altmult_accum
generic (
accum_direction : string;
accum_sload_reg : string;
addnsub_aclr : string;
addnsub_pipeline_reg : string;
addnsub_reg : string;
dedicated_multiplier_circuitry : string;
input_aclr_a : string;
input_aclr_b : string;
input_reg_a : string;
input_reg_b : string;
input_source_a : string;
input_source_b : string;
intended_device_family : string;
lpm_type : string;
multiplier_reg : string;
output_aclr : string;
output_reg : string;
port_addnsub : string;
port_signa : string;
port_signb : string;
representation_a : string;
representation_b : string;
sign_aclr_a : string;
sign_aclr_b : string;
sign_pipeline_reg_a : string;
sign_pipeline_reg_b : string;
sign_reg_a : string;
sign_reg_b : string;
width_a : natural;
width_b : natural;
width_result : natural
);
port (
dataa : in std_logic_vector (width_a - 1 downto 0);
datab : in std_logic_vector (width_b - 1 downto 0);
accum_sload : in std_logic ;
aclr0 : in std_logic ;
clock0 : in std_logic ;
ena0 : in std_logic ;
result : out std_logic_vector (width_result - 1 downto 0)
);
end component;
------------------------------------------------------------------------------------------------------------------------
begin
------------------------------------------------------------------------------------------------------------------------
aclr_s <= not(arstn);

i_altmult_accum : altmult_accum
generic map (
accum_direction => "add",
accum_sload_reg => "unregistered",
addnsub_aclr => "aclr0",
addnsub_pipeline_reg => "unregistered",
addnsub_reg => "clock0",
dedicated_multiplier_circuitry => "AUTO",
input_aclr_a => "aclr0",
input_aclr_b => "aclr0",
input_reg_a => "clock0",
input_reg_b => "clock0",
input_source_a => "dataa",
input_source_b => "datab",
intended_device_family => "cyclone iii",
lpm_type => "altmult_accum",
multiplier_reg => "unregistered",
output_aclr => "aclr0",
output_reg => "clock0",
port_addnsub => "port_unused",
port_signa => "port_unused",
port_signb => "port_unused",
representation_a => "unsigned",
representation_b => "signed",
sign_aclr_a => "aclr0",
sign_aclr_b => "aclr0",
sign_pipeline_reg_a => "unregistered",
sign_pipeline_reg_b => "unregistered",
sign_reg_a => "clock0",
sign_reg_b => "clock0",
width_a => Gdwidth,
width_b => Gcoeff_width,
width_result => Gdwidth + Gcoeff_width
)
port map (
dataa => da_i,
datab => coeff_i,
accum_sload => '1',
aclr0 => aclr_s,
clock0 => clk2x,
ena0 => clk2xen,
result => q_o
);
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
end architecture rtl;
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--

From: Pete Fraser on 27 Apr 2010 08:03

"dgreig" <dgreig(a)ieee.org> wrote in message
news:0ea2c276-604b-4619-baed-9c28551d93a1(a)r18g2000yqd.googlegroups.com...

> Unfortunataly unsigned to signed requires zero padding,

Perhaps for multiplier inference but, if you're building the
hardware by hand you don't need the extra bit.
You can just invert the msb, and consider the middle of
your range to be "000000000".

Pete

From: dgreig on 27 Apr 2010 11:21

On Apr 27, 1:03 pm, "Pete Fraser" <pfra...(a)covad.net> wrote:
> "dgreig" <dgr...(a)ieee.org> wrote in message
>
> news:0ea2c276-604b-4619-baed-9c28551d93a1(a)r18g2000yqd.googlegroups.com...
>
> > Unfortunataly unsigned to signed requires zero padding,
>
> Perhaps for multiplier inference but, if you're building the
> hardware by hand you don't need the extra bit.
> You can just invert the msb, and consider the middle of
> your range to be "000000000".
>
> Pete

Pete

Now I follow what you mean - offset binary. Could be done, but would
make verification very painful.
With numeric_std the number system is two's complement rather than
one's complement so inverting the MSB will not work.
I am doing a lot of differing large matrix operations (a 21*21 matrix
multiplier as one of the datapath operators) and detecting features as
sets of multiple peaks alomg rows and colums. Coefficients are
calculated adaptively in both the FPGA and a DSP so I have a distinct
preference to stay with two's. In total about 80Gmults/sec and about
the number of additions/subtractions, all on a Cyclone 3! Asymetric 2D
wavelet image processing for feature detection and measurement.

From: Pete Fraser on 27 Apr 2010 13:00

"dgreig" <dgreig(a)ieee.org> wrote in message
news:07426f9a-724b-42df-b940-1feaa545bff0(a)j21g2000yqh.googlegroups.com...
On Apr 27, 1:03 pm, "Pete Fraser" <pfra...(a)covad.net> wrote:

>> You can just invert the msb, and consider the middle of
>> your range to be "000000000".

> Now I follow what you mean - offset binary. Could be done, but would
> make verification very painful.

Perhaps, but not as painful as you think.

> With numeric_std the number system is two's complement rather than
> one's complement so inverting the MSB will not work.

You invert the msb, and regard it as a 2s complement number
(-2^(n-1) -> 2^(n-1) -1; ony one zero)
This has the advantage of allowing you symmetrical overload in
your ladder, for essentially unipolar signals.

I used to build large, time variant 1-D and 2-D filters for
processing RGB and Y. This was back when the MPY8HJ
first came out (1978?).

Pete

From: Jonathan Bromley on 27 Apr 2010 18:05

On Tue, 27 Apr 2010 03:41:01 -0700 (PDT), dgreig wrote:

>Unfortunataly unsigned to signed requires zero padding, adding the
>extra bit inferres a 18*18 block rather than 9*9. In the case of 18
>bit inputs the unsigned to signed requires one more bit than the block
>actually has.

What about Kolja Sulimma's suggestion of a conditional adder
after a 17x18 multiply? This is only a sketch, but shows
that it is quite neat both in VHDL code and in hardware:

subtype S36 is signed(35 downto 0);

function U18xS18 (
U: unsigned(17 downto 0),
S: signed(17 downto 0)
) return S36 is
variable product: S36;
begin
product := signed'(U) * S;
if (U(17) = '1') then
product(35 downto 18) :=
product(35 downto 18) + signed'(U);
end if;
return product;
end;

Disclaimer: I haven't tried synthesising this, and I suspect you
may need to play with the code some more to get the best
synthesis results.
--
Jonathan Bromley

First | Prev | Next | Last
Pages: 1 2 3
Prev: data2mem
Next: Booting Linux from my own bootloader