Prev: data2mem
Next: Booting Linux from my own bootloader
From: dgreig on 27 Apr 2010 06:41 On Apr 26, 9:42 pm, "Pete Fraser" <pfra...(a)covad.net> wrote: > "dgreig" <dgr...(a)ieee.org> wrote in message > > news:8c1c9a60-fef0-4aa2-aa00-54761db33af0(a)s41g2000vba.googlegroups.com... > > > Perhaps suppose it is image sensor data and feature detection DSP. > > Data is naturally unsigned and the other operand mostly signed. In > > this case say 9*9 and 2D DSP. Losing a bit off the image data is > > certainly undesirable and the alternative cosly or limits function if > > 18*18 multipliers have to be inferred as a kludge. > > One trick I've used is to convert the unsigned data to signed > at the filter input, then back to unsigned on the output. > Most of the video filters I built in the late 70s worked that way. > Once you've wrapped your brain round mid-grey being 0, it's > easy to deal with. Unfortunataly unsigned to signed requires zero padding, adding the extra bit inferres a 18*18 block rather than 9*9. In the case of 18 bit inputs the unsigned to signed requires one more bit than the block actually has. Going the other way does (Altera Cyclone 3 & Quartus) not make use of the dsp block IO registers but at least the multipliers are used. Example 1 : This ends as as 60 extra logic cells + a 9*9 multiplier, not pretty and certainly not desirous -- ============================================================================================================================================================-- -- COPYRIGHT (c) 2010 DAVID GREIG. This source file is the property of David Greig. This work must not be copied without permission from David Greig. -- -- Any copy or derivative of this source file must include this copyright statement. -- ---------------------------------------------------------------------------------------------------------------------------------------------------------------- -- File : SyUS_Mult.vhd -- Author : David Greig (email : dgreig(a)ieee.org) -- Revision : -- Description : signed input data multiplier with clken output reg ------------------------------------------------------------------------------------------------------------------------ -- Notes : 2 clock cycle delay -- : arstn and clk2xen must meet required setup and hold requirements of the register -- ============================================================================================================================================================-- library IEEE; use IEEE.std_logic_1164.all; use IEEE.numeric_std.all; -- ============================================================================================================================================================-- entity SyUS_Mult is generic( Gdwidth : natural; Gcoeff_width : natural; Gmult_pref : string ); port( arstn : in std_logic; clk2x : in std_logic; clk2xen : in std_logic; da_i : in std_logic_vector(Gdwidth - 1 downto 0); coeff_i : in std_logic_vector(Gcoeff_width - 1 downto 0); q_o : out std_logic_vector(Gdwidth + Gcoeff_width - 1 downto 0) ); end entity SyUS_Mult; -- ============================================================================================================================================================-- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- architecture rtl of SyUS_Mult is -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- attribute multstyle : string; -- Implementation style, "logic" "dsp" ------------------------------------------------------------------------------------------------------------------------ signal da_r : unsigned(Gdwidth - 1 downto 0); signal coeff_r : unsigned(Gcoeff_width - 1 downto 0); signal product_s : signed(Gdwidth + Gcoeff_width - 1 downto 0); signal product_r : signed(Gdwidth + Gcoeff_width - 1 downto 0); attribute multstyle of product_r : signal is Gmult_pref; ------------------------------------------------------------------------------------------------------------------------ begin ------------------------------------------------------------------------------------------------------------------------ prcs_SyUS_Mult : process(arstn, clk2xen, clk2x) begin if (arstn = '0') then da_r <= (others => '0'); coeff_r <= (others => '0'); product_r <= (others => '0'); elsif (clk2xen = '0') then null; elsif rising_edge(clk2x) then da_r <= unsigned(da_i); coeff_r <= unsigned(coeff_i); product_r <= product_s; end if; end process prcs_SyUS_Mult; ---------------------------------------- product_s <= -(signed(da_r*unsigned(abs(signed(coeff_r))))) when (coeff_r(coeff_r'high) = '1') else signed(da_r*unsigned(abs(signed(coeff_r)))); ------------------------------------------------------------------------------------------------------------------------ q_o <= std_logic_vector(product_r); -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- end architecture rtl; -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- Example 2 : This is not transportable, but does work for brand A. -- ============================================================================================================================================================-- library IEEE; use IEEE.std_logic_1164.all; use IEEE.numeric_std.all; -- ============================================================================================================================================================-- entity SyUS_Mult is generic( Gdwidth : natural; Gcoeff_width : natural; Gmult_pref : string ); port( arstn : in std_logic; clk2x : in std_logic; clk2xen : in std_logic; da_i : in std_logic_vector(Gdwidth - 1 downto 0); coeff_i : in std_logic_vector(Gcoeff_width - 1 downto 0); q_o : out std_logic_vector(Gdwidth + Gcoeff_width - 1 downto 0) ); end entity SyUS_Mult; -- ============================================================================================================================================================-- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- architecture rtl of SyUS_Mult is -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- attribute multstyle : string; -- Implementation style, "logic" "dsp" ------------------------------------------------------------------------------------------------------------------------ signal aclr_s : std_logic; component altmult_accum generic ( accum_direction : string; accum_sload_reg : string; addnsub_aclr : string; addnsub_pipeline_reg : string; addnsub_reg : string; dedicated_multiplier_circuitry : string; input_aclr_a : string; input_aclr_b : string; input_reg_a : string; input_reg_b : string; input_source_a : string; input_source_b : string; intended_device_family : string; lpm_type : string; multiplier_reg : string; output_aclr : string; output_reg : string; port_addnsub : string; port_signa : string; port_signb : string; representation_a : string; representation_b : string; sign_aclr_a : string; sign_aclr_b : string; sign_pipeline_reg_a : string; sign_pipeline_reg_b : string; sign_reg_a : string; sign_reg_b : string; width_a : natural; width_b : natural; width_result : natural ); port ( dataa : in std_logic_vector (width_a - 1 downto 0); datab : in std_logic_vector (width_b - 1 downto 0); accum_sload : in std_logic ; aclr0 : in std_logic ; clock0 : in std_logic ; ena0 : in std_logic ; result : out std_logic_vector (width_result - 1 downto 0) ); end component; ------------------------------------------------------------------------------------------------------------------------ begin ------------------------------------------------------------------------------------------------------------------------ aclr_s <= not(arstn); i_altmult_accum : altmult_accum generic map ( accum_direction => "add", accum_sload_reg => "unregistered", addnsub_aclr => "aclr0", addnsub_pipeline_reg => "unregistered", addnsub_reg => "clock0", dedicated_multiplier_circuitry => "AUTO", input_aclr_a => "aclr0", input_aclr_b => "aclr0", input_reg_a => "clock0", input_reg_b => "clock0", input_source_a => "dataa", input_source_b => "datab", intended_device_family => "cyclone iii", lpm_type => "altmult_accum", multiplier_reg => "unregistered", output_aclr => "aclr0", output_reg => "clock0", port_addnsub => "port_unused", port_signa => "port_unused", port_signb => "port_unused", representation_a => "unsigned", representation_b => "signed", sign_aclr_a => "aclr0", sign_aclr_b => "aclr0", sign_pipeline_reg_a => "unregistered", sign_pipeline_reg_b => "unregistered", sign_reg_a => "clock0", sign_reg_b => "clock0", width_a => Gdwidth, width_b => Gcoeff_width, width_result => Gdwidth + Gcoeff_width ) port map ( dataa => da_i, datab => coeff_i, accum_sload => '1', aclr0 => aclr_s, clock0 => clk2x, ena0 => clk2xen, result => q_o ); -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- end architecture rtl; -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
From: Pete Fraser on 27 Apr 2010 08:03 "dgreig" <dgreig(a)ieee.org> wrote in message news:0ea2c276-604b-4619-baed-9c28551d93a1(a)r18g2000yqd.googlegroups.com... > Unfortunataly unsigned to signed requires zero padding, Perhaps for multiplier inference but, if you're building the hardware by hand you don't need the extra bit. You can just invert the msb, and consider the middle of your range to be "000000000". Pete
From: dgreig on 27 Apr 2010 11:21 On Apr 27, 1:03 pm, "Pete Fraser" <pfra...(a)covad.net> wrote: > "dgreig" <dgr...(a)ieee.org> wrote in message > > news:0ea2c276-604b-4619-baed-9c28551d93a1(a)r18g2000yqd.googlegroups.com... > > > Unfortunataly unsigned to signed requires zero padding, > > Perhaps for multiplier inference but, if you're building the > hardware by hand you don't need the extra bit. > You can just invert the msb, and consider the middle of > your range to be "000000000". > > Pete Pete Now I follow what you mean - offset binary. Could be done, but would make verification very painful. With numeric_std the number system is two's complement rather than one's complement so inverting the MSB will not work. I am doing a lot of differing large matrix operations (a 21*21 matrix multiplier as one of the datapath operators) and detecting features as sets of multiple peaks alomg rows and colums. Coefficients are calculated adaptively in both the FPGA and a DSP so I have a distinct preference to stay with two's. In total about 80Gmults/sec and about the number of additions/subtractions, all on a Cyclone 3! Asymetric 2D wavelet image processing for feature detection and measurement.
From: Pete Fraser on 27 Apr 2010 13:00 "dgreig" <dgreig(a)ieee.org> wrote in message news:07426f9a-724b-42df-b940-1feaa545bff0(a)j21g2000yqh.googlegroups.com... On Apr 27, 1:03 pm, "Pete Fraser" <pfra...(a)covad.net> wrote: >> You can just invert the msb, and consider the middle of >> your range to be "000000000". > Now I follow what you mean - offset binary. Could be done, but would > make verification very painful. Perhaps, but not as painful as you think. > With numeric_std the number system is two's complement rather than > one's complement so inverting the MSB will not work. You invert the msb, and regard it as a 2s complement number (-2^(n-1) -> 2^(n-1) -1; ony one zero) This has the advantage of allowing you symmetrical overload in your ladder, for essentially unipolar signals. I used to build large, time variant 1-D and 2-D filters for processing RGB and Y. This was back when the MPY8HJ first came out (1978?). Pete
From: Jonathan Bromley on 27 Apr 2010 18:05
On Tue, 27 Apr 2010 03:41:01 -0700 (PDT), dgreig wrote: >Unfortunataly unsigned to signed requires zero padding, adding the >extra bit inferres a 18*18 block rather than 9*9. In the case of 18 >bit inputs the unsigned to signed requires one more bit than the block >actually has. What about Kolja Sulimma's suggestion of a conditional adder after a 17x18 multiply? This is only a sketch, but shows that it is quite neat both in VHDL code and in hardware: subtype S36 is signed(35 downto 0); function U18xS18 ( U: unsigned(17 downto 0), S: signed(17 downto 0) ) return S36 is variable product: S36; begin product := signed'(U) * S; if (U(17) = '1') then product(35 downto 18) := product(35 downto 18) + signed'(U); end if; return product; end; Disclaimer: I haven't tried synthesising this, and I suspect you may need to play with the code some more to get the best synthesis results. -- Jonathan Bromley |