Prev: Performing incremental code coverage with modelsim
Next: Problems with VHDL lookup table in Quartus
From: daniel.larkin on 26 Jul 2010 15:49 Hi all, In my Cyclone 4 based design I'm getting an embedded multiplier inferred, as expected from the following VHDL: C <= A * B; (where A and B are registered 12 bit values, and the output C is subsequently registered, with no other logic in the path) However I'm seeing a timing violation on this path. Looking at the timing reports, there is nearly a 2ns delay between the output of the multiplier and the flop. Obviously I'd really like to pull in some of this 2ns, which would sort out the negative slack problem. I looked through the documentation for the embedded multipliers, and as expected there are input and output registers as part of the embedded multiplier block. But clearly with that 2ns delay the output register isn't being used. So my question is: how do I write my code to infer the use of the output registers in the embedded multipliers? As I tried a number of coding styles, including putting the multiplication operation directly inside a clocked process and it had no impact on timing. But I definitely don't want to instantiate the embedded multiplier directly. Perhaps there are any VHDL attributes that may help (anything other than MULTSTYLE DSP/LOGIC)? Any suggestions or pointers to documents would be greatly appreciated!
From: firefox3107 on 26 Jul 2010 18:24 On Jul 26, 9:49 pm, "daniel.lar...(a)gmail.com" <daniel.lar...(a)gmail.com> wrote: > Hi all, > > In my Cyclone 4 based design I'm getting an embedded multiplier > inferred, as expected from the following VHDL: > > C <= A * B; > > (where A and B are registered 12 bit values, and the output C is > subsequently registered, with no other logic in the path) > > However I'm seeing a timing violation on this path. Looking at the > timing reports, there is nearly a 2ns delay between the output of the > multiplier and the flop. Obviously I'd really like to pull in some of > this 2ns, which would sort out the negative slack problem. > > I looked through the documentation for the embedded multipliers, and > as expected there are input and output registers as part of the > embedded multiplier block. But clearly with that 2ns delay the output > register isn't being used. So my question is: how do I write my code > to infer the use of the output registers in the embedded multipliers? > As I tried a number of coding styles, including putting the > multiplication operation directly inside a clocked process and it had > no impact on timing. But I definitely don't want to instantiate the > embedded multiplier directly. Perhaps there are any VHDL attributes > that may help (anything other than MULTSTYLE DSP/LOGIC)? > > Any suggestions or pointers to documents would be greatly appreciated! I would try this Mult: process (iClk, inResetAsync) is begin if inResetAsync = '0' then C <= (others => '0'); elsif rising_edge(iClk) then -- rising clock edge C <= A * B; end if; end process Mult;
From: daniel.larkin on 27 Jul 2010 04:39 I thought I'd already tried that - but it looks like I forgot to reset the output (i.e. C in this case), which subsequently gave a result which didn't use the output register. Problem solved now - Thanks > I would try this > > Mult: process (iClk, inResetAsync) is > begin > if inResetAsync = '0' then > C <= (others => '0'); > elsif rising_edge(iClk) then -- rising clock edge > C <= A * B; > end if; > end process Mult;
From: Nial Stewart on 27 Jul 2010 05:00 > I thought I'd already tried that - but it looks like I forgot to reset > the output (i.e. C in this case), which subsequently gave a result > which didn't use the output register. Problem solved now - Thanks That's odd, I'd have expected the output to have been registered whether it was asynchronously reset or not. Is this a bug in the synthesis tool? Nial.
From: dgreig on 28 Jul 2010 04:56 On Jul 26, 8:49 pm, "daniel.lar...(a)gmail.com" <daniel.lar...(a)gmail.com> wrote: > Hi all, > > In my Cyclone 4 based design I'm getting an embedded multiplier > inferred, as expected from the following VHDL: > > C <= A * B; > > (where A and B are registered 12 bit values, and the output C is > subsequently registered, with no other logic in the path) > > However I'm seeing a timing violation on this path. Looking at the > timing reports, there is nearly a 2ns delay between the output of the > multiplier and the flop. Obviously I'd really like to pull in some of > this 2ns, which would sort out the negative slack problem. > > I looked through the documentation for the embedded multipliers, and > as expected there are input and output registers as part of the > embedded multiplier block. But clearly with that 2ns delay the output > register isn't being used. So my question is: how do I write my code > to infer the use of the output registers in the embedded multipliers? > As I tried a number of coding styles, including putting the > multiplication operation directly inside a clocked process and it had > no impact on timing. But I definitely don't want to instantiate the > embedded multiplier directly. Perhaps there are any VHDL attributes > that may help (anything other than MULTSTYLE DSP/LOGIC)? > > Any suggestions or pointers to documents would be greatly appreciated! The following works, I have 10's of thousands of instantions in a similar number of FPGA's actually in the field. The multstyle attribute may be what you need. Synthesis might not use DSP if there is not timing need and no power need. -- ============================================================================================================================================================-- -- COPYRIGHT (c) 2010 DAVID GREIG. This source file is the property of David Greig. This work must not be copied without permission from David Greig. -- -- Any copy or derivative of this source file must include this copyright statement. -- ---------------------------------------------------------------------------------------------------------------------------------------------------------------- -- File : SyS_Mult.vhd -- Author : David Greig (email : -- Revision : -- Description : signed input data multiplier with clken output reg ------------------------------------------------------------------------------------------------------------------------ -- Notes : 2 clock cycle delay -- ============================================================================================================================================================-- library IEEE; use IEEE.std_logic_1164.all; use IEEE.numeric_std.all; -- ============================================================================================================================================================-- entity SyS_Mult is -- 2 clock cycle delay generic( Gdawidth : natural; Gdbwidth : natural; Gmult_pref : string ); port( arstn : in std_logic; clk : in std_logic; clken : in std_logic; da_i : in std_logic_vector(Gdawidth - 1 downto 0); db_i : in std_logic_vector(Gdbwidth - 1 downto 0); q_o : out std_logic_vector(Gdawidth + Gdbwidth - 1 downto 0) ); end entity SyS_Mult; -- ============================================================================================================================================================-- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- architecture rtl of SyS_Mult is -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- attribute multstyle : string; -- Implementation style, "logic" "dsp" ------------------------------------------------------------------------------------------------------------------------ signal da_r : signed(Gdawidth - 1 downto 0); signal db_r : signed(Gdbwidth - 1 downto 0); signal p_r : signed(Gdawidth + Gdbwidth - 1 downto 0); attribute multstyle of p_r : signal is Gmult_pref; ------------------------------------------------------------------------------------------------------------------------ begin ------------------------------------------------------------------------------------------------------------------------ prcs_SyS_Mult : process(arstn, clken, clk) begin if (arstn = '0') then da_r <= (others => '0'); db_r <= (others => '0'); p_r <= (others => '0'); elsif (clken = '0') then null; elsif rising_edge(clk) then da_r <= signed(da_i); db_r <= signed(db_i); p_r <= (da_r * db_r); end if; end process prcs_SyS_Mult; ------------------------------------------------------------------------------------------------------------------------ q_o <= std_logic_vector(p_r); -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- end architecture rtl; -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- -- component SyS_Mult is -- 2 clock cycle delay -- generic( -- Gdawidth : natural; -- Gdbwidth : natural; -- Gmult_pref : string -- ); -- port( -- arstn : in std_logic; -- clk : in std_logic; -- clken : in std_logic; -- da_i : in std_logic_vector(Gdawidth - 1 downto 0); -- db_i : in std_logic_vector(Gdbwidth - 1 downto 0); -- q_o : out std_logic_vector(Gdawidth + Gdbwidth -1 downto 0) -- ); -- end component SyS_Mult; -- i_ : SyS_Mult -- 2 clock cycle delay -- generic map( -- Gdawidth => , -- Gdbwidth => , -- Gmult_pref => -- ) -- port map( -- arstn => , -- clk => , -- clken => , -- da_i => , -- db_i => , -- q_o => -- );
|
Pages: 1 Prev: Performing incremental code coverage with modelsim Next: Problems with VHDL lookup table in Quartus |