Prev: data2mem
Next: Booting Linux from my own bootloader
From: dgreig on 28 Apr 2010 03:42 On Apr 27, 11:05 pm, Jonathan Bromley <s...(a)oxfordbromley.plus.com> wrote: > On Tue, 27 Apr 2010 03:41:01 -0700 (PDT), dgreig wrote: > >Unfortunataly unsigned to signed requires zero padding, adding the > >extra bit inferres a 18*18 block rather than 9*9. In the case of 18 > >bit inputs the unsigned to signed requires one more bit than the block > >actually has. > > What about Kolja Sulimma's suggestion of a conditional adder > after a 17x18 multiply? This is only a sketch, but shows > that it is quite neat both in VHDL code and in hardware: > > subtype S36 is signed(35 downto 0); > > function U18xS18 ( > U: unsigned(17 downto 0), > S: signed(17 downto 0) > ) return S36 is > variable product: S36; > begin > product := signed'(U) * S; > if (U(17) = '1') then > product(35 downto 18) := > product(35 downto 18) + signed'(U); > end if; > return product; > end; > > Disclaimer: I haven't tried synthesising this, and I suspect you > may need to play with the code some more to get the best > synthesis results. > -- > Jonathan Bromley Problem is more logic + routing + clock cycle latency. The dirty method altmult_accum at least makes best use of the resources and the 2 cycle latency is key to system throughput. System uses 282.5 out of 288 18*18 multipliers, 91% block ram and ~70% of logic elements. Still achieving a 25% timing margin on a mid speed device (slowest max is ~200MHz) and I am reluctant to push the boat out. The niggle is that apart from pll's and DDR2 specifics, the code would be otherwise transportable. |