fooling the compiler [Design]

Prev: The urge to kill
Next: high quality Puma shoes wholesale

From: Tim Wescott on 25 Jun 2010 11:30

On 06/25/2010 07:00 AM, John Larkin wrote:
> On Fri, 25 Jun 2010 08:46:30 +0000 (UTC), Uwe Bonnes
> <bon(a)elektron.ikp.physik.tu-darmstadt.de> wrote:
>
>> In comp.arch.fpga John Larkin
>> <jjlarkin(a)highnotlandthistechnologypart.com> wrote:
>>
>>
>>> We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>> converters. The data we get back is different, but the clock and chip
>>> select timings are the same. To get the timing right, avoiding routing
>>> delays, we need our outgoing stuff to be reclocked by i/o cell
>>> flipflops.
>>
>>> So what happens is that we have one state machine running all 16 SPI
>>> interfaces. We tell the software that we want the adc chip select
>>> flops in i/o cells. The compiler decides that all are seeing the same
>>> input signal, so reduces them to one flipflop. Then it concludes that
>>> that flipflop can't be in an i/o block, and builds it that way. The
>>> resulting routing delays are deadly.
>>
>>> We couldn't find a way to force these 16 flops into IOBs. Really.
>>
>>> The fix is to fool the compiler into thinking the flipflop states are
>>> not the same. Turns out the the synchronous clear inputs to the flops
>>> are unused in our design. My suggestion was to ground an input pin,
>>> run that into the serial input of a 16-bit shift register, and route
>>> the sr taps to the clears of the 16 output flops. The compiler can't
>>> know that these levels are in fact always low, so has to gen 16
>>> different flops. *Then* it allows the flops to be forced into IOBs.
>>
>>> Rob has a better idea, just make a 16-bit SR that generates a
>>> thermometer code on powerup, namely walk a 1 into it, and have the sr
>>> output bits un-clear the i/o flops sequentially. The compiler isn't
>>> smart enough catch onto that, and we don't need to ground a pin.
>>
>>> It works.
>>
>>> Isn't that all perfectly stupid?
>>
>> Did you try to attach a
>> (* KEEP = "TRUE" *)
>> attribute to the registers in question?
>>
>> I had a similar problem with registers meant to get places in an IOB
>> absorbed by the feeding BRAM
>
> My Xilinx guy, a real pro at this sort of thing, tried everything,
> "keeps" and "forces" and such. He thought my suggestion was
> disgusting, which it certainly is, but it broke the logjam and let us
> get on with our lives.
>
> The real figure of merit of any fpga is a mimimal value of
>
> K = A/R
>
> where A is the actual time you spend downloading, installing,
> patching, and fighting with the tools, and R is a reasonable
> design/sim/test time for the project. Xilinx's K value increases
> steadily over time, and is now roughly 4.

Funny, that's my figure of merit for an embedded processor, too!

It's much better these days than it was 20 years ago, although I admit
that I've managed to shove it way up recently by deciding to build my
own open-source tools. But that's for play, not for money, so it's
different.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

From: John Larkin on 25 Jun 2010 11:55

On Fri, 25 Jun 2010 08:30:04 -0700, Tim Wescott <tim(a)seemywebsite.now>
wrote:

>On 06/25/2010 07:00 AM, John Larkin wrote:
>> On Fri, 25 Jun 2010 08:46:30 +0000 (UTC), Uwe Bonnes
>> <bon(a)elektron.ikp.physik.tu-darmstadt.de> wrote:
>>
>>> In comp.arch.fpga John Larkin
>>> <jjlarkin(a)highnotlandthistechnologypart.com> wrote:
>>>
>>>
>>>> We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>>> converters. The data we get back is different, but the clock and chip
>>>> select timings are the same. To get the timing right, avoiding routing
>>>> delays, we need our outgoing stuff to be reclocked by i/o cell
>>>> flipflops.
>>>
>>>> So what happens is that we have one state machine running all 16 SPI
>>>> interfaces. We tell the software that we want the adc chip select
>>>> flops in i/o cells. The compiler decides that all are seeing the same
>>>> input signal, so reduces them to one flipflop. Then it concludes that
>>>> that flipflop can't be in an i/o block, and builds it that way. The
>>>> resulting routing delays are deadly.
>>>
>>>> We couldn't find a way to force these 16 flops into IOBs. Really.
>>>
>>>> The fix is to fool the compiler into thinking the flipflop states are
>>>> not the same. Turns out the the synchronous clear inputs to the flops
>>>> are unused in our design. My suggestion was to ground an input pin,
>>>> run that into the serial input of a 16-bit shift register, and route
>>>> the sr taps to the clears of the 16 output flops. The compiler can't
>>>> know that these levels are in fact always low, so has to gen 16
>>>> different flops. *Then* it allows the flops to be forced into IOBs.
>>>
>>>> Rob has a better idea, just make a 16-bit SR that generates a
>>>> thermometer code on powerup, namely walk a 1 into it, and have the sr
>>>> output bits un-clear the i/o flops sequentially. The compiler isn't
>>>> smart enough catch onto that, and we don't need to ground a pin.
>>>
>>>> It works.
>>>
>>>> Isn't that all perfectly stupid?
>>>
>>> Did you try to attach a
>>> (* KEEP = "TRUE" *)
>>> attribute to the registers in question?
>>>
>>> I had a similar problem with registers meant to get places in an IOB
>>> absorbed by the feeding BRAM
>>
>> My Xilinx guy, a real pro at this sort of thing, tried everything,
>> "keeps" and "forces" and such. He thought my suggestion was
>> disgusting, which it certainly is, but it broke the logjam and let us
>> get on with our lives.
>>
>> The real figure of merit of any fpga is a mimimal value of
>>
>> K = A/R
>>
>> where A is the actual time you spend downloading, installing,
>> patching, and fighting with the tools, and R is a reasonable
>> design/sim/test time for the project. Xilinx's K value increases
>> steadily over time, and is now roughly 4.
>
>Funny, that's my figure of merit for an embedded processor, too!
>
>It's much better these days than it was 20 years ago, although I admit
>that I've managed to shove it way up recently by deciding to build my
>own open-source tools. But that's for play, not for money, so it's
>different.

I'm still programming embedded stuff in 68K assembly. Dyno mode. The
thing is, I finish a typical instrument's firmware in a week or two
and have zero problems with the assembly and debug tools. And rarely
find a bug in shipped products. I can archive the source *and all the
tools* on one floppy. A lot of people nowadays can't even install and
run tool chains that they used a few years ago.

Sometimes just grunting it out with simple tools is the best way to
get something done. A lot of fancy labor-saving, abstraction-for-reuse
stuff is actually game-playing and counter-productive.

John

From: Tim Wescott on 25 Jun 2010 13:03

On 06/25/2010 08:55 AM, John Larkin wrote:
> On Fri, 25 Jun 2010 08:30:04 -0700, Tim Wescott<tim(a)seemywebsite.now>
> wrote:
>
>> On 06/25/2010 07:00 AM, John Larkin wrote:
>>> On Fri, 25 Jun 2010 08:46:30 +0000 (UTC), Uwe Bonnes
>>> <bon(a)elektron.ikp.physik.tu-darmstadt.de> wrote:
>>>
>>>> In comp.arch.fpga John Larkin
>>>> <jjlarkin(a)highnotlandthistechnologypart.com> wrote:
>>>>
>>>>
>>>>> We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>>>> converters. The data we get back is different, but the clock and chip
>>>>> select timings are the same. To get the timing right, avoiding routing
>>>>> delays, we need our outgoing stuff to be reclocked by i/o cell
>>>>> flipflops.
>>>>
>>>>> So what happens is that we have one state machine running all 16 SPI
>>>>> interfaces. We tell the software that we want the adc chip select
>>>>> flops in i/o cells. The compiler decides that all are seeing the same
>>>>> input signal, so reduces them to one flipflop. Then it concludes that
>>>>> that flipflop can't be in an i/o block, and builds it that way. The
>>>>> resulting routing delays are deadly.
>>>>
>>>>> We couldn't find a way to force these 16 flops into IOBs. Really.
>>>>
>>>>> The fix is to fool the compiler into thinking the flipflop states are
>>>>> not the same. Turns out the the synchronous clear inputs to the flops
>>>>> are unused in our design. My suggestion was to ground an input pin,
>>>>> run that into the serial input of a 16-bit shift register, and route
>>>>> the sr taps to the clears of the 16 output flops. The compiler can't
>>>>> know that these levels are in fact always low, so has to gen 16
>>>>> different flops. *Then* it allows the flops to be forced into IOBs.
>>>>
>>>>> Rob has a better idea, just make a 16-bit SR that generates a
>>>>> thermometer code on powerup, namely walk a 1 into it, and have the sr
>>>>> output bits un-clear the i/o flops sequentially. The compiler isn't
>>>>> smart enough catch onto that, and we don't need to ground a pin.
>>>>
>>>>> It works.
>>>>
>>>>> Isn't that all perfectly stupid?
>>>>
>>>> Did you try to attach a
>>>> (* KEEP = "TRUE" *)
>>>> attribute to the registers in question?
>>>>
>>>> I had a similar problem with registers meant to get places in an IOB
>>>> absorbed by the feeding BRAM
>>>
>>> My Xilinx guy, a real pro at this sort of thing, tried everything,
>>> "keeps" and "forces" and such. He thought my suggestion was
>>> disgusting, which it certainly is, but it broke the logjam and let us
>>> get on with our lives.
>>>
>>> The real figure of merit of any fpga is a mimimal value of
>>>
>>> K = A/R
>>>
>>> where A is the actual time you spend downloading, installing,
>>> patching, and fighting with the tools, and R is a reasonable
>>> design/sim/test time for the project. Xilinx's K value increases
>>> steadily over time, and is now roughly 4.
>>
>> Funny, that's my figure of merit for an embedded processor, too!
>>
>> It's much better these days than it was 20 years ago, although I admit
>> that I've managed to shove it way up recently by deciding to build my
>> own open-source tools. But that's for play, not for money, so it's
>> different.
>
> I'm still programming embedded stuff in 68K assembly. Dyno mode. The
> thing is, I finish a typical instrument's firmware in a week or two
> and have zero problems with the assembly and debug tools. And rarely
> find a bug in shipped products. I can archive the source *and all the
> tools* on one floppy. A lot of people nowadays can't even install and
> run tool chains that they used a few years ago.

Good for you.

> Sometimes just grunting it out with simple tools is the best way to
> get something done. A lot of fancy labor-saving, abstraction-for-reuse
> stuff is actually game-playing and counter-productive.

It can be. It works well when you have a large group and a product line
with lots of components that have both similarities and differences.

I worked on a product that had over a dozen CAN-enabled processors roped
together on a CAN bus. We had a _lot_ of common code that was reused
among all the processors. We also had a lot of code that was individual
to each processor.

Just handing out specifications for the CAN protocol to a dozen
developers and telling them "go" would have been a nightmare. Instead
we got the CAN stuff going with just two guys, and used it everywhere.
Ditto for a bunch of motor control stuff that was used everywhere, as
well as some generic ADC-reading infrastructure and other bits and pieces.

But I've seen code reuse turn into a disaster in the hands of someone
who's not as smart as they think they are.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

From: Nico Coesel on 25 Jun 2010 13:20

John Larkin <jjlarkin(a)highNOTlandTHIStechnologyPART.com> wrote:

>
>
>We have a Spartan6/45 that's talking to 16 separate SPI A/D
>converters. The data we get back is different, but the clock and chip
>select timings are the same. To get the timing right, avoiding routing
>delays, we need our outgoing stuff to be reclocked by i/o cell
>flipflops.
>
>So what happens is that we have one state machine running all 16 SPI
>interfaces. We tell the software that we want the adc chip select
>flops in i/o cells. The compiler decides that all are seeing the same
>input signal, so reduces them to one flipflop. Then it concludes that
>that flipflop can't be in an i/o block, and builds it that way. The
>resulting routing delays are deadly.
>
>We couldn't find a way to force these 16 flops into IOBs. Really.

Constraints usually help. In that case it should duplicate logic (if
this option is on) to meet timing specifications.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico(a)nctdevpuntnl (punt=.)
--------------------------------------------------------------

From: langwadt on 25 Jun 2010 13:27

On 25 Jun., 07:09, John Larkin
<jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote:
> We have a Spartan6/45 that's talking to 16 separate SPI A/D
> converters. The data we get back is different, but the clock and chip
> select timings are the same. To get the timing right, avoiding routing
> delays, we need our outgoing stuff to be reclocked by i/o cell
> flipflops.
>
> So what happens is that we have one state machine running all 16 SPI
> interfaces. We tell the software that we want the adc chip select
> flops in i/o cells. The compiler decides that all are seeing the same
> input signal, so reduces them to one flipflop. Then it concludes that
> that flipflop can't be in an i/o block, and builds it that way. The
> resulting routing delays are deadly.
>
> We couldn't find a way to force these 16 flops into IOBs. Really.
>

you don't happen to use the output of that flop somewhere in the
design?

you can't directly instantiate an output FF, but you can instantiate
a
DDR output FF, OFDDRCPE, with C1 tied low it might work.

-Lasse

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: The urge to kill
Next: high quality Puma shoes wholesale