Embedded clocks [FPGA]

Prev: Cyclone I & II memory fmax
Next: JOP as SOPC component

From: Jim Granville on 12 Aug 2006 21:51

rickman wrote:

> Jim Granville wrote:
>
>>anything better than RC, has starting time issues, so usually runs
>>all the time, and that has power penalties.
>
>
> RC is not even an oscillator without other componets so it really is
> not a solution. I can get an oscillator that runs on 1 mA of current,
> costs under $0.50 and has plenty of accuracy to do any of the above
> protocols. So async serial is ok. One in and one out.

RC osc would use the CPLD - not sure what 'other components' you mean.

If you are happy with 1mA and 50c, then that's fine.

I see in my notes, Core ICC figures of ~20uA @ 15KHz for a CPLD RC osc,
at a cost of a few cents. ( and appx 50uA at 1MHz )

>
>>>I don't see one wire as being any simpler than a UART. One wire is
>>>just bit async rather than byte async. You still need a timer to time
>>>the bits.
>>
>>build them both, and count the macrocells :)
>>
>>UARTs need (commonly) /16 resettable counter on RX, and a /16 non
>>resetable counter on TX, plus the byte buffers in both directions.
>>
>>So that's at least 8 macrocells running higher than the bit-rate,
>>plus appx 4 more do do the framing, vs 3-4 for PWM bus.
>
>
> A PWM bit level signal still has to do all the higher level stuff of
> counting the bits in a word etc. So if there is an savings, it would
> be very little.

Again, it depends on your yardstick. When you are working with 32
Macrocell CPLDs, as I do often, a saving of 8 macrocells can be
very important.

<snip>
>>64 Macrocells sounds plenty, could even manage this in 32 Macrocell parts.
>
>
> You did not account for the two SPI ports that are being multiplexed.
> Without more info on the protocol on the SPI ports, I can't count FFs.

I thought this was a multi-slave plus master problem - you seem to be
talking only about the master above - what are the slaves ?

> But each one will need a buffer since the link will have to run much
> faster than either of the two SPI ports. Also I don't even know if
> this will work since SPI is full duplex, IIRC. As you shift out data
> read data is coming back, right? Or is it still half duplex with the
> read data and write data never happening at the same time? I would not
> be able to buffer words and do full duplex. That sounds incompatible
> to me.

SPI works like the simplest 8 bit shift registers, so it is duplex
capable.

Most SPI memories, work in half-duplex - they read the address info,
while floating SerialOUT, and then ignore SerialIN, while
driving serial out (if doing a read).

If you have to slave to two separate SPI streams, that you have little
control over, that could get complex very quickly.

-jg

From: PeteS on 13 Aug 2006 07:53

rickman wrote:
> Jim Granville wrote:
> > anything better than RC, has starting time issues, so usually runs
> > all the time, and that has power penalties.
>
> RC is not even an oscillator without other componets so it really is
> not a solution. I can get an oscillator that runs on 1 mA of current,
> costs under $0.50 and has plenty of accuracy to do any of the above
> protocols. So async serial is ok. One in and one out.
>
> > > I don't see one wire as being any simpler than a UART. One wire is
> > > just bit async rather than byte async. You still need a timer to time
> > > the bits.
> >
> > build them both, and count the macrocells :)
> >
> > UARTs need (commonly) /16 resettable counter on RX, and a /16 non
> > resetable counter on TX, plus the byte buffers in both directions.
> >
> > So that's at least 8 macrocells running higher than the bit-rate,
> > plus appx 4 more do do the framing, vs 3-4 for PWM bus.
>
> A PWM bit level signal still has to do all the higher level stuff of
> counting the bits in a word etc. So if there is an savings, it would
> be very little.
>
> > PWM Osc is gated-monostable type at 4x bit rate - so power is lower.
> > A 3 bit Gray counter handles RxSample, TxWindow, and Sync detect
> >
> > Simulating all this is not that easy, on today's tools, which are
> > designed for a master-clock approach.
> >
> >
> > CPLDs have no problems with speed, but the host speed may be a stumbling
> > block. Philips talked about 3.4MHz i2c, but nothing seems to have hit
> > the streets. I see they now have a FM+ spec, which is high drive i2c,
> > at 1MBd, also well within CPLD's reach.
>
> The host would be another CPLD. The "host" has to take in two SPI
> running near 100 kHz and four discrete signals. I have no info on how
> the SPI data is framed. I2C is done in bytes, but my understanding is
> that SPI has no defined protocol, it really is a non-standard standard.
> I will have to get more info on how the SPI busses are being used
> before I can decide if this will even work.
>
>
> > > SPI would work too, but would use all four pins leaving us no spares.
> >
> > SPI can work with 3 wires, if that helps.
>
> I could put an address on the SPI bus like I2C does it. I can't recall
> at the moment why I felt it would need a fourth pin. I think because
> of flagging which of the two SPI ports was running at that moment. But
> that can be encoded in the data stream so I guess it could leave a pin
> free.
>
>
> > > A UART interface could use two wires, one for transmit and one for receive.
> >
> > > The word
> > > size can be application specific with dedicated bits for discrete
> > > signals. Most importantly, I think it will be the smallest in a CPLD.
> >
> > How many IO's do you need, on how many addresses ?
>
> I don't understand. Do you mean the discrete signals?
>
> > Do they need dataDirection register control, and read-back, or
> > are simpler fixed OUT and IN acceptable ?
>
> No, just four outputs. They are triggers with timing information, but
> I don't know how precise they need to be.
>
> > 64 Macrocells sounds plenty, could even manage this in 32 Macrocell parts.
>
> You did not account for the two SPI ports that are being multiplexed.
> Without more info on the protocol on the SPI ports, I can't count FFs.
> But each one will need a buffer since the link will have to run much
> faster than either of the two SPI ports. Also I don't even know if
> this will work since SPI is full duplex, IIRC. As you shift out data
> read data is coming back, right? Or is it still half duplex with the
> read data and write data never happening at the same time? I would not
> be able to buffer words and do full duplex. That sounds incompatible
> to me.

SPI is a spec in search of itself.
There are some variants to it. For a decent overview of the different
types, you could look at the interface specs on this device I use
http://products.zarlink.com/product_profiles/ZL38001.htm

(Look at the microport section).

One of the issues I ran into on the original Motorola implementation
was that the last data bit (lsb) was only valid for half a bit time -
so I had to generate a quadrature clock. (That was in the early 90s).

As you are doing it for yourself, you have no such limitation. SPI can
run at up to 4Mb/s on commercial parts, and I see no reason a CPLD
could not handle that.

Cheers

PeteS

From: rickman on 13 Aug 2006 08:15

Jim Granville wrote:
> rickman wrote:
>
> > Jim Granville wrote:
> >
> >>anything better than RC, has starting time issues, so usually runs
> >>all the time, and that has power penalties.
> >
> >
> > RC is not even an oscillator without other componets so it really is
> > not a solution. I can get an oscillator that runs on 1 mA of current,
> > costs under $0.50 and has plenty of accuracy to do any of the above
> > protocols. So async serial is ok. One in and one out.
>
> RC osc would use the CPLD - not sure what 'other components' you mean.
>
> If you are happy with 1mA and 50c, then that's fine.
>
> I see in my notes, Core ICC figures of ~20uA @ 15KHz for a CPLD RC osc,
> at a cost of a few cents. ( and appx 50uA at 1MHz )

How do you get a CPLD to reliably oscillate with an RC?

> Again, it depends on your yardstick. When you are working with 32
> Macrocell CPLDs, as I do often, a saving of 8 macrocells can be
> very important.

I agree, but I am not sure there is any savings. Both methods have to
do the same work in framing bits and words so I don't see where the
savings would come in. That would require a more detailed analysis or
design. I suppose the one wire protocol might be able to have more
commonality between the rx and tx, but a half duplex uart could likely
do the same thing.

There are other considerations. If I can clearly show that a one-wire
approach would work and fit the part while a uart design would not,
that would be clear evidence. But I am currently working (the emphasis
on *currently*) in a very political environment where being right is no
guarantee of being "right". If a more "trusted" designer decided that
he was uncomfortable with the one-wire approach it would be gone
regardless of the facts.

> <snip>
> >>64 Macrocells sounds plenty, could even manage this in 32 Macrocell parts.
> >
> >
> > You did not account for the two SPI ports that are being multiplexed.
> > Without more info on the protocol on the SPI ports, I can't count FFs.
>
> I thought this was a multi-slave plus master problem - you seem to be
> talking only about the master above - what are the slaves ?

I think I explained what I was building, but it may be scattered over
several different posts. The "thing" I am trying to figure out is in
essense a multiplexer and demultiplexer of several signals. There are
two interfaces that need to be brought out through a cable. The
connector is virtually out of pins and we don't want to make it larger.
Each interface has an SPI port with a CE since it drives multiple
slaves running at lowish rates. The interface also has two discrete
signals which are mostly for timing of control. The SPI port is used
to send setup commands and the discrete signals say "do it now". We
have added four pins to the connector before remembering that the
interface for each was a total of 6 pins and that we actually needed
two ports. Is it unlikely that we can add a total of 12 pins to the
connnector. So we need a mux.

The SPI ports can not be multiplexed together in the simple way since
they have separate masters that are not synchronized. So to multiplex
this will require capturing the data from both ports and muxing it at a
higher speed along with the discrete signals. I am limited by not
knowing the format of the data on the SPI busses.

It just occured to me that I really don't need to know the format of
the data if I just treat the signals as arbitrary data streams. I can
sample them at high enough rates that the timing information is not
significantly distorted and send them across in a very, very simple
scheme. I can sample each one in a round-robin manner at say 1 MHz for
a total clock rate of 10 MHz (10 signals in one direction, 2 in the
other). This interface would need a clock, datain, dataout and frame
sync. That will fill the four pins, but would also be flexible enough
that other discrete signals could be added with a small increase in
frequency.

One BIG advantage to doing it this way is that the latency is no more
than 2 clock cycles or 0.2 uS. I am also very confident that it could
be done in a 32 cell CPLD. The slave would be clocked by the
interface, so it does not need a crystal or other timing device. The
master can be a bit more complex since it is inside the unit, but
likely a ~10 MHz clock can be provided. The only remaining issue is
finding a CPLD that is low power and can be made tolerant of the
external signals. I would hate to have to buffer all 16 signals in the
cable.

> SPI works like the simplest 8 bit shift registers, so it is duplex
> capable.
>
> Most SPI memories, work in half-duplex - they read the address info,
> while floating SerialOUT, and then ignore SerialIN, while
> driving serial out (if doing a read).
>
> If you have to slave to two separate SPI streams, that you have little
> control over, that could get complex very quickly.

Yes, that is the part that has me concerned. But by explaining it to
you, I think I have figured out the best way to approach this. We'll
see if I can get the multiplexer to fit in terms of power and any other
constraints that pop up later.

Thanks for listening and offering your advice.

From: Brian Davis on 13 Aug 2006 08:34

rickman wrote:
>
> I had not given the question much thought when I posed it and I see now
> that all the "self clocking" schemes are framed in some rate and the
> clock is recovered given a reference.
>
If you can use one of the small CPLD/FPGA parts with a DLL/PLL
(e.g. MachXO) at the far end, you should be able to press the {D|P}LL
into service for clock recovery using a specially constructed waveform.

There was thread last year about sending bidirectional data over a
SATA cable without needing a CDR at the far end where I suggested
a clock phase modulation scheme:

http://groups.google.com/group/comp.arch.fpga/msg/42d1840c981c64e6

- send a clock with fixed rising edges and +/-90 degree
phase modulation of the falling edges

- divide by two to strip the phase modulation -> clean ref. clk

- double then phase shift for a mid-period sample clock

If the {D|P}LL phase detector & ctl logic only uses leading edges,
and doesn't mind the wild duty cycle swings, you could skip the
divide-by-2 and double steps.

I haven't tried that out yet, but I don't see any fundamental problems
with it ( other than wasting BW vs. other modulation schemes. )

Brian

From: rickman on 13 Aug 2006 09:15

Brian Davis wrote:
> rickman wrote:
> >
> > I had not given the question much thought when I posed it and I see now
> > that all the "self clocking" schemes are framed in some rate and the
> > clock is recovered given a reference.
> >
> If you can use one of the small CPLD/FPGA parts with a DLL/PLL
> (e.g. MachXO) at the far end, you should be able to press the {D|P}LL
> into service for clock recovery using a specially constructed waveform.
>
> There was thread last year about sending bidirectional data over a
> SATA cable without needing a CDR at the far end where I suggested
> a clock phase modulation scheme:
>
> http://groups.google.com/group/comp.arch.fpga/msg/42d1840c981c64e6
>
> - send a clock with fixed rising edges and +/-90 degree
> phase modulation of the falling edges
>
> - divide by two to strip the phase modulation -> clean ref. clk
>
> - double then phase shift for a mid-period sample clock
>
> If the {D|P}LL phase detector & ctl logic only uses leading edges,
> and doesn't mind the wild duty cycle swings, you could skip the
> divide-by-2 and double steps.
>
> I haven't tried that out yet, but I don't see any fundamental problems
> with it ( other than wasting BW vs. other modulation schemes. )

Thanks for the ideas. I was not aware of the MACHXO parts. They are
suprizingly dense and cheap. Web pricing at Atmel for the 256 LUT part
is around $5 in moderate quantities.

I assume I would need a 2x clock to generate the 90 degree skewing of
the trailing edge or even a 4x clock if I don't want to play tricks
with using opposite phases clocking FFs.

This could work and would only use two pins, one in each direction.
But the device itself would be pushing the boundary of what I would
like to use. The smallest part is 256 LUTs and the smallest package is
a 100 pin Chip Scale BGA at 8x8 mm. There is one universal truth in
FPGAs and CPLDs; if you have a lot of IO, you will have a lot of logic
and if you have a lot of logic you will have a lot of IO. They don't
put large deivces in low pin count packages, ever!

The power consumption of the MACHXO is a bit high. I don't have an
exact number, but the standby current starts at 14 mA. But this is not
out of the ballpark and I will keep the idea in mind since it can use
half the pins of the other approach I have in mind.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Cyclone I & II memory fmax
Next: JOP as SOPC component