From: krw on
On Tue, 08 Sep 2009 15:53:43 +0200, Bill <a(a)a.a> wrote:

>Hi,
>
>I'm trying to pull out data from the ADS8320 (a 16-bit ADC by Analog
>Devices. See bottom of page 10 in
>http://focus.ti.com/lit/ds/symlink/ads8320.pdf ) using the SPI in an
>AT91SAM7S256. The problem is that the ADC needs 6 extra clock cycles
>to sample the analog signal, before the 16 cycles that will output
>each of the conversion result bits. So, a complete ADC cycle involves
>a minimum of 22 clock cycles. Now, the SPI in the AT91SAM7 (and, as
>far as I've seen, in all other MCUs), cannot generate more than 16
>clock cycles within one CS activation.

Seems like a pretty poor implementation of the SPI master. I've
generally rolled my own, but at least the later TI DSPs allow SPI word
lengths of 1 to 32 bits.

>How am I supposed to do this, in an elegant way? Of course I could bit
>bang those lines, but I hate doing that, because it adds load to the
>CPU, and doesn't take advantage of the SPI and DMA.

Bit bang the sample clock then use SPI to transfer data? Use an FPGA
to do your I/O?

>The AT91SAM7S256 allows holding the CS low until a new device is
>addressed, so I could initiate two 11-bit readings in a row (in such a
>way that the ADC would think it is a single CS assertion with 22 clock
>cycles inside), and discard the bits with no information, but that's
>still ugly to me. It would use the SPI, but not the DMA, and the two
>readings would be different (the first one should hold CS low. The
>second one should leave it high), which is kind of non-homogeneous.
>
>Any more elegant ideas?

You have a sows ear and you want a silk purse? ;-)

From: David Brown on
Bill wrote:
> On Tue, 08 Sep 2009 22:53:39 +0200, David Brown
> <david.brown(a)hesbynett.removethisbit.no> wrote:
>
>>> I'm very curious... what were the guys at TI (and at some other
>>> companies) thinking when they designed this ADC with SPI interface?
>>> Which SPI hardware were they thinking would be able to pull out data
>>> in one transaction? Does anyone know of an MCU with SPI hardware able
>>> to do this? Amazing.
>>>
>> Freescale microcontrollers with queued SPI peripherals have no problems
>> doing long SPI transfers. Each individual transfer is from 8 to 16
>> clocks, but you can specify that the chip select will not be deasserted
>> after the first transfer,
>
> The AT91SAM7S256 also allows that. The CSAAT (Chip Select Active After
> Transfer) that I keep mentioning in my other posts is exactly that.
> But that is of no use here. It does not allow me to use DMA. I still
> need manual intervention from the CPU to change that CSAAT bit so that
> it is 1 for the first transfer, and 0 for the second transfer (the CS
> cannot be always 0, because the ADC would never trigger a new
> conversion). The DMA won't do that for me. And won't bit bang CS for
> me, either.
>

I'm not familiar with the AT91SAM devices (I'm not very familiar with
ARMs in general - it's nearly twenty years sine I used one), so this may
or may not give you some ideas...

First, is the CSAAT bit in a register adjacent to the SPI transfer
register(s) ? If so, it's possible that your DMA could be set up to
transfer a new value to that control register at the same time as
sending another 16-bit word to the SPI transmit register. (On the
Freescale devices, the equivalent to CSAAT is in a command byte that is
collected from the queue along with the data to send.)

Secondly, if you have several DMA channels and they are flexible enough,
it might be possible to configure a chain of DMA events. For example,
the SPI transfer complete could trigger a DMA transfer. Instead of
writing the next SPI transmit value, this could "manually" trigger two
new DMA channels. The first of these is set up to set the CSAAT bit,
and the second to write the next SPI transfer.

> In other words. Yes, I can specify that the chip select will not be
> deasserted after the first transfer. But how about after the second
> transfer? I need that the chip select WILL BE deasserted after the
> second transfer. And all that needs CPU intervention. The DMA cannot
> automatically reconfigure a peripheral between transfers.
>
>
>> so that you can easily get 22 clock transfers.
>> There are many ColdFire's (and older 683xx devices) with a queued SPI,
>> and the latest PPC-based MPC5xxx devices have even more powerful SPI
>> peripherals connected to flexible DMA.
>>
>> I don't know if that's what the guys at TI were thinking about, and I
>> doubt if it really helps you here, but there *are* microcontrollers
>> available that can handle all this in the background by DMA
>
> There are tens of "MPC5xxx" devices. Could you specify one?
>

There are indeed a fair number of them, and (AFAIK) they all have
powerful SPI peripherals and lots of flexible DMA channels. So without
anything else to go on (performance, flash, external bus, timers, etc.),
it would be a random guess as to which would be suitable for you (though
MPC55xx or MPC56xx narrows it down a little). I've used an MPC5554, if
that helps at all. If you are considering changing architecture, then
you should also look at the ColdFires (specially the v2 cores) - they
are easier to work with.

>> (/if/ you can understand the documentation...).
>
> Doesn't that sound a bit unpolite? Of course I can. Why shouldn't I?
>

Sorry, I didn't mean it to sound impolite to /you/. It's just that the
documentation for these devices is complicated and often unclear. The
"SPI" peripherals on the MPC55xx support serial multiplexing of internal
timer channels as well as SPI (which supports multiple queues amongst
other features), and it's no small job figuring out what is really
needed to get the SPI running as desired. Similarly with DMA. There is
also a lack of appnotes for this sort of setup, so you have to be
prepared for a fair amount of work figuring things out. Once you
understand it all, however, you can do all sorts of wonderful things
with the MPC's peripherals.
From: Stef on
In comp.arch.embedded,
Bill <a(a)a.a> wrote:
> Hi,
>
> I'm trying to pull out data from the ADS8320 (a 16-bit ADC by Analog
> Devices. See bottom of page 10 in
> http://focus.ti.com/lit/ds/symlink/ads8320.pdf ) using the SPI in an
> AT91SAM7S256. The problem is that the ADC needs 6 extra clock cycles
> to sample the analog signal, before the 16 cycles that will output
> each of the conversion result bits. So, a complete ADC cycle involves
> a minimum of 22 clock cycles. Now, the SPI in the AT91SAM7 (and, as
> far as I've seen, in all other MCUs), cannot generate more than 16
> clock cycles within one CS activation.
>
> How am I supposed to do this, in an elegant way? Of course I could bit
> bang those lines, but I hate doing that, because it adds load to the
> CPU, and doesn't take advantage of the SPI and DMA.
>
> The AT91SAM7S256 allows holding the CS low until a new device is
> addressed, so I could initiate two 11-bit readings in a row (in such a
> way that the ADC would think it is a single CS assertion with 22 clock
> cycles inside), and discard the bits with no information, but that's
> still ugly to me. It would use the SPI, but not the DMA, and the two
> readings would be different (the first one should hold CS low. The
> second one should leave it high), which is kind of non-homogeneous.

Ofcourse you can do this with 2 11-bit transfers (or 3 8-bit as mentioned
by others) and still use the PDC (DMA). If you couldn't, how would you
for example read EEPROMS?

Just set CSAAT=0 "The Peripheral Chip Select Line rises as soon as the
last transfer is achieved", fill your TX buffer and point PDC to it.
Then write the PDC counter and the CS will go low before the first
transfer and only rise after the PDC has completed last transfer.

You ofcourse have to make sure all other settings match your hardware,
check that every bit in every register is set as required, understand
the function of each bit.

This is the way I always use SPI on the SAM7. I do use variable
peripheral select and no CS decoding, but I don't think these options
affect the CS deassertion behaviour.

--
Stef (remove caps, dashes and .invalid from e-mail address to reply by mail)

I'm glad I was not born before tea.
-- Sidney Smith (1771-1845)
From: Stef on
In comp.arch.embedded,
Bill <a(a)a.a> wrote:
> On Tue, 08 Sep 2009 13:06:58 -0400, Mel <mwilson(a)the-wire.com> wrote:
>
>>Bill wrote:
>>
>>> On Tue, 08 Sep 2009 11:23:38 -0400, Mel <mwilson(a)the-wire.com> wrote:
>>>
>>>
>>>>You have Peripheral DMA Controller, use it.
>>>
>>> I would gladly use it, but I don't think it is possible. DMA is not
>>> able neither to automatically change CSAAT (Chip Select Active After
>>> Transfer) configuration between 8-bit transactions 2 and 3, nor to bit
>>> bang CS to my needs.
>>
>>From what I'm seeing here, when the PDC is in charge, the SPI doesn't work
>>in 8-bit transactions. It works in a single transaction involving as many
>>8-bit bytes as the PDC demands. I see /CS go low, and while it's low 24
>>clock pulses come out in a single burst, then /CS goes high and life goes
>>on. (This is with the code I posted. Thank the deity for digital storage
>>scopes.)
>
> No, no. It doesn't work that way. The PDC moves N units of
> information, each unit being whichever length (from 8 to 16 bits) the
> SPI has been configured to work with. You must be seeing a single
> transfer there (in your 24 clock cycles).
>
You missed the point: The 'N' units are in a single 'transaction', with
only a single activation of the CS. There may be some (programmable)
delays between the bytes within the transaction, but that should not
bother your peripheral.

> Show me where, in the datasheet, the PDC forces the SPI to work in
> transfers of 8 bits, or in a multiple of 8 bits. I couldn't find it.

I think you misread "doesn't work in 8-bit transactions".

In the datasheet there is however a diagram that shows CS staying low
between 2 bytes. Dec'08 datasheet, page 271, figure 28-7, see signal
"Chip Select 2".


--
Stef (remove caps, dashes and .invalid from e-mail address to reply by mail)

People who push both buttons should get their wish.
From: Bill on
Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
CS would be deasserted (high) between consecutive "word transfers"
within one "block transfer", but it is not. I had clear from the
beginning (from diagrams and text) that there was a way to keep CS=0
between word transfers, but I thought that it implied CSAAT=1, and it
is not true. CS is 0 between consecutive word transfers (of the same
block transfer) regardless of the value of CSAAT.

So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
needed (other than at the beginning and at the end of each block
transfer), and I can use DMA, with two 11-bit word transfers per block
transfer.


This is good, but I think that it could be better. Difficult to
explain, but I'll try:

Imagine my external ADC (with SPI interface) is sampling the analog
input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
10 us between samples. Not much. Each sample needs 22-clock cycles
inside each assertion of CS=0, so each sample needs one DMA block
transfer (with for instance two 11-bit word transfers inside). Each
DMA block transfer needs CPU intervention. So, I need CPU intervention
every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
Since (that I know) a DMA block transfer cannot be triggered directly
by a timer overflow or underflow, an interrupt service routine
(triggered by a 10 us timer underflow) must be executed every so
often, so that the CPU can manually trigger the DMA block transfer and
collect the data. Adding up the overhead of the interrupt context
switching and the instructons needed to move data from and to the
block buffers, to re-trigger the block transfer, and all this in C++,
I think that all that may consume a "significant" portion of those 480
cycles. And the CPU is supposed to do something with that data, and
some other things. I see that hog as a killer, or at least as a pitty.

If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
allowed triggering the next word transfer (inside a block transfer)
when a certain timer underflows, then the DMA blocks wouldn't need to
be so small. Each analog sample could travel in one single SPI word
transfer, and one DMA block could be planned to carry for instance
1000 word transfers. That would be one DMA block every 10 ms. The
buffer (FIFO) memory would be larger, but the CPU intervention needed
would be much lower. There would be the same number of useful cycles,
but much fewer wasted cycles. There wouldn't need to exist an
interrupt service routine executed every 10 us, which is a killer.
That would be a good SPI and a good DMA, in my opinion, and the extra
cost in silicon is negligible, compared to the added benefit. Why
don't most MCUs allow that? Even cheap MCUs could include that. An MCU
with the price of a SAM7 should include that, in my opinion.

Best,