From: Stef on
In comp.arch.embedded,
Bill <a(a)a.a> wrote:
> Hi,
>
> I'm trying to pull out data from the ADS8320 (a 16-bit ADC by Analog
> Devices. See bottom of page 10 in
> http://focus.ti.com/lit/ds/symlink/ads8320.pdf ) using the SPI in an
> AT91SAM7S256. The problem is that the ADC needs 6 extra clock cycles
> to sample the analog signal, before the 16 cycles that will output
> each of the conversion result bits. So, a complete ADC cycle involves
> a minimum of 22 clock cycles. Now, the SPI in the AT91SAM7 (and, as
> far as I've seen, in all other MCUs), cannot generate more than 16
> clock cycles within one CS activation.
>
> How am I supposed to do this, in an elegant way? Of course I could bit
> bang those lines, but I hate doing that, because it adds load to the
> CPU, and doesn't take advantage of the SPI and DMA.
>
> The AT91SAM7S256 allows holding the CS low until a new device is
> addressed, so I could initiate two 11-bit readings in a row (in such a
> way that the ADC would think it is a single CS assertion with 22 clock
> cycles inside), and discard the bits with no information, but that's
> still ugly to me. It would use the SPI, but not the DMA, and the two
> readings would be different (the first one should hold CS low. The
> second one should leave it high), which is kind of non-homogeneous.

Ofcourse you can do this with 2 11-bit transfers (or 3 8-bit as mentioned
by others) and still use the PDC (DMA). If you couldn't, how would you
for example read EEPROMS?

Just set CSAAT=0 "The Peripheral Chip Select Line rises as soon as the
last transfer is achieved", fill your TX buffer and point PDC to it.
Then write the PDC counter and the CS will go low before the first
transfer and only rise after the PDC has completed last transfer.

You ofcourse have to make sure all other settings match your hardware,
check that every bit in every register is set as required, understand
the function of each bit.

This is the way I always use SPI on the SAM7. I do use variable
peripheral select and no CS decoding, but I don't think these options
affect the CS deassertion behaviour.

--
Stef (remove caps, dashes and .invalid from e-mail address to reply by mail)

I'm glad I was not born before tea.
-- Sidney Smith (1771-1845)
From: Bill on
Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
CS would be deasserted (high) between consecutive "word transfers"
within one "block transfer", but it is not. I had clear from the
beginning (from diagrams and text) that there was a way to keep CS=0
between word transfers, but I thought that it implied CSAAT=1, and it
is not true. CS is 0 between consecutive word transfers (of the same
block transfer) regardless of the value of CSAAT.

So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
needed (other than at the beginning and at the end of each block
transfer), and I can use DMA, with two 11-bit word transfers per block
transfer.


This is good, but I think that it could be better. Difficult to
explain, but I'll try:

Imagine my external ADC (with SPI interface) is sampling the analog
input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
10 us between samples. Not much. Each sample needs 22-clock cycles
inside each assertion of CS=0, so each sample needs one DMA block
transfer (with for instance two 11-bit word transfers inside). Each
DMA block transfer needs CPU intervention. So, I need CPU intervention
every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
Since (that I know) a DMA block transfer cannot be triggered directly
by a timer overflow or underflow, an interrupt service routine
(triggered by a 10 us timer underflow) must be executed every so
often, so that the CPU can manually trigger the DMA block transfer and
collect the data. Adding up the overhead of the interrupt context
switching and the instructons needed to move data from and to the
block buffers, to re-trigger the block transfer, and all this in C++,
I think that all that may consume a "significant" portion of those 480
cycles. And the CPU is supposed to do something with that data, and
some other things. I see that hog as a killer, or at least as a pitty.

If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
allowed triggering the next word transfer (inside a block transfer)
when a certain timer underflows, then the DMA blocks wouldn't need to
be so small. Each analog sample could travel in one single SPI word
transfer, and one DMA block could be planned to carry for instance
1000 word transfers. That would be one DMA block every 10 ms. The
buffer (FIFO) memory would be larger, but the CPU intervention needed
would be much lower. There would be the same number of useful cycles,
but much fewer wasted cycles. There wouldn't need to exist an
interrupt service routine executed every 10 us, which is a killer.
That would be a good SPI and a good DMA, in my opinion, and the extra
cost in silicon is negligible, compared to the added benefit. Why
don't most MCUs allow that? Even cheap MCUs could include that. An MCU
with the price of a SAM7 should include that, in my opinion.

Best,
From: Chris Stratton on
On Sep 9, 4:17 pm, Bill <a...(a)a.a> wrote:
> This is good, but I think that it could be better. Difficult to
> explain, but I'll try:
>
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention.

You might be interested to compare the blackfin's SPORT DMA
capability... there the DMA can be programmed to transfer words
separated in time without CPU intervention. Obviously no fixed-
hardware solution (other than a built in gate array ;-) is going to
have the flexibility for all needs, but this sounds like it might be
along the lines of what you are looking for... so at least some
silicon designers seem to be thinking along your lines.
From: Stef on
In comp.arch.embedded,
Bill <a(a)a.a> wrote:
>
> This is good, but I think that it could be better. Difficult to
> explain, but I'll try:
>
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention. So, I need CPU intervention
> every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
> Since (that I know) a DMA block transfer cannot be triggered directly
> by a timer overflow or underflow, an interrupt service routine
> (triggered by a 10 us timer underflow) must be executed every so
> often, so that the CPU can manually trigger the DMA block transfer and
> collect the data. Adding up the overhead of the interrupt context
> switching and the instructons needed to move data from and to the
> block buffers, to re-trigger the block transfer, and all this in C++,
> I think that all that may consume a "significant" portion of those 480
> cycles. And the CPU is supposed to do something with that data, and
> some other things. I see that hog as a killer, or at least as a pitty.
>
> If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
> allowed triggering the next word transfer (inside a block transfer)
> when a certain timer underflows, then the DMA blocks wouldn't need to
> be so small. Each analog sample could travel in one single SPI word
> transfer, and one DMA block could be planned to carry for instance
> 1000 word transfers. That would be one DMA block every 10 ms. The
> buffer (FIFO) memory would be larger, but the CPU intervention needed
> would be much lower. There would be the same number of useful cycles,
> but much fewer wasted cycles. There wouldn't need to exist an
> interrupt service routine executed every 10 us, which is a killer.
> That would be a good SPI and a good DMA, in my opinion, and the extra
> cost in silicon is negligible, compared to the added benefit. Why
> don't most MCUs allow that? Even cheap MCUs could include that. An MCU
> with the price of a SAM7 should include that, in my opinion.


Processing bigger blocks does save some interrupt overhead, but you
still need to handle all the data so the advantage may not be that big.
And 480 cycles is still a decent amount to do some work. ;-)

But with a bit of creativity, you can get bigger blocks from your DMA.
If you use "variable peripheral select" you can create a buffer with
your ADC transfers and dummy transfers to another chipselect in between
to get the required CS switching pattern. If you then set up the SPI
clock correctly, you can let the PDC perform the 100kHz interval
transfers up to 64k 'bytes' (including the dummies).

Another option is to use the SSC (as mentioned by others), this can
transfer up to 32 bits per transfer. Transfers can be started on an
external event on RF pin. If you tie a TC output to the RF input, you
can use the TC waveform mode to initiate the transfers.

But I agree, being able to program some interval timer (maybe TC) and
use that directly to initiate transfers to peripherals would be nice
to have. But as long as it's not there, see what is and try to get the
most out of that. And if you are not tied to the SAM7, check if there
are other CPU's that have features that suit your wishes.

--
Stef (remove caps, dashes and .invalid from e-mail address to reply by mail)

A platitude is simply a truth repeated till people get tired of hearing it.
-- Stanley Baldwin
From: krw on
On Wed, 09 Sep 2009 22:17:51 +0200, Bill <a(a)a.a> wrote:

>Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
>CS would be deasserted (high) between consecutive "word transfers"
>within one "block transfer", but it is not. I had clear from the
>beginning (from diagrams and text) that there was a way to keep CS=0
>between word transfers, but I thought that it implied CSAAT=1, and it
>is not true. CS is 0 between consecutive word transfers (of the same
>block transfer) regardless of the value of CSAAT.
>
>So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
>needed (other than at the beginning and at the end of each block
>transfer), and I can use DMA, with two 11-bit word transfers per block
>transfer.
>
>
>This is good, but I think that it could be better. Difficult to
>explain, but I'll try:
>
>Imagine my external ADC (with SPI interface) is sampling the analog
>input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
>10 us between samples. Not much. Each sample needs 22-clock cycles
>inside each assertion of CS=0, so each sample needs one DMA block
>transfer (with for instance two 11-bit word transfers inside). Each
>DMA block transfer needs CPU intervention. So, I need CPU intervention
>every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
>Since (that I know) a DMA block transfer cannot be triggered directly
>by a timer overflow or underflow, an interrupt service routine
>(triggered by a 10 us timer underflow) must be executed every so
>often, so that the CPU can manually trigger the DMA block transfer and
>collect the data. Adding up the overhead of the interrupt context
>switching and the instructons needed to move data from and to the
>block buffers, to re-trigger the block transfer, and all this in C++,
>I think that all that may consume a "significant" portion of those 480
>cycles. And the CPU is supposed to do something with that data, and
>some other things. I see that hog as a killer, or at least as a pitty.
>
>If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
>allowed triggering the next word transfer (inside a block transfer)
>when a certain timer underflows, then the DMA blocks wouldn't need to
>be so small. Each analog sample could travel in one single SPI word
>transfer, and one DMA block could be planned to carry for instance
>1000 word transfers. That would be one DMA block every 10 ms. The
>buffer (FIFO) memory would be larger, but the CPU intervention needed
>would be much lower. There would be the same number of useful cycles,
>but much fewer wasted cycles. There wouldn't need to exist an
>interrupt service routine executed every 10 us, which is a killer.
>That would be a good SPI and a good DMA, in my opinion, and the extra
>cost in silicon is negligible, compared to the added benefit. Why
>don't most MCUs allow that? Even cheap MCUs could include that. An MCU
>with the price of a SAM7 should include that, in my opinion.

It's not a matter of silicon area, but what SPI devices do they want
to cover. SPI is a thousand twisty little passages, all different.
How are they going to service them all? The bottom line is that they
put enough in to put the bullet on the front page of the datasheet.
If you want custom I/O do it in an FPGA.

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8
Prev: Fourier Transform Tutorial
Next: Issues with LED grid driving