From: Stef on 9 Sep 2009 05:19 In comp.arch.embedded, Bill <a(a)a.a> wrote: > Hi, > > I'm trying to pull out data from the ADS8320 (a 16-bit ADC by Analog > Devices. See bottom of page 10 in > http://focus.ti.com/lit/ds/symlink/ads8320.pdf ) using the SPI in an > AT91SAM7S256. The problem is that the ADC needs 6 extra clock cycles > to sample the analog signal, before the 16 cycles that will output > each of the conversion result bits. So, a complete ADC cycle involves > a minimum of 22 clock cycles. Now, the SPI in the AT91SAM7 (and, as > far as I've seen, in all other MCUs), cannot generate more than 16 > clock cycles within one CS activation. > > How am I supposed to do this, in an elegant way? Of course I could bit > bang those lines, but I hate doing that, because it adds load to the > CPU, and doesn't take advantage of the SPI and DMA. > > The AT91SAM7S256 allows holding the CS low until a new device is > addressed, so I could initiate two 11-bit readings in a row (in such a > way that the ADC would think it is a single CS assertion with 22 clock > cycles inside), and discard the bits with no information, but that's > still ugly to me. It would use the SPI, but not the DMA, and the two > readings would be different (the first one should hold CS low. The > second one should leave it high), which is kind of non-homogeneous. Ofcourse you can do this with 2 11-bit transfers (or 3 8-bit as mentioned by others) and still use the PDC (DMA). If you couldn't, how would you for example read EEPROMS? Just set CSAAT=0 "The Peripheral Chip Select Line rises as soon as the last transfer is achieved", fill your TX buffer and point PDC to it. Then write the PDC counter and the CS will go low before the first transfer and only rise after the PDC has completed last transfer. You ofcourse have to make sure all other settings match your hardware, check that every bit in every register is set as required, understand the function of each bit. This is the way I always use SPI on the SAM7. I do use variable peripheral select and no CS decoding, but I don't think these options affect the CS deassertion behaviour. -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) I'm glad I was not born before tea. -- Sidney Smith (1771-1845)
From: Bill on 9 Sep 2009 16:17 Well, I was wrong in at least one thing: I thought that, with CSAAT=0, CS would be deasserted (high) between consecutive "word transfers" within one "block transfer", but it is not. I had clear from the beginning (from diagrams and text) that there was a way to keep CS=0 between word transfers, but I thought that it implied CSAAT=1, and it is not true. CS is 0 between consecutive word transfers (of the same block transfer) regardless of the value of CSAAT. So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention needed (other than at the beginning and at the end of each block transfer), and I can use DMA, with two 11-bit word transfers per block transfer. This is good, but I think that it could be better. Difficult to explain, but I'll try: Imagine my external ADC (with SPI interface) is sampling the analog input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So, 10 us between samples. Not much. Each sample needs 22-clock cycles inside each assertion of CS=0, so each sample needs one DMA block transfer (with for instance two 11-bit word transfers inside). Each DMA block transfer needs CPU intervention. So, I need CPU intervention every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7. Since (that I know) a DMA block transfer cannot be triggered directly by a timer overflow or underflow, an interrupt service routine (triggered by a 10 us timer underflow) must be executed every so often, so that the CPU can manually trigger the DMA block transfer and collect the data. Adding up the overhead of the interrupt context switching and the instructons needed to move data from and to the block buffers, to re-trigger the block transfer, and all this in C++, I think that all that may consume a "significant" portion of those 480 cycles. And the CPU is supposed to do something with that data, and some other things. I see that hog as a killer, or at least as a pitty. If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA allowed triggering the next word transfer (inside a block transfer) when a certain timer underflows, then the DMA blocks wouldn't need to be so small. Each analog sample could travel in one single SPI word transfer, and one DMA block could be planned to carry for instance 1000 word transfers. That would be one DMA block every 10 ms. The buffer (FIFO) memory would be larger, but the CPU intervention needed would be much lower. There would be the same number of useful cycles, but much fewer wasted cycles. There wouldn't need to exist an interrupt service routine executed every 10 us, which is a killer. That would be a good SPI and a good DMA, in my opinion, and the extra cost in silicon is negligible, compared to the added benefit. Why don't most MCUs allow that? Even cheap MCUs could include that. An MCU with the price of a SAM7 should include that, in my opinion. Best,
From: Chris Stratton on 9 Sep 2009 16:53 On Sep 9, 4:17 pm, Bill <a...(a)a.a> wrote: > This is good, but I think that it could be better. Difficult to > explain, but I'll try: > > Imagine my external ADC (with SPI interface) is sampling the analog > input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So, > 10 us between samples. Not much. Each sample needs 22-clock cycles > inside each assertion of CS=0, so each sample needs one DMA block > transfer (with for instance two 11-bit word transfers inside). Each > DMA block transfer needs CPU intervention. You might be interested to compare the blackfin's SPORT DMA capability... there the DMA can be programmed to transfer words separated in time without CPU intervention. Obviously no fixed- hardware solution (other than a built in gate array ;-) is going to have the flexibility for all needs, but this sounds like it might be along the lines of what you are looking for... so at least some silicon designers seem to be thinking along your lines.
From: Stef on 9 Sep 2009 17:44 In comp.arch.embedded, Bill <a(a)a.a> wrote: > > This is good, but I think that it could be better. Difficult to > explain, but I'll try: > > Imagine my external ADC (with SPI interface) is sampling the analog > input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So, > 10 us between samples. Not much. Each sample needs 22-clock cycles > inside each assertion of CS=0, so each sample needs one DMA block > transfer (with for instance two 11-bit word transfers inside). Each > DMA block transfer needs CPU intervention. So, I need CPU intervention > every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7. > Since (that I know) a DMA block transfer cannot be triggered directly > by a timer overflow or underflow, an interrupt service routine > (triggered by a 10 us timer underflow) must be executed every so > often, so that the CPU can manually trigger the DMA block transfer and > collect the data. Adding up the overhead of the interrupt context > switching and the instructons needed to move data from and to the > block buffers, to re-trigger the block transfer, and all this in C++, > I think that all that may consume a "significant" portion of those 480 > cycles. And the CPU is supposed to do something with that data, and > some other things. I see that hog as a killer, or at least as a pitty. > > If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA > allowed triggering the next word transfer (inside a block transfer) > when a certain timer underflows, then the DMA blocks wouldn't need to > be so small. Each analog sample could travel in one single SPI word > transfer, and one DMA block could be planned to carry for instance > 1000 word transfers. That would be one DMA block every 10 ms. The > buffer (FIFO) memory would be larger, but the CPU intervention needed > would be much lower. There would be the same number of useful cycles, > but much fewer wasted cycles. There wouldn't need to exist an > interrupt service routine executed every 10 us, which is a killer. > That would be a good SPI and a good DMA, in my opinion, and the extra > cost in silicon is negligible, compared to the added benefit. Why > don't most MCUs allow that? Even cheap MCUs could include that. An MCU > with the price of a SAM7 should include that, in my opinion. Processing bigger blocks does save some interrupt overhead, but you still need to handle all the data so the advantage may not be that big. And 480 cycles is still a decent amount to do some work. ;-) But with a bit of creativity, you can get bigger blocks from your DMA. If you use "variable peripheral select" you can create a buffer with your ADC transfers and dummy transfers to another chipselect in between to get the required CS switching pattern. If you then set up the SPI clock correctly, you can let the PDC perform the 100kHz interval transfers up to 64k 'bytes' (including the dummies). Another option is to use the SSC (as mentioned by others), this can transfer up to 32 bits per transfer. Transfers can be started on an external event on RF pin. If you tie a TC output to the RF input, you can use the TC waveform mode to initiate the transfers. But I agree, being able to program some interval timer (maybe TC) and use that directly to initiate transfers to peripherals would be nice to have. But as long as it's not there, see what is and try to get the most out of that. And if you are not tied to the SAM7, check if there are other CPU's that have features that suit your wishes. -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) A platitude is simply a truth repeated till people get tired of hearing it. -- Stanley Baldwin
From: krw on 9 Sep 2009 20:07
On Wed, 09 Sep 2009 22:17:51 +0200, Bill <a(a)a.a> wrote: >Well, I was wrong in at least one thing: I thought that, with CSAAT=0, >CS would be deasserted (high) between consecutive "word transfers" >within one "block transfer", but it is not. I had clear from the >beginning (from diagrams and text) that there was a way to keep CS=0 >between word transfers, but I thought that it implied CSAAT=1, and it >is not true. CS is 0 between consecutive word transfers (of the same >block transfer) regardless of the value of CSAAT. > >So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention >needed (other than at the beginning and at the end of each block >transfer), and I can use DMA, with two 11-bit word transfers per block >transfer. > > >This is good, but I think that it could be better. Difficult to >explain, but I'll try: > >Imagine my external ADC (with SPI interface) is sampling the analog >input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So, >10 us between samples. Not much. Each sample needs 22-clock cycles >inside each assertion of CS=0, so each sample needs one DMA block >transfer (with for instance two 11-bit word transfers inside). Each >DMA block transfer needs CPU intervention. So, I need CPU intervention >every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7. >Since (that I know) a DMA block transfer cannot be triggered directly >by a timer overflow or underflow, an interrupt service routine >(triggered by a 10 us timer underflow) must be executed every so >often, so that the CPU can manually trigger the DMA block transfer and >collect the data. Adding up the overhead of the interrupt context >switching and the instructons needed to move data from and to the >block buffers, to re-trigger the block transfer, and all this in C++, >I think that all that may consume a "significant" portion of those 480 >cycles. And the CPU is supposed to do something with that data, and >some other things. I see that hog as a killer, or at least as a pitty. > >If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA >allowed triggering the next word transfer (inside a block transfer) >when a certain timer underflows, then the DMA blocks wouldn't need to >be so small. Each analog sample could travel in one single SPI word >transfer, and one DMA block could be planned to carry for instance >1000 word transfers. That would be one DMA block every 10 ms. The >buffer (FIFO) memory would be larger, but the CPU intervention needed >would be much lower. There would be the same number of useful cycles, >but much fewer wasted cycles. There wouldn't need to exist an >interrupt service routine executed every 10 us, which is a killer. >That would be a good SPI and a good DMA, in my opinion, and the extra >cost in silicon is negligible, compared to the added benefit. Why >don't most MCUs allow that? Even cheap MCUs could include that. An MCU >with the price of a SAM7 should include that, in my opinion. It's not a matter of silicon area, but what SPI devices do they want to cover. SPI is a thousand twisty little passages, all different. How are they going to service them all? The bottom line is that they put enough in to put the bullet on the front page of the datasheet. If you want custom I/O do it in an FPGA. |