Prev: LPc2478 external Sdram initialization Help Needed
Next: speed of writing file is fixed when changing the CLK of SD/MMC
From: D Yuniskis on 5 Feb 2010 21:00 Hi Tim, Tim Watts wrote: > D Yuniskis <not.going.to.be(a)seen.com> >> Tim Watts wrote: >>> Can anyone recommend a good book that would cover things like: >>> >>> 1) Compare-nybble (ie is there a more clever more efficient way than AND- >>> mask and compare-word?) >> The list of micro-optimizations is limitless. :> Some idioms >> are easily expressed in HLL's while others are best left >> to ASM implementations (e.g., compare nybble's could exploit >> a "swap nybble" instruction on some processors). >> >> Usually, these optimizations are only essential in ISR's >> or in very tight loops where they can materially contribute >> to overall performance. E.g., "find rightmost set bit". > > ISRs - yes, that's where I'm looking to optimise. I'd noticed the swap- > nybble instruction. Couldn't see a use for it. Given it's there, it must be > useful: therefore I'm ignorant ;-> Hence the desire to do some general > reading. If you find yourself doing "painful" things, ask yourself if there is a *reason* that you must do those things "that way". Most often, you can change how things work so that they better fit "what's easier". My ISRs are like greased lightning. :> But, I design the hardware with the software in mind. And, push a lot of work out of the ISR onto higher layers where "time" isn't as important. E.g., I aim to reenable interrupts real soon *inside* (certain) ISRs and just burden the code with the task of determining if there was an "overrun" (rather than risk missing a masked ISR). When you are writing the ISR, ask yourself, "Do I *have to* do this here?" Usually, you can preprocess data for an ISR (or post process it *from* an ISR) to make the ISR much leaner. [timing services] > Thanks for all that - It makes sense. There are *lots* of way to handle time in a processor. My point was not to be suckered into a "classic" approach with its attendant costs unless you *know* that solution is right for you. Look at the sorts of times you need to measure (mostly delays and timeouts). If they are short, then why burden yourself with some heavy timer notion (32 bit timers where the top 20 bits are almost always "0"). Likewise, think about the frequency of your jiffy and see if you can decrease it (longer period). This cuts down on interrupt overhead (proportionately) as well as makes timers -- and any other time_t-ish things -- derived from it "narrower" (smaller). >>> 4) Bomb proof "boot sector" and live firmware (flash) update tricks. >> These all depend on the resources that you have available to >> your application (primarily hardware). E.g., if you have enough >> flash to store two copies of your application, then you burn >> the new version in place of the "oldest" version using code >> running in the *current* version. Once that burn has >> completed successfully, you flip a pointer (toggle a bit) >> that tells the application launcher to use that "new" copy >> in place of the "current" copy. >> >> Few folks have the luxury of 2X the flash they need! :> > > I think I do :) AVRs (Megas anyway) are pretty flash heavy and RAM light. > Unless one needs a lot of stored const data, then that can eat flash fast. You might be able to move down to a smaller device. Or, pick up some extra integrated peripherals as that portion of the die that would otherwise be used for FLASH can now be used for something "more productive". >> If you can find a copy of the (large) black applications handbook >> that motogorilla issued for the 6800 (1980-ish?), you would probably >> get a much better feel for how to think about microcontrollers. >> In that era, you were dealing with a fraction of a (VAX) MIPS >> so you truly saw the cost of each instruction you executed. >> > > I'll look out for that :) I *think* it is called "M6800 Microprocessor Applications Manual". It's an ~8.5x11" format, about 2 inches thick. Black cover with orange (red?) and white on it. (sorry, I can't recall what's white or orange but those are the colors that stick in my mind). If pushed hard, I could go dig through boxes out in the store room but I would *really* hate doing that :> I've been trying to scan most documents that I want to preserve just to cut down on decades of databooks and appnotes but it's really hard to bring myself to destroy (which is essentially what has to be done in order to effectively scan such a title) classic books like that until I absolutely must. :< Too bad their original authors didn't opt to preserve the original "sources" from which the titles were created! --don
From: Jon Kirwan on 5 Feb 2010 21:02 On Sat, 06 Feb 2010 00:04:02 +0000, Tim Watts <tw(a)dionic.net> wrote: >Jon Kirwan <jonk(a)infinitefactors.org> > wibbled on Friday 05 February 2010 21:08 > >> On Fri, 05 Feb 2010 13:31:30 +0000, Tim Watts <tw(a)dionic.net> >> wrote: >> >> <snip> >>>1) Compare-nybble (ie is there a more clever more efficient way than AND- >>>mask and compare-word?) >> >> In terms of c, sounds like all you want are some idioms to >> use. But why worry about this? Either you are doing this >> for a LOT of short fields and need the extra boost in >> performance, in which case it may be better to write a >> routine in assembly, or else this is a one-time affair and >> your earlier comment about not worrying about wasting cycles >> seems to enter back in. So I am not sure why you need to >> care that much about this. Some c compilers will do better >> than others, here. Normally, you let the c compiler authors >> worry about these details. They generally do it well enough >> for most one-time type uses -- especially in the general >> situation you earlier described. Which makes me think you >> are trying to do this fast, perhaps to handle a serial stream >> of data that is arriving at a fair clip? > >Bursty is probably nearer the mark. I would like to experiment with RF links >- to this end I have a choice of Xbee (20 quid and does all the hard stuff) >or (eg) CF2500 which does all of the Layer 2 stuff I want, but I would >implement some addition layers on top. The point in question is to do with >whether it is better to pack certain data (eg sub protocol type) into a few >bits for bandwidth efficiency, or just use a full 8 bits wastefully because >it's cheap to process. When dealing with bursty data like that, the usual answer is to use buffers and separate the code into two parts -- the interrupt code that handles the hardware interface and stuffs (or retrieves from) a buffer and then a high level side that allows your regular code to call it to fetch (or put) data. A full duplex serial port driver, for example, would include two interrupt handling fragments and two high level fragments, with two buffers to mediate between them. It isn't hard to write the high level side in c code, most of the time. (Since it doesn't have hard timing issues to cope with, except possibly with "volatile" flags or pointers that may also be modified in the interrupt code.) The interrupt code may very well be written in assembly. Or assembly with a c wrapper around it, I suppose. >Good point about C. I use GCC for AVR work and I have made a point of >looking at the assembler listings so I can see what it's doing at various >optimisation levels. This is _very_ good practice. >> You didn't mention the shift operator or the % operator or >> using bit fields and unions. Have you played with those, >> yet? > >Good point. Well, take a shot at them then. See what results. >> Are you familiar enough with looking at the assembly >> output of your c compiler to know which works well and which >> does not? (Bit fields aren't always portable, but I assume >> you are using a single c compiler and stuck on the AVRs, so >> it may not be an issue for you.) > >Yes. I was tossing up whether to go for AVR or PIC24. In the end, the choice >of opensource linux hosted compilers for PIC seems rather less ubiquitous >than AVR and some of the nice devices don't seem to have any support. So >I'll stick with AVRs - nice fairly regular little machines and GCC seems to >work well with them. Don't have any problems popping some inline assembler >in where needed. Actually, I agree with you on this point. AVRs do seem to get more _readily_ available gcc support. I believe that some of the Microchip c compilers are now based upon GNU's gcc tools, but that the libraries are Microchip IP. Or something like that. Which makes it a bit more complex to siphon over and cobble up a working system, as others may not work quite as hard to smooth over the path and document it for you as they might with an AVR. >>>2) Bitstream input techniques - eg efficient sampling of either a self >>>clocking serial stream or a timing critical one (eg 1-wire) where the >>>timings are so small (10's of uS) that one cannot really afford to waste >>>instructions on a 20MIPS device. Use of interrupts and timers... >> >> Almost by your own definition here, it's likely you won't be >> using c for this for the "timints are so small" part of this >> question, right? Assembly, yes? > >Most likely. There are some very useful 1-wire temperature sensors I've >tried (using someone else's example code). Some of the timings are >horrendously short (order of 15-60 uS IIRC). Though again, it's bursty - I >generally don't want to hold a continuous conversation with one. That can be handled with bit banging in an assembly routine you call. Or, if you can wrangle the timings with a timer you have available, you _could_ consider a state machine approach driven off of the timer interrupt and otherwise basically hidden from the rest of your code. Again, if you do the state machine approach, you will likely have an upper level routine to stuff/fetch pieces in/out of buffers and just let the state machine flip around its states as it goes. The interrupt driven state machine may be written in assembly if your timing requires it or in c if things are more flexible. The state machine is actually very easy to design as you go. Just look at the timing diagram and write states up. It's not totally cut and dried, as you need to think in terms of organizing just a little bit. But it is largely cookie cutter and I usually get them working first time out from a timing diagram, if it is well documented. The state machine will be somewhat harder to document inside the code, unless to take the time to provide an ASCII layout of the timing diagram to help explain it. Without the timing diagram, others may have trouble following whatever you write. >> Regards self-clocking >> streams, it is application specific and the best place to go >> for the answer here will be manuals/whitepapers that talk >> about the particular method in detail. If the stream is slow >> enough for c, you just need to follow the description well in >> writing your code. It's not about some special idiomatic c >> expression that a general book could fairly discuss. >> >>>3) Cool ways to semi-virtualise a timer - eg ideally you'd like 10 >>>hardware timers but you have 2. >> >> I assume by this you'd like software routines to execute at >> certain times you can set and that these times are slow >> enough that virtualizing the timer works well for you in >> terms of latency and variability in that latency. >> >> On this point, I could talk at length. However, I'll >> recommend one very good, old book which has an excellent >> chapter on the topic of delta queues which solve this problem >> with very little code and very repeatable, reliable >> performance. Douglas Comer's first book on XINU. > >Then I must read that. I'd been considering delta queues (but realising >there may be other solutions). The general idea I had was to run the 16 bit >timer at something like CLK/16; task would be inserted into the queue with a >delta-tick until execution and if the task was at the head of the ordered >queue, then that would set the timer's comparator to trip at the time >required, firing an interrupt. The ISR and queue popping and task calling >should be O(1) so that tasks being executed have reasonably constant latency >prior to execution of the "useful" code, but task insertion (done at the end >if the task needs to reschedule itself) need not be O(1) (accepting we will >need to shuffle the queue or some other technique for possible mid queue >insertion). Insertion done at the end of some code that doesn't have a fixed timing to its execution does not result in repeatable timing. Insertion done as the first bit of code, especially if that code executes completely before the next timer event happens, does yield repeatable results. Pay the re-insertion cost between timer events, if possible. And since the firing of the code should be nearly synchronous with the last timer event, the first few lines are a good place to struggle for that. >FreeRTOS looks great as a way of managing heavy tasks and coroutines, but >task are expensive (lot of stack operations for the context switch) and the >timing control is milliseconds not microseconds granularity - great for >human interface stuff, less useful for frantic bitbanging. If you don't need all that stuff, don't burden your project. It may leave you having to study stuff you don't care about merely because you have to be aware of it to turn it off or otherwise avoid some issue there. If all you need is a delta queue, they aren't hard to code. >Something like 1-wire really just wants to schedule a task for executing in >dT=few-tens of uS, but the task is trivial - check a port pin and push the >result somewhere, then add a task for checking the pin again at a variable >time in the future depending on protocol state. I think this would be >analogous to a linux bottom-end driver? Interpreting the data collected >would be more suited to a non ISR task and could happen non critically after >many bytes had been stored (top end driver). Something like all that makes sense, if I'm reading correctly. A question you need to answer is what is the longest timer interval you can live with, consistent with the resolution you also may need. Also, when implementing delta queues in the past and where I had a dedicated timer for them, I shut them off when the last task is removed and turn them back on when the first task is inserted. You don't want to burden the cpu with aimless timer events that serve no purpose, even if the test for "nothing to do" is quickly performed. We talked about debouncing, but the timing is say 8ms per sample. I like rock-solid keyboard sampling, but since sampling edges may be usefully allowed to be a little sloppy without folks noticing, you could write it something like this. (Assume your basic timer interval is 20 microseconds, just to pick a number.) unsigned char debounced= 0; void keys( void ) { static unsigned char state= 0, curr= 0; auto unsigned char prev; insert( keys, 400 ); prev= curr; curr= READPORT(); tmp= (prev ^ curr) | state; debounced= (debounced & tmp) | (current & ~tmp); state= prev ^ curr; return; } Something like that. You could wrap up 'debounced' with some code to access it, instead of putting the variable itself in the program-global namespace. >OK - I could use a timer directly, but there are other tasks I may need a >timer for so I thought it an interesting exercise to see if I could come up >with a general solution that was efficient enough to run perhaps half a >dozen different tasks in ISR space using only one hardware timer. > >Having has a look at the FreeRTOS code, I could probably write a driver for >that that used a "virtualised" timer rather than a real one. The two >subsystems would complement each other quite nicely. Whatever works. >> And if you'd like, I can send a file or two that show one >> style of implementation details to supplement what you might >> find in that book. > >That would be very kind - if you have the time. My email addy above is >valid. I'll do that. Give me a little time to isolate the pieces. >> I think it is "cool." > >:) > >>>4) Bomb proof "boot sector" and live firmware (flash) update tricks. >> >> Normally, I think you need to provide some care in defining >> what is meant by "bomb proof" and "firmware update." For >> example, a firmware update might be to a specific driver >> placed in a specific location. Making that bomb proof might >> mean that it must be movable. Or not. >> >> I guess you are looking for a book that provides a variety of >> techniques that might be used, with descriptions of how they >> are implemented, things to watch out for in doing so, what >> benefits and downsides they each may have, etc., so that you >> can get a comprehensive view and make choices here and there, >> from time to time. If that is the case, I don't know where >> to go and would ask that if _you_ find such a book to let me >> know about it, too! ;) > >Yes. AVRs have (at teh higher end) loads of flash and quite a bit, but not >lots of RAM. So probably the most obvious solution is an invariant boot >block and 2 flash sections - that's certainly a technique I've seen used by >Extreme for their network switches. But again, I wonder if I'm missing ideas >due to a lack of general knowledge. I don't have comprehensive knowledge in this area, mostly just some random ideas here and there. I'm probably missing ideas, too. Which is what makes having do so something like that fun. >>>5) Efficient algorithms for certain maths operations, eg integer square >>>root (as has been mentioned recently, if not here, on a USENET group I'm >>>subscribed to). >> >> One of the best places to go for things like this are integer >> DSPs. For example, Analog Devices printed an excellent book >> on the general topic regarding their ADSP-21xx processors. I >> think some of it (or all of it) may be available online, too. >> >> The pair of books I'm thinking of here are: "DIGITAL SIGNAL >> PROCESSING IN VLSI" and "DIGITAL SIGNAL PROCESSING >> APPLICATIONS: Using the ADSP-2100 Family." The latter one, >> for example, covers a host of fixed point and floating point >> arithmetic operations, function approximations, and so on. In >> enough detail to implement on other processor families. I've >> ported the ideas from time to time, so I know this is quite >> true. >> >> Another excellent reference of another nature is to have is >> one of the incarnations of "Numerical Recipes." If you don't >> have it get that, too. >> >> If you aren't fully sharpened on your own algebra skills, get >> a good algebra book and work on that. If you don't have at >> least a 1st year's understanding of calculus, focus on that, >> as well, and get a book book on calculus. My own preference >> would be that you go further and complete at least some of a >> 1st/2nd year's diff-eq. If you have a community college >> available, that would be an excellent resource to take >> advantage of sooner than later. >> >> It goes a LONG way to be able to actually _read_ with >> _understanding_ what you see so that you can modify/tailor it >> to your specific needs. Otherwise, you are just blind. >> >> Just to provide an example that you bring up, the integer >> square root question was answered about the way I might have >> by Hans-Bernhard Br�ker. When I first faced the need for one >> of these on my own, I didn't go to a book at all. Instead, I >> just remembered the hand-method I'd been trained to use in >> high school (or was it slightly earlier?) and implemented it >> with an algorithm. Worked perfectly well, once I nailed the >> rounding issue correctly. (That was the only detail that >> didn't get nailed down on the first try because I didn't >> think about it before writing the first code sample.) So >> training helps you when all else fails you. > >If only they did that in school now! I know what you mean though - back in >1988, I did an exercise on a 68000 to produce a sine output on a DAC. The >expected solution was a lookup table. I did it with Newton-Raphson in >integer arithmetic and it worked really well :) It's my belief that not enough is required of CS students in the math area. The EE and CE folks get perhaps just a little too much math for a CS student. But the CS departments I'm aware of don't require enough, in my opinion. >>>6) Keypad debounce. >> >> You _must_ be able to find unending sources on this topic. >> For example, most everyone points to Jack Ganssle's "A Guide >> to Debouncing" which is often named "debouncing.pdf" as a >> place to start on the topic. He provides a nice survey. >> >> Once you land on something you like to use, you will probably >> stick with it for most uses because you'll understand it >> well. And besides, I don't see people caring that much to >> gaining a comprehensive view on it, anyway. Many find >> "something that works" and stop dead in their tracks after >> that. I think there is more to learn and it is fun to >> continue that education. But most seem to disagree with me >> and stop as soon as they find something that works for them >> and they feel they understand. >> >> I use the following logic applied to an interrupt interval of >> 8ms: >> IF current <> previous THEN state = 1 >> ELSEIF state = 1 THEN state = 0 >> ELSE debounced = current >> In other words, the above logic executes once each 8ms. >> >> With a little thought to the above logic, you should be able >> to see that the resulting state is always the value that >> results from XORing the current and previous values. An XOR >> operation is usually pretty efficient. Also, note that the >> debounced value is changed only when the prior state is 0 and >> the current and previous values are the same. This results >> in the following sequential steps: >> >> 1: previous = current >> 2: current = READ PORT PIN(S) >> 3: temp = (previous XOR current) OR state >> 4: debounced = (debounced AND temp) OR (current AND NOT temp) >> 5: state = previous XOR current > >Nice. Well, it works and latches states pretty reliably. >> This logic then requires the current condition of a switch to >> remain at the same level for three observations before the >> debounced value is updated. >> >> It's not just a sugestion that this method can be applied to >> 8 switches at a time as easily as it is to one, when the >> switches are lined up in a single port byte. If you want >> this to be efficient and usable for more than one switch it >> helps to place them on a single port. >> >>>7) Serial comms (SPI, I2C, 1-wire, RS485, roll-yer-own-with-a-GPIO-pin) >> >> ??? I tend to go to the standards docs for some of these, or >> the datasheets for others. What are you really wanting here? >> A "complete skillset" stuffed into your brain? > >No, not really. It was more of an understanding of sampling and time methods >that were efficient and reliable should I need to implement those in >software. OK - there's usually always a UART, so 232/485 are solved. I2C >isn't so bad because there's an explicit clock that can be tied to an IO >interrupt. I guess the real problems are quite a small set of self clocking >or fully time based signals. Now on this score, I don't mind just sitting down and working through solutions as I find them. That will mean doing some searches to help ensure I'm not forgetting something important or to provide me with robust ideas. But this is an area where I don't focus on acquiring broad knowledge without a specific application in mind. Other areas I do push for developing my skill sets. In this area, I'd let the application push them. The one exception to that is open collector/drain shared lines where _I_ get to decide what I like doing. I wrote myself a paper a while back and use that when I need to rethink a new design. So working out a general approach in this area does help. >>>8) Working with an OS, eg FreeRTOS. >> >> 3rd year CS classes get into general ideas and force you to >> write test programs to analyze _some_ of them. I write my >> own and don't use any commercial or free system others write >> because my needs are specific enough to require my having a >> general skill that is deduced to specific cases. I gather >> you want to simply use something as a drop-in and use it >> reasonably well. For that, I'd probably go to (and hope for) >> the good documentation often provided by those writing (and >> using) what you are using. >> >>>And lots more in the same vein... >>> >>>Although I'm most likely to use AVRs or maybe PIC24s, the book doesn't >>>have to be that specific - just "how to hack around in 8 bits". Pretty >>>much a "Knuth for uControllers". >> >> Oh. I see. If you find Knuth for Micros, let me know. >> Actually, I was kind of thinking of recommending you read >> Knuth until you said this. I guess you already have that >> much and want to see such a tour de force done anew. >> >> If you think about what happened with Knuth, he took on a >> HUGE project and stated that there would be (if I recall) >> four more books. In fact, I think he titled them back at the >> start and listed what he expected to produce. However, in >> doing those first three, which took years, he just stopped. >> It was that huge. Then he decided he needed to take on >> typesetting so that he could get back to writing. And that >> one he expected to see last only a few years. Instead, he >> took a decade and wrote a nice 5-volume set on typesetting >> and then went on to develop a toolset (TeX.) I know he is >> working on some of what he'd intended 30 years ago, now, but >> I've no idea if he will ever finish!! >> >> And you want someone to take that on, now??? Most sane >> people would look at Knuth and say... "Uh, no. I have a >> life. I think." >> >> But yes, if you find the new Knuth please let me know!! > >:) > >Sounds like Russel and Whitehead when they wrote Principia Mathematica - >with the idea being to formalise all of maths and logic starting with proper >axioms and solid derivations. "1+1=2" featured somewhere in Volume 2! By the >end of Volume 3, they'd got into calculus. Unfortunately, (I believe the >story goes) they were so fried, they didn't get to realise their dream of >continuing the process and deriving as yet unknown mathematical methods. And I think Dedekind and Weierstrass's calculus basis is Rube Goldberg compared to the simplicity of accepting infinitesimals. Abraham Robinson's hyperreals has helped bring 'sense' back where it was sorely lacking, a century later, but not without some mental cost and incompleteness. >Thanks for all your extensive comments - I'll take up those leads as well as >the others put forward. No problem. And I'll get some code shoveled off to you. Jon
From: Ulf Samuelsson on 6 Feb 2010 18:29 Tim Watts skrev: > Hi, > > I can hack my way around AVRs OK - but I realise my limitations... I > normally program in perl on larger computers, so wasting cycles has never > been something I've worried about(!). Nor do I normally do timing sensitive > stuff where timing < 0.1s > > Can anyone recommend a good book that would cover things like: > > 1) Compare-nybble (ie is there a more clever more efficient way than AND- > mask and compare-word?) > > 2) Bitstream input techniques - eg efficient sampling of either a self > clocking serial stream or a timing critical one (eg 1-wire) where the > timings are so small (10's of uS) that one cannot really afford to waste > instructions on a 20MIPS device. Use of interrupts and timers... > You maybe should look into the Event system in the AVR XMEGA series. Toggle an I/O pin, could cause an event. The event could cause a capture, which causes the DMA to write the capture value to SRAM > 3) Cool ways to semi-virtualise a timer - eg ideally you'd like 10 hardware > timers but you have 2. An AVR32 has some interesting capability here. Low cycle interrupt. 6-12 cycles interrupt routine would read one prepared dword from SRAM = 1 cycle write to I/O port forcing pin to toggle if corresponding bit is "1". prepare next dword. ? return from interrupt. a few cycles. Sould give you at least us resolution on 32 outputs. with its capability to write to I/O once per CPU cycle, > > 4) Bomb proof "boot sector" and live firmware (flash) update tricks. > > 5) Efficient algorithms for certain maths operations, eg integer square root > (as has been mentioned recently, if not here, on a USENET group I'm > subscribed to). > > 6) Keypad debounce. > > 7) Serial comms (SPI, I2C, 1-wire, RS485, roll-yer-own-with-a-GPIO-pin) > > 8) Working with an OS, eg FreeRTOS. > > And lots more in the same vein... > > Although I'm most likely to use AVRs or maybe PIC24s, the book doesn't have > to be that specific - just "how to hack around in 8 bits". Pretty much a > "Knuth for uControllers". > > Many thanks in advance :) > > Cheers > > Tim > -- Best Regards Ulf Samuelsson These are my own personal opinions, which may or may not be shared by my employer Atmel Nordic AB
From: Rich Webb on 7 Feb 2010 09:17
On Sun, 07 Feb 2010 11:04:12 +0000, Tim Watts <tw(a)dionic.net> wrote: >The Xmega certainly looks interesting - just skimmed on of the datasheets. >The DMA and the event system do look cute. > >I'd ignored those devices due to being SMT only but maybe I should bite the >bullet, get a new iron and "upgrade" - most of the hand soldering howtos >imply the coarser pitched leaded devices aren't too bad to do. Even the finer pitched footprints, like a 0.5 mm TQFP, are quite achievable by hand, as long as they have legs. It does take some magnification to inspect for bridges, and some (lots of!) flux and wick to de-bridge. The bastards like LGA or <shudder> BGA with the pads only underneath the package are the ones that get tricky. -- Rich Webb Norfolk, VA |