From: Joe Chisolm on 7 Jun 2010 15:50 On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote: > On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang" > <ms(a)NOJUNKcustomORSPAMware.nl> wrote: > >>Unbelievable..... >> >>I'm playing around with the Microchip C18 compiler after a >>hair-splitting experience with CCS. Apparently the optimizer of C18 is >>not that good. For instance: LATF = addr >> 16; where addr is an >>uint32, is compiled into a loop where 4 registers really get shifted 16 >>times in a loop. Any decent compiler should recognise that a shift by >>16, stored to an 8 bit port could easily be done by simply accessing the >>3rd byte.... sheesh.... >> >>Meindert > > You're asking a lot. > > I've been programming since 1977 and I have never seen any compiler turn > a long word shift (and/or mask) into a corresponding short word or byte > access. Every compiler I have ever worked with would perform the shift. > > That said, something is wrong if it takes 4 registers. I don't know the > PIC18, but I never encountered any chip that required more than 2 > registers to shift a value. Many chips have only a 1-bit shifter and > require a loop to do larger shifts - but many such chips microcode the > shift loop so the programmer sees only a simple instruction. But, > occasionally, you do run into oddballs that need large shifts spelled > out. > > Most likely you're somehow reading the (dis)assembly incorrectly: 4 > temporaries that are really mapped into the same register. If the > compiler (or chip) really does need 4 registers to do a shift, then it's > a piece of sh*t. > > George You have a 8 bit architecture shifting a 32 bit value, shifting out of one byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect the compiler is generating a right shift into carry so the code can tell if a 1 needs to be moved into the most significant bit of the next byte. -- Joe Chisolm Marble Falls, Tx.
From: D Yuniskis on 7 Jun 2010 15:59 Hi Joe, Joe Chisolm wrote: > On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote: > >> You're asking a lot. >> >> I've been programming since 1977 and I have never seen any compiler turn >> a long word shift (and/or mask) into a corresponding short word or byte >> access. Every compiler I have ever worked with would perform the shift. >> >> That said, something is wrong if it takes 4 registers. I don't know the >> PIC18, but I never encountered any chip that required more than 2 >> registers to shift a value. Many chips have only a 1-bit shifter and >> require a loop to do larger shifts - but many such chips microcode the >> shift loop so the programmer sees only a simple instruction. But, >> occasionally, you do run into oddballs that need large shifts spelled >> out. >> >> Most likely you're somehow reading the (dis)assembly incorrectly: 4 >> temporaries that are really mapped into the same register. If the >> compiler (or chip) really does need 4 registers to do a shift, then it's >> a piece of sh*t. It would be informative to know what sort of "helper routines" the compiler calls on. E.g., it might (inelegantly) treat this as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the canned representation of *any* "long int". > You have a 8 bit architecture shifting a 32 bit value, shifting out of one > byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect > the compiler is generating a right shift into carry so the code can > tell if a 1 needs to be moved into the most significant bit of the next > byte. I think George is commenting that a *smart* compiler can realize that an (e.g.) 8 bit shift is: foo[2] = foo[3] foo[1] = foo[2] foo[0] = foo[1] (if you are casting to a narrower data type and can discard foo[3]) and a *9* bit shift is the same as the above with a *single* bit shift introduced (i.e., you operate on a byte at a time instead of the entire "long") (recall, the shift amount is a constant available at compile time)
From: Grant Edwards on 7 Jun 2010 16:18 On 2010-06-07, George Neuner <gneuner2(a)comcast.net> wrote: > I've been programming since 1977 and I have never seen any compiler > turn a long word shift (and/or mask) into a corresponding short word > or byte access. Every compiler I have ever worked with would perform > the shift. Really? I've seen quite a few compilers do that. For example, gcc for ARM does: ------------------------------testit.c------------------------------ unsigned long ul; unsigned char foo(void) { return ul>>8; } unsigned short bar(void) { return ul>>16; } ------------------------------testit.c------------------------------ $ /home/nextgen/toolchain/bin/arm-linux-gcc -c -Os -S -fomit-frame-pointer testit.c ------------------------------testit.s------------------------------ .arch armv5te [...] .file "testit.c" .text .align 2 .global foo .type foo, %function foo: ldr r3, .L3 ldrb r0, [r3, #1] @ zero_extendqisi2 bx lr ..L4: .align 2 ..L3: .word ul .size foo, .-foo .align 2 .global bar .type bar, %function bar: ldr r3, .L7 ldrh r0, [r3, #2] bx lr ..L8: .align 2 ..L7: .word ul .size bar, .-bar .comm ul,4,4 [...] ------------------------------testit.s------------------------------ -- Grant Edwards grant.b.edwards Yow! I'm young ... I'm at HEALTHY ... I can HIKE gmail.com THRU CAPT GROGAN'S LUMBAR REGIONS!
From: David Brown on 7 Jun 2010 16:47 D Yuniskis wrote: > Hi Meindert, > > Meindert Sprang wrote: >> Unbelievable..... >> >> I'm playing around with the Microchip C18 compiler after a hair-splitting >> experience with CCS. Apparently the optimizer of C18 is not that good. >> For >> instance: LATF = addr >> 16; where addr is an uint32, is compiled into a >> loop where 4 registers really get shifted 16 times in a loop. Any decent >> compiler should recognise that a shift by 16, stored to an 8 bit port >> could >> easily be done by simply accessing the 3rd byte.... sheesh.... > > Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it > can discard all but the lowest 8 bits?) > That's irrelevant (or should be!) - expressions are evaluated in their own right, and /then/ cast to the type of the LHS. The compiler should, as it does, initially treat it as a 32-bit shift, but it's a poor compiler that can't optimise a 32-bit shift by 16 to something better than this. Optimising it to a single byte transfer comes logically at a later stage. > Is uuint32_t *really* unsigned (and not a cheap hack to "long int")? > I.e., can the compiler be confused (by the definition) to thinking > it is signed and opting for a sign-preserving shift? > I believe that uint32_t /must/ be an unsigned 32-bit integer. If the compiler cannot work with such a type, then no such type should exist in <stdint.h>. A standards-compliant compiler is not allowed to cheat in that way. Of course, I don't know if Microchip's compiler claims to be standards compliant... > How about: > > uint8_t pointer; > > pointer = (uint8_t *) &addr; > LATF = pointer[2]; > > Clumsy, admittedly, but perhaps more obvious what's going on? > (I would have added that this would be easy for an optimizer > to reduce to an "addressing operation" but I also would have > expected your shift to be recognized as an easy optimization!)
From: Joe Chisolm on 7 Jun 2010 17:36
On Mon, 07 Jun 2010 12:59:49 -0700, D Yuniskis wrote: > Hi Joe, > > Joe Chisolm wrote: >> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote: >> >>> You're asking a lot. >>> >>> I've been programming since 1977 and I have never seen any compiler >>> turn a long word shift (and/or mask) into a corresponding short word >>> or byte access. Every compiler I have ever worked with would perform >>> the shift. >>> >>> That said, something is wrong if it takes 4 registers. I don't know >>> the PIC18, but I never encountered any chip that required more than 2 >>> registers to shift a value. Many chips have only a 1-bit shifter and >>> require a loop to do larger shifts - but many such chips microcode the >>> shift loop so the programmer sees only a simple instruction. But, >>> occasionally, you do run into oddballs that need large shifts spelled >>> out. >>> >>> Most likely you're somehow reading the (dis)assembly incorrectly: 4 >>> temporaries that are really mapped into the same register. If the >>> compiler (or chip) really does need 4 registers to do a shift, then >>> it's a piece of sh*t. > > It would be informative to know what sort of "helper routines" the > compiler calls on. E.g., it might (inelegantly) treat this as "CALL > SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the > canned representation of *any* "long int". > I agree with your statement. The C18 suite has some canned libraries like 32 bit division and such. There are other helper routines for doing delays and such. >> You have a 8 bit architecture shifting a 32 bit value, shifting out of >> one byte and into the next, thus 4 temps. You have 1 bit shifts. I >> suspect the compiler is generating a right shift into carry so the code >> can tell if a 1 needs to be moved into the most significant bit of the >> next byte. > > I think George is commenting that a *smart* compiler can realize that an > (e.g.) 8 bit shift is: foo[2] = foo[3] > foo[1] = foo[2] > foo[0] = foo[1] > (if you are casting to a narrower data type and can discard foo[3]) > > and a *9* bit shift is the same as the above with a *single* bit shift > introduced (i.e., you operate on a byte at a time instead of the entire > "long") > > (recall, the shift amount is a constant available at compile time) I just did a test using C18. I choose a 18F86J10 (for no particular reason other than I remember it has a port F and thus a LATF) For: static unsigned long addr; LATF = addr >> 16; I get results similar to what you have above. The compiler "shifts" addr into a 32 bit temp by doing two byte moves and two clear byte instructions. It then does a 1 byte move into LATF from the temp. I'm not sure what version the OP is using or what else might be going on behind the scenes with addr. I agree a compiler should be smarter but for the price (free) C18 is not bad for smaller projects. BTW: I did a quick test with gcc 4.4.1 and it does a load, shift 16 and a store byte. -- Joe Chisolm Marble Falls, Tx. |