From: James Dow Allen on 28 May 2010 02:04 On May 28, 9:59 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com> wrote: > I was thinking, maybe a "bitcount mod 32" might help ... From the (useless?) factoids department: When Intel x86 shifts a 32-bit register with a shift-count coming from reg, all but the low-order 5 bits are ignored; i.e. shift count is mod 32. *However* the Motorola 68xxx uses 6 bits; i.e. shift count is mod 64. A large count will set the 32-bit reg to all 0's (or all 1's). I wrote a Huffman decoder once. I didn't want it to be just another FAST decoder; I wanted it to be the fastest decoder possible! Config tested shifts and took a short-cut when possible. I am *not* here to defend anal-retentive coding. ("Hello, my name is James and I indulge in silly pointless micro-optimizations.") But surely everyone will admit that micro-optimizations are a cheaper and safer *recreation* than many alternatives. :-) James
From: Terje Mathisen "terje.mathisen at on 28 May 2010 07:32 James Dow Allen wrote: > On May 28, 9:59 am, "Skybuck Flying"<IntoTheFut...(a)hotmail.com> > wrote: >> I was thinking, maybe a "bitcount mod 32" might help ... > > From the (useless?) factoids department: > > When Intel x86 shifts a 32-bit register with a shift-count > coming from reg, all but the low-order 5 bits are ignored; > i.e. shift count is mod 32. > > *However* the Motorola 68xxx uses 6 bits; i.e. shift > count is mod 64. A large count will set the 32-bit > reg to all 0's (or all 1's). > > I wrote a Huffman decoder once. I didn't want it to > be just another FAST decoder; I wanted it to be the > fastest decoder possible! Config tested shifts and > took a short-cut when possible. The fastest Huffman decoders don't use that many shifts, they average less than one shift_right per decoded token: My fastest code uses N-bit wide lookups that can return multiple tokens per iteration, along with a count of the total number of bits used and the number of tokens copied, i.e. how much the token_count has been incremented. It also returns a pointer to the next decoder to use, normally the same. In the case of zero return tokens, it instead chains to a secondary decoder. This means that as long as you can blindly decode a bunch of tokens, i.e. nothing like the "Context-Adaptive" part of h.264 CABAC decoding, you can do so in a totally branchless manner. BTW, you also backfill the bit buffer in a branchless manner, and do it for free by overlapping these operations with the core logic: // Always try to load the next 8 (or 16/32) bits bit_buffer |= *data << bits; bits += 8; oflow = bits >> 5; // Did we overflow the bit buffer? bits &= 31; data += 1-oflow; > > I am *not* here to defend anal-retentive coding. > ("Hello, my name is James and I indulge in silly > pointless micro-optimizations.") But surely everyone It is never pointless: I learn something every time. > will admit that micro-optimizations are a cheaper > and safer *recreation* than many alternatives. :-) Absolutely. I've been helping a guy who crossposted an optimization problem over a bunch of programming groups (I read it in comp.lang.asm.x86). His original program took 26 minutes (albeit a number of years ago), he has a third-party written version which runs in ~5 ms, but sometimes fails to work (i.e. doesn't find all solutions). My current code runs in 0.75 ms, and he's still complaining because I wrote it in C(++) instead of his preferred Pascal. :-) Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Skybuck Flying on 28 May 2010 09:37 Ok, I no longer believe the lookup table is best... Correct formula's / calculations can probably solve the problem. As long as the algorithm exists on BitCount = 0. Good testing still has to be done but it seems ok. The time went down from 1.32 seconds or so to 1.12 seconds or so... So that's at least 15% more speed or so, that's nice. Bye, Skybuck.
From: Robert Myers on 28 May 2010 12:30 On May 28, 7:32 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > James Dow Allen wrote: > > > I am *not* here to defend anal-retentive coding. > > ("Hello, my name is James and I indulge in silly > > pointless micro-optimizations.") But surely everyone > > It is never pointless: I learn something every time. > > > will admit that micro-optimizations are a cheaper > > and safer *recreation* than many alternatives. :-) > > Absolutely. > > I've been helping a guy who crossposted an optimization problem over a > bunch of programming groups (I read it in comp.lang.asm.x86). > > His original program took 26 minutes (albeit a number of years ago), he > has a third-party written version which runs in ~5 ms, but sometimes > fails to work (i.e. doesn't find all solutions). > > My current code runs in 0.75 ms, and he's still complaining because I > wrote it in C(++) instead of his preferred Pascal. :-) Just saw another orienteering competition in progress, Terje. Very safe, except possibly for a sprain or an abrasion from a slip and fall on the rocky terrain. I once managed to get my foot infected from something in my shoe, even though I wasn't orienteering. Most important, if someone is overly-clever or overreaching, the consequences for anyone else will be strictly limited. The national guard and police helicopters are unlikely to be engaged in even the most dire conceivable mishap. I'm not sure you can say the same for compulsively-optimizing hotshot programmers. Maybe you could get more of them interested in orienteering, or some similar mentally and physically challenging activity that will leave less time for obscure cleverness. As to learning something from almost anything that requires thought, who could disagree? ;-) Robert.
From: Branimir Maksimovic on 28 May 2010 13:19
On Fri, 28 May 2010 18:11:01 +0200 "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote: > At least this brings the branches back to 1. > > Bye, > Skybuck. > > Skyfuck, don;t crosspost. Program in delphi if you want delphi. Program in asm if you want asm. You have free will? Or you are just Illusion? Don;t provoke people, not nice. Greets! -- http://maxa.homedns.org/ Sometimes online sometimes not Svima je "dozvoljeno" biti idiot i > mrak, ali samo neki to odaberu, |