Prev: PC = Personal Copier :)
Next: x86 instruction set usage-difference between windows 95 andwindows xp ?
From: Robert Myers on 27 May 2010 17:13 On May 27, 4:02 pm, Seebs <usenet-nos...(a)seebs.net> wrote: > On 2010-05-27, James Harris <james.harri...(a)googlemail.com> wrote: > > > I'm not going to try and defend him but having seen his posts for some > > time I don't think he's trolling. > > Could be. > > > His interests are or have been video > > processing. He puts a lot of effort into getting the best x86 > > instruction sequences from his Delphi compiler. The rest is mainly > > communication style. > > Could be, but it's a communications style which seems rude to me, and > the outrage at the idea that a processor might require you to give it > only semantically valid instructions strikes me as a bad sign. > Rudeness on Usenet? What *is* the world coming to? Main Entry: 1loll Pronunciation: \Ëläl\ Function: verb Etymology: Middle English Date: 14th century intransitive verb 1 : to hang loosely or laxly : droop 2 : to act or move in a lax, lazy, or indolent manner : lounge transitive verb : to let droop or dangle synonyms see idle â loll·er \Ëlä-lÉr\ noun In any case, computers know what programmers really mean. If not, they're just being difficult, like a recalcitrant child. A very stubborn and defiant recalcitrant child. Robert.
From: MitchAlsup on 27 May 2010 19:20 On May 27, 11:06 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com> wrote: > Would you rather write: > > // 1. > Z := X shl Y; > > // or > > // 2. > if Y < 32 then > begin > Z := X shl Y; > end else > begin > Z := X; > end; Why not: z = x << (y & 31); ?
From: James Dow Allen on 28 May 2010 02:04 On May 28, 9:59 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com> wrote: > I was thinking, maybe a "bitcount mod 32" might help ... From the (useless?) factoids department: When Intel x86 shifts a 32-bit register with a shift-count coming from reg, all but the low-order 5 bits are ignored; i.e. shift count is mod 32. *However* the Motorola 68xxx uses 6 bits; i.e. shift count is mod 64. A large count will set the 32-bit reg to all 0's (or all 1's). I wrote a Huffman decoder once. I didn't want it to be just another FAST decoder; I wanted it to be the fastest decoder possible! Config tested shifts and took a short-cut when possible. I am *not* here to defend anal-retentive coding. ("Hello, my name is James and I indulge in silly pointless micro-optimizations.") But surely everyone will admit that micro-optimizations are a cheaper and safer *recreation* than many alternatives. :-) James
From: Terje Mathisen "terje.mathisen at on 28 May 2010 07:32 James Dow Allen wrote: > On May 28, 9:59 am, "Skybuck Flying"<IntoTheFut...(a)hotmail.com> > wrote: >> I was thinking, maybe a "bitcount mod 32" might help ... > > From the (useless?) factoids department: > > When Intel x86 shifts a 32-bit register with a shift-count > coming from reg, all but the low-order 5 bits are ignored; > i.e. shift count is mod 32. > > *However* the Motorola 68xxx uses 6 bits; i.e. shift > count is mod 64. A large count will set the 32-bit > reg to all 0's (or all 1's). > > I wrote a Huffman decoder once. I didn't want it to > be just another FAST decoder; I wanted it to be the > fastest decoder possible! Config tested shifts and > took a short-cut when possible. The fastest Huffman decoders don't use that many shifts, they average less than one shift_right per decoded token: My fastest code uses N-bit wide lookups that can return multiple tokens per iteration, along with a count of the total number of bits used and the number of tokens copied, i.e. how much the token_count has been incremented. It also returns a pointer to the next decoder to use, normally the same. In the case of zero return tokens, it instead chains to a secondary decoder. This means that as long as you can blindly decode a bunch of tokens, i.e. nothing like the "Context-Adaptive" part of h.264 CABAC decoding, you can do so in a totally branchless manner. BTW, you also backfill the bit buffer in a branchless manner, and do it for free by overlapping these operations with the core logic: // Always try to load the next 8 (or 16/32) bits bit_buffer |= *data << bits; bits += 8; oflow = bits >> 5; // Did we overflow the bit buffer? bits &= 31; data += 1-oflow; > > I am *not* here to defend anal-retentive coding. > ("Hello, my name is James and I indulge in silly > pointless micro-optimizations.") But surely everyone It is never pointless: I learn something every time. > will admit that micro-optimizations are a cheaper > and safer *recreation* than many alternatives. :-) Absolutely. I've been helping a guy who crossposted an optimization problem over a bunch of programming groups (I read it in comp.lang.asm.x86). His original program took 26 minutes (albeit a number of years ago), he has a third-party written version which runs in ~5 ms, but sometimes fails to work (i.e. doesn't find all solutions). My current code runs in 0.75 ms, and he's still complaining because I wrote it in C(++) instead of his preferred Pascal. :-) Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Skybuck Flying on 28 May 2010 09:37
Ok, I no longer believe the lookup table is best... Correct formula's / calculations can probably solve the problem. As long as the algorithm exists on BitCount = 0. Good testing still has to be done but it seems ok. The time went down from 1.32 seconds or so to 1.12 seconds or so... So that's at least 15% more speed or so, that's nice. Bye, Skybuck. |