Lolling at programmers, how many ways are there to create a bitmask ? ;) :) [ASM]

Prev: library .dll in windows
Next: overflow of stack for recursive functions

From: James Dow Allen on 28 May 2010 02:04

On May 28, 9:59 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com>
wrote:
> I was thinking, maybe a "bitcount mod 32" might help ...

From the (useless?) factoids department:

When Intel x86 shifts a 32-bit register with a shift-count
coming from reg, all but the low-order 5 bits are ignored;
i.e. shift count is mod 32.

*However* the Motorola 68xxx uses 6 bits; i.e. shift
count is mod 64. A large count will set the 32-bit
reg to all 0's (or all 1's).

I wrote a Huffman decoder once. I didn't want it to
be just another FAST decoder; I wanted it to be the
fastest decoder possible! Config tested shifts and
took a short-cut when possible.

I am *not* here to defend anal-retentive coding.
("Hello, my name is James and I indulge in silly
pointless micro-optimizations.") But surely everyone
will admit that micro-optimizations are a cheaper
and safer *recreation* than many alternatives. :-)

James

From: Terje Mathisen "terje.mathisen at on 28 May 2010 07:32

James Dow Allen wrote:
> On May 28, 9:59 am, "Skybuck Flying"<IntoTheFut...(a)hotmail.com>
> wrote:
>> I was thinking, maybe a "bitcount mod 32" might help ...
>
> From the (useless?) factoids department:
>
> When Intel x86 shifts a 32-bit register with a shift-count
> coming from reg, all but the low-order 5 bits are ignored;
> i.e. shift count is mod 32.
>
> *However* the Motorola 68xxx uses 6 bits; i.e. shift
> count is mod 64. A large count will set the 32-bit
> reg to all 0's (or all 1's).
>
> I wrote a Huffman decoder once. I didn't want it to
> be just another FAST decoder; I wanted it to be the
> fastest decoder possible! Config tested shifts and
> took a short-cut when possible.

The fastest Huffman decoders don't use that many shifts, they average
less than one shift_right per decoded token:

My fastest code uses N-bit wide lookups that can return multiple tokens
per iteration, along with a count of the total number of bits used and
the number of tokens copied, i.e. how much the token_count has been
incremented.

It also returns a pointer to the next decoder to use, normally the same.

In the case of zero return tokens, it instead chains to a secondary decoder.

This means that as long as you can blindly decode a bunch of tokens,
i.e. nothing like the "Context-Adaptive" part of h.264 CABAC decoding,
you can do so in a totally branchless manner.

BTW, you also backfill the bit buffer in a branchless manner, and do it
for free by overlapping these operations with the core logic:

// Always try to load the next 8 (or 16/32) bits
bit_buffer |= *data << bits;
bits += 8;

oflow = bits >> 5; // Did we overflow the bit buffer?
bits &= 31;
data += 1-oflow;

>
> I am *not* here to defend anal-retentive coding.
> ("Hello, my name is James and I indulge in silly
> pointless micro-optimizations.") But surely everyone

It is never pointless: I learn something every time.

> will admit that micro-optimizations are a cheaper
> and safer *recreation* than many alternatives. :-)

Absolutely.

I've been helping a guy who crossposted an optimization problem over a
bunch of programming groups (I read it in comp.lang.asm.x86).

His original program took 26 minutes (albeit a number of years ago), he
has a third-party written version which runs in ~5 ms, but sometimes
fails to work (i.e. doesn't find all solutions).

My current code runs in 0.75 ms, and he's still complaining because I
wrote it in C(++) instead of his preferred Pascal. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Skybuck Flying on 28 May 2010 09:37

Ok,

I no longer believe the lookup table is best...

Correct formula's / calculations can probably solve the problem.

As long as the algorithm exists on BitCount = 0.

Good testing still has to be done but it seems ok.

The time went down from 1.32 seconds or so to 1.12 seconds or so...

So that's at least 15% more speed or so, that's nice.

Bye,
Skybuck.

From: Robert Myers on 28 May 2010 12:30

On May 28, 7:32 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> James Dow Allen wrote:

>
> > I am *not* here to defend anal-retentive coding.
> > ("Hello, my name is James and I indulge in silly
> > pointless micro-optimizations.") But surely everyone
>
> It is never pointless: I learn something every time.
>
> > will admit that micro-optimizations are a cheaper
> > and safer *recreation* than many alternatives. :-)
>
> Absolutely.
>
> I've been helping a guy who crossposted an optimization problem over a
> bunch of programming groups (I read it in comp.lang.asm.x86).
>
> His original program took 26 minutes (albeit a number of years ago), he
> has a third-party written version which runs in ~5 ms, but sometimes
> fails to work (i.e. doesn't find all solutions).
>
> My current code runs in 0.75 ms, and he's still complaining because I
> wrote it in C(++) instead of his preferred Pascal. :-)

Just saw another orienteering competition in progress, Terje. Very
safe, except possibly for a sprain or an abrasion from a slip and fall
on the rocky terrain. I once managed to get my foot infected from
something in my shoe, even though I wasn't orienteering.

Most important, if someone is overly-clever or overreaching, the
consequences for anyone else will be strictly limited. The national
guard and police helicopters are unlikely to be engaged in even the
most dire conceivable mishap.

I'm not sure you can say the same for compulsively-optimizing hotshot
programmers. Maybe you could get more of them interested in
orienteering, or some similar mentally and physically challenging
activity that will leave less time for obscure cleverness.

As to learning something from almost anything that requires thought,
who could disagree? ;-)

Robert.

From: Branimir Maksimovic on 28 May 2010 13:19

On Fri, 28 May 2010 18:11:01 +0200
"Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote:

> At least this brings the branches back to 1.
>
> Bye,
> Skybuck.
>
>

Skyfuck, don;t crosspost. Program in delphi if you want
delphi. Program in asm if you want asm. You have free will?
Or you are just Illusion?
Don;t provoke people, not nice.

Greets!

--
http://maxa.homedns.org/

Sometimes online sometimes not

Svima je "dozvoljeno" biti idiot i
> mrak, ali samo neki to odaberu,

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: library .dll in windows
Next: overflow of stack for recursive functions