Lolling at programmers, how many ways are there to create a bitmask ? ;) :) [Computer Architecture]

Prev: Calculating longword pointer, which method is faster ?
Next: SMT exploiting 21264-like clustering?

From: Skybuck Flying on 28 May 2010 12:11

"Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message
news:15d69$4bffe92b$54190f09$2267(a)cache3.tilbu1.nb.home.nl...
> "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message
> news:82ee6$4bffe8c5$54190f09$1670(a)cache3.tilbu1.nb.home.nl...
>>> However the problem for shifting "content" still needs to be looked. I
>>> am not yet sure if it can be solved.
>>
>> There is now one remaining problem with the algorithm:
>>
>> mSomething := Content shr (BitCount - Shift);
>>
>> The BitCount can again range from 0 to 32.
>>
>> The Shift can range from 0 to 31.
>>
>> Thus BitCount 32 - 0 is 32
>>
>> So shr 32 is a problem.
>>
>> mSomething should become zero when shr 32 is done.
>>
>> Shr 0 will leave the content intact which would be wrong.
>>
>> Any solutions ?
>>
>> For now I only see a branch as a solution.
>
> It's kinda tricky too... to try and get a branch working.
>
> Just checking if BitCount is 32 would not be enough....
>
> Since BitShift might be 5 and then it needs to shift and so forth.
>
> So it actually needs to branches probably something like:
>
> if (mBitCount=32) and (mShift = 0) then
> begin
> mSomething := 0;
> end;

A better way could be to store the result in a variable first:

vShiftRight := (BitCount - Shift);

if vShiftRight = 32 then
begin
mSomething := 0;
end else
begin
mSomething := Content shr vShiftRight;
end;

At least this brings the branches back to 1.

Bye,
Skybuck.

From: Skybuck Flying on 28 May 2010 19:30

Anyway,

I have now completely illiminated the need for a mask... at least in the
write routine... but other routines might/will probably still require it.

The formula was:

Mask := not ( -2 shl (BitCount-1) )

As far as I can see it probably uses 3 instructions, which is still quite a
lot.

Since a subtraction must now occur anyway the following formula will
probably be faster, unless shifting many bits takes longer... but I would
not expect that on modern hardware, the AMD optimization manual says "note
3, clock count ???" What is that ?

Anyway here is a potentially faster formula which works from the opposite
side:

Mask := -1 shr (32-BitCount);

From the looks of it, just 2 instructions.

Yeah it's a little bit faster :)

Bye,
Skybuck.

First | Prev |
Pages: 1 2
Prev: Calculating longword pointer, which method is faster ?
Next: SMT exploiting 21264-like clustering?