Prev: Calculating longword pointer, which method is faster ?
Next: SMT exploiting 21264-like clustering?
From: Skybuck Flying on 28 May 2010 12:11 "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message news:15d69$4bffe92b$54190f09$2267(a)cache3.tilbu1.nb.home.nl... > "Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message > news:82ee6$4bffe8c5$54190f09$1670(a)cache3.tilbu1.nb.home.nl... >>> However the problem for shifting "content" still needs to be looked. I >>> am not yet sure if it can be solved. >> >> There is now one remaining problem with the algorithm: >> >> mSomething := Content shr (BitCount - Shift); >> >> The BitCount can again range from 0 to 32. >> >> The Shift can range from 0 to 31. >> >> Thus BitCount 32 - 0 is 32 >> >> So shr 32 is a problem. >> >> mSomething should become zero when shr 32 is done. >> >> Shr 0 will leave the content intact which would be wrong. >> >> Any solutions ? >> >> For now I only see a branch as a solution. > > It's kinda tricky too... to try and get a branch working. > > Just checking if BitCount is 32 would not be enough.... > > Since BitShift might be 5 and then it needs to shift and so forth. > > So it actually needs to branches probably something like: > > if (mBitCount=32) and (mShift = 0) then > begin > mSomething := 0; > end; A better way could be to store the result in a variable first: vShiftRight := (BitCount - Shift); if vShiftRight = 32 then begin mSomething := 0; end else begin mSomething := Content shr vShiftRight; end; At least this brings the branches back to 1. Bye, Skybuck.
From: Skybuck Flying on 28 May 2010 19:30 Anyway, I have now completely illiminated the need for a mask... at least in the write routine... but other routines might/will probably still require it. The formula was: Mask := not ( -2 shl (BitCount-1) ) As far as I can see it probably uses 3 instructions, which is still quite a lot. Since a subtraction must now occur anyway the following formula will probably be faster, unless shifting many bits takes longer... but I would not expect that on modern hardware, the AMD optimization manual says "note 3, clock count ???" What is that ? Anyway here is a potentially faster formula which works from the opposite side: Mask := -1 shr (32-BitCount); From the looks of it, just 2 instructions. Yeah it's a little bit faster :) Bye, Skybuck.
First
|
Prev
|
Pages: 1 2 Prev: Calculating longword pointer, which method is faster ? Next: SMT exploiting 21264-like clustering? |