Prev: Apple II Debugger
Next: TMA Assembler?
From: jukka@liimatta.org on 24 Aug 2006 04:29 Skybuck wrote: > Can you "unraffle" it into multiple lines (without all the () ) ? Let's look at one line, sure we can refactor it: x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); -> y = (x & 0xcccccccc) >> 2; z = (x & 0x33333333) << 2; x = y | z; But that just introduces two new variables and doesn't solve the primary problem of this fundamentally *iterative* algorithm: dependency.
From: Robert Redelmeier on 24 Aug 2006 09:31 In alt.lang.asm jukka(a)liimatta.org <jukka(a)liimatta.org> wrote in part: > y = (x & 0xcccccccc) >> 2; > z = (x & 0x33333333) << 2; > x = y | z; > > But that just introduces two new variables and doesn't > solve the primary problem of this fundamentally *iterative* > algorithm: dependency. Actually, programming in x86 asm or a decent compiler will introduce the two variables so the masks & shifts can proceed in parallel (using the hardware scheduler and perhaps the register renamer): mov ebx, eax ; copy x from eax to ebx and eax, 0CCCCCCCCh and ebx, 033333333h shr eax, 2 shl ebx, 2 or eax, ebx ; new x now in eax The 5 combining ORs do introduce dependancies, but it is far less than the 32 dependancies that SHR/SHL would introduce. You could try a byte-wise lookup table (bigger would likely not be in cache) but partial register operations (bytewise) can introduce stalls on some processors, so you'd best run shifts. I doubt you'll beat the 10-15 clocks the swapping algo takes (untested). Given TCP & IP datagrams are big endian, while x86 is little endian, the problem would appear to be important. Enough to add a much quicker hardware instruction. However, most uses of datagrams fields are as signatures not needing mathematical manipulation. So leaving them as unconverted backwards tokens is sufficient if speed is the over-riding concern. I really miss a fast endian conversion instruction for lexical sorts (for the diminishing cases where ASCII order suffices). Little endian hexdumps are easy to read when printed right-to-left. -- Robert
From: jukka on 24 Aug 2006 10:00 Robert Redelmeier wrote: > Actually, programming in x86 asm or a decent compiler will introduce > the two variables so the masks & shifts can proceed in parallel > (using the hardware scheduler and perhaps the register renamer): > > mov ebx, eax ; copy x from eax to ebx > and eax, 0CCCCCCCCh > and ebx, 033333333h > shr eax, 2 > shl ebx, 2 > or eax, ebx ; new x now in eax I didn't have that in mind, but rather, that writing out the subexpressions as two discrete expressions wouldn't change anything. Something like this. Here's one of the versions compiled... mov ecx, DWORD PTR _x$[esp-4] mov eax, ecx shr eax, 2 lea edx, DWORD PTR [ecx*4] xor eax, edx and eax, 858993459 ; 33333333H shl ecx, 2 xor eax, ecx And here is the other one... mov ecx, DWORD PTR _x$[esp-4] mov eax, ecx shr eax, 2 lea edx, DWORD PTR [ecx*4] xor eax, edx and eax, 858993459 ; 33333333H shl ecx, 2 xor eax, ecx At quick glance, the output seems *identical*, so only benefit from writing the subexpressions separately was to introduce two new variables. It depends on the reader if this is better thing or not. I prefer the construct as a single expression because for my tastes it communicates the intent better than the other alternative. Regarding the dependecy, I meant that the next line cannot be evaluated until the previous expression is assigned. I'm well aware that a compiler can blow out the scope and build operation tree and optimize it using dynamic programming, what not or whatever the compiler is programmed to do. Ofcourse, if we are interested in what kind of binary the compilers of today turn out with the function in question we can compile it, inspect the assembly output and draw our own conclusions. I checked the output with g++ 4.2 and visual c++ 2005 and the output was as predicted. (above snips are a good reference what to expect..)
From: Skybuck on 24 Aug 2006 14:10 > you guys are nuts, i've coding Assembler with various processors over the > years and all i had to worry about was byte order!, (big endian/little > endian). > the only time bit ordering had to be worked with is when you may > receive data from some sort of serial device or encryption code to get > it in the correct > order. this has nothing to do with the platform your on... > > i have copied over piles of source code that was originated on Big > Endian platforms over to little Endian that contain all kinds of bit > operations etc.. never had a problem and never had to change anything > other than byte order coding .. Ok I just can't resist to make a nasty joke, here it comes, brace yourself lol: "I am glad I never hired Jamie to code my airoplanes :P ;)" Bye, Skybuck.
From: Skybuck on 24 Aug 2006 14:18
jukka(a)liimatta.org schreef: > > Also the code does not seem to make much sense, at first glance anyway, > > and even after further inspection it's unclear what it does, how it > > works, or how it can be extended to more than 8 bits. > > I don't understand what is so unclear about how it works. > > > > Obfuscating is not allowed in my competitions, so this algorithm/code > > is disqualified ! ;) =D > > It is not obfuscated, the method name clearly states what the intent > is, if the name doesn't work out for you rename it to reverseBits() and > it should make more sense. That's why we NAME functions so that you can > determine what they are supposed to do. > > > > Thank you for participating, you are free to try again and this time in > > clearity, and then you can come again =D lol, have a nice day ! =D > > > What you mean? The code is quite clear, let's rehash: > > > > v = ((v >> 1) & 0x55555555) | ((v << 1) & 0xaaaaaaaa); > > > v = ((v >> 2) & 0x33333333) | ((v << 2) & 0xcccccccc); > > > v = ((v >> 4) & 0x0f0f0f0f) | ((v << 4) & 0xf0f0f0f0); > > > v = ((v >> 8) & 0x00ff00ff) | ((v << 8) & 0xff00ff00); > > > v = (v >> 16) | (v << 16); > > Look this from bottom-up, what does the LAST statement do? Do you always read code from bottom-up ? I-DON'T-THINK-SO ! > > v = (v >> 16) | (v << 16); > > It swaps the lowest and highest 16 bits. Then what does the statement > BEFORE that do? Yes, it swaps the 8 bit quantities within each 16 bit > quantity. And so on, until single bits are changing position inside 2 > bit groups. Ok, this explanation is clear. The code however is not. You actually had to read it bottom-up. Structured programming languages are read from top to bottom. Switching the lines would have made it a bit more understandable. But still. Each single line is still unclear. How can the variable 'v' be modified two times before assigning it to itself ? In theory this is impossible. There is only one variable V in the code. This can mean two things: 1. Temporarely variables are used. 2. The variable is modified in the first part of the line and then modified again in the second part of the line. Or when looking at the assembler a third possibility could exist: 3. The code is invalid and something else is generated to solve it. > In the end, well, the bits are reversed. It's so painstakingly obvious > when you read the code that I was taken completely by surprise that > such basic bit operations could be considered obfuscated! > > I'm baffled, I really am. G'day to you aswell sir. Lol, do you hang upside down from the ceiling or something lol ? Seriously. I shall now try to re-implement this idea to see if it really works and to see how crystal clear code could look like, and to see if it's still as fast ! (my first guess would be probably not ;)) Thanks for your clearification/post. Bye, Skybuck. |