Prev: Apple II Debugger
Next: TMA Assembler?
From: J French on 20 Aug 2006 06:45 On 20 Aug 2006 03:17:02 -0700, "Skybuck" <skybuck2000(a)hotmail.com> wrote: > >Skybuck schreef: > >> Woops, sorry, I did not know comp.lang.asm.x86 is a moderated >> newsgroup, as I don't like moderated newsgroups I post it again and >> exclude comp.lang.asm.x86: >> >> Reversing bit order in delphi ? >> >> Hello, >> >> I think intel uses most significant bit first ? >Nope, sorry, I was wrong. >The byte order and bit order of data in RAM is both Little Endian for >intel. (LSB) > >The byte order and bit order of CPU registers is both Little Endian for >intel. (LSB) > >The drawing below was correct, but the comments are wrong. > >When transmitting bits and bytes intels reads from right to left. > >So intel starts reading at the lowest bit order/number. > >So that means least significant data bit/byte first. > >> [intel system, most significant bit first] >Should be: >[intel system, lest significant bit (and byte) first] It depends on how you look at the bits in a byte - physically the bits go from right to left inside a byte - Intel does not really read bits - it grabs bytes Bytes within a number go from left to right A pure bit stream, as you read from a credit card, is not necessarily the same as a series of bits in a number of bytes I ran into that problem with bitmap graphics in the DOS days
From: f0dder on 20 Aug 2006 08:11 Skybuck wrote: > f0dder schreef: > >> A 256-byte LUT can easily fit in any CPU L1 cache. Typically a >> cacheline is 32 bytes long, so you'd maximally get (256/32) 8 cache >> misses before the entire LUT is cached. I'm pretty sure that this is >> hard to beat no matter which fancy instruction set you're going to >> use... > > Hello, > > Thank you for trying to explain it. > > I still do not quite understand it completely maybe you can clearify > something: > > Does the above theory only apply to the xlat instruction or any > instruction/memory access that uses the lookup table technique ? Any instruction that references memory will cause an entire cache line to be read (or written?). Iirc it's been like this since at least the 486. Things are a bit more complex than this, though, especially on recent architectures... check the intel and AMD docs, and Agner Fog's optimization manuals.
From: Skybuck on 20 Aug 2006 08:27 J French schreef: > On 20 Aug 2006 03:17:02 -0700, "Skybuck" <skybuck2000(a)hotmail.com> > wrote: > > > > >Skybuck schreef: > > > >> Woops, sorry, I did not know comp.lang.asm.x86 is a moderated > >> newsgroup, as I don't like moderated newsgroups I post it again and > >> exclude comp.lang.asm.x86: > >> > >> Reversing bit order in delphi ? > >> > >> Hello, > >> > >> I think intel uses most significant bit first ? > > >Nope, sorry, I was wrong. > > >The byte order and bit order of data in RAM is both Little Endian for > >intel. (LSB) > > > >The byte order and bit order of CPU registers is both Little Endian for > >intel. (LSB) > > > >The drawing below was correct, but the comments are wrong. > > > >When transmitting bits and bytes intels reads from right to left. > > > >So intel starts reading at the lowest bit order/number. > > > >So that means least significant data bit/byte first. > > > >> [intel system, most significant bit first] > > >Should be: > > >[intel system, lest significant bit (and byte) first] > > It depends on how you look at the bits in a byte > - physically the bits go from right to left inside a byte > - Intel does not really read bits - it grabs bytes > > Bytes within a number go from left to right Not on an intel system ;) On an intel system the bytes go from right to left. We humans however like to look at the big numbers and then at the little numbers most of the time. For example 1.324.234 | We look at the first digits ;) Intel stores this from right to left ends reading here <------ starts reading here The arror does not indicate where the reading starts. The arrow does indicate HOW intel reads. ;) Bye, Skybuck.
From: J French on 20 Aug 2006 09:47 On 20 Aug 2006 05:27:10 -0700, "Skybuck" <skybuck2000(a)hotmail.com> wrote: <snip> >> Bytes within a number go from left to right > >Not on an intel system ;) > >On an intel system the bytes go from right to left. > >We humans however like to look at the big numbers and then at the >little numbers most of the time. >For example 1.324.234 >We look at the first digits ;) >Intel stores this from right to left >ends reading here <------ starts reading here >The arror does not indicate where the reading starts. >The arrow does indicate HOW intel reads. Yes, there is a strange logic in what you say - the actual bit layout is different from the 'sort of' order of access But it is only numeric access I had problems with 2 colour bitmaps under DOS, and I am very wary of the distinction between physical and logical layout.
From: Skybuck on 20 Aug 2006 10:27
Day two of the competition. Alan provided new code, but it's simply the same, none the less it was included. A new contender has entered the competition, Jukka/Steve Baker/Skybuck joined effort. It's fast, it's correct, but it's obfuscated, so it's disqualified, none the less it's also included. A new contender has entered the competition, Skybuck's XLAT implementation =D ( with a little bit of help from Martin ;) :) Actually this was somebody elses idea but he didn't provide an implementation... to bad for him lol =D ) The performance chart on day two: 1st place, 107.180.332 calls, Herbert's implementation/idea wins (but disqualified for cheating =D) 2st place, 90.123.101 calls, (Jukka + Steve Baker + Skybuck) implementation. (but disqualified for obfuscation =D) 3st place, 84.320.368 calls, Skybuck's XLAT implementation. (but diqualified for cheating =D) 4st place, 81.954.163 calls, Skybuck's second and third implementation. (shared place) 5st place, 77.748.964 calls, Jukka and Steve (with a little help of skybuck to fix bugs ;)) 6st place, 38.171.975 calls, Frank Kotler implementation. 7st place, 29.151.010 calls, Skybuck's first implementation. 8st place, 21.531.280 calls, Alan Lloyd's basm (wrapped) implementation. (Had to use a wrapper to fix it) 9st place, 18.937.822 calls, Alan Lloyd's pure delphi implementation. (shared place) (Bummer dude, you end up last, good try though ;) ) Conclusion on day two: Herbert's Table LookUp + Skybuck's Reverse Bit Order third implementation wins for speed and clearity and purity (only delphi needed). Here is test program version 0.02 with some new implementations added: *** Begin of Part 1 of Code *** program TestReversingBitOrder; {$APPTYPE CONSOLE} { Reversing Bit Order Test program version 0.01 created on 19 august 2006 by Skybuck Flying. First I will create my own version, let's see how would I have done it... probably with some and's and shifts or so ;) Or maybe a bitset or some of that code ;) First I should build in some checking code to verify that the method is correct/sound ;) Writing this code was a lot of fun. Benchmark method is implemented Verification method is implemented. Different implementations are implemented. The fastest algorithm/idea is from Herbert. But he cheating lol. He did not supply a method/algorithm to generate the look up table, nor did he supply a method to calculate the reversed bit order value etc. The most amazing thing is that my third implementation is the fastest... well not that many contenders anyway. My second implementation is the runner up. Only Alan Lloyd and Frank Kotler dared to enter this speed competition, maybe we'll see more contenders ? :) Here are the result/output of the program: <snipped away, see new result/output and conclusion in version 0.02> } { version 0.02 created on 20 august 2006 by Skybuck Flying + New implementations are to be added ;) :) Intel memory and register layout is as follows: Memory layout: Bit 31...Bit 0 Byte 3...Byte 0 <-------------- Longword 10 /|\ | | | Longword 0 CPU Register layout: Bit 31...Bit 0 <-------------- The arrow indicates HOW intel reads. It starts reading at the right and proceeds to the left. It starts reading at the bottom and proceeds to the top. Intel reads from right to left, and down to top. So intel's world is up side down ;) However the most significant byte and bit is still on the left. Final conclusion: Intel's bit and byte order for both memory and registers is LSB. Same changes to the code have been made to reflect these facts. New conclusion and report: A new contender has entered the arena =D It's a combination of Jukka, Steve Bakker, and Skybuck ;) It's the fastest performing method to flip bits so far, not counting the memory access look up ;) which remains the fastest. However it appears to be obfuscated, so it's disqualified as well ;) The new results/output is: 0 00000000 00000000 0 1 00000001 10000000 128 2 00000010 01000000 64 3 00000011 11000000 192 4 00000100 00100000 32 5 00000101 10100000 160 6 00000110 01100000 96 7 00000111 11100000 224 8 00001000 00010000 16 9 00001001 10010000 144 10 00001010 01010000 80 11 00001011 11010000 208 12 00001100 00110000 48 13 00001101 10110000 176 14 00001110 01110000 112 15 00001111 11110000 240 16 00010000 00001000 8 17 00010001 10001000 136 18 00010010 01001000 72 19 00010011 11001000 200 20 00010100 00101000 40 21 00010101 10101000 168 22 00010110 01101000 104 23 00010111 11101000 232 24 00011000 00011000 24 25 00011001 10011000 152 26 00011010 01011000 88 27 00011011 11011000 216 28 00011100 00111000 56 29 00011101 10111000 184 30 00011110 01111000 120 31 00011111 11111000 248 32 00100000 00000100 4 33 00100001 10000100 132 34 00100010 01000100 68 35 00100011 11000100 196 36 00100100 00100100 36 37 00100101 10100100 164 38 00100110 01100100 100 39 00100111 11100100 228 40 00101000 00010100 20 41 00101001 10010100 148 42 00101010 01010100 84 43 00101011 11010100 212 44 00101100 00110100 52 45 00101101 10110100 180 46 00101110 01110100 116 47 00101111 11110100 244 48 00110000 00001100 12 49 00110001 10001100 140 50 00110010 01001100 76 51 00110011 11001100 204 52 00110100 00101100 44 53 00110101 10101100 172 54 00110110 01101100 108 55 00110111 11101100 236 56 00111000 00011100 28 57 00111001 10011100 156 58 00111010 01011100 92 59 00111011 11011100 220 60 00111100 00111100 60 61 00111101 10111100 188 62 00111110 01111100 124 63 00111111 11111100 252 64 01000000 00000010 2 65 01000001 10000010 130 66 01000010 01000010 66 67 01000011 11000010 194 68 01000100 00100010 34 69 01000101 10100010 162 70 01000110 01100010 98 71 01000111 11100010 226 72 01001000 00010010 18 73 01001001 10010010 146 74 01001010 01010010 82 75 01001011 11010010 210 76 01001100 00110010 50 77 01001101 10110010 178 78 01001110 01110010 114 79 01001111 11110010 242 80 01010000 00001010 10 81 01010001 10001010 138 82 01010010 01001010 74 83 01010011 11001010 202 84 01010100 00101010 42 85 01010101 10101010 170 86 01010110 01101010 106 87 01010111 11101010 234 88 01011000 00011010 26 89 01011001 10011010 154 90 01011010 |