From: Willow on 8 Jan 2010 14:16 Hi, Sometimes when disassembling 32bit x86 code I find this idiom: cmp reg1,SomeLimit ja DefaultLabel movzx reg2,byte [reg1+disp1] ; What is this DOING ???? jmp dword near [reg2*4+disp2] I designed my CRUDASM3 disassembler to recognize the idiom and deal with it but why does it exist? What is the benefit of using a second-level table? I understand this idiom: cmp reg1,SomeLImit ja DefaultLabel jmp dword near [reg1*4+disp1] That makes sense to me, it's a straight-forward C/C++ switch table, converted to assembly language. So why would a compiler want to use MOVZX before the jmp indirect? I understand some older compilers would actually use REPNE SCASB to find a match, maybe the MOVZX idiom is just a modern counterpart to that. But I don't understand it, what's it doing and why is it needed? The CRUDASM3 disassembler correctly recognizes switch/case tables that match the above idiom--with or without MOVZX. Most of the time. It's like a peephole optimizer, it makes a log of insn's disassembled and when the last one in the log is a JMP indirect, it goes through the log to identify the upper bound of the register, identify the default label, and deal with MOVZX's. Sometimes it gets confused. I made it correctly deal with JAE/JA but if someone uses JB/JBE after a compare and the target does the indirect jump, it will get confused and won't recognize the switch/ case table. Does anyone out there have any experience making switch/case recognizers? What are the pitfalls to be weary of? Thanks in advance! You can find my CRUDASM3 disassembler (complete with source code) here: http://code.google.com/p/vm64dec/downloads/list Willow
From: H. Peter Anvin on 8 Jan 2010 14:24 On 01/08/2010 11:16 AM, Willow wrote: > Hi, > Sometimes when disassembling 32bit x86 code I find this idiom: > cmp reg1,SomeLimit > ja DefaultLabel > movzx reg2,byte [reg1+disp1] ; What is this DOING ???? > jmp dword near [reg2*4+disp2] > > I designed my CRUDASM3 disassembler to recognize the idiom and deal > with it but why does it exist? > What is the benefit of using a second-level table? It's trying to save space. The first table only uses one byte per entry, instead of four. If the first table covers a much larger numberspace than the second, this is a win in space, and quite possibly in time due to better cache locality. -hpa
From: Rod Pemberton on 8 Jan 2010 15:21 "Willow" <wrschlanger(a)gmail.com> wrote in message news:c3e87888-2280-489f-bfe5-9f3b30d5710b(a)u41g2000yqe.googlegroups.com... > Sometimes when disassembling 32bit x86 code I find this idiom: > cmp reg1,SomeLimit > ja DefaultLabel > movzx reg2,byte [reg1+disp1] ; What is this DOING ???? > jmp dword near [reg2*4+disp2] > Not the answers you're looking for: a) converting one set of integers to another b) zero extending reg2's byte to full size of the register > I designed my CRUDASM3 disassembler to recognize the idiom and deal > with it but why does it exist? > What is the benefit of using a second-level table? I understand this > idiom: > cmp reg1,SomeLImit > ja DefaultLabel > jmp dword near [reg1*4+disp1] > That makes sense to me, it's a straight-forward C/C++ switch table, > converted to assembly language. > With GCC both 3.x and 4.x, I have many pieces of code with switch() statements that will generate that, but usually only with optimization, such as -O2 enabled. I wasn't able to generate the first construct above. > So why would a compiler want to use MOVZX before the jmp indirect? There is some type of remapping or byte conversion involved. It may be compiler generated, e.g., optimization, or user generated, e.g., an character array used in switch(). With GCC 3.x, byte based instructions are hard to generate. They usually involve a C type, such as "char" or "unsigned char" or "unsigned short", which force the compiler to clamp intermediate values to a byte. GCC tends to promote them to larger sizes as quickly as possible. OpenWatcom on the other hand, will generate byte instructions and maintain variables as byte sized throughout the code. I didn't check OW's assembly code. My primary guess would be that a "char" was used as an index to an array in a switch() statement. However, I tested a couple of those with GCC, with and without optimization, and didn't see the construct you asked about. If you know what C code generated that and it was from a GCC compiler, could you post it? Thanks. Rod Pemberton
|
Pages: 1 Prev: spam killing the Forum Next: Multidimensional array member initialization |