Prev: D2007 does not do "multiplication illimination" optimization...
Next: Was NOP deprecated for AA64? NASM not disassembling...
From: Alexei A. Frounze on 18 Feb 2010 14:06 On Feb 18, 3:00 am, "wolfgang kern" <nowh...(a)never.at> wrote: > peter asked:> Hi > > Hello, > > > I am adding a "page up" and "page down" button to the instruction > > panel (http://peter-bochs.googlecode.com/files/screendump20100203.png) > > For the page up button, I don't know how to calculate the address to > > start to disassemble. For example, if I am disassembling 0x1000 > > address, how can I know what address I should disassemble after > > pressing the "page up" button, it could be 0xff0, 0xff1, 0xff2. > > Currently I use this method, but it has bug, arround 50% will > > disassemble the correct result: save the first 10 instructions into > > an array, keep trying to disassemble the previous address (decrease > > the address one by one to try), if those 10 instructions appears > > again, that mean I have disassemble the correct address. > > thanks > > from Peter (cmk...(a)hotmail.com) > > I once tried this too and also used backwards byte stepping, > but it will only be correct if it starts with the max. possible > instruction-length (14/15 bytes depending on mode) and it heavy > fails if code is mixed with data. It only remembered the last > known start address of the first visible line for matching. > > So finally I just remember the address of the previous page start > (even page size may vary with selected display layout) and use > the cursor keys for moving back one byte at a time. > This way allow to see otherwise hidden entry-points in addition. > > __ > wolfgang However, if we make certain assumptions and arrangements, we can make it work most of the time. For example, if the disassembler knows that the memory content has come from an executable file that has separate sections for code, constants and variables/stack/heap and it can get all this info, then the code section or maybe even individual subroutines from it may be pre-disassembled (or just pre-parsed) from start to end and that would give the disassembler knowledge of where every instruction in the code section begins (for other sections you won't use this method). Of course, this still won't work for programs with data mixed in the code section and this won't help much with code that jumps into the middle of its instructions, but for most typical applications this will work just fine. Alex
From: Jake Waskett on 18 Feb 2010 16:01 On Thu, 18 Feb 2010 11:06:19 -0800, Alexei A. Frounze wrote: > On Feb 18, 3:00 am, "wolfgang kern" <nowh...(a)never.at> wrote: >> peter asked:> Hi >> >> Hello, >> >> > I am adding a "page up" and "page down" button to the >> > instruction >> > panel >> > (http://peter-bochs.googlecode.com/files/screendump20100203.png) For >> > the page up button, I don't know how to calculate the address to >> > start to disassemble. For example, if I am disassembling 0x1000 >> > address, how can I know what address I should disassemble after >> > pressing the "page up" button, it could be 0xff0, 0xff1, 0xff2. >> > Currently I use this method, but it has bug, arround 50% will >> > disassemble the correct result: save the first 10 instructions into >> > an array, keep trying to disassemble the previous address (decrease >> > the address one by one to try), if those 10 instructions appears >> > again, that mean I have disassemble the correct address. thanks >> > from Peter (cmk...(a)hotmail.com) >> >> I once tried this too and also used backwards byte stepping, but it >> will only be correct if it starts with the max. possible >> instruction-length (14/15 bytes depending on mode) and it heavy fails >> if code is mixed with data. It only remembered the last known start >> address of the first visible line for matching. >> >> So finally I just remember the address of the previous page start (even >> page size may vary with selected display layout) and use the cursor >> keys for moving back one byte at a time. This way allow to see >> otherwise hidden entry-points in addition. >> >> __ >> wolfgang > > However, if we make certain assumptions and arrangements, we can make it > work most of the time. For example, if the disassembler knows that the > memory content has come from an executable file that has separate > sections for code, constants and variables/stack/heap and it can get all > this info, then the code section or maybe even individual subroutines > from it may be pre-disassembled (or just pre-parsed) from start to end > and that would give the disassembler knowledge of where every > instruction in the code section begins (for other sections you won't use > this method). Of course, this still won't work for programs with data > mixed in the code section and this won't help much with code that jumps > into the middle of its instructions, but for most typical applications > this will work just fine. > > Alex I wonder whether a statistical approach might work... Some instructions are much more likely to occur than others - for example jumps (conditional and otherwise) are likely every few instructions. So you could try a set of start addresses, compute some measure of the likelihood of the sequence of disassembled instructions occurring in normal code, and pick the most likely. It wouldn't be foolproof, but it would probably be correct most of the time.
From: robertwessel2 on 18 Feb 2010 18:38 On Feb 17, 10:58 pm, peter <cmk...(a)gmail.com> wrote: > Hi > I am adding a "page up" and "page down" button to the instruction > panel (http://peter-bochs.googlecode.com/files/screendump20100203.png) > > For the page up button, I don't know how to calculate the address to > start to disassemble. For example, if I am disassembling 0x1000 > address, how can I know what address I should disassemble after > pressing the "page up" button, it could be 0xff0, 0xff1, 0xff2. > > Currently I use this method, but it has bug, arround 50% will > disassemble the correct result: save the first 10 instructions into > an array, keep trying to disassemble the previous address (decrease > the address one by one to try), if those 10 instructions appears > again, that mean I have disassemble the correct address. To add to the other comments, while it's impossible in general, you can do a pretty good job unless you run into data (or you have code jumping into the middle of instructions), by backing up several dozen bytes (the further you back up, the higher the odds of getting the alignment correct - again, except in the case of data or jumps into the middle of instructions) before your candidate location, and then disassembling from there. Then try disassembling from there and the subsequent 14 bytes (since instructions cannot be more than 15 bytes long) up to where you know you had a good instruction (presumably the top of the page before the "PgUp"). Only those positions that align correctly with the known instructions, and include only good instructions, are candidates. You might still get more than one possible sequence, but the further back you go, the less likely that will be (again subject to the above limits). If you get no hits, you can try a shorter backwards step to start your scanning. You can also do this backwards, one instruction at a time. Try to determine how many instructions (or bytes) back you can construct a plausible sequence (basically constructing a tree of possible instructions sequences), and then assuming that the version of the last instruction (that you're trying to back over) that leads to the longest plausible sequence of preceding instructions is the most likely one. IOW, at each position try positions (n-1)..(n-15) to see if in decodes as a 1..15 byte instruction. The going forward approach has the advantage of simplicity in terms of data structures, but requires more care to deal with the inevitable situations where you can't find a good sequence from a given starting point (basically you have to start over and try a shorter backstep, possibly binary searching to the longest one you can find). The going backwards approach has the advantage of dealing well with the point where you backup into non-instruction data, but requires a more complex search. You might apply some additional heuristics. For example, most x86-32 and -64 code does not contain data intermixed with code (although it's certainly possible), and because of the way most OSs, linkers and program loaders work, you can usually assume that code sections start on page (4KB) boundaries. And there are certainly some more common sequences that you might search for (a RET followed by enough NOPs to get to a 16 byte boundary, or the sequence push ebp, mov ebp,esp, sub esp,##, for example). And the suggestion to give the user a chance to fiddle the alignment manually is a good one.
From: James Harris on 18 Feb 2010 19:09
On 18 Feb, 06:43, "Alexei A. Frounze" <alexfrun...(a)gmail.com> wrote: > On Feb 17, 8:58 pm, peter <cmk...(a)gmail.com> wrote: > > > > > Hi > > I am adding a "page up" and "page down" button to the instruction > > panel (http://peter-bochs.googlecode.com/files/screendump20100203.png) > > > For the page up button, I don't know how to calculate the address to > > start to disassemble. For example, if I am disassembling 0x1000 > > address, how can I know what address I should disassemble after > > pressing the "page up" button, it could be 0xff0, 0xff1, 0xff2. > > > Currently I use this method, but it has bug, arround 50% will > > disassemble the correct result: save the first 10 instructions into > > an array, keep trying to disassemble the previous address (decrease > > the address one by one to try), if those 10 instructions appears > > again, that mean I have disassemble the correct address. Given the information available ISTM important that the user be in no doubt that it's the *user's* responsibility to align the disassembly. Consider that the instruction on which processing is stopped may have just been jumped to from somewhere else and be preceded by nothing useful. > You can't implement this correctly in general when your instructions > have variable length and even may overlap with data. You may only try > or let the user do this by allowing him to adjust the address on the > first line by +/-1 in an easy manner. Bumping the start byte by one is ideal if the disassembly is quick enough. When the resulting instruction stream fails to perfectly meet the known instruction some clear indication would be helpful. Of course, any jump may really have been to the middle of an instruction.... James |