Two Click disassembly/reassembly [ASM]

From: randyhyde@earthlink.net on 26 Jan 2006 19:32

Charles A. Crayne wrote:
> On 26 Jan 2006 12:09:26 -0800
> "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:
>
> :What happens with they didn't use a scaled index addressing mode?
> :Perhaps they've used induction across a loop and they automatically add
> :4 to EAX on each iteration of that loop.
>
> And just how is this straw man any more formidable than his predecessor,
> which I swept away with ease?

"Swept under the table" is a more appropriate metaphor.
So tell me, because *I* don't know the answer to this one, just exactly
*how* are you going to determine that the code is indexing into a table
of four-byte pointers? You can provide all the *simple* examples you
want, but in the end, any code that handles trivial examples is *not*
going to work on real-world code. Just looking at the scaled indexed
value is *not* going to tell you the information you need to know.

Suppose, for example, I have a table of record values and each record
is eight bytes long. Now suppose that I use an addressing mode like
this: table[eax*8+4]. Okay. Now how about a table with 32-byte
entries into which I must manually compute the index (because there is
no *32 scaled index mode)?

Now suppose that instead of a nice array of table entries, I have a
two-dimensional table of these objects. Or how about a linked list?
Or maybe parallel arrays? Or any other data structure I can dream up.
Do you honestly think you can come up with a generic algorithm that
will figure this stuff out and correctly map the offsets onto the new
architecture? I'll be the first to nominate you for the Turing Award
if you do this.

> However, since you obviously didn't
> understand, let me go through it again, in more detail.

Oh, I understand exactly what you're trying to claim. It's just that
I've seen the research and I know the blind alley you're headed down.
Been there, done that. It was just a long time ago is all.

>
> The tool flags any statement which calls, or jumps to, an address which is
> not an unmodified label. The programmer can either add a new label to
> the x86 source and change the call/jump target address accordingly [as in
> your previous example], or change the target address calculation where
> it occurs in the loop [as in your example above] and flag the call/jmp
> statement as having by approved by the programmer.

Are you sure you can always flag any statement which calls, or
jumps....?
If you restrict the source language to the point where the automated
code can deal with it (without running into the undecideability
problem), then the resulting code will require careful review by a
human. And I stick by my assertion that translating the code from
scratch will probably be easier than modifying the produced code for
all but trivial applications.

BTW, have you looked at the output of the 8080->8086 translator that
Intel wrote? Noting that the semantics of the 8080 instructions nearly
match the semantics of the 8080 subset on the 8086, this should have
been a fairly easy program to write that generated decent code. The
code was *terrible*. The saving grace is that the 8086 ran a *whole*
lot faster than the 8080, so you could get away with the terrible code.
And that was without *any* of the problems associated with porting to
a different architecture that we're now discussing.

>
> :Except for trivial demo
> :apps, it's less work to do the conversion by hand when it's all said
> :and done.
>
> It is difficult to believe that you typed this with a straight face, as it
> should be obvious to the casual observer that the larger the body of code
> to be converted, the more valuable the tool.

B.S. The larger the body of code, the more bugs wind up in the result
and the more difficult it is to eradicate those bugs. Do you really
want to review 250,000 lines of code with an average of 50 bugs (or
whatever) per 1,000 instructions? This is the type of code that you
throw away and start over on. And unless the code is written in a
trivial manner, this is the type of error rate you can expect from a
conversion program.

>
> :Well, you and Rene can feel free to spend all your free time working on
> :a tool.
>
> Unlike Betov, I do not have a large body of RosAsm source code which I wish
> to convert to another architecture, and therefore will not be working on
> that tool. However I do have a very similar situation, for which I am
> currently writing a conversion tool.

Well then, I strongly suggest that you do your research first rather
than just hacking out the first idea that comes along (as Rene
typically does).

>
> As you may know, some years ago, I wrote a text adventure game engine, for
> which my wife wrote a number of game scripts. More recently, I ported the
> engine from DOS to Linux, but in an attempt to reach users on other
> hardware platforms, I have decided to port the scripts to the Inform
> compiler.

Completely different animal. But I wish you luck.

>
> Most of the time I have spent on this project has be devoted to learning
> the Inform way of doing things, which I still have not completely
> mastered. However, the 16 hours, or so, which I have spent writing the
> tool, has already saved me at least 10 times that investment, as compared
> to hand conversion.

Again, a completely different animal. If inform or your scripting
language was anywhere near as semantically complex as 80x86 assembly
language, you'd have a point. But it's not. Ultimately, what Rene is
probably thinking about is simply converting Intel mnemonics to their
closest relative on the target processor. That simply won't work.

> -- Chuck

From: Frank Kotler on 26 Jan 2006 20:38

randyhyde(a)earthlink.net wrote:

....
[RosAsm disassembler]
> it's amazing that
> it broke so easily on a simple program like "99 bottles").

Is this the same version of "99 bottles" that breaks HLA on Linux?

Best,
Frank

From: Charles A. Crayne on 26 Jan 2006 21:34

On Fri, 27 Jan 2006 00:15:02 +0000 (UTC)
Alex McDonald <alex_mcd(a)btopenworld.com> wrote:

:someCodePtr dd $ ; "pointer to self" address
:someCode ... ; code to execute
:
: mov eax, someCodePtr ; fetch the code address
: add eax, # 4 ; point at someCode
: jmp eax ; and call it
:
:What's the problem with it?

It can be replaced by 'jmp someCodePtr+4'

-- Chuck

From: Charles A. Crayne on 26 Jan 2006 22:26

On 26 Jan 2006 16:32:51 -0800
"randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:

:Now suppose that instead of a nice array of table entries, I have a
:two-dimensional table of these objects. Or how about a linked list?
:Or maybe parallel arrays? Or any other data structure I can dream up.

Just for the record, how many instances of such structures do you expect
to find in Betov's source code?

:Do you honestly think you can come up with a generic algorithm that
:will figure this stuff out and correctly map the offsets onto the new
:architecture?

Probably not a generic algorithm, although even that might be possible,
but I could probably come up with some pretty good heuristics. However,
that is neither here nor there, as the intent of the proposed tool is
merely to speed up the dull, routine, time consuming transliterations,
thus allowing a human programmer the freedom to see through all of the
obfuscations which you are trying to introduce into Betov's code.

:I stick by my assertion that translating the code from
:scratch will probably be easier than modifying the produced code for
:all but trivial applications.

Do you consider Betov's code to consist of trivial applications?
[Hint: This is a trick question.]

:Do you really
:want to review 250,000 lines of code

I'd rather review 250,000 lines of code, than write them from scratch.

:with an average of 50 bugs (or whatever) per 1,000 instructions

Is this the error rate you typically see from your own product? Or is this
just part of the 79.25% of all statistics which are made up on the spot?

:If inform or your scripting
:language was anywhere near as semantically complex as 80x86 assembly
:language, you'd have a point. But it's not.

A bold statement from one who has seen only snippets of my scripting
language, and probably not much more of the Inform language. However, the
x86 issue is immaterial, since Betov, like yourself, already has that part
of the tool written.

-- Chuck

From: Dragontamer on 26 Jan 2006 23:45

Charles A. Crayne wrote:
> On 26 Jan 2006 16:32:51 -0800
> "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:
>
> :Now suppose that instead of a nice array of table entries, I have a
> :two-dimensional table of these objects. Or how about a linked list?
> :Or maybe parallel arrays? Or any other data structure I can dream up.
>
> Just for the record, how many instances of such structures do you expect
> to find in Betov's source code?

Question:

Why hasn't big-endian vs little-endian been brought up yet?

Especially with the onslaught of networked programs today, and even
programming languages designed *for* the internet?

Even the *data* itself may have to be converted for the code to execute
correctly. I don't see it impossible for a C compiler to output
pre-made
data so that it doesn't have to be run during runtime.

Ex: hton would be #if platform is little-endian, and then have macro
code to reverse the IP address or whatever into network byte order.

And then comes the problem of detecting whether the data needs
to be reversed or if it doesn't need to be reversed.

Other trivial examples:
Any self-modifying code, from encrypting/decrypting memory
for security reasons to compressed code would not translate
so easily from platform to platform.

And I know there are at least 3 or 4 "real world" programs on
my computer that do the above.

> :If inform or your scripting
> :language was anywhere near as semantically complex as 80x86 assembly
> :language, you'd have a point. But it's not.
>
> A bold statement from one who has seen only snippets of my scripting
> language, and probably not much more of the Inform language. However, the
> x86 issue is immaterial, since Betov, like yourself, already has that part
> of the tool written.

A nicely designed scripting language *would not be* as complex as
80x86.

The best scripting languages follow the principle "less is more", such
as
Lua or Scheme or python.

I may have not seen this "inform" myself, but if the designers had
half-a-brain,
they'd try to keep the language simple and efficient. They would not
design a language to be overly complex with 30 years of extensions that
boots up in a 16-bit mode that can address more than 1 meg of memory
only
if you toggle a switch in the keyboard port on bootup, while trying to
conform
to both CISC and RISC philosophies.

To be fair, that was more against the architecture itself :) So for a
better example:

A good language designer would not force you to stop using Floating
point
instructions when you want to add 4 words together at the same time.

--Dragontamer

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: Check out POASM
Next: Bad habits