Two Click disassembly/reassembly [ASM]

From: randyhyde@earthlink.net on 5 Feb 2006 13:04

Charles A. Crayne wrote:
> On 3 Feb 2006 18:56:39 -0800
> "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:
>
> :Now let's consider *every* line of RosAsm code ever written. Is that a
> :sufficient amount?
>
> Insufficient data. However, the 250,000 lines that you mentioned
> previously would certainly qualify. What do you like for a productivity
> figure for such work? Even at 500 lines/day, the project would take 500
> days. Now, if a tool could translate 80% of those lines, the savings would
> be 400 days.

If it translated them *perfectly* and you didn't have to review *each
line*, that would be the savings. If the remaining 20% was all in one
spot, rather than interspersed throughout the code, you might get those
savings. If the correctness of that 80% didn't depend on the other 20%,
you might get those savings. Shall I go on? Surely, someone with the
experience you claim to have doing software conversions realizes the
fundamental flaw with your simplistic claim here?

>
> :Download the demo and give it a shot.
>
> Impractical, in my case, since I don't do Windows, anymore. However, I did
> download and read the manual.

And all your experience over all these years didn't tell you not to buy
into the marketing hype?

>
> :No offense, but when the program breaks on something as simple as this,
> :I think it's fair to forgive me for not having a whole lot of
> :confidence in the operation of the rest of the program.
>
> You didn't think that 'jmp jmpTbl[eax*4-8]' was a simple thing, in your
> previous posts, when you were using a similar construction as the number
> one reason why such a tool is impractical.

Uh, you're confused. The construct I tested the code with had *nothing*
to do with the earlier example I gave as a problem. If you don't
understand the difference between the two examples, then as Rene might
say, you should learn a little more 80x86 assembly langauge
programming. The example I gave to PortASM86 is a *very* typical switch
statement implementation. Indeed, the PPC code it generates for that
jmp (as I would expect) is quite correct. It is the fact that it can't
handle the *labels* that is a bit of a concern here.

Just so you realize this, there is a *big* difference between:

jmp jmpTbl[eax*4-8]

and

mov eax, jmpTbl[eax*4]
sub eax, 8
jmp eax

If you don't understand the difference, you need to think about this
problem for a while. The former is fairly easy to convert (at least
across 32-bit architectures). The latter is extremely difficult to
convert.

> In addition, you should not
> leap to judgement before trying the jump table hint described in the
> manual. All they ask is that you tell the tool the beginning and end of
> the jump table.

In a typical Win32 assembly program (e.g., RosAsm) there are going to
be a *ton* of indirect jumps (e.g., each call to the Win32 API). Does
the phrase "cry wolf" having any meaning to you? The fact that there
will be a ton of false warnings is going to *tremendously* increase the
workload on that 80% of the code that was converted correctly.

And once again, I point out, if you've got to make modifications to the
original source code to make it translate correctly, you've got
problems. Such conversions are *bound* to introduce new defects. And,
again, you're left with the fact that you have to maintain *two* copies
of the code when the translation is complete, because you'll never be
able to retranslate the original code without repeating all the work.

>
> :IIRC, the documentation claims that the program handles the parity
> :flag.
>
> Nor have you shown any evidence that it doesn't, as the conditional jumps
> seem to have been generated correctly. It would be interesting to see if a
> instruction must set hint would clean up the 'unknown register' issue.

Please tell me what that code does, then? What is all that extra cruft
(which I *seriously* doubt is syntactically correct)? Even still, why
do I even have to look at this. Again, if this is part of the 80% of
the code that translated correctly, I still have to look at it. So the
savings you're claiming just don't follow.

>
> :I know
> :nothing about NCR machines, so I have no idea how well NCR instruction
> :semantics map to IBM 360 instructions.
>
> Very poorly. To begin with, where the basic addressability of the IBM 360
> was four 8-bit bytes within a 32-bit word, NCR used 12-bit slabs, each
> of which contained either two 6-bit characters, or three 4-bit binary
> coded decimal digits. It had no registers, and no floating point
> instructions. We had to write data conversion programs for each
> application system, and transfer the data via 7-track tapes, with no
> tape labels. The list goes on and on.

IOW, it was a lot of reengineering, not simply a "conversion". That's
quite a bit different from what Rene (or even MicroAPL) is proposing.
Different problems. There is *no question* that an x86 program can be
reengineered to the PowerPC (or some other processor). There is also no
question that you cannot write some tool (e.g., PortASM/86) to help
with the job. But it's not going to do 80% of the work for you.

> :BTW, I notice that MicroAPL has an assembly->C converter. I wonder how
> :well it works (not having downloaded anything to try it out).
>
> I, too, am very interested in knowing how good it is, as I have always
> considered such a tool to be quite difficult.

Actually, back when I looked at the problem (two years ago), the
conversion to C was actually a bit easier than the conversion to
PowerPC. But the result was still too big and slow to be of practical
use. Effectively, what you wound up doing, was calling a bunch of C
functions that emulated each of the machine instructions. Even with
data flow analysis, the result was going to be impractical. The few
examples on MicroAPL's web site seem to suggest that miracles occur.
But, of course, the examples they give are the ones where the tool
shines and they don't bother posting the code where the problems arise.

> Of course, as you know,
> merely converting it to C does not, by itself, make it portable.

Certainly there are OS issues and other problems. But I will suggest
this: converting it to C is going to making it a *heck* of a lot more
portable than conversion to some other assembly language.

The problem with the conversion process, and why the conversions are so
ugly, is because the programming paradigm for C, PowerPC assembler, and
other languages is quite a bit different from x86 assembly. E.g., you
do *not* write PPC assembly the same way you write x86 assembly. Any
attempt to convert x86 assembly on a line by line basis to PPC assembly
(or any other language) is going to produce bloat like you wouldn't
believe. The few simple examples I posted in the last posting should
demonstrate this. To someone who knows PPC assembly language, it's
obvious that code is "not right." For example, you *don't* access
memory left and write when writing RISC code. Yet a straight-forward
conversion of x86 to PPC does exactly this. Even though there are lots
of registers available to avoid this problem. Ditto for C. You don't
have things like a stack in C, so you don't program in C the same way
you do in assembly language (which is one of the benefits of using
assembly in the first place). Any attempt to convert x86 assembly to C
(or other HLL) is going to run into this problem. So even though a
semantically faithful conversion (automatic) is theoretically possible,
the result is not practical.

And PortASM/86 doesn't come close to doing the conversion automatically
for you. And even *after* you spent the considerable effort converting
the result to the target processor, what you've got is a bunch of
highly inefficient code that will be difficult for a PPC programmer to
maintain.
Cheers,
Randy Hyde

From: Charles A. Crayne on 6 Feb 2006 00:54

On 5 Feb 2006 10:04:18 -0800
"randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:

:If it translated them *perfectly* and you didn't have to review *each
:line*, that would be the savings. If the remaining 20% was all in one
:spot, rather than interspersed throughout the code, you might get those
:savings. If the correctness of that 80% didn't depend on the other 20%,
:you might get those savings. Shall I go on?

One can easily imagine the die-hards of yore making the same objections to
the world's first compiler. "Why do you want us to add an additional
step to the development process? Why should we have to write our programs
in some weird, restrictive language, when we have to review *each line* of
the compiler output? Now we are going to have to maintain two
different source files. It would be faster to just throw away the
compiler, and write the program directly in assembler. Anyone who knows
assembly programming can see immediately that the compiler generated code
looks strange. . . ."

However, as history has shown, with all their warts, compilers are a fact
of life. Yes, they do have bugs. Yes, the compiled code is hard to follow.
Yes, the use of HLLs makes it easy to write bad code. And yes, a highly
skilled assembly programmer can write more efficient code. And yet, we do
not routinely review the output of a compiler; we do not maintain
separate HLL and assembly source files; it does not take
significantly longer to write and debug a program in an HLL, than in
assembler; and, for the most part, nobody cares about the relative
performance.

The fact of the matter is, that with the exception of a few of us
hobbyists, decisions about development languages and tools is made based
not upon technical considerations, but rather upon such business
considerations such as delivery dates and return on investment.

-- Chuck

From: randyhyde@earthlink.net on 6 Feb 2006 14:53

Charles A. Crayne wrote:
> On 5 Feb 2006 10:04:18 -0800
> "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:
>
> :If it translated them *perfectly* and you didn't have to review *each
> :line*, that would be the savings. If the remaining 20% was all in one
> :spot, rather than interspersed throughout the code, you might get those
> :savings. If the correctness of that 80% didn't depend on the other 20%,
> :you might get those savings. Shall I go on?
>
> One can easily imagine the die-hards of yore making the same objections to
> the world's first compiler. "Why do you want us to add an additional
> step to the development process? Why should we have to write our programs
> in some weird, restrictive language, when we have to review *each line* of
> the compiler output? Now we are going to have to maintain two
> different source files. It would be faster to just throw away the
> compiler, and write the program directly in assembler. Anyone who knows
> assembly programming can see immediately that the compiler generated code
> looks strange. . . ."

You're going off the deep end here, Chuck.
The complaints against the first compilers were of two varieties: (1)
Machines are too expensive to allow any inefficiencies to creep in, and
(2) Machines are too expensive to waste valuable time doing clerical
things like compilation (or even assembly, in earlier cases). The
issue you mention was *never* brought up to my knowledge.

>
> However, as history has shown, with all their warts, compilers are a fact
> of life. Yes, they do have bugs. Yes, the compiled code is hard to follow.

People don't have to maintain compiler output. That's a *big*
difference. The output of a translator is going to have to be manually
maintained. Surely, someone with all the years of experience you
possess would understand this, right?

> Yes, the use of HLLs makes it easy to write bad code.

What does this have to do with maintaining the output of a translator
separately from the original x86 code?

> And yes, a highly
> skilled assembly programmer can write more efficient code.

What does this have to do with maintaining the output of a translator
separately from the original x86 code?

> And yet, we do
> not routinely review the output of a compiler;

But you *will* have to review the output of the translator. Because it
does not produce semantically correct code. And we're not talking about
simple "bugs" in the compiler here. We're talking about design
decisions on the part of the program's designer not to support certain
features.

> we do not maintain
> separate HLL and assembly source files;

But you will have to maintain separate x86 assembly and PPC (or
whatever) assembly files. That's the whole point here.

> it does not take
> significantly longer to write and debug a program in an HLL, than in
> assembler; and, for the most part, nobody cares about the relative
> performance.

If you create a program that is *guaranteed* to produce semantically
correct code from an x86 assembly language source, I can promise you
that you *will* care about the performance. It starts to look pretty
bad when you've got to carry around an emulator as part of the
translated code. And while a combination emulator/translated package
*may* outperform code that is strictly emulated, neither scheme is
going to be anywhere close to the performance of the original x86 code
(or a reasonable manual port to the PPC, or whatever). This is
*exactly* why I gave up on the project. No automatic translation is
possible that wouldn't wind up with something like an order of
magnitude loss of performance. *That* is a big deal and people *would*
care about it.

>
> The fact of the matter is, that with the exception of a few of us
> hobbyists, decisions about development languages and tools is made based
> not upon technical considerations, but rather upon such business
> considerations such as delivery dates and return on investment.

And the business decision would probably be that it's cheaper, faster,
and far less expensive to port the assembly code to C and work in C
from that point forward. There is a reason assembly language is *far*
less popular than it used to be. Portability is one of those main
reasons. And attempting to waste time cleaning up code that was
converted from x86 assembly to some other assembly language is a waste
of time that could have been put to more effective use porting the code
to some portable HLL. Particularly when you consider the fact that the
performance of the result will be so low.

Keep in mind, CPU speeds are not increasing as they used to. We can no
longer count on the next generation's CPU speed to cover up an order of
magnitude performance drop because of sloppy coding (or translation).
Cheers,
Randy Hyde

From: Charles A. Crayne on 6 Feb 2006 22:15

On 6 Feb 2006 11:53:26 -0800
"randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote:

:But you will have to maintain separate x86 assembly and PPC (or
:whatever) assembly files. That's the whole point here.

Since this is the whole point, lets see if we can get it settled, once and
for all.

Given a body of x86 assembly source to convert, either it is the intent to
continue to maintain that source, or else to freeze it. If the intent is
to freeze the x86 code, then -- tool or no tool -- there is only one
source to maintain.

If the intent is to continue to maintain the x86 source, and the tool is
NOT used, then one will have to maintain separate x86 and <whatever>
assembly files. However it the tool IS used, then the worst case is that
one has to maintain two sources, but there is a possibility that only
the x86 source must be maintained, because the <whatever> source can be
regenerated every time the x86 source is updated.

So, it is clear that the use of the tool is NEVER worse (in this regard)
than not using the tool, and can sometimes be better.

-- Chuck

From: randyhyde@earthlink.net on 8 Feb 2006 00:14

Charles A. Crayne wrote:
>
> If the intent is to continue to maintain the x86 source, and the tool is
> NOT used, then one will have to maintain separate x86 and <whatever>
> assembly files. However it the tool IS used, then the worst case is that
> one has to maintain two sources, but there is a possibility that only
> the x86 source must be maintained, because the <whatever> source can be
> regenerated every time the x86 source is updated.
>
> So, it is clear that the use of the tool is NEVER worse (in this regard)
> than not using the tool, and can sometimes be better.

Maybe it's just me. But I would find *manually rewritten* code a *heck*
of a lot easier to maintain than the kind of stuff that PortASM/86 is
putting out. I suspect you don't know PPC assembly language, else you
would recognize that the code it is producing is *very bad* and not at
all written in the RISC/PPC paradigm. And therein lies our difference
of opinion, I suspect. If the code that PortASM/86 produced were
actually readable and followed standard PPC (or whatever) programming
style, I might agree with you for the *occasional* project that
requires translation. But, alas, the code is *far* worse than the stuff
I've seen *any* compiler produce. So even if your assembly code was
written in a "good" manner that made translation easy (and I'd suggest
that this is a stretch), and the conversion was almost 100% automatic,
you'd still have the problem of dealing with code that has expanded by
a factor of three or more. The problem with *every* PortASM/86 sequence
I've seen to date is that it does an instruction by instruction
conversion. This means that it winds up converting each x86 instruction
to a sequence of PPC instruction that attempt to do the same thing
(typically three or more instructions). In fact, a good PPC assembly
programmer will not do this. While a RISC assembly program *is* going
to be larger than a CISC assembly program, the difference isn't as
great as we're seeing in the PortASM/86 code.

Now the truth is, a semi-automatic conversion *could* be better than
what PortASM/86 is doing. If they did a decent data and control flow
analysis of the x86 program, rather than trying to simulate the x86
code (e.g., renaming registers RAX, RBX, etc.) and they kept data in
registers rather than going to memory every time the x86 references a
memory location, they'd generate *much* better code. But it should be
clear that the MicroAPL folks have put a *lot* of effort into this
product and they've only achieved as much as they have. Again, I would
argue, that it's *less work* to simply hand port the few apps you need
moved to a different CPU rather than go to all the hassle of writing a
*decent* translator and then hand massaging the output. Particularly
if you want to translate to more than one CPU.

But feel free to use the MicroAPL code to port your adventure game to
the PPC (or some other processor). It would be interesting to see how
much work is *really* required to pull this off (I'm assuming you wrote
it with MASM, back in the DOS days). Nothing like an actual project to
pull it off. I'd say you could then run the result on a Macintosh, but
I'm not sure if the MicroAPL stuff emits code that is compatible with
MacOS' memory layout (certainly the examples I've seen are not, but
there may be an option to allow this).

In any case, if you can truthfully demonstrate that it would only take
you about 20% of the effort to do the conversion, then you will
convince me. And as (I assume) you've frozen the x86 code, we don't
have to worry about future maintenance, right?
Cheers,
Randy Hyde

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Prev: Check out POASM
Next: Bad habits