Prev: Check out POASM
Next: Bad habits
From: Betov on 26 Jan 2006 05:08 "Charles A. Crayne" <ccrayne(a)crayne.org> ?crivait news:20060125222022.7c8bcbe7(a)heimdall.crayne.org: > Perhaps, or perhaps he is talking about converting x86 assembly > language code to source code for other processors. Yes. > In either case, his approach is probably the one which requires the > least human interaction to accomplish the above goal. This is what i suppose. > However, since > the difficulty of the task depends upon the similarity, or lack > thereof, between the source and target architectures, and since there > has not yet been any agreement on what the target architecture might > be, it is easy, albeit unproductive, to postulate theoretical > difficulties which may not be a significant consideration in real > world implementations. .... and that are not any problem, with the RosAsm Encoder Architecture, for this example, as the References are fixed last. > If, for example, the address size of the target architecture is not > four bytes, then a jump table invocation such as, 'jmp > [sometable+4*eax]' requires that both the code statement, and the > elements of the table be altered. Yes, of course. If the Addresses are not four Bytes, the port of a Table of Labels would fail. But, on one hand, this is found way less in Assembly Sources than in C / C++ Disassemblies... then, on the other hand, replacing a couple of "4"s, say, by a couple of "8"s (i think i have read somewhere there is one of the "Alien Processor" working that way... or whatever) .... would be _way_ less work that writting all of the port entirely by hand. > Some of these special cases can be handled automatically by the tool, > and others will have to be cleaned up by a human. However, I have yet > to see any arguments which reasonably suggest that the proposed tool > would not be a useful one. Yes. At least that would be a great help at porting automatically everything that can be ported automatically, and, quite frankely, when we take a look at what is really found inside most executables, making a bet on a direct re-run, does not seem to me completly stupid. :) Betov. < http://rosasm.org >
From: randyhyde@earthlink.net on 26 Jan 2006 15:09 Charles A. Crayne wrote: > On 25 Jan 2006 14:37:39 -0800 > "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote: > > :However, Rene is talking about > :converting x86 assembly language code to machine code for other > :processors. > > Perhaps, or perhaps he is talking about converting x86 assembly language > code to source code for other processors. Same problem. Doesn't matter if it's a source to source or source to binary conversion. The problem is still *very* difficult. This was researched quite a bit in the 1980s when people like Intel wanted to migrate a lot of x86 code to other processors (e.g., the IA64). You can see the results of that research today. Forgive me for believing that if this were easy enough for someone like Rene to handle, the top minds in the industry would have solved the problem a decade ago. > > In either case, his approach is probably the one which requires the > least human interaction to accomplish the above goal. However, since the > difficulty of the task depends upon the similarity, or lack thereof, > between the source and target architectures, and since there has not yet > been any agreement on what the target architecture might be, it is easy, > albeit unproductive, to postulate theoretical difficulties which may not be > a significant consideration in real world implementations. Well, he *has* mentioned ARM and PocketPC (which includes MIPS and SH8, among others). Doesn't matter though. As the example I give demonstrates problems you're going to have if there are *any* differences in opcode size, among other things. > > :mov eax, someCodePtr > :add eax, 4 > :jmp eax > > Leaving aside the fact that this pseudo-example is bad coding practice, Perhaps. But if you don't think things like this through, you wind up doing things like spending two or three years writing a disassembler that breaks whenever you insert an innocuous NOP into the disassembled result. Bottom line is that if you start coding before you *carefully* think the problem through, you wind up wasting a good part of your life writing code that will *never* work for anything other than carefully crafted demos. Rene has proven this with his disassembler, he's about to make the same mistake with code conversion. > and may never occur in the source which Betov proposes to migrate, Yeah, yeah. He proposed writing a disassembler that converted library code to RosAsm, and punted on handling problematic issues. Of course, it broke when fed a few simple routines from the HLA standard library. No offense, but I'm not expecting anything more from his "code conversion" encoder. > it > does illustrate a more general issue, which needs to be considered. > For obvious reasons, labels in the x86 source are highly unlikely to > resolve to the same addresses as the corresponding labels do in the target > source. Therefore, if one is going to even approximate line by line > translation, ALL target addresses must be symbols, so that they can be > resolved by the target assembler. > > If, for example, the address size of the target architecture is not four > bytes, then a jump table invocation such as, 'jmp [sometable+4*eax]' > requires that both the code statement, and the elements of the table be > altered. What happens with they didn't use a scaled index addressing mode? Perhaps they've used induction across a loop and they automatically add 4 to EAX on each iteration of that loop. How is the converter going to figure this out? (Hint: this is an undecideable problem; we're back to the halting problem again). > > Some of these special cases can be handled automatically by the tool, and > others will have to be cleaned up by a human. Which makes such a tool almost worthless, consider the size of modern applications (even those written in assembly). Except for trivial demo apps, it's less work to do the conversion by hand when it's all said and done. > However, I have yet to see > any arguments which reasonably suggest that the proposed tool would not be > a useful one. Well, you and Rene can feel free to spend all your free time working on a tool. But given the usefulness of such a tool (plus the fact that folks like Intel and Motorola have sunk a *lot* of money into researching this problem), I have to agree with Alex when he says "if it was that easy, it would have been done already." Think about it a moment. Do you think that Rene is the *first* person to come up with this idea? Heck, I was contemplating an HLA->C converter back in 2001, but ultimately gave up on the idea because the result would have produced code that was *way* too slow. To give you a bit of a clue, it's not that this type of conversion is impossible. It's just that the resulting code is *soooo* big and *soooo* slow it's not practical. The solutions I've see pull the JIT trick of keeping the original object code around and doing emulation on things that it couldn't compile properly. Things like labels are handled by lookup tables at run time (i.e., when you jump to an indirect address, you look up the address in a lookup table to get the target address on the new architecture). All this adds up to an incredibly slow result. Considering that the target CPUs Rene has mentioned are all *much* slower than a contemporary x86, this just isn't a practical thing to do. There is no doubt that Rene can "macro-ize" x86 instructions on other architectures. But this just *won't* produce working software except for some trivial demo apps. However, if you think it can be done, feel free to join the RosAsm team and help him tilt at a few windmills. Cheers, Randy Hyde
From: randyhyde@earthlink.net on 26 Jan 2006 15:19 Betov wrote: > "Charles A. Crayne" <ccrayne(a)crayne.org> écrivait > news:20060125222022.7c8bcbe7(a)heimdall.crayne.org: > > > > However, since > > the difficulty of the task depends upon the similarity, or lack > > thereof, between the source and target architectures, and since there > > has not yet been any agreement on what the target architecture might > > be, it is easy, albeit unproductive, to postulate theoretical > > difficulties which may not be a significant consideration in real > > world implementations. > > ... and that are not any problem, with the RosAsm Encoder > Architecture, for this example, as the References are > fixed last. You really don't understand the scope of the problem, do you? > > > > If, for example, the address size of the target architecture is not > > four bytes, then a jump table invocation such as, 'jmp > > [sometable+4*eax]' requires that both the code statement, and the > > elements of the table be altered. > > Yes, of course. If the Addresses are not four Bytes, > the port of a Table of Labels would fail. But, on > one hand, this is found way less in Assembly Sources > than in C / C++ Disassemblies... Maybe the way *you* write code, stuff like this doesn't appear very often. I can assure you that *real* assembly language programmers use stuff like this all the time. And we're not just talking about jump tables here, but tables of *any* data. And have you even considered the fact that most processors don't allow access to unaligned memory locations? Or that many target processors don't support byte-addressable memory? And as Alex as pointed out, have you considered the fact that most RISC processors don't have the same notion of "condition codes" as the x86? > then, on the other > hand, replacing a couple of "4"s, say, by a couple of > "8"s (i think i have read somewhere there is one of > the "Alien Processor" working that way... or whatever) > ... would be _way_ less work that writting all of the > port entirely by hand. What happens when the "*4" component is computed by the program rather than part of the addressing mode? How will your "encoder" figure this out? > > > > Some of these special cases can be handled automatically by the tool, > > and others will have to be cleaned up by a human. However, I have yet > > to see any arguments which reasonably suggest that the proposed tool > > would not be a useful one. > > Yes. At least that would be a great help at porting > automatically everything that can be ported automatically, > and, quite frankely, when we take a look at what is really > found inside most executables, making a bet on a direct > re-run, does not seem to me completly stupid. Only because you've never studied enough Computer Science to realize the magnitude of the problem you're attempting. It's like your disassembler. You get some crazy idea that you know how to do something so much better than everyone who came before you (and you probably don't realize that this problem has been attempted *many* times in the past, by people *much* smarter than you), and you jump in without realizing the futility of what you're trying. Oh well, waste lots of time on it. I'm sure you'll come up with yet another great demo like your automatic disassembler that works for some spoon-fed apps, but breaks on anything real-world-ish. Too bad your assembler users don't get to benefit from the real work you could have done on your *assembler* while you are wasting time writing yet another demo program. Computer Science is a formal science for exactly this reason-- so people could determine which things are impossible or impractical before they waste a good chuck of their time on the problem. What you are attempting to do is impractical. People have proven that already. But go ahead and waste your time on it. It is your time, after all. Cheers, Randy Hyde
From: Charles A. Crayne on 26 Jan 2006 18:58 On 26 Jan 2006 12:09:26 -0800 "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote: :What happens with they didn't use a scaled index addressing mode? :Perhaps they've used induction across a loop and they automatically add :4 to EAX on each iteration of that loop. And just how is this straw man any more formidable than his predecessor, which I swept away with ease? However, since you obviously didn't understand, let me go through it again, in more detail. The tool flags any statement which calls, or jumps to, an address which is not an unmodified label. The programmer can either add a new label to the x86 source and change the call/jump target address accordingly [as in your previous example], or change the target address calculation where it occurs in the loop [as in your example above] and flag the call/jmp statement as having by approved by the programmer. :Except for trivial demo :apps, it's less work to do the conversion by hand when it's all said :and done. It is difficult to believe that you typed this with a straight face, as it should be obvious to the casual observer that the larger the body of code to be converted, the more valuable the tool. :Well, you and Rene can feel free to spend all your free time working on :a tool. Unlike Betov, I do not have a large body of RosAsm source code which I wish to convert to another architecture, and therefore will not be working on that tool. However I do have a very similar situation, for which I am currently writing a conversion tool. As you may know, some years ago, I wrote a text adventure game engine, for which my wife wrote a number of game scripts. More recently, I ported the engine from DOS to Linux, but in an attempt to reach users on other hardware platforms, I have decided to port the scripts to the Inform compiler. Most of the time I have spent on this project has be devoted to learning the Inform way of doing things, which I still have not completely mastered. However, the 16 hours, or so, which I have spent writing the tool, has already saved me at least 10 times that investment, as compared to hand conversion. -- Chuck
From: Alex McDonald on 26 Jan 2006 19:15
Charles A. Crayne wrote: > > :mov eax, someCodePtr > :add eax, 4 > :jmp eax > > Leaving aside the fact that this pseudo-example is bad coding practice, The syntax may be slightly ambiguous, so permit me to clean it up; someCodePtr dd $ ; "pointer to self" address someCode ... ; code to execute mov eax, someCodePtr ; fetch the code address add eax, # 4 ; point at someCode jmp eax ; and call it What's the problem with it? -- Regards Alex McDonald |