Prev: Check out POASM
Next: Bad habits
From: randyhyde@earthlink.net on 27 Jan 2006 14:38 Charles A. Crayne wrote: > On 26 Jan 2006 16:32:51 -0800 > "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote: > > :Now suppose that instead of a nice array of table entries, I have a > :two-dimensional table of these objects. Or how about a linked list? > :Or maybe parallel arrays? Or any other data structure I can dream up. > > Just for the record, how many instances of such structures do you expect > to find in Betov's source code? Is this tool *simply* for Betov's personal use? If so, it will be *far* more practical to take the time wasted on writing the tool and invest that time into hand translating all of his applications to the new CPU. Or better yet, spend 25% of the time learning C/C++ and another 25% of the time rewriting the apps in C/C++ in a truly portable manner. And as for the structs appearing in Rene's code, RosAsm doesn't support structs, which is one more reason the conversion to a different CPU is going to be more difficult. When you've got statements like "mov eax, D$someFakeStruct+someOffset, it's a bit more work to divine the meaning of this statement and adjust the offsets of the structure fields so that they work properly on a CPU that doesn't support the same alignments as the x86. > > :Do you honestly think you can come up with a generic algorithm that > :will figure this stuff out and correctly map the offsets onto the new > :architecture? > > Probably not a generic algorithm, although even that might be possible, > but I could probably come up with some pretty good heuristics. The problem with "pretty good heuristics" is that we're discussing *one* issue here. A nasty one, but *one* issue. This isn't a "tip of the iceberg" issue, it's more like "a grain of sand on the beach" issue. That is, once you've come up with a heuristic to handle this one problem, you're faced with a *tremendous* number of other issues you've got to deal with. Even if you limit yourself to the simplistic code that Rene claims to write, this is *not* a trivial process. > However, > that is neither here nor there, as the intent of the proposed tool is > merely to speed up the dull, routine, time consuming transliterations, > thus allowing a human programmer the freedom to see through all of the > obfuscations which you are trying to introduce into Betov's code. You're making the assumption that Rene is writing a tool to translate *his* particular assembly code to other CPUs (which mainly consists of RosAsm, at this point). I certainly didn't get that impression. Why waste an *incredible* amount of time writing a tool that translates the RosAsm system source code to some other CPU when you can manually convert that code in less time? No, the only thing that makes sense is to write a *generic* tool that *all* RosAsm users can use. A tool he can add to the RosAsm bullet list so he can brag about "two clicks CPU code conversion". And, sadly, a tool that will be just as broken as his disassembler. The fact that Rene sticks to a simplified subset of the x86 instruction set that might be easier to translate to a different CPU doesn't mean that *all* RosAsm users do so. Look at Wannabee's code posted around here, for example. In many respects it is *far* more sophisticated than Rene's code (hey, at least he is *attempting* OOP in RosAsm, even if it's not quite there yet). I can promise you that any translator Rene might be capable of writing will break on OOP. > > :I stick by my assertion that translating the code from > :scratch will probably be easier than modifying the produced code for > :all but trivial applications. > > Do you consider Betov's code to consist of trivial applications? > [Hint: This is a trick question.] I do not particularly consider *Rene's* code to be the benchmark here. I'm thinking more in terms of the RosAsm user base. Though RosAsm doesn't have a *lot* of different users, the user base *is* big enough that you cannot consider the conversion of Rene's code to be sufficient for a general RosAsm tool. Let's face it. We could define a small subset of the x86 instruction set that is easy to translate, and impose all kinds of restrictions (like you must access data in a certain way and all memory references have to be through labels, and the like). Then it would be possible to translate this restricted x86 code to another processor. But by doing this, you've given up the whole reason for using assembly language in the first place -- the power of the native CPU's language. If you're going to place such restrictions on the user, then just use C and you'll get *much* better results. > > :Do you really > :want to review 250,000 lines of code > > I'd rather review 250,000 lines of code, than write them from scratch. And once you get into it, you'll decide that it's better to write the code from scratch. The code is *not* going to be pretty (just as bad as reading disassembled code). And the semantic problems are going to be subtle. As Alex pointed out, what do you do with ADC when there is no carry flag on the target processor? Sure, you can emit (a lot of) code to simulate the carry flag, but do you *really* want to read through all this code? And without an optimizer to clean up afterwards (which I can assure you that Rene is not capable of writing), you're going to have to *completely* emulate all the flags and other semantics of each instruction. E.g., what are you going to do when you encounter a JPO or JPE instruction? True, Rene might not use these instructions in *his* code, but if he distributes this tool as a general purpose tool, he has to allow for the fact that someone else might what to use these instructions. Fortunately, Win32 is protected mode, so you don't have to worry about things like "IN" and "OUT" emulation, but how on earth are you going to translate something like "mov eax, dword ptr fs:[0]" to a new architecture (code that is going to appear in any application that supports structured exception handling in Windows). Again, we're talking grains of sand here. The list goes on and on and on and on and .... > > :with an average of 50 bugs (or whatever) per 1,000 instructions > > Is this the error rate you typically see from your own product? Or is this > just part of the 79.25% of all statistics which are made up on the spot? Made up on the spot. But having written a data flow analyzer for the 6502 (a *much* simpler CPU than the x86, mind you), and having looked into translating HLA source code to PPC assembly or C a few years back, I am obviously a bit more aware of the problems than either you or Rene with respect to this conversion process. The reason you don't see such a tool for HLA today is because I determined that the result was impractical. To do it right would result in unacceptably bad performance, after a *ton* of work. To do a semi-automated tool as you suggest would result in too many semantic miscues, injecting bugs into the final result. BTW, it is interesting to note that a semi-automated tool is about as useless as an automatic disassembler. A semi-automated tool forces you to maintain *two* versions of the software after a successful conversion (including the manual process). That's why HLLs are so popular -- properly written, you only need to maintain *one* source file, not "n" sources files (one for each target CPU). If you can't do an automated conversion so you only have to maintain one source file, the tool won't find much use. This is why, for example, I briefly considered the HLA->C conversion after deciding that the HLA->PPC conversion just wouldn't work. > > :If inform or your scripting > :language was anywhere near as semantically complex as 80x86 assembly > :language, you'd have a point. But it's not. > > A bold statement from one who has seen only snippets of my scripting > language, and probably not much more of the Inform language. I know nothing of your language, but I have looked at Inform (back when I was working on the AGE project). But as someone who has *taught* compiler courses, and as the author of an assembler that does a source to source conversion as part of the assembly process, let's just say that I happen to know a *little* bit about this process. Translating from one language to another is a practical thing to do if the target language can efficiently represent all the semantics of the source language. For example, the original C++ compiler emitted C code. And many VHLL languages emit C (or some other HLL). Consider Flex and Bison, for example. These translations work because the target language is semantically capable of (efficiently) representing the machine abstraction of the source language. I suspect that this is true for your scripting language vs. Inform (Inform is very capable, IIRC). I could be wrong, but it's a good guess. BTW, a semi-automated tool for your purposes is not a bad idea. After all, I seriously doubt that there is anywhere *near* the amount of work needed for the convertor as would be needed for the alternate CPU translator. Further, I don't expect that you're continuing to develop or maintain your existing scripts and need the ability to maintain only one file (i.e., once the conversion is done, you can use Inform and stop using your tool). It's (probably) not like there is an existing user base of your tool that would insist on automated conversions for new products. You do the conversion once (even an 80% solution) and then move on to Inform. Big difference in the usage of these tools. Of course, the other issue that makes Rene's proposed tool less than useful is the fact that porting the machine code is only *part* of the problem. There's also the problem of the OS interface. Of course, he is discussing only a WinCE port (as best I can tell) and WinCE is *similar* to Win32, but there is little chance that his code will port to WinCE. Just as there are semantic differences between x86 instructions and ARM (or whatever) instructions, there are semantic differences between the API functions in Win32 and WinCE. Some calls don't exist, some calls are new, and some calls behave differently. Yes, there is a subset of calls you can make to write portable code, but do you really think that Rene's existing code (or the existing RosAsm code base) has stuck to this subset? Again, if you want code that runs portably on Win32 and on various WinCE platforms, the only reasonable solution is to get Visual Studio with the processor pack and the various SDKs and pay *careful* attention to the OS calls you make. Trying to make assembly portable just isn't going to fly. If it was, I'd be well into that project by now (I've had a lot of Macintosh users ask for a Mac version of HLA over the years; fortunately, given recent events, I'm glad I ignored those requests and quickly discovered that x86->PPC wasn't practical). Cheers, Randy Hyde
From: randyhyde@earthlink.net on 27 Jan 2006 14:44 Dragontamer wrote: > Charles A. Crayne wrote: > > On 26 Jan 2006 16:32:51 -0800 > > "randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> wrote: > > > > :Now suppose that instead of a nice array of table entries, I have a > > :two-dimensional table of these objects. Or how about a linked list? > > :Or maybe parallel arrays? Or any other data structure I can dream up. > > > > Just for the record, how many instances of such structures do you expect > > to find in Betov's source code? > > Question: > > Why hasn't big-endian vs little-endian been brought up yet? As I've mentioned in other posts, the list of problems why a conversion won't work is *nearly* endless. Each little problem you bring up is like a grain of sand on a beach of problems. As for Big vs. Little Endian, this isn't an issue with the translator that Rene is hinting about. He's proposing a translator for WinCE and WinCE only runs on processors that are little endian. Of course, endian issues would be a bigger problem on other OSes (this was one of the main reasons I gave up on an x86->PPC translator for the Mac several years ago). > > Especially with the onslaught of networked programs today, and even > programming languages designed *for* the internet? Especially given the fact that optimizers for portable HLLs are pretty good today (not as good as an expert assembly programmer, but *much* better than the code you'll get out of Rene's proposed translator). > > Even the *data* itself may have to be converted for the code to execute > correctly. Of course. The current example is pointers to code. And given that RosAsm doesn't have structures and other data structure hints to aid in the conversion process, this makes the whole thing even more difficult. > > Other trivial examples: > Any self-modifying code, from encrypting/decrypting memory > for security reasons to compressed code would not translate > so easily from platform to platform. We're assuming that we're working with source code here, so this won't be a problem. Encryption and compression are generally applied to the binary code after compilation. Cheers, Randy Hyde
From: randyhyde@earthlink.net on 27 Jan 2006 14:51 Charles A. Crayne wrote: > On Fri, 27 Jan 2006 00:15:02 +0000 (UTC) > Alex McDonald <alex_mcd(a)btopenworld.com> wrote: > > :someCodePtr dd $ ; "pointer to self" address > :someCode ... ; code to execute > : > : mov eax, someCodePtr ; fetch the code address > : add eax, # 4 ; point at someCode > : jmp eax ; and call it > : > :What's the problem with it? > > It can be replaced by 'jmp someCodePtr+4' > > -- Chuck No, "jmp someCodePtr+4" would transfer control to the address held in the dword immediately following someCodePtr. The above code transfers control to the code address at the location specified by the *sum* of the dword at someCodePtr and four. And while you might think that this is bad coding practice, or that Rene doesn't write code like this, it's quite easy to write a macro that takes advantage of this scheme to produce (maintainable) code that doesn't use a jump table (thus sparing you an extra memory access). If your "case sequences" are the same length, the trick above can be quite useful (generally, the offset will be greater than four, but the concept is the same). This is the kind of code that really demonstrates why assembly language is so cool. The fact that you can do stuff like this (it falls under Rene's "strategy optimization" moniker -- we don't need no stinking jump table! So don't put it in the code). Cheers, Randy Hyde
From: randyhyde@earthlink.net on 27 Jan 2006 15:00 Alex McDonald wrote: > > You really have lost me here. What on earth does this have to do with > this part of the thread? What does MASM bashing, along with some > remarks about my surname that I normally get from three year old > children, have to do with the code posted? Very simple. He realizes that he has lost the argument, so he's changing the subject to deflect attention away from his own mistakes. Nothing like throwing in a few insults to get you to move away from the fact that his proposal is a non-starter and that he should have researched this a little better before hand. Cheers, Randy Hyde
From: Betov on 27 Jan 2006 15:48
"randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> ?crivait news:1138392037.641543.43420(a)g14g2000cwa.googlegroups.com: > > Alex McDonald wrote: >> >> You really have lost me here. What on earth does this have to do with >> this part of the thread? What does MASM bashing, along with some >> remarks about my surname that I normally get from three year old >> children, have to do with the code posted? > > > Very simple. He realizes that he has lost the argument, so he's > changing the subject to deflect attention away from his own mistakes. > Nothing like throwing in a few insults to get you to move away from the > fact that his proposal is a non-starter and that he should have > researched this a little better before hand. Ah!... We don't have a MASM victim here: That one would not even be able to understand, even when explained. :) Betov. < http://rosasm.org > |