Prev: x86 instruction set usage-difference between windows 95 and windows xp ?
Next: x86 instruction set usage-difference between windows 95 and windows xp ?
From: cr88192 on 25 Mar 2010 11:18 well, this was a recent argument on comp.compilers, but I figured it may make some sense in a "freer" context. basically, it is the question of whether or not a textual assembler is fast enough for use in a JIT context (I believe it is, and that one can benefit notably from using textual ASM here). in my case, it works ok, but then I realized that I push through it relatively low volumes of ASM, and I am left to wonder about the higher volume cases. so, some tests: basically, I have tried assembling a chunk of text over and over again (in a loop) and figuring out how quickly it was pushing through ASM. it keeps track of the time, and runs for about 10s, as well as how many times the loop has run, and from this can figure how quickly ASM is being processed. the dynamic linker is currently disabled in these tests, as this part proves problematic to benchmark due to technical reasons (endlessly re-linking the same code into the running image doesn't turn out well, as fairly quickly the thing will crash). (I would need to figure out a way to hack-disable part of the dynamic linker to use it in benchmarks). initially, I found that my assembler was not performing terribly well, and the profiler showed that most of the time was going into zeroing memory. I fixed this, partly both by reducing the size of some buffers, and in a few cases disabling the 'memset' calls. then I went on a search, trying some to micro-optimize the preprocessor, and also finding and fixing a few bugs (resulting from a few recent additions to the preprocessor functionality). at this point, it was pulling off around 1MB/s (so, 1MB of ASM per second). I then noted that most of the time was going into my case-insensitive compare function, which is a bit slower than the case-sensitive compare function (strcmp). doing a little fiddling in the ASM parser reduced its weight, and got the speed to about 1.5MB/s. as such, time is still mostly used by the case-insensitive compare, and also the function to read tokens. I am left to wonder if this is "fast enough". I am left to wonder if I should add options for a no-preprocessor + case-sensitive mode (opcodes/registers/... would be necessarily lower-case), .... but, really, I don't know how fast people feel is needed. but, in my case, I will still use it, since the other major options: having codegens hand-craft raw machine-code; having to create and use an API to emit opcodes; .... don't really seem all that great either. and, as well, I guess the volumes of ASM I assemble are low enough that it has not been much of an issue thus far (I tend not to endlessly re-assemble all of my libraries, as most loadable modules are in HLL's, and binary object-caching tends to be used instead of endless recompilation...). for most fragmentary code, such as resulting from eval or from special-purpose thunks, the total volume of ASM tends to remain fairly low (most are periodic and not that large). likely, if it did become that much of an issue, there would be bigger issues at play... or such...
From: Marco van de Voort on 25 Mar 2010 12:05 On 2010-03-25, cr88192 <cr88192(a)hotmail.com> wrote: > well, this was a recent argument on comp.compilers, but I figured it may > make some sense in a "freer" context. > > basically, it is the question of whether or not a textual assembler is fast > enough for use in a JIT context (I believe it is, and that one can benefit > notably from using textual ASM here). When we replaced the assembler with an internal one, it was 40% faster on Linux/FreeBSD, and more than 100% on windows. (overall build time) We explained the difference due to slower I/O and, mainly, slower .exe startup/shutdown time.
From: Branimir Maksimovic on 25 Mar 2010 13:34 On Thu, 25 Mar 2010 08:18:15 -0700 "cr88192" <cr88192(a)hotmail.com> wrote: > > initially, I found that my assembler was not performing terribly > well, and the profiler showed that most of the time was going into > zeroing memory. I fixed this, partly both by reducing the size of > some buffers, and in a few cases disabling the 'memset' calls. This is fasm time in compilin it's own source for all 4 platforms it supports on my machine bmaxa(a)maxa:~/fasm/source$ time fasm DOS/fasm.asm | fasm Linux/fasm.asm | fasm libc/fasm.asm | fasm Win32/fasm.asm flat assembler version 1.68 (16384 kilobytes memory) 4 passes, 83456 bytes. real 0m0.060s user 0m0.080s sys 0m0.000s bmaxa(a)maxa:~/fasm/source$ find . -name 'fasm*' -exec ls -l {} \;-rw-r--r-- 1 bmaxa bmaxa 99982 2010-03-25 18:21 ./libc/fasm.o -rw-rw-r-- 1 bmaxa bmaxa 4874 2009-07-06 15:44 ./libc/fasm.asm -rw-r--r-- 1 bmaxa bmaxa 77635 2010-03-25 18:21 ./DOS/fasm.exe -rw-rw-r-- 1 bmaxa bmaxa 5260 2009-07-06 15:44 ./DOS/fasm.asm -rw-r--r-- 1 bmaxa bmaxa 83456 2010-03-25 18:21 ./Win32/fasm.exe -rw-rw-r-- 1 bmaxa bmaxa 6160 2009-07-06 15:44 ./Win32/fasm.asm -rwxr-xr-x 1 bmaxa bmaxa 75331 2010-03-25 18:21 ./Linux/fasm -rw-rw-r-- 1 bmaxa bmaxa 4694 2009-07-06 15:44 ./Linux/fasm.asm bmaxa(a)maxa:~/fasm/source$ bmaxa(a)maxa:~/fasm/source$ find . -name '*.inc' -exec ls -l {} \; -rw-rw-r-- 1 bmaxa bmaxa 5424 2009-07-06 15:44 ./libc/system.inc -rw-rw-r-- 1 bmaxa bmaxa 50682 2009-07-06 15:44 ./expressi.inc -rw-rw-r-- 1 bmaxa bmaxa 138351 2009-07-06 15:44 ./x86_64.inc -rw-rw-r-- 1 bmaxa bmaxa 7779 2009-07-06 15:44 ./DOS/system.inc -rw-rw-r-- 1 bmaxa bmaxa 1995 2009-07-06 15:44 ./DOS/sysdpmi.inc -rw-rw-r-- 1 bmaxa bmaxa 10419 2009-07-06 15:44 ./DOS/modes.inc -rw-rw-r-- 1 bmaxa bmaxa 24541 2009-07-06 15:44 ./parser.inc -rw-rw-r-- 1 bmaxa bmaxa 37936 2009-07-06 15:44 ./assemble.inc -rw-rw-r-- 1 bmaxa bmaxa 7916 2009-07-06 15:44 ./Win32/system.inc -rw-rw-r-- 1 bmaxa bmaxa 6290 2009-07-06 15:44 ./Linux/system.inc -rw-rw-r-- 1 bmaxa bmaxa 46363 2009-07-06 15:44 ./preproce.inc -rw-rw-r-- 1 bmaxa bmaxa 3860 2009-07-06 15:44 ./errors.inc -rw-rw-r-- 1 bmaxa bmaxa 1805 2009-07-06 15:44 ./version.inc -rw-rw-r-- 1 bmaxa bmaxa 82747 2009-07-06 15:44 ./formats.inc -rw-rw-r-- 1 bmaxa bmaxa 2404 2009-07-06 15:44 ./messages.inc -rw-rw-r-- 1 bmaxa bmaxa 48970 2009-07-06 15:44 ./tables.inc -rw-rw-r-- 1 bmaxa bmaxa 2267 2009-07-06 15:44 ./variable.inc Greets! -- http://maxa.homedns.org/ Sometimes online sometimes not
From: Robbert Haarman on 25 Mar 2010 14:24 Hi CR, On Thu, Mar 25, 2010 at 08:18:15AM -0700, cr88192 wrote: > > basically, it is the question of whether or not a textual assembler is fast > enough for use in a JIT context (I believe it is, and that one can benefit > notably from using textual ASM here). I would imagine that it depends on what you consider "fast enough". > in my case, it works ok, but then I realized that I push through it > relatively low volumes of ASM, and I am left to wonder about the higher > volume cases. Right. If you handle only low volumes, many solutions tend to be "fast enough". In my experience, assemblers (that read assembly code and produce machine code) tend to be quite fast. It seems to me that many compilers spend more time processing the source language into (optimized) assembly than the assembler spends turning the resulting assembly code into machine code. On the other hand, parsing text can be quite time consuming. In programs I have profiled, it is not uncommon to find that they spend most of their time parsing their input. Although I haven't profiled any assemblers, I could easily imagine that parsing and recognizing opcodes takes up most of their time. To answer all the questions here, it would probably be a good idea to first come up with a definition of "fast enough", and then, if you find your program isn't fast enough by this definition, to profile it to figure out where it is spending most of its time. Another question is why you would be going through assembly code at all. What benefit does it provide, compared to, for example, generating machine code directly? Surely, if speed is a concern, you could benefit from cutting out the assembler altogether. Kind regards, Bob
From: Maxim S. Shatskih on 25 Mar 2010 15:15
> basically, it is the question of whether or not a textual assembler is fast > enough for use in a JIT context What is the value of this? The values of JIT: a) platform independent binaries, the platform-dependency occurs only on load and not on build. b) mandatory, really mandatory, without the chances to escape by the malicious tools use, things like exception handling, attribute-based code access rights, and garbage collection. Both a) and b) are only achievable if some IL will be used pre-JIT, not the real assembler. IL is a) platform-independent b) just has no means to bypass security, exception frames, or to do leakable memory allocations. Real ASM is not such. You can, though, invent the textual IL and binarize it on load. But what is the value of this, compared to IL binarized at build? -- Maxim S. Shatskih Windows DDK MVP maxim(a)storagecraft.com http://www.storagecraft.com |