assembler speed... [ASM]

Prev: x86 instruction set usage-difference between windows 95 and windows xp ?
Next: x86 instruction set usage-difference between windows 95 and windows xp ?

From: cr88192 on 25 Mar 2010 11:18

well, this was a recent argument on comp.compilers, but I figured it may
make some sense in a "freer" context.

basically, it is the question of whether or not a textual assembler is fast
enough for use in a JIT context (I believe it is, and that one can benefit
notably from using textual ASM here).

in my case, it works ok, but then I realized that I push through it
relatively low volumes of ASM, and I am left to wonder about the higher
volume cases.

so, some tests:
basically, I have tried assembling a chunk of text over and over again (in a
loop) and figuring out how quickly it was pushing through ASM.

it keeps track of the time, and runs for about 10s, as well as how many
times the loop has run, and from this can figure how quickly ASM is being
processed.

the dynamic linker is currently disabled in these tests, as this part proves
problematic to benchmark due to technical reasons (endlessly re-linking the
same code into the running image doesn't turn out well, as fairly quickly
the thing will crash).

(I would need to figure out a way to hack-disable part of the dynamic linker
to use it in benchmarks).

initially, I found that my assembler was not performing terribly well, and
the profiler showed that most of the time was going into zeroing memory. I
fixed this, partly both by reducing the size of some buffers, and in a few
cases disabling the 'memset' calls.

then I went on a search, trying some to micro-optimize the preprocessor, and
also finding and fixing a few bugs (resulting from a few recent additions to
the preprocessor functionality).

at this point, it was pulling off around 1MB/s (so, 1MB of ASM per second).

I then noted that most of the time was going into my case-insensitive
compare function, which is a bit slower than the case-sensitive compare
function (strcmp).

doing a little fiddling in the ASM parser reduced its weight, and got the
speed to about 1.5MB/s.

as such, time is still mostly used by the case-insensitive compare, and also
the function to read tokens.

I am left to wonder if this is "fast enough".

I am left to wonder if I should add options for a no-preprocessor +
case-sensitive mode (opcodes/registers/... would be necessarily lower-case),
....

but, really, I don't know how fast people feel is needed.

but, in my case, I will still use it, since the other major options:
having codegens hand-craft raw machine-code;
having to create and use an API to emit opcodes;
....

don't really seem all that great either.

and, as well, I guess the volumes of ASM I assemble are low enough that it
has not been much of an issue thus far (I tend not to endlessly re-assemble
all of my libraries, as most loadable modules are in HLL's, and binary
object-caching tends to be used instead of endless recompilation...).

for most fragmentary code, such as resulting from eval or from
special-purpose thunks, the total volume of ASM tends to remain fairly low
(most are periodic and not that large).

likely, if it did become that much of an issue, there would be bigger issues
at play...

or such...

From: Marco van de Voort on 25 Mar 2010 12:05

On 2010-03-25, cr88192 <cr88192(a)hotmail.com> wrote:
> well, this was a recent argument on comp.compilers, but I figured it may
> make some sense in a "freer" context.
>
> basically, it is the question of whether or not a textual assembler is fast
> enough for use in a JIT context (I believe it is, and that one can benefit
> notably from using textual ASM here).

When we replaced the assembler with an internal one, it was 40% faster on
Linux/FreeBSD, and more than 100% on windows. (overall build time)

We explained the difference due to slower I/O and, mainly, slower .exe
startup/shutdown time.

From: Branimir Maksimovic on 25 Mar 2010 13:34

On Thu, 25 Mar 2010 08:18:15 -0700
"cr88192" <cr88192(a)hotmail.com> wrote:

>
> initially, I found that my assembler was not performing terribly
> well, and the profiler showed that most of the time was going into
> zeroing memory. I fixed this, partly both by reducing the size of
> some buffers, and in a few cases disabling the 'memset' calls.

This is fasm time in compilin it's own source for all 4
platforms it supports on my machine

bmaxa(a)maxa:~/fasm/source$ time fasm DOS/fasm.asm | fasm Linux/fasm.asm
| fasm libc/fasm.asm | fasm Win32/fasm.asm
flat assembler version 1.68 (16384 kilobytes memory) 4 passes, 83456
bytes.

real 0m0.060s
user 0m0.080s
sys 0m0.000s
bmaxa(a)maxa:~/fasm/source$ find . -name 'fasm*' -exec ls -l {}
\;-rw-r--r-- 1 bmaxa bmaxa 99982 2010-03-25 18:21 ./libc/fasm.o
-rw-rw-r-- 1 bmaxa bmaxa 4874 2009-07-06 15:44 ./libc/fasm.asm
-rw-r--r-- 1 bmaxa bmaxa 77635 2010-03-25 18:21 ./DOS/fasm.exe
-rw-rw-r-- 1 bmaxa bmaxa 5260 2009-07-06 15:44 ./DOS/fasm.asm
-rw-r--r-- 1 bmaxa bmaxa 83456 2010-03-25 18:21 ./Win32/fasm.exe
-rw-rw-r-- 1 bmaxa bmaxa 6160 2009-07-06 15:44 ./Win32/fasm.asm
-rwxr-xr-x 1 bmaxa bmaxa 75331 2010-03-25 18:21 ./Linux/fasm -rw-rw-r--
1 bmaxa bmaxa 4694 2009-07-06 15:44 ./Linux/fasm.asm
bmaxa(a)maxa:~/fasm/source$
bmaxa(a)maxa:~/fasm/source$ find . -name '*.inc' -exec ls -l {} \;
-rw-rw-r-- 1 bmaxa bmaxa 5424 2009-07-06 15:44 ./libc/system.inc
-rw-rw-r-- 1 bmaxa bmaxa 50682 2009-07-06 15:44 ./expressi.inc
-rw-rw-r-- 1 bmaxa bmaxa 138351 2009-07-06 15:44 ./x86_64.inc
-rw-rw-r-- 1 bmaxa bmaxa 7779 2009-07-06 15:44 ./DOS/system.inc
-rw-rw-r-- 1 bmaxa bmaxa 1995 2009-07-06 15:44 ./DOS/sysdpmi.inc
-rw-rw-r-- 1 bmaxa bmaxa 10419 2009-07-06 15:44 ./DOS/modes.inc
-rw-rw-r-- 1 bmaxa bmaxa 24541 2009-07-06 15:44 ./parser.inc
-rw-rw-r-- 1 bmaxa bmaxa 37936 2009-07-06 15:44 ./assemble.inc
-rw-rw-r-- 1 bmaxa bmaxa 7916 2009-07-06 15:44 ./Win32/system.inc
-rw-rw-r-- 1 bmaxa bmaxa 6290 2009-07-06 15:44 ./Linux/system.inc
-rw-rw-r-- 1 bmaxa bmaxa 46363 2009-07-06 15:44 ./preproce.inc
-rw-rw-r-- 1 bmaxa bmaxa 3860 2009-07-06 15:44 ./errors.inc
-rw-rw-r-- 1 bmaxa bmaxa 1805 2009-07-06 15:44 ./version.inc
-rw-rw-r-- 1 bmaxa bmaxa 82747 2009-07-06 15:44 ./formats.inc
-rw-rw-r-- 1 bmaxa bmaxa 2404 2009-07-06 15:44 ./messages.inc
-rw-rw-r-- 1 bmaxa bmaxa 48970 2009-07-06 15:44 ./tables.inc
-rw-rw-r-- 1 bmaxa bmaxa 2267 2009-07-06 15:44 ./variable.inc

Greets!

--
http://maxa.homedns.org/

Sometimes online sometimes not

From: Robbert Haarman on 25 Mar 2010 14:24

Hi CR,

On Thu, Mar 25, 2010 at 08:18:15AM -0700, cr88192 wrote:
>
> basically, it is the question of whether or not a textual assembler is fast
> enough for use in a JIT context (I believe it is, and that one can benefit
> notably from using textual ASM here).

I would imagine that it depends on what you consider "fast enough".

> in my case, it works ok, but then I realized that I push through it
> relatively low volumes of ASM, and I am left to wonder about the higher
> volume cases.

Right. If you handle only low volumes, many solutions tend to be
"fast enough".

In my experience, assemblers (that read assembly code and produce
machine code) tend to be quite fast. It seems to me that many compilers
spend more time processing the source language into (optimized) assembly
than the assembler spends turning the resulting assembly code into
machine code.

On the other hand, parsing text can be quite time consuming. In programs
I have profiled, it is not uncommon to find that they spend most of their
time parsing their input. Although I haven't profiled any assemblers, I
could easily imagine that parsing and recognizing opcodes takes up most
of their time.

To answer all the questions here, it would probably be a good idea to
first come up with a definition of "fast enough", and then, if you find
your program isn't fast enough by this definition, to profile it to figure
out where it is spending most of its time.

Another question is why you would be going through assembly code at all.
What benefit does it provide, compared to, for example, generating machine
code directly? Surely, if speed is a concern, you could benefit from
cutting out the assembler altogether.

Kind regards,

Bob

From: Maxim S. Shatskih on 25 Mar 2010 15:15

> basically, it is the question of whether or not a textual assembler is fast
> enough for use in a JIT context

What is the value of this?

The values of JIT:

a) platform independent binaries, the platform-dependency occurs only on load and not on build.
b) mandatory, really mandatory, without the chances to escape by the malicious tools use, things like exception handling, attribute-based code access rights, and garbage collection.

Both a) and b) are only achievable if some IL will be used pre-JIT, not the real assembler.

IL is a) platform-independent b) just has no means to bypass security, exception frames, or to do leakable memory allocations.

Real ASM is not such.

You can, though, invent the textual IL and binarize it on load. But what is the value of this, compared to IL binarized at build?

--
Maxim S. Shatskih
Windows DDK MVP
maxim(a)storagecraft.com
http://www.storagecraft.com

| Next | Last
Pages: 1 2 3
Prev: x86 instruction set usage-difference between windows 95 and windows xp ?
Next: x86 instruction set usage-difference between windows 95 and windows xp ?