assembler speed... [ASM]

Prev: x86 instruction set usage-difference between windows 95 and windows xp ?
Next: x86 instruction set usage-difference between windows 95 and windows xp ?

From: octavio on 26 Mar 2010 08:18

I use something similar to JIT in 'octaos' and is fast enought even on
older computers.Instead of using a intermediate language or binary
format it works directly with sources.
'octasm' can assemble about 1 million of instructions with a atom
1.6Ghz cpu but tipical programs just have a few thousands instructions
because the operating system provide a library that makes most of the
work.
I don't like to use MB or number of lines to measure since it depends
on the programing style and coments, long names or empty lines don't
slow down very much.Also the multimedia data that many aplications
include does not count ,since it would take the same time to load with
a executable file.Well written programs should never need more that 1
million instructions.
Parsing case insensitive sources should not be a big problem in your
assembler,just do a table lokup to obtain the upercase char and the
token type.

From: BGB / cr88192 on 26 Mar 2010 13:21

"octavio" <octavio.vega.fernandez(a)gmail.com> wrote in message
news:5327089e-81c5-4a6e-8042-731347fda90b(a)d27g2000yqf.googlegroups.com...
>I use something similar to JIT in 'octaos' and is fast enought even on
> older computers.Instead of using a intermediate language or binary
> format it works directly with sources.
> 'octasm' can assemble about 1 million of instructions with a atom
> 1.6Ghz cpu but tipical programs just have a few thousands instructions
> because the operating system provide a library that makes most of the
> work.
> I don't like to use MB or number of lines to measure since it depends
> on the programing style and coments, long names or empty lines don't
> slow down very much.Also the multimedia data that many aplications
> include does not count ,since it would take the same time to load with
> a executable file.Well written programs should never need more that 1
> million instructions.
> Parsing case insensitive sources should not be a big problem in your
> assembler,just do a table lokup to obtain the upercase char and the
> token type.
>

I was using MB/s for the ASM code mostly as it is easy to calculate.

the fragment I am testing is essentially comment-free, and has very few
empty lines (mostly, if is a glob of code from my main codegen implementing
a lot of basic operations for 128-bit integer values).

anyways, I currently have the thing assembling code at around 2.2 MB/s...

(currently this means re-assembling my blob of ASM code around 3000 times in
10s).

checking lines, 280 lines are currently in use, so, 840000 lines in 10s, or
84000 loc/s.
this means at present, ~11.9us per line.

maybe better if I can get the time per loc a little lower...

note that my blurb does multiple opcodes per-line, since my assembler
supports this.
splitting out to a single opcode per line produces ~500 lines, meaning
1500000 lines in 10s, or 150000 lines per second, or ~ 6.7us per
opcode/line.

syntax is still presently mostly case-insensitive (although, it is
case-insensitive for a few things, but stricmp is no longer high on the
profiler list).

From: BGB / cr88192 on 26 Mar 2010 13:30

"Alexei A. Frounze" <alexfrunews(a)gmail.com> wrote in message
news:7bb8d1d3-5ea4-4804-aef8-9098d0572bf0(a)f14g2000pre.googlegroups.com...
On Mar 25, 1:19 pm, Robbert Haarman <comp.lang.m...(a)inglorion.net>
wrote:
....
> It's the difference between, for example:
>
> n += cg_x86_emit_reg32_imm8_instr(code + n,
> sizeof(code) - n,
> CG_X86_OP_OR,
> CG_X86_REG_EBX,
> 42);
>
> and
>
> (emit code '(or (reg ebx) (imm 42)))

<--
Umm... Looks Lispy! :)

For fun I've once implemented an x86 assembler (NASMish, but with much
less functionality) in Perl. It was pretty compact (~50KB of source
code). A C solution would've been much bigger. The perf relationship
would've been the opposite. Which is, nonetheless, to say, domain
specific or task oriented languages are a good thing.
-->

the main assembler machinery is about 100kB of C source (parser + opcode
generating logic).

20kB is used for the COFF writer, and 120kB for the opcode-tables
(mechanically-generated C).

the whole thing is a bit larger though if everything else were counted (the
linker, disassembler, a lot of special-purpose logic code, ...).

From: Rugxulo on 26 Mar 2010 13:54

Hi,

On Mar 25, 10:04 pm, "BGB / cr88192" <cr88...(a)hotmail.com> wrote:
>
> "Rod Pemberton" <do_not_h...(a)havenone.cmm> wrote in message
>
> > Is TCC when used as TCCBOOT fast enough in a JIT context? ! ? ! ...
>
> can't say, I have not used tcc.
> I hear it compiles fairly fast though.

It's one pass, built-in assembler and linker, and its optimizations
are less than GCC, so that's why. (Although, honestly, Fabrice Bellard
deserves most of the credit.)

Octasm is similarly fast because it's written in itself by a smart
programmer (hi !) and is very cautious about multiple passes. FASM's
author was very very glad to receive tips from Octavio concerning
this. I think he called it the "best suggestion ever" (and he's no
slouch either).

Sorry, can't find that link, but here's when Privalov started speeding
it up, circa 1.50 or such (maybe that'll give some good ideas):

http://board.flatassembler.net/topic.php?t=854

From: BGB / cr88192 on 26 Mar 2010 15:24

"Rugxulo" <rugxulo(a)gmail.com> wrote in message
news:5f74e08b-94ea-4648-9802-2eff601c900f(a)i25g2000yqm.googlegroups.com...
Hi,

On Mar 25, 10:04 pm, "BGB / cr88192" <cr88...(a)hotmail.com> wrote:
>
> "Rod Pemberton" <do_not_h...(a)havenone.cmm> wrote in message
>
> > Is TCC when used as TCCBOOT fast enough in a JIT context? ! ? ! ...
>
> can't say, I have not used tcc.
> I hear it compiles fairly fast though.

<--
It's one pass, built-in assembler and linker, and its optimizations
are less than GCC, so that's why. (Although, honestly, Fabrice Bellard
deserves most of the credit.)
-->

yeah.
forcing my assembler into single-pass mode effectively doubles its speed
(but disables automatic jump optimization).

so, it is currently 2.55 MB/s with multi-passes allowed, and 4.9 MB/s
single-pass.
(I have spent a lot of the morning fiddly micro-optimizing the damn
thing...).

this puts it at currently about 3us per opcode (323817 opcodes/sec).

so, I may add an optional "fast" mode which will, among other things:
disable multi-pass assembly (short jumps would need to be explicit);
disables the preprocessor;
....

<--
Octasm is similarly fast because it's written in itself by a smart
programmer (hi !) and is very cautious about multiple passes. FASM's
author was very very glad to receive tips from Octavio concerning
this. I think he called it the "best suggestion ever" (and he's no
slouch either).
-->

yep.

<--
Sorry, can't find that link, but here's when Privalov started speeding
it up, circa 1.50 or such (maybe that'll give some good ideas):

http://board.flatassembler.net/topic.php?t=854
-->

yes, ok.

First | Prev |
Pages: 1 2 3
Prev: x86 instruction set usage-difference between windows 95 and windows xp ?
Next: x86 instruction set usage-difference between windows 95 and windows xp ?