Prev: x86 instruction set usage-difference between windows 95 and windows xp ?
Next: x86 instruction set usage-difference between windows 95 and windows xp ?
From: octavio on 26 Mar 2010 08:18 I use something similar to JIT in 'octaos' and is fast enought even on older computers.Instead of using a intermediate language or binary format it works directly with sources. 'octasm' can assemble about 1 million of instructions with a atom 1.6Ghz cpu but tipical programs just have a few thousands instructions because the operating system provide a library that makes most of the work. I don't like to use MB or number of lines to measure since it depends on the programing style and coments, long names or empty lines don't slow down very much.Also the multimedia data that many aplications include does not count ,since it would take the same time to load with a executable file.Well written programs should never need more that 1 million instructions. Parsing case insensitive sources should not be a big problem in your assembler,just do a table lokup to obtain the upercase char and the token type.
From: BGB / cr88192 on 26 Mar 2010 13:21 "octavio" <octavio.vega.fernandez(a)gmail.com> wrote in message news:5327089e-81c5-4a6e-8042-731347fda90b(a)d27g2000yqf.googlegroups.com... >I use something similar to JIT in 'octaos' and is fast enought even on > older computers.Instead of using a intermediate language or binary > format it works directly with sources. > 'octasm' can assemble about 1 million of instructions with a atom > 1.6Ghz cpu but tipical programs just have a few thousands instructions > because the operating system provide a library that makes most of the > work. > I don't like to use MB or number of lines to measure since it depends > on the programing style and coments, long names or empty lines don't > slow down very much.Also the multimedia data that many aplications > include does not count ,since it would take the same time to load with > a executable file.Well written programs should never need more that 1 > million instructions. > Parsing case insensitive sources should not be a big problem in your > assembler,just do a table lokup to obtain the upercase char and the > token type. > I was using MB/s for the ASM code mostly as it is easy to calculate. the fragment I am testing is essentially comment-free, and has very few empty lines (mostly, if is a glob of code from my main codegen implementing a lot of basic operations for 128-bit integer values). anyways, I currently have the thing assembling code at around 2.2 MB/s... (currently this means re-assembling my blob of ASM code around 3000 times in 10s). checking lines, 280 lines are currently in use, so, 840000 lines in 10s, or 84000 loc/s. this means at present, ~11.9us per line. maybe better if I can get the time per loc a little lower... note that my blurb does multiple opcodes per-line, since my assembler supports this. splitting out to a single opcode per line produces ~500 lines, meaning 1500000 lines in 10s, or 150000 lines per second, or ~ 6.7us per opcode/line. syntax is still presently mostly case-insensitive (although, it is case-insensitive for a few things, but stricmp is no longer high on the profiler list).
From: BGB / cr88192 on 26 Mar 2010 13:30 "Alexei A. Frounze" <alexfrunews(a)gmail.com> wrote in message news:7bb8d1d3-5ea4-4804-aef8-9098d0572bf0(a)f14g2000pre.googlegroups.com... On Mar 25, 1:19 pm, Robbert Haarman <comp.lang.m...(a)inglorion.net> wrote: .... > It's the difference between, for example: > > n += cg_x86_emit_reg32_imm8_instr(code + n, > sizeof(code) - n, > CG_X86_OP_OR, > CG_X86_REG_EBX, > 42); > > and > > (emit code '(or (reg ebx) (imm 42))) <-- Umm... Looks Lispy! :) For fun I've once implemented an x86 assembler (NASMish, but with much less functionality) in Perl. It was pretty compact (~50KB of source code). A C solution would've been much bigger. The perf relationship would've been the opposite. Which is, nonetheless, to say, domain specific or task oriented languages are a good thing. --> the main assembler machinery is about 100kB of C source (parser + opcode generating logic). 20kB is used for the COFF writer, and 120kB for the opcode-tables (mechanically-generated C). the whole thing is a bit larger though if everything else were counted (the linker, disassembler, a lot of special-purpose logic code, ...).
From: Rugxulo on 26 Mar 2010 13:54 Hi, On Mar 25, 10:04 pm, "BGB / cr88192" <cr88...(a)hotmail.com> wrote: > > "Rod Pemberton" <do_not_h...(a)havenone.cmm> wrote in message > > > Is TCC when used as TCCBOOT fast enough in a JIT context? ! ? ! ... > > can't say, I have not used tcc. > I hear it compiles fairly fast though. It's one pass, built-in assembler and linker, and its optimizations are less than GCC, so that's why. (Although, honestly, Fabrice Bellard deserves most of the credit.) Octasm is similarly fast because it's written in itself by a smart programmer (hi !) and is very cautious about multiple passes. FASM's author was very very glad to receive tips from Octavio concerning this. I think he called it the "best suggestion ever" (and he's no slouch either). Sorry, can't find that link, but here's when Privalov started speeding it up, circa 1.50 or such (maybe that'll give some good ideas): http://board.flatassembler.net/topic.php?t=854
From: BGB / cr88192 on 26 Mar 2010 15:24
"Rugxulo" <rugxulo(a)gmail.com> wrote in message news:5f74e08b-94ea-4648-9802-2eff601c900f(a)i25g2000yqm.googlegroups.com... Hi, On Mar 25, 10:04 pm, "BGB / cr88192" <cr88...(a)hotmail.com> wrote: > > "Rod Pemberton" <do_not_h...(a)havenone.cmm> wrote in message > > > Is TCC when used as TCCBOOT fast enough in a JIT context? ! ? ! ... > > can't say, I have not used tcc. > I hear it compiles fairly fast though. <-- It's one pass, built-in assembler and linker, and its optimizations are less than GCC, so that's why. (Although, honestly, Fabrice Bellard deserves most of the credit.) --> yeah. forcing my assembler into single-pass mode effectively doubles its speed (but disables automatic jump optimization). so, it is currently 2.55 MB/s with multi-passes allowed, and 4.9 MB/s single-pass. (I have spent a lot of the morning fiddly micro-optimizing the damn thing...). this puts it at currently about 3us per opcode (323817 opcodes/sec). so, I may add an optional "fast" mode which will, among other things: disable multi-pass assembly (short jumps would need to be explicit); disables the preprocessor; .... <-- Octasm is similarly fast because it's written in itself by a smart programmer (hi !) and is very cautious about multiple passes. FASM's author was very very glad to receive tips from Octavio concerning this. I think he called it the "best suggestion ever" (and he's no slouch either). --> yep. <-- Sorry, can't find that link, but here's when Privalov started speeding it up, circa 1.50 or such (maybe that'll give some good ideas): http://board.flatassembler.net/topic.php?t=854 --> yes, ok. |