c - Writing a JIT compiler in assembly -


i've written virtual machine in c has decent performance non-jit vm, want learn new, , improve performance. current implementation uses switch translate vm bytecode instructions, compiled jump table. said, decent performance is, i've hit barrier can overcome jit compiler.

i've asked similar question not long ago self-modifying code, came realize wasn't asking right question.

so goal write jit compiler c virtual machine, , want in x86 assembly. (i'm using nasm assembler) i'm not quite sure how go doing this. i'm comfortable assembly, , i've looked on self-modifying code examples, haven't come figure out how code generation yet.

my main block far copying instructions executable piece of memory, with arguments. i'm aware can label line in nasm, , copy entire line address static arguments, that's not dynamic, , doesn't work jit compiler. need able interpret instruction bytecode, copy executable memory, interpret first argument, copy memory, interpret second argument, , copy memory.

i've been informed several libraries make task easier, such gnu lightning, , llvm. however, i'd write hand first, understand how works, before using external resources.

are there resources or examples community provide me started on task? simple example showing 2 or 3 instructions "add" , "mov" being used generate executable code, arguments, dynamically, in memory, wonders.

i wouldn't recommend writing jit in assembly @ all. there arguments writing executed bits of interpreter in assembly. example of how looks see comment mike pall, author of luajit.

as jit, there many different levels varying complexity:

  1. compile basic block (a sequence of non-branching instructions) copying interpreter's code. example, implementations of few (register-based) bytecode instructions might this:

    ; ebp points virtual register 0 on stack instr_add:     <decode instruction>     mov eax, [ebp + ecx * 4]  ; load first operand stack     add eax, [ebp + edx * 4]  ; add second operand stack     mov [ebp + ebx * 4], eax  ; write result     <dispatch next instruction> instr_sub:     ... ; similar 

    so, given instruction sequence add r3, r1, r2, sub r3, r3, r4 simple jit copy relevant parts of interpreters implementation new machine code chunk:

        mov ecx, 1     mov edx, 2     mov ebx, 3     mov eax, [ebp + ecx * 4]  ; load first operand stack     add eax, [ebp + edx * 4]  ; add second operand stack     mov [ebp + ebx * 4], eax  ; write result     mov ecx, 3     mov edx, 4     mov ebx, 3     mov eax, [ebp + ecx * 4]  ; load first operand stack     sub eax, [ebp + edx * 4]  ; add second operand stack     mov [ebp + ebx * 4], eax  ; write result 

    this copies relevant code, need initialise registers used accordingly. better solution translate directly machine instructions mov eax, [ebp + 4], have manually encode requested instructions.

    this technique removes overheads of interpretation, otherwise not improve efficiency much. if code executed 1 or 2 times, may not worth first translate machine code (which requires flushing @ least parts of i-cache).

  2. while jits use above technique instead of interpreter, employ more complicated optimisation mechanism executed code. involves translating executed bytecode intermediate representation (ir) on additional optimisations performed.

    depending on source language , type of jit, can complex (which why many jits delegate task llvm). method-based jit needs deal joining control-flow graphs, use ssa form , run various analyses on (e.g., hotspot).

    a tracing jit (like luajit 2) compiles straight line code makes many things easier implement, have careful how pick traces , how link multiple traces efficiently. gal , franz describe 1 method in this paper (pdf). method see luajit source code. both jits written in c (or perhaps c++).


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -