Skip to content

Dynamic Binary Translation

Xiangyan Sun edited this page Mar 14, 2015 · 6 revisions

Foreign LINUX comes with a fast dynamic binary translator. It is capable of translating Linux binaries on the fly to overcome difficulties noted in incompatibilities. Its code is located at "dbt/" in the source tree.

Basic translation loop

The pseudo code is:

for (;;) {
    if (!(translated_pc = find_translation(pc))
        translated_pc = translate_basic_block(pc);
    execute(translated_pc);
}

The translation process runs on units of basic blocks. A basic block is a immediate flow of instructions with a branch instruction (jmp, call, etc) at the tail, but not inside. A translated basic block can be directly executed on CPU. When the last branch instruction is executed, we then need to determine whether the branch target is already translated. If so, we can simply jump there. Otherwise, we have to translate the basic block first. A call to dispatch trampoline is added to every translated basic block. Therefore the outer for loop in the above pseudo code is not needed in real implementation.

The main basic block translation procedure is located in function dbt_translate().

The basic translation loop is pretty easy to implement. But without any optimizations it will run extremely slow. It is obvious the most time consuming part in the execution is when we are doing branching. Therefore we mainly focus on optimizations of branch instructions.

Branch instructions are categorized into two categories: direct branch and indirect branch. Indirect branch can be further divided as general indirect branches and return instructions.

Direct branch

If a branch target can be determined at translation time, it is a direct branch. For example, calling a fixed function is a direct branch. There is only one deterministic target for the branch, thus it will be pretty easy to optimize.

When we encounter a direct branch. We first check whether it is already implemented. If so, we can easily encode the translated branch target. Otherwise, we build a trampoline which looks like: (dbt_get_direct_trampoline)

push $patch_addr push $target_pc jmp dbt_find_direct_internal

where patch_addr is a pointer to the branch target immediate in the original direct branch instruction. dbt_find_direct_internal is an assembly function for saving and restoring program context, it will call dbt_find_next. When it is called, it will translate the missing basic block, then patch the translated pc to patch_addr. Therefore when the original branch instruction is encountered the second time, it will directly jump to the correct target with no overhead.

Indirect branch

Indirect branch instructions looks like call *eax, or jmp [edx+ecx*4]. Since the branch target cannot be determined at translation time, it must be dispatched at run time. Its effectiveness directly affects the performance of a DBT system. Currently the SIEVE approach is used to effectively handle indirect branches.

TBA

Return handling

TBA

Call return pair optimization

TBA

CPUID emulation

Since each instruction must be explicitly supported to work. We cannot pass the CPUID information of the host directly to the application. Consider the scenario when AVX is not supported in flinux but the CPU supports it. Directly using native CPUID will make the application think they can use AVX instruction which they can't. Therefore we have to intercept CPUID instruction and mask out features we don't support.

FS/GS handling

TBA

References

TBA