Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-place binary/jit updating #1087

Open
lachlansneff opened this issue Sep 28, 2020 · 14 comments
Open

In-place binary/jit updating #1087

lachlansneff opened this issue Sep 28, 2020 · 14 comments
Labels
A-jit Area: JIT compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. compile-time How fast is the code compiled

Comments

@lachlansneff
Copy link

Loris Cro and Andrew Kelley recently published a blog post (https://kristoff.it/blog/zig-new-relationship-llvm/) about Ziglang's new self-hosted compiler. Part of the post is about how they can do in-place patching of the binary to avoid doing linking when changing small amounts of code (or maybe larger amounts too, I'm not sure). This seems extremely powerful. Linking is consistently a significant bottleneck in compiling rust, and completely skipping it would probably result in crazy speedups for compiling.

I'm proposing that this project pursue this either in a similar way to Zig, by patching individual functions in the output binary, or by keeping the jit open and recompiling individual functions as the source changes.

@bjorn3
Copy link
Member

bjorn3 commented Sep 28, 2020

Thanks for pointing me to that post! Binary patching is probably hard to implement. Doing this for the JIT mode of cg_clif is much more possible. cranelift-simplejit currently doesn't support hot code swapping. I already wanted to implement this for lazy compilation in JIT mode though. Rustc doesn't have a DepNode that I can use to implement incremental compilation of single mono items, but that shouldn't be too hard to implement. rust-lang/rust#76474 will make it possible to create a custom rustc driver that would listen for file changes and pass the necessary persistent state to the various runs of the codegen backend.

@bjorn3 bjorn3 added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Sep 28, 2020
@lachlansneff
Copy link
Author

lachlansneff commented Sep 28, 2020

@bjorn3 That's fantastic, I'm glad to hear that the prerequisite components for this to work are mostly already on the roadmap.

As for the binary reloading, I don't think it'd actually be too bad, since the way that Zig does this is by including a function lookup table in the binary that can be modified very easily to redirect functions to new versions. It would require logic for each output format (mach, elf, pe, wasm, etc) though.

I am more immediately excited for a jit mode, since continuous in-memory compilation would be faster than touching disk anyhow.

@bjorn3 bjorn3 added A-jit Area: JIT compilation compile-time How fast is the code compiled labels Sep 30, 2020
@bjorn3
Copy link
Member

bjorn3 commented Oct 11, 2020

I have been trying to revive an old branch for lazy compilation in jit mode. Currently the following program sometimes panics. (about one in five tries I think)

#![feature(
    no_core, start, lang_items, box_syntax, never_type, linkage,
    extern_types, thread_local
)]
#![no_core]
#![allow(dead_code, non_camel_case_types)]

extern crate mini_core;

use mini_core::*;
use mini_core::libc::*;

unsafe extern "C" fn my_puts(s: *const i8) {
    puts(s);
}

#[lang = "termination"]
trait Termination {
    fn report(self) -> i32;
}

impl Termination for () {
    fn report(self) -> i32 {
        unsafe {
            0
        }
    }
}

#[lang = "start"]
fn start<T: Termination + 'static>(
    main: fn() -> T,
    argc: isize,
    argv: *const *const u8,
) -> isize {
    main().report();
    0
}

macro_rules! assert_eq {
    ($l:expr, $r: expr) => {
        if $l != $r {
            panic(stringify!($l != $r));
        }
    }
}

struct Unique<T: ?Sized> {
    pointer: *const T,
    _marker: PhantomData<T>,
}

impl<T: ?Sized, U: ?Sized> CoerceUnsized<Unique<U>> for Unique<T> where T: Unsize<U> {}

fn take_unique(_u: Unique<()>) {}

fn main() {
    take_unique(Unique {
        pointer: 0 as *const (),
        _marker: PhantomData,
    });

    extern {
        #[linkage = "extern_weak"]
        static ABC: *const u8;
    }

    {
        extern {
            #[linkage = "extern_weak"]
            static ABC: *const u8;
        }
    }

    unsafe { assert_eq!(ABC as usize, 0); }
}

@bjorn3
Copy link
Member

bjorn3 commented Oct 12, 2020

Found the problem. I wasn't correctly saving definitions of weak linkage statics.

@bjorn3
Copy link
Member

bjorn3 commented Oct 12, 2020

Currently lazy jit compilation is significantly slower. Probably because of the fact that once a function is jitted, previous references of it still go through the compilation shim.

Benchmark #1: /home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 --jit example/std_example.rs --target x86_64-unknown-linux-gnu
  Time (mean ± σ):      1.476 s ±  0.059 s    [User: 1.412 s, System: 0.062 s]
  Range (min … max):    1.435 s …  1.632 s    10 runs
 
Benchmark #2: /home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 example/std_example.rs --crate-type bin --target x86_64-unknown-linux-gnu &&  ./target/out/std_example arg
  Time (mean ± σ):     650.3 ms ±  14.7 ms    [User: 553.7 ms, System: 96.4 ms]
  Range (min … max):   639.6 ms … 687.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  '/home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 example/std_example.rs --crate-type bin --target x86_64-unknown-linux-gnu &&  ./target/out/std_example arg' ran
    2.27 ± 0.10 times faster than '/home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 --jit example/std_example.rs --target x86_64-unknown-linux-gnu'

This should be fixable using a GOT, which would also help with other kinds of function replacement, like the in-place jit updating proposed by this issue.

@bjorn3
Copy link
Member

bjorn3 commented Oct 13, 2020

Implemented a GOT+PLT for SimpleJIT. It mostly works, but crashes for mutable static writes. (haven't tried reads)

#![feature(
    no_core, start, lang_items, box_syntax, never_type, linkage,
    extern_types, thread_local
)]
#![no_core]

extern crate mini_core;

#[lang = "start"]
fn start(main: fn(), argc: isize, argv: *const *const u8) -> isize {
    unsafe {
        NUM = 43;
    }
    0
}

static mut NUM: u8 = 6 * 7;

fn main() {}
(gdb) disassemble 0x7fffe807e040,+0x22
Dump of assembler code from 0x7fffe807e040 to 0x7fffe807e062:
   0x00007fffe807e040:  rex push %rbp
   0x00007fffe807e042:  mov    %rsp,%rbp
   0x00007fffe807e045:  mov    -0x202c(%rip),%rax        # 0x7fffe807c020
   0x00007fffe807e04c:  rex mov $0x2b,%ecx
   0x00007fffe807e052:  movzbl %cl,%ecx
=> 0x00007fffe807e056:  mov    %cl,(%rax)
   0x00007fffe807e059:  rex mov $0x0,%eax
   0x00007fffe807e05f:  rex pop %rbp
   0x00007fffe807e061:  retq   
End of assembler dump.
(gdb) info registers rax 
rax            0x6e696d243032752a  7956010218921555242
(gdb) p/x *(long*)0x7fffe807c020
$1 = 0x6e696d243032752a

@lachlansneff
Copy link
Author

Looks like you're getting there!

@bjorn3
Copy link
Member

bjorn3 commented Oct 14, 2020

I was accidentally reading the got entry in get_got_entry instead of returning the address of the got entry itself. mini_core_hello_world.rs now works with GOT+PLT.

Edit: std_example.rs also works. 🎉

@bjorn3
Copy link
Member

bjorn3 commented Apr 6, 2021

Discussion on the Bevy discord about how to handle changing types in case of hot code swapping: https://discord.com/channels/691052431525675048/692572690833473578/828930167648813086

Dylan-DPC-zz referenced this issue in Dylan-DPC-zz/rust Apr 19, 2021
…r=wesleywiser

Introduce CompileMonoItem DepNode

This is likely required for allowing efficient hot code swap support in cg_clif's jit mode. My prototype currently requires re-compiling all functions, which is both slow and uses a lot of memory as there is not support for freeing the memory used by replaced functions yet.

cc https://github.com/bjorn3/rustc_codegen_cranelift/issues/1087
@bjorn3
Copy link
Member

bjorn3 commented Jul 19, 2021

For future reference: some more discussion at https://discord.com/channels/691052431525675048/730525730601041940/866626266173276170

@mav3ri3k
Copy link

mav3ri3k commented Mar 1, 2024

Hi @bjorn3,
I would love to know the current state of in place binary updating. From the issue I extract that some ground work has been laid and it is working for basic examples.
Rust code-generation in jit would be great during debug scenarios.

@bjorn3
Copy link
Member

bjorn3 commented Mar 1, 2024

There is no support for in place binary patching. The default system linker is used by cg_clif. Supporting in place binary patching requires a linker with specific support for this, which the system linker doesn't have. As for jitting, there is support for a jit mode (disabled in the rustup distributed version of cg_clif), but it is generally slower than aot compilation as it is entirely incompatible with incr comp. There is a branch for runtime patching in jit mode, but I haven't touched it in years and it leaks a lot of memory on every update, quickly leading to a crash.

@adaszko
Copy link

adaszko commented Mar 1, 2024

Is using Zig as a linker an option? They explicitly have binary patching style linking as a feature. There's a cargo plugin for Zig and some positive experience reports: https://users.rust-lang.org/t/costs-of-using-zig-linker/88525

Andrew Kelley had a live coding session on Vimeo where he hot-swapped parts of the program via ptrace API on Linux. All done in milliseconds. Sadly, it seems the video got deleted since and I can't find it anymore. There's an open GitHub issue about this functionality so it's not finished.

It would be fantastic if Rust feedback cycles could be shortened thanks to Cranelift.

@bjorn3
Copy link
Member

bjorn3 commented Mar 1, 2024

My understanding is that zig cc uses regular lld. It merely handles things relevant for cross-compiling, but is otherwise just a regular linker. For hot code swapping in zig I can't find anything about it every going beyond the prototype phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-jit Area: JIT compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. compile-time How fast is the code compiled
Projects
None yet
Development

No branches or pull requests

4 participants