Make it possible to use this as backend for the rust REPL #817

bjorn3 · 2019-11-29T18:04:17Z

The rust REPL is currently in development: rust-lang/compiler-team#213

Discussion with @alexreg about integration: https://zulip-archive.rust-lang.org/131828tcompiler/81358REPLCraneliftbackend.html

Depends on https://github.com/bytecodealliance/cranelift/issues/1260 to prevent a memory leak.

bjorn3 · 2019-11-30T10:56:23Z

Necessary changes

Implement as_debug intrinsic once it exists. (easy)
Make it possible to specify certain addresses to store the locals of the main closure in.
Prevent duplicate codegen across multiple compilation sessions in the same process (instance hash -> ptr map in static?)
Dealloc defintively unused functions after jit execution. Cast to fnptr and creation of vtables should inhibit optimization.
After changing a function, the old one should remain for exisiting fnptrs/vtables/threads.
Some form of crash recovery. Maybe fork()? (hard) It may also be possible to serialize variables using serde from the REPL side.

Did I miss something @alexreg?

alexreg · 2019-11-30T16:02:16Z

Yep, I think mainly right. A few further thoughts:

We decided we probably don't actually need a custom version of __rustc_codegen_backend that returns a struct, since the REPL itself will handle all allocation/deallocation of "user fn" locals (including for the return value I guess). But obviously we do need to pass in a local <-> memory address mapping to codegen_crate.
When you write "prevent duplicate codegen", this is for all functions an methods you mean, right? I think the best thing to do would be to have a DefPath <-> memory address map passed in alongside the other map for locals. Perhaps you meant a hash of both the header and body of the fn/method?
The problem I now see with deallocating "definitively unused" functions after JIT execution (except the cass you mention) is that if each time we compile we have, say, fn a and fn b that creates a fnptr to a, we will create new allocations for fn a each time, even if a does not change! Unless I'm mistaken?
Maybe instead of the above two bullet points, we leave everything up to cg-clif to compile the fn again and then compare it to the previously-compiled version of the fn (whose address it can cache for itself). It can then do a memcmp to see whether to deallocate the just-allocated version of the fn. Maybe not the most time-efficient, but certainly space-efficient and reliable. (And avoids the problem of hash collision, which is admittedly very unlikely but possible.)
Maybe we can get simplejit to use the in-memory incr. comp. cache that I implemented for rustc, and avoid all the above?
Ideally, we avoid forking (partly because of the cost of doing the equivalent on Windows). We can catch panics, of course, but things like segfaults (if we allow unsafe code) are problematic. Maybe we get the JIT to do bounds checking? Maybe we fork only if there's unsafe code? Or we just say "it's the user's fault" if they're doing anything unsafe? In any case it's probably worth investigating what a good C(++) REPL does regarding this problem.

bjorn3 · 2019-11-30T17:57:12Z

We decided we probably don't actually need a custom version of __rustc_codegen_backend that returns a struct, since the REPL itself will handle all allocation/deallocation of "user fn" locals (including for the return value I guess). But obviously we do need to pass in a local <-> memory address mapping to codegen_crate.

Being able to pass the local <-> address map was the reason to use a custom __rustc_codegen_backend.

When you write "prevent duplicate codegen", this is for all functions an methods you mean, right? I think the best thing to do would be to have a DefPath <-> memory address map passed in alongside the other map for locals. Perhaps you meant a hash of both the header and body of the fn/method?

Yes, I meant a hash of both the fnsig and mir of all functions, methods and statics (the Instance type is used by rustc to couple a DefId and Subst for those) I also just realized that the hash of an Instance doesn't contain the mir, so that will need to be hashed together.

The problem I now see with deallocating "definitively unused" functions after JIT execution (except the cass you mention) is that if each time we compile we have, say, fn a and fn b that creates a fnptr to a, we will create new allocations for fn a each time, even if a does not change! Unless I'm mistaken?

If an fnptr is created to a function, it will be assumed to be used. When the function is not changed, the old compilation will be reused. ("prevent duplicate codegen")

Maybe instead of the above two bullet points, we leave everything up to cg-clif to compile the fn again and then compare it to the previously-compiled version of the fn (whose address it can cache for itself). It can then do a memcmp to see whether to deallocate the just-allocated version of the fn. Maybe not the most time-efficient, but certainly space-efficient and reliable. (And avoids the problem of hash collision, which is admittedly very unlikely but possible.)

I meant the reusing of previous compilation to be handled by cg_clif. As for the hash collision, that would screw the incr cache up anyway, as the same hash used for the incr cache will be used.

Maybe we can get simplejit to use the in-memory incr. comp. cache that I implemented for rustc, and avoid all the above?

We have to handle the case of fn a casted to an fnptr, next time being unused and then being used again.

Ideally, we avoid forking (partly because of the cost of doing the equivalent on Windows)

Yeah

We can catch panics, of course, but things like segfaults (if we allow unsafe code) are problematic.

Panics can't be catched yet. And segfaults are indeed problematic.

Maybe we get the JIT to do bounds checking?

Yes, but that would only help if it can't call native methods. Emulating those is going to take a lot of time. (even miri supports only a fraction after several years)

Maybe we fork only if there's unsafe code?

Almost everything calls unsafe code. Simply println!() calls libc::write, which is unsafe. This means that all REPL provided code calls unsafe code.

Or we just say "it's the user's fault" if they're doing anything unsafe?

At leasr it would be nice to recover serializable values. And maybe add a :crash_recoverable command to mark certain inputs as prefering fork() crash recovery.

alexreg · 2019-12-01T00:16:23Z

Being able to pass the local <-> address map was the reason to use a custom __rustc_codegen_backend.

Yeah, but I just meant it doesn't need to return anything back to the REPL... I think?

Yes, I meant a hash of both the fnsig and mir of all functions, methods and statics (the Instance type is used by rustc to couple a DefId and Subst for those) I also just realized that the hash of an Instance doesn't contain the mir, so that will need to be hashed together.

Indeed, it doesn't. I wonder if we should be worried about hash collision though... if we should, I think my memcmp approach works, yes? At a cost of extra computation, but no extra memory (except ephemeral memory usage).

The problem I now see with deallocating "definitively unused" functions after JIT execution (except the cass you mention) is that if each time we compile we have, say, fn a and fn b that creates a fnptr to a, we will create new allocations for fn a each time, even if a does not change! Unless I'm mistaken?

If an fnptr is created to a function, it will be assumed to be used. When the function is not changed, the old compilation will be reused. ("prevent duplicate codegen")

Oh, I was being obtuse. So, the optimisation is basically "don't codegen the same fn/method & body twice", with the caveat that we can't do this optimisation when a fnptr is created to that fn/method.

I meant the reusing of previous compilation to be handled by cg_clif. As for the hash collision, that would screw the incr cache up anyway, as the same hash used for the incr cache will be used.

Fair enough (although I think future versions of incr. comp. may avoid the matter of hashing altogether, in fact). I see no reason not to use in-memory incr. comp. anyway, but that's orthogonal to these issues I suppose.

Panics can't be catched yet. And segfaults are indeed problematic.

Okay. I assume you'll work on the former at some point. With the latter, we definitely need to see how C/C++ REPLs do this.

Almost everything calls unsafe code. Simply println!() calls libc::write, which is unsafe. This means that all REPL provided code calls unsafe code.

Yeah... I was thinking we could just ignore that, but special-casing probably isn't wise.

At leasr it would be nice to recover serializable values. And maybe add a :crash_recoverable command to mark certain inputs as prefering fork() crash recovery.

Yes, not a bad idea. But again, let's see what existing C/C++ REPLs do.

bjorn3 · 2020-11-13T18:29:41Z

Unfortunately the REPL is very likely abandoned before the source was published.

alexreg · 2020-11-20T11:06:19Z

Sorry @bjorn3, I haven't been involved in Rust at all for about a year now. I've moved on to other languages and other pastures...

bjorn3 · 2020-11-20T11:47:46Z

No worries.

bjorn3 added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Nov 29, 2019

bjorn3 added the A-jit Area: JIT compilation label Mar 9, 2020

bjorn3 closed this as completed Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to use this as backend for the rust REPL #817

Make it possible to use this as backend for the rust REPL #817

bjorn3 commented Nov 29, 2019 •

edited

Loading

bjorn3 commented Nov 30, 2019

alexreg commented Nov 30, 2019

bjorn3 commented Nov 30, 2019

alexreg commented Dec 1, 2019

bjorn3 commented Nov 13, 2020

alexreg commented Nov 20, 2020

bjorn3 commented Nov 20, 2020

Make it possible to use this as backend for the rust REPL #817

Make it possible to use this as backend for the rust REPL #817

Comments

bjorn3 commented Nov 29, 2019 • edited Loading

bjorn3 commented Nov 30, 2019

Necessary changes

alexreg commented Nov 30, 2019

bjorn3 commented Nov 30, 2019

alexreg commented Dec 1, 2019

bjorn3 commented Nov 13, 2020

alexreg commented Nov 20, 2020

bjorn3 commented Nov 20, 2020

bjorn3 commented Nov 29, 2019 •

edited

Loading