Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unwinding past FFI boundaries #46833

Merged
merged 3 commits into from
Dec 24, 2017
Merged

Conversation

diwic
Copy link
Contributor

@diwic diwic commented Dec 19, 2017

Second attempt to write a patch to solve this.

r? @nikomatsakis

So, my biggest issue with this patch is the way the patch determines what functions should have an abort landing pad (in construct_fn). I would ideally have this code match src/librustc_trans/callee.rs::get_fn but couldn't find an id that returns true for is_foreign_item. Also tried tcx.has_attr("unwind") with no luck. FIXED

Other issues:

  • llvm.trap is an SIGILL on amd64. Ideally we could use panic-abort's version of aborting which is nicer but we don't want to depend on that library...

  • Mir inlining is a stub currently. FIXED (no-op)

Also, when reviewing please take into account that I'm new to the code and only partially know what I'm doing... and that I've mostly made made matches on TerminatorKind::Abort match either TerminatorKind::Resume or TerminatorKind::Unreachable based on what looked best.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@bors
Copy link
Contributor

bors commented Dec 19, 2017

☔ The latest upstream changes (presumably #45525) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -806,6 +806,7 @@ impl<'a, 'tcx> MutVisitor<'tcx> for Integrator<'a, 'tcx> {
*kind = TerminatorKind::Goto { target: tgt }
}
}
TerminatorKind::Abort => { unimplemented!("Not sure what to do here?!"); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a no-op - there are no targets to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aka the same as TerminatorKind::Unreachable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will fix for next version.


// FIXME: Figure out why we can't use something like this instead:
// tcx.is_foreign_item(tcx.hir.local_def_id(fn_id));
// tcx.has_attr(tcx.hir.local_def_id(fn_id), "unwind");
Copy link
Contributor

@arielb1 arielb1 Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you use has_attr? Why doesn't it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, tcx.has_attr(tcx.hir.local_def_id(fn_id), "unwind") returns false also for "__rust_start_panic" and "panicking::rust_begin_panic".

I don't know why, all these different contexts and ids are a bit bewildering to me. My guess is that I'm trying with the wrong ID or something, but then I don't know what ID would be the right one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch this comment, I must have done something wrong. It does seem to work in the new - just pushed - version.

// Therefore generate an extra "Abort" landing pad.

// FIXME: Figure out why we can't use something like this instead:
// tcx.is_foreign_item(tcx.hir.local_def_id(fn_id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_foreign_item checks for foreign items, aka

extern "C" {
    fn foreign_item(); // we don't generate MIR for this
}

Rather than extern fns, aka

extern "C" fn extern_fn() {
    // we *do* generate MIR for this
}

@arielb1
Copy link
Contributor

arielb1 commented Dec 19, 2017

[00:04:12] tidy error: /checkout/src/librustc_mir/build/scope.rs:619: line longer than 100 chars

pub fn schedule_abort(&mut self) -> BasicBlock {
self.scopes[0].needs_cleanup = true;
let abortblk = self.cfg.start_new_cleanup_block();
self.cfg.terminate(abortblk, self.scopes[0].source_info(self.fn_span), TerminatorKind::Abort);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line longer than 100 chars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix for next version.

// tcx.is_foreign_item(tcx.hir.local_def_id(fn_id));
// tcx.has_attr(tcx.hir.local_def_id(fn_id), "unwind");

let is_foreign = match abi {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that an abi check is exactly the right thing. The key point is that the "C" ABI (and other non-Rust ABIs) don't have a defined way to propagate Rust panics, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure.

"__rust_start_panic" and "panicking::rust_begin_panic" are of C ABI and are able to panic, so it can't be that undefined...

Rather, the real danger is when we tell LLVM that a function is "nounwind" and then we end up panicking within - or through - it. That's the undefined behavior this patch is trying to resolve.
So, then I was trying to figure out when we actually mark a function as "nounwind", and it seems now I did not look closely enough. The algorithm seems to be:

  1. ABI check - so you're right, it should be an ABI check.
  2. Set as unwinding if there is an unwind attribute
  3. Set as unwinding if it isn't a foreign item

So maybe that's what I'm supposed to mimic, or possibly try to refactor somehow if we need it in two places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be ideal to have the criteria extracted into a helper function, yes.

Copy link
Contributor

@arielb1 arielb1 Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure extracting the criteria directly would help:

  1. Unwinding across a lang boundary is "instant" LLVM UB as we emit the nounwind attribute. If we want to continue doing that, we can't also make it trap.
  2. Therefore, we want to catch unwinding before we reach the lang boundary. That means stopping unwinding from proceeding from Rust to C, because we don't control the C-to-Rust lang boundary.
  3. This means that we need to prevent unwinding on non-foreign items, which means we need to ignore the code for (3) from the previous list.

Disabling the check that allows non-foreign C ABI Rust functions to unwind would allow us to make these functions abort on unwind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be confused about something. @arielb1 what do you mean by this:

Unwinding across a lang boundary is "instant" LLVM UB as we emit the nounwind attribute. If we want to continue doing that, we can't also make it trap.

Do you mean that the call is tagged with nounwind, or the function? I was assuming the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter whether the call or the function are tagged as nounwind. In both cases, unwinding is UB LLVM-side and therefore can't be turned to an abort..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thnk we are talking past each other to a certain extent. Let me first define a set of functions I will call "border functions" -- i.e., functions implemented in Rust but which are invokable from C and hence have C ABI. For these functions, it is considered UB if they unwind (and hence these border functions may also be marked as "no-unwind"). In that case, when we generate the fn body, we can trap/abort if an unwind does occur. (This is, I believe, the same thing C++ does in such cases, though I may be mistaken.) This costs us nothing to the same extent that unwinding is "zero cost".

I guess you are saying that we should ignore the #[unwind] attribute for the purpose of this trap, and generate it anyway? This is (I guess) because C code may still call such a function? That sort of makes sense, though it does raise the question of the purpose of the #[unwind] attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure enough. So for that we'll have to remove the "Set as unwinding if it isn't a foreign item" check.

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 19, 2017
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Dec 20, 2017

OK, so @arielb1 and I were chatting on gitter, and we came to roughly this conclusion:

  • We can emit the trap on functions with non-Rust ABI.
    • As @arielb1 said above, the "foreign item" check is not relevant here.
  • We should permit such functions to have a #[unwind] attribute, which would suppress the trap.

Note though that this is a change in behavior -- albeit only quasi-defined behavior -- and it feels like it ought to go through the RFC process. Still, it'd be good to have a working implementation so that we can do a crater run and assess possible impact.

So it may not be that there is a "common helper" to extract, that's not entirely clear to me.

The Abort Terminatorkind will cause an llvm.trap function call to be
emitted.

Signed-off-by: David Henningsson <diwic@ubuntu.com>
Generate Abort instead of Resume terminators on nounwind ABIs.

rust-lang#18510

Signed-off-by: David Henningsson <diwic@ubuntu.com>
@diwic
Copy link
Contributor Author

diwic commented Dec 21, 2017

We can emit the trap on functions with non-Rust ABI.
As @arielb1 said above, the "foreign item" check is not relevant here.
We should permit such functions to have a #[unwind] attribute, which would suppress the trap.

Ok, so I think we're mostly on the same page w r t what needs to be done. I rebased it on top of master and skipped the "common helper" part.

Note though that this is a change in behavior -- albeit only quasi-defined behavior -- and it feels like it ought to go through the RFC process. Still, it'd be good to have a working implementation so that we can do a crater run and assess possible impact.

Hmm, so I was thinking "what could this possibly break" and came up with this contrived example:

extern "C" fn foo(called_from_rust: bool) { 
    if something_really_bad_happens() {
        if called_from_rust { panic!("Oh no"); }
        else { std::process::abort() }
    }
}

But even in this case; looking at the LLVM IR, we mark this function as nounwind today (not sure why - the "foreign item" check should be false so unwind should have been added?) - so even if this code seems to work in practice, it's UB in theory because we're unwinding from a nounwind function.

EDIT: So what I wanted to say - is this ever a change in behavior where the previous behavior was not UB?

@diwic
Copy link
Contributor Author

diwic commented Dec 21, 2017

@kennytm Is there a way I can remove the "waiting on author" tag, now that I've responded and so it is no longer waiting for me (but for CI and a new review pass)?

@diwic
Copy link
Contributor Author

diwic commented Dec 21, 2017

Btw: Not sure about the current status of #[unwind] - should it be an unsafe attribute or not? Should it be behind a feature gate?

Also, rust_begin_unwind was relabelled to rust_begin_panic a long time ago, still it shows up as rust_begin_unwind in backtraces. This is a bit confusing. I could change that in a separate PR.

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 21, 2017
@kennytm
Copy link
Member

kennytm commented Dec 21, 2017

@diwic Retagged :)

If this PR is no longer a work-in-progress, please also remove the "WIP" from the title.

@arielb1
Copy link
Contributor

arielb1 commented Dec 21, 2017

@diwic

Why should #[unwind] be an unsafe attribute? It makes extern "C" functions panic rather than abort, which could have unexpected effects if they are used for FFI, but FFI is unsafe anyway.

@@ -383,6 +405,11 @@ fn construct_fn<'a, 'gcx, 'tcx, A>(hir: Cx<'a, 'gcx, 'tcx>,
let source_info = builder.source_info(span);
let call_site_s = (call_site_scope, source_info);
unpack!(block = builder.in_scope(call_site_s, LintLevel::Inherited, block, |builder| {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray newline

@@ -353,6 +354,27 @@ macro_rules! unpack {
};
}

fn needs_abort_block<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this to should_abort_on_panic instead?

@arielb1
Copy link
Contributor

arielb1 commented Dec 21, 2017

r=me with nits addressed

@arielb1 arielb1 changed the title [WIP] prevent unwinding past FFI boundaries Prevent unwinding past FFI boundaries Dec 21, 2017
As suggested by arielb1.

Closes rust-lang#18510

Signed-off-by: David Henningsson <diwic@ubuntu.com>
@diwic
Copy link
Contributor Author

diwic commented Dec 21, 2017

@arielb1

Why should #[unwind] be an unsafe attribute? It makes extern "C" functions panic rather than abort, which could have unexpected effects if they are used for FFI, but FFI is unsafe anyway.

Good point. FFI can call Rust functions too with the wrong calling convention, this is no worse really.

Nits addressed.

@arielb1
Copy link
Contributor

arielb1 commented Dec 23, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Dec 23, 2017

📌 Commit 4910ed2 has been approved by arielb1

nikomatsakis added a commit to nikomatsakis/rust that referenced this pull request Feb 20, 2018
nikomatsakis added a commit to nikomatsakis/rust that referenced this pull request Feb 22, 2018
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this pull request Feb 22, 2018
bors added a commit that referenced this pull request Feb 26, 2018
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 28, 2018
This commit is targeted at addressing rust-lang#48251 by specifically fixing a case where
a longjmp over Rust frames on MSVC runs cleanups, accidentally running the
"abort the program" cleanup as well. Added in rust-lang#46833 `extern` ABI functions in
Rust will abort the process if Rust panics, and currently this is modeled as a
normal cleanup like all other destructors.

Unfortunately it turns out that `longjmp` on MSVC is implemented with SEH, the
same mechanism used to implement panics in Rust. This means that `longjmp` over
Rust frames will run Rust cleanups (even though we don't necessarily want it
to). Notably this means that if you `longjmp` over a Rust stack frame then that
probably means you'll abort the program because one of the cleanups will abort
the process.

After some discussion on IRC it turns out that `longjmp` doesn't run cleanups
for *caught* exceptions, it only runs cleanups for cleanup pads. Using this
information this commit tweaks the codegen for an `extern` function to
a catch-all clause for exceptions instead of a cleanup block. This catch-all is
equivalent to the C++ code:

    try {
        foo();
    } catch (...) {
        bar();
    }

and in fact our codegen here is designed to match exactly what clang emits for
that C++ code!

With this tweak a longjmp over Rust code will no longer abort the process. A
longjmp will continue to "accidentally" run Rust cleanups (destructors) on MSVC.
Other non-MSVC platforms will not rust destructors with a longjmp, so we'll
probably still recommend "don't have destructors on the stack", but in any case
this is a more surgical fix than rust-lang#48567 and should help us stick to standard
personality functions a bit longer.
Manishearth added a commit to Manishearth/rust that referenced this pull request Mar 1, 2018
rustc: Tweak funclet cleanups of ffi functions

This commit is targeted at addressing rust-lang#48251 by specifically fixing a case where
a longjmp over Rust frames on MSVC runs cleanups, accidentally running the
"abort the program" cleanup as well. Added in rust-lang#46833 `extern` ABI functions in
Rust will abort the process if Rust panics, and currently this is modeled as a
normal cleanup like all other destructors.

Unfortunately it turns out that `longjmp` on MSVC is implemented with SEH, the
same mechanism used to implement panics in Rust. This means that `longjmp` over
Rust frames will run Rust cleanups (even though we don't necessarily want it
to). Notably this means that if you `longjmp` over a Rust stack frame then that
probably means you'll abort the program because one of the cleanups will abort
the process.

After some discussion on IRC it turns out that `longjmp` doesn't run cleanups
for *caught* exceptions, it only runs cleanups for cleanup pads. Using this
information this commit tweaks the codegen for an `extern` function to
a catch-all clause for exceptions instead of a cleanup block. This catch-all is
equivalent to the C++ code:

    try {
        foo();
    } catch (...) {
        bar();
    }

and in fact our codegen here is designed to match exactly what clang emits for
that C++ code!

With this tweak a longjmp over Rust code will no longer abort the process. A
longjmp will continue to "accidentally" run Rust cleanups (destructors) on MSVC.
Other non-MSVC platforms will not rust destructors with a longjmp, so we'll
probably still recommend "don't have destructors on the stack", but in any case
this is a more surgical fix than rust-lang#48567 and should help us stick to standard
personality functions a bit longer.
adonis0302 added a commit to adonis0302/gnome-class that referenced this pull request Sep 23, 2023
Rust 1.24 made it so that a panic!() won't unwind across FFI
boundaries: we won't unwind if we panic inside a Rust function
declared extern "C".

rust-lang/rust#46833

However, gnome-class does not formally mandate a particular Rust
version; we've been running on the assumption that we are running on
nightly, or "recent enough".  For now, just supress the deprecation
warnings from glib-rs, until we formalize our requirements for the
rustc version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants