Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

Closed
wants to merge 1 commit into from

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jul 6, 2022

In #94158, we started emitting noundef, which means that functions returning uninitialized references emit IR with is both ret void and also noundef: https://godbolt.org/z/hbjsKKfc3
More generally, this change makes mem::uninitialized dangerous in a way that it wasn't before. This noundef change was shipped in the past 2 stable releases, 1.61.0 and 1.62.0.

This concerns me because in #66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB.

The pattern of creating uninit references in arrays shows up real code because the 0.11, 0.12, and 0.13 release series for hyper all use mem::uninitialized to construct arrays of httparse::Header, which contains a &str and &[u8]. There is no patch available within these release series, so a cargo update is not a viable way to escape the UB here. Also note that the 0.11 release series of hyper predates MaybeUninit, so any source-level patch for that will incur runtime overhead. Which would be unfortunate, but preferable to UB.

I discussed this with @thomcc on the community Discord, and we think that it would be prudent to revert this introduction of noundef until we have a mitigation in place for the UB that this may unleash. We haven't been able to cook up any examples of surprising optimizations due to this pattern, but the whole point of noundef is to enable optimizations, and we know that there is code which uses it in a way which is definitely instant UB and which we have declined to inform users of.

If possible, we would like to see freeze applied to the return value of mem::uninitialized as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit.

@rustbot labels add +T-compiler +I-compiler-nominated

…t do not permit raw init"

In rust-lang#94158, we started emitting `noundef`, which means that functions
returning uninitialized references emit IR with is both `ret void` and
also `noundef`: https://godbolt.org/z/hbjsKKfc3
More generally, this change makes `mem::uninitialized` dangerous in a
way that it wasn't before. This `noundef` change was shipped in the
past 2 stable releases, 1.61.0 and 1.62.0.

This concerns me because in rust-lang#66151 we have thus far decided against
issuing a panic for creating uninitialized references within arrays,
on account of the breakage that shows up in crater. If this pattern
is so widely-used that we're not willing to put in a runtime check
for it, then it doesn't seem prudent to invite LLVM to exploit this
UB.

The pattern of creating uninit references in arrays shows up real code
because the 0.11, 0.12, and 0.13 release series for `hyper` all use
`mem::uninitialized` to construct arrays of `httparse::Header`, which
contains a `&str` and `&[u8]`. There is no patch available within
these release series, so a `cargo update` is not a viable way to
escape the UB here. Also note that the 0.11 release series of `hyper`
predates `MaybeUninit`, so any source-level patch for that will incur
runtime overhead. Which would be unfortunate, but preferable to UB.

I discussed this with @thomcc on the community Discord, and we think
that it would be prudent to revert this introduction of `noundef` until
we have a mitigation in place for the UB that this may unleash. We
haven't been able to cook up any examples of surprising optimizations
due to this pattern, but the whole point of `noundef` is to enable
optimizations, and we know that there is code which uses it in a way
which is definitely instant UB and which we have declined to inform users
of.

If possible, we would like to see `freeze` applied to the return value
of `mem::uninitialized` as a mitigation for this problem. That may be
able to keep old code functioning without introducing a performance hit.

@rustbot labels add +T-compiler +I-compiler-nominated
@rustbot
Copy link
Collaborator

rustbot commented Jul 6, 2022

Error: Label add can only be set by Rust team members

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 6, 2022
@rust-highfive
Copy link
Collaborator

r? @compiler-errors

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 6, 2022
@compiler-errors compiler-errors added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jul 6, 2022
@erikdesjardins
Copy link
Contributor

erikdesjardins commented Jul 6, 2022

functions returning uninitialized references emit IR with is both ret void and also noundef: godbolt.org/z/hbjsKKfc3

define void @example_oof([2 x i8*]* noalias nocapture noundef readnone sret([2 x i8*]) dereferenceable(16) %0) {
  ret void
}

That example doesn't contain UB--noundef applies to the sret pointer itself (which is never undef), not the pointed-to return value. It's okay for the return value to be undef in that function.


This example does contain UB, because of the load (https://godbolt.org/z/P5cn4434T):

pub fn oof() -> &'static u8 {
    let refs: [&'static u8; 2] = unsafe { core::mem::uninitialized() };
    refs[0]
}
define noundef nonnull align 1 dereferenceable(1) i8* @oof() {
  ret i8* undef
}

In the optimized IR, it's UB for 2 reasons:

(you can play with this in Alive2, e.g. https://alive2.llvm.org/ce/z/rwmt_r -- if the function can be replaced with unreachable, it's unconditional UB)

The original load in the unoptimized IR looks like (https://godbolt.org/z/c3bnPfa97):

%2 = load i8*, i8** %1, align 8, !nonnull !2, !align !3, !noundef !2

This is UB for 1 reason:


So then the question is--does hyper load any undef references before initializing them?

The usual pattern of:

let foo = mem::uninitialized();
// initialize undef fields
foo.x = 1;
foo.y = 2;
// use foo
// ...

should not have any problematic loads, even if the uninitialized value is returned through multiple other functions first. (Provided it's returned indirectly, which is the case for everything > 1 pointer size, i.e. everything except [&Foo; 1] and equivalent types. Edit: actually it's fine either way, since we don't apply such attributes to aggregates, e.g. https://godbolt.org/z/b9vvvcdTT)

If there is a load, doing almost anything with such a loaded reference except storing it (i.e. passing, returning, taking a reference to inner fields, god forbid actually dereferencing, etc.), has been UB since Rust 1.0.


I agree that we should implement mem::uninitialized at runtime with freeze(poison/undef). That will prevent some of this UB at the source.

(Not all of it though--freeze(poison/undef) can be refined to a constant, and LLVM can choose a constant that maximizes UB, e.g. 0 for references, which are nonnull. I think mem::uninitialized basically cannot be salvaged, at least without an "angelic" implementation that always makes a valid value for the type. But that likely has other problems.)

@erikdesjardins
Copy link
Contributor

erikdesjardins commented Jul 6, 2022

In short:

We don't apply noundef and friends to aggregates* (this includes arrays, even single-element arrays, e.g. https://godbolt.org/z/h4hb8f4M6), so UB only comes into play when you load individual fields of an undef aggregate at a scalar type, e.g. when loading &'static u8 from an undef [&'static u8; 2].

This may be (likely is?) still a concern if libraries are loading such undef fields. But the PR description seems written with the idea that the array creation itself is now UB, which is not the case.

*except those which inherit the ABI from their single field, e.g. https://godbolt.org/z/7e3vs3Wx1, but mem::uninitialized already panics in those cases

@saethlin
Copy link
Member Author

saethlin commented Jul 6, 2022

It is good to hear this scenario is less dangerous than I feared.

does hyper load any undef references before initializing them?

No, but it creates references to them, which we apply dereferenceable to, which permits speculative loads. But it would be silly for there to ever be a load before they are initialized.

I think mem::uninitialized basically cannot be salvaged

To be clear, my concern is much more the inconsistency in the decision to not turn this pattern into a panic because of the breakage in crater, but to walk this pattern towards being dangerous UB. I do not mind being corrected that I'm just misunderstanding the situation.

@compiler-errors
Copy link
Member

r? @nikic

@nikic
Copy link
Contributor

nikic commented Jul 6, 2022

No, but it creates references to them, which we apply dereferenceable to, which permits speculative loads. But it would be silly for there to ever be a load before they are initialized.

The !noundef metadata is control flow predicated -- it will be dropped if loads are speculated.

If possible, we would like to see freeze applied to the return value of mem::uninitialized as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit.

FWIW in practice using freeze in this context is equivalent to using zero initialization. That would effectively be #87032. Haven't reread the discussion there, so not sure why it did not land. (Edit: And to be clear, I would be very much in favor of a change along those lines.)

Don't really have anything more to add here, I think @erikdesjardins already explained very well why the premise of this PR is to the most part incorrect.

@saethlin
Copy link
Member Author

saethlin commented Jul 6, 2022

#87032 (comment)

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

@nikic
Copy link
Contributor

nikic commented Jul 6, 2022

#87032 (comment)

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

I mean, this is possible in the sense that any change can perturb optimization behavior in unexpected ways. But we don't intentionally optimize undef values worse than zero values -- if such cases exist, we would classify that as a missed optimization opportunity and be happy to fix it (fix = exploit more UB in this context).

@erikdesjardins
Copy link
Contributor

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

To echo @nikic--this seems to be making an assumption about the way LLVM works that is not really accurate.

Optimizations are generally written as if undef is refined to whatever value allows the most optimization. So any UB that can be exploited for a zero value (or an all-ones value, or...) should be exploited for undef as well.

(As nikic says, it's always possible that an optimization can behave differently for undef and zero. But if that results in less optimization for undef, that's an optimization quality bug that ought to be fixed in LLVM.)

As a trivial example of this: division by zero is UB in LLVM IR. Both 1 / 0 and 1 / undef are optimized to poison: https://godbolt.org/z/K8ahad69v.

@RalfJung
Copy link
Member

I have certainly seen examples in the past where replacing mem::zeroed by mem::uninitialized made an undesired compilation result go away. So LLVM is not always great at that "make undef maximize UB" thing.

@erikdesjardins
Copy link
Contributor

If you have an example I'll fix it upstream ;)

I should've been clearer though. I want to put much more weight on "staying with undef is a bad solution" than "zero in particular is a better solution". (I personally like freeze(undef), but I'll defer to nikic if he thinks it'll end up being the same as zero.)

My point is that upstream philosophy says undef/poison should be optimized at least as much as any constant. If that's not the case, it's an accident, so we should not be depending on a lack of optimizations for undef.

In particular I want to push back against the following:

(which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way)

It is exploited in many ways! And more are coming with LLVM 15, with improvements to branch-on-undef UB, etc.

@RalfJung
Copy link
Member

If you have an example I'll fix it upstream ;)

I doubt it still reproduces, but #52898 says the very first example used to SIGILL with zeroed but "work fine" with uninitialized.

@nikic
Copy link
Contributor

nikic commented Jul 12, 2022

@RalfJung Thanks, this still reproduces on the IR level: https://llvm.godbolt.org/z/dK9rz73ad There's multiple ways in which this could be optimized here, but I believe the main culprit here is mem2reg: https://llvm.godbolt.org/z/eK9Mdvx3T It places a nonnull assumption for the zero load, but not the undef load. From a brief look at the code, addAssumeNonNull (https://github.com/llvm/llvm-project/blob/f44d28f840c0b0877b09d5547fd09e191bbdc90e/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp#L301) is not consistently called for replacements with undef values. Should be easy to fix.

@nikic
Copy link
Contributor

nikic commented Jul 12, 2022

After looking a bit closer, the generic mem2reg code was already handling this correctly, it's just the optimized single-block implementation that failed to preserve the nonnull assume. Fixed in llvm/llvm-project@3d475df. Thanks for sharing the example!

@5225225
Copy link
Contributor

5225225 commented Jul 12, 2022

Does LLVM say it's instant UB to do a noundef load that loads an undef value?

If so, can that be checked in memory sanitizer or similar? msan only seems to check branches on undef, but if llvm is going to use "this cannot be null, it's uninit, therefore it's UB" to optimize on, then that should abort under msan as well.

@RalfJung
Copy link
Member

@nikic thanks for the quick fix. :)

@RalfJung
Copy link
Member

This concerns me because in #66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB.

I would propose the following alternative to this PR: for the cases where mem::uninitialized does not panic, we make it fill memory by repeating the byte 0x01. That means at least all bool and nonnull assumptions for the type are satisfied, and of course noundef as well.
That also has the advantage that we make the old code (considered broken by today's standards, and often with a fix available in newer versions of the same crate, but not always semver-compatible) pay the cost, rather than new code that correctly uses MaybeUninit.

@nikic
Copy link
Contributor

nikic commented Jul 12, 2022

Does LLVM say it's instant UB to do a noundef load that loads an undef value?

If so, can that be checked in memory sanitizer or similar? msan only seems to check branches on undef, but if llvm is going to use "this cannot be null, it's uninit, therefore it's UB" to optimize on, then that should abort under msan as well.

MSan has an "eager checks" option which checks noundef arguments and return values. Rustc currently doesn't enable this option, but probably should? I believe it also makes msan faster in practice, but can't find the reference for that.

This option currently doesn't handle noundef loads, but could be extended in that direction.

@5225225
Copy link
Contributor

5225225 commented Jul 12, 2022

Rustc currently doesn't enable this option, but probably should?

Yeah, I think that'd be great. I did look online for if such a thing existed, but couldn't find anything. I'd be happy to try and help get that enabled by default in rustc (but more discussion of that should probably be in its own issue).

@nikic
Copy link
Contributor

nikic commented Jul 12, 2022

Rustc currently doesn't enable this option, but probably should?

Yeah, I think that'd be great. I did look online for if such a thing existed, but couldn't find anything. I'd be happy to try and help get that enabled by default in rustc (but more discussion of that should probably be in its own issue).

I opened #99179 to track.

@RalfJung
Copy link
Member

I would propose the following alternative to this PR: for the cases where mem::uninitialized does not panic, we make it fill memory by repeating the byte 0x01. That means at least all bool and nonnull assumptions for the type are satisfied, and of course noundef as well.

PR for that is up at #99182.

@pnkfelix
Copy link
Member

pnkfelix commented Jul 21, 2022

Discussed in T-compiler meeting.

I think the questions raised here should be resolved in the T-lang meeting, so I'm dropping the compiler nomination and replacing it with a T-lang nomination.

(Also, any discussion of the questions posed in this PR should include note of PR #99182 as one (partial, IIUC) mitigation of the problem described here.)

@rustbot label: +I-lang-nominated -I-compiler-nominated

@rustbot rustbot added I-lang-nominated Nominated for discussion during a lang team meeting. and removed I-compiler-nominated Nominated for discussion during a compiler team meeting. labels Jul 21, 2022
@joshtriplett
Copy link
Member

joshtriplett commented Jul 26, 2022

We discussed this in today's @rust-lang/lang meeting.

We felt that the proposed mitigation in mem::uninitialized() (#99182) seems like a reasonable step to work around issues.

With that change made, we're likely fine with not doing this revert, assuming other issues don't arise.

However, one non-consensus to explicitly note: there was no consensus to treat this as precedent to reject potential future changes, such as potentially using freeze or otherwise mitigating the use of LLVM's undef. (Concretely, I had objections to treating this as precedent, but don't have a problem with the concrete proposal on the table here.)

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 28, 2022
mem::uninitialized: mitigate many incorrect uses of this function

Alternative to rust-lang#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations.

This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such.

This is somewhat similar to rust-lang#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However
- That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time.
- The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant.

`@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here.
Cc rust-lang#66151
Fixes rust-lang#87675
@RalfJung
Copy link
Member

RalfJung commented Aug 2, 2022

#99182 has landed, so this one should probably be closed?

@saethlin
Copy link
Member Author

saethlin commented Aug 2, 2022

👍 Sounds good to me

@saethlin saethlin closed this Aug 2, 2022
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
mem::uninitialized: mitigate many incorrect uses of this function

Alternative to rust-lang/rust#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations.

This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such.

This is somewhat similar to rust-lang/rust#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However
- That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time.
- The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant.

`@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here.
Cc rust-lang/rust#66151
Fixes rust-lang/rust#87675
@saethlin saethlin deleted the revert-noundef branch September 21, 2022 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated Nominated for discussion during a lang team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants