Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Miscompilation due to bug in "mutable noalias" logic #84958

Closed
briansmith opened this issue May 5, 2021 · 34 comments
Closed

Regression: Miscompilation due to bug in "mutable noalias" logic #84958

briansmith opened this issue May 5, 2021 · 34 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Milestone

Comments

@briansmith
Copy link
Contributor

The ring test suite started failing when I tried to upgrade to untrusted 0.8.0. The regression first shipped in nightly-2021-03-23:

We were further able to narrow down the regression by bisecting down to: 97663b6 "Auto merge of #82834 - nikic:mutable-noalias, r=nagisa Enable mutable noalias for LLVM >= 12."

The symptoms are 100% consistent with lifetimes being an input into the problem. The regression occurs in --release mode.

Code

I tried this code:

Cargo.toml:

[package]
name = "untrusted-user"
version = "0.1.0"
edition = "2018"

[dependencies]
untrusted = { git = "https://github.com/briansmith/untrusted", commit = "bb8d942ee2af7c37b64e6dbc06b2aaa78b3059aa" }

[profile.bench]
lto = true
codegen-units = 1

src/lib.rs:

#[inline(never)] // Required otherwise the test passes.
fn read_single_byte_value<'a>(input: &mut untrusted::Reader<'a>) -> untrusted::Input<'a> {
    input.read_bytes(1).unwrap()
}

#[test]
fn test_regression() {
    let input = &[0x00];
    let input = untrusted::Input::from(input);
    let _ = input.read_all((), |input| {
        let value = read_single_byte_value(input);

        let mut r2 = untrusted::Reader::new(value);
        r2.read_byte().unwrap();

        assert!(input.at_end());

        Ok(())
    });
}

I expected to see this happen: explanation

cargo +nightly test --release should pass.

Instead, this happened: explanation

cargo +nightly test --release fails with:

thread 'test_regression' panicked at 'assertion failed: input.at_end()', src\lib.rs:16:9

Version it worked on

Stable
Beta
nightly-2021-03-22

Version with regression

cargo +nightly-2021-03-23 version --verbose

cargo 1.52.0-nightly (90691f2bf 2021-03-16)
release: 1.52.0
commit-hash: 90691f2bfe9a50291a98983b1ed2feab51d5ca55
commit-date: 2021-03-16
rustc 1.53.0-nightly (5d04957a4 2021-03-22)
binary: rustc
commit-hash: 5d04957a4b4714f71d38326fc96a0b0ef6dc5800
commit-date: 2021-03-22
host: x86_64-pc-windows-msvc
release: 1.53.0-nightly
LLVM version: 12.0.0

@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged

@briansmith briansmith added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels May 5, 2021
@rustbot rustbot added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. and removed regression-untriaged Untriaged performance or correctness regression. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels May 5, 2021
@jyn514
Copy link
Member

jyn514 commented May 5, 2021

Enable mutable noalias by default on LLVM 12, as previously known miscompiles have been resolved. Now it's time to find the next one ;)

@nikic: found it :P

@jyn514
Copy link
Member

jyn514 commented May 5, 2021

Marking this as a beta regression since 1.53 is about to be promoted to beta: #84909

@jyn514 jyn514 added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-untriaged Untriaged performance or correctness regression. labels May 5, 2021
@briansmith
Copy link
Contributor Author

cargo +nightly test --release fails, but cargo +nightly test (without --release) passes. Full investigation in briansmith/untrusted#57.

The code in untrusted 0.8.0 (which I yanked) is not substantially different from the untrusted 0.7.1 so I'm afraid that ring (and all ring-based crates) relying on untrusted 0.7.1 to avoid this will not be a viable solution.

I think there is sufficient evidence here to warrant disabling the broken optimization for the beta. I am happy to continue minimizing the test case too.

@jonas-schievink jonas-schievink added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label May 5, 2021
@jyn514
Copy link
Member

jyn514 commented May 5, 2021

Marking this as needs-MCVE since we'll need it to fix the upstream bug, but I agree this alone is probably enough to get it reverted.

@jyn514 jyn514 added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label May 5, 2021
@briansmith
Copy link
Contributor Author

Bad news regarding the MCVE: I just tried inlining all the untrusted code into the repro test case so I could start minimizing it, but that caused the test case to pass. So it looks like creating a MCVE will be more difficult than I expected.

@pietroalbini pietroalbini added this to the 1.53.0 milestone May 5, 2021
@jyn514 jyn514 added the A-LTO Area: Link-time optimization (LTO) label May 5, 2021
@briansmith
Copy link
Contributor Author

For context, the test case is reduced from ring's EcdsaKeyPair::from_pkcs8, so the security implications of this are pretty high.

@jyn514
Copy link
Member

jyn514 commented May 5, 2021

Assigning priority as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels May 5, 2021
@lqd
Copy link
Member

lqd commented May 5, 2021

Is this minimal enough for untrusted ?

  • src/main.rs:
use untrusted::{Reader, Slice};

#[inline(never)] // Required otherwise the test passes.
fn read_single_byte_value<'a>(input: &mut Reader<'a>) -> Slice<'a> {
    input.read_bytes(1)
}

fn main() {
    let bytes = &[0x00];
    let slice = Slice { bytes };
    slice.read_all(|input| {
        let value = read_single_byte_value(input);

        let mut r2 = Reader::new(value);
        r2.read_byte();

        assert!(input.at_end());
    });
}
  • src/lib.rs:
#![no_std]

#[derive(Clone, Copy)]
pub struct Slice<'a> {
    pub bytes: &'a [u8],
}
impl<'a> Slice<'a> {
    #[inline]
    pub fn subslice(&self, r: core::ops::Range<usize>) -> Self {
        let bytes = self.bytes.get(r).unwrap();
        Self { bytes }
    }
    pub fn into_value(self) -> Slice<'a> {
        self
    }

    pub fn read_all<F, R>(self, read: F) -> R
    where
        F: FnOnce(&mut Reader<'a>) -> R,
    {
        let mut input = Reader::new(self.into_value());
        read(&mut input)
    }
}
pub struct Reader<'a> {
    input: Slice<'a>,
    i: usize,
}
impl<'a> Reader<'a> {
    #[inline]
    pub fn new(input: Slice<'a>) -> Self {
        Self {
            input: input.into_value(),
            i: 0,
        }
    }
    #[inline]
    pub fn at_end(&self) -> bool {
        self.i == self.input.bytes.len()
    }
    #[inline]
    pub fn read_byte(&mut self) -> u8 {
        let b = self.input.bytes.get(self.i).unwrap();
        self.i += 1;
        *b
    }
    #[inline]
    pub fn read_bytes(&mut self, num_bytes: usize) -> Slice<'a> {
        let new_i = self.i.checked_add(num_bytes).unwrap();
        let bytes = self.input.subslice(self.i..new_i);
        self.i = new_i;
        bytes
    }
}

In the time I tried minimizing this, I wasn't able to easily trigger the issue without the multiple crates, LTO, and codegen-units=1.

Here are the commands to trigger the bug:

> rm -rf ./target ; mkdir target 
> rustc --crate-name untrusted --edition=2018 src/lib.rs --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C linker-plugin-lto -C codegen-units=1 --out-dir ./target 
> rustc --crate-name untrusted --edition=2018 src/main.rs --crate-type bin --emit=dep-info,link -C opt-level=3 -C lto -C codegen-units=1 --out-dir ./target --extern untrusted=./target/libuntrusted.rlib 
> ./target/untrusted

thread 'main' panicked at 'assertion failed: input.at_end()', src/main.rs:17:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

As mentioned, removing the LTO and codegen-units flags will stop the assert from triggering. Here's that command which doesn't trigger the bug for the same code, for easy copy-pasting:

rm -rf ./target ; mkdir target && rustc --crate-name untrusted --edition=2018 lib.rs --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 --out-dir ./target && rustc --crate-name untrusted --edition=2018 main.rs --crate-type bin --emit=dep-info,link -C opt-level=3 --out-dir ./target --extern untrusted=./target/libuntrusted.rlib && ./target/untrusted

@nagisa
Copy link
Member

nagisa commented May 5, 2021

It is fine if the crash only occurs with multiple crates. One further minimization that would help is to get cargo out of the way (by specifying the minimal rustc invocations that are necessary to reproduce the issue). Another minimization that could help is getting rid of the #[test] infrastructure somehow (replacing it with a regular fn main() if possible).

@lqd
Copy link
Member

lqd commented May 5, 2021

@nagisa updated to remove the test infra and cargo

@comex
Copy link
Contributor

comex commented May 5, 2021

Cannot reproduce when targeting macOS or Linux.

@briansmith
Copy link
Contributor Author

briansmith commented May 5, 2021

Cannot reproduce when targeting macOS or Linux.

Thanks for letting me know. I will go back to my original test case and build a less-minimized version that fails the test on Linux. I definitely did get it to fail on Linux before when the test case was much more complex.

Edit: It seems others (including me) is now able to reproduce it on Linux.

@Aaron1011
Copy link
Member

@comex: I was able to reproduce the original example on Linux (I haven't tested the minimization)

@soruh
Copy link
Contributor

soruh commented May 5, 2021

One further minimization that would help is to get cargo out of the way (by specifying the minimal rustc invocations that are necessary to reproduce the issue).

The following rustc invokations seem to be the minimal required ones:

rustc --edition=2018 untrusted.rs  --crate-type lib -C codegen-units=1 -C linker-plugin-lto
rustc --edition=2018 regression.rs --crate-type bin -C codegen-units=1 -C lto -C opt-level=3 --extern untrusted=libuntrusted.rlib

with untrusted.rs and regression.rs containing the code from lqd's comment

./regression
thread 'main' panicked at 'assertion failed: input.at_end()', regression.rs:17:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

EDIT: the untrusted crate does not need -C opt-level=3

(I apparently also can't read and completely missed lqd's update to their comment)

@comex
Copy link
Contributor

comex commented May 5, 2021

Oops. The issue is actually that I was using @lqd's "command for easy copy-pasting" but it's missing -C codegen-units=1. Now I can reproduce on macOS (and probably Linux, I didn't try).

@lqd
Copy link
Member

lqd commented May 5, 2021

@comex The command for easy copy pasting is the one that removes the LTO and CGU flags to show it works, contrasted to the above commands which trigger the bug

@comex
Copy link
Contributor

comex commented May 5, 2021

I can't read.

@lqd
Copy link
Member

lqd commented May 5, 2021

More likely that I can't write :) I've now added explicitly what the commands do; for the record I've minimized this on x86_64-unknown-linux-gnu (but under WSL1)

@comex
Copy link
Contributor

comex commented May 5, 2021

Minimized further.
lib.rs:

#![no_std]

#[derive(Clone, Copy)]
pub struct Slice<'a> {
    pub bytes: &'a [u8],
}
impl<'a> Slice<'a> {
    pub fn into_value(self) -> Slice<'a> {
        self
    }
}

main.rs:

use untrusted::{Slice};
struct Reader<'a> {
    pub input: Slice<'a>,
    pub i: usize,
}

#[inline(never)] // Required otherwise the test passes.
fn read_single_byte_value<'a>(input: &mut Reader<'a>) -> Slice<'a> {
    input.i = 1;
    input.input
}

#[inline(always)]
fn read(input: &mut Reader) {
    let value = read_single_byte_value(input);

    value.into_value().bytes.get(0).unwrap();

    let val = input.i;
    assert!(val == 1);
}

fn main() {
    let bytes = &[0x00];
    let slice = Slice { bytes };
    slice.into_value();
    let mut input = Reader {
        input: slice,
        i: 0,
    };
    read(&mut input)
}

Running this panics:
rustc --edition=2018 lib.rs --crate-type lib -C codegen-units=1 -C linker-plugin-lto && rustc --edition=2018 main.rs --crate-type bin -C opt-level=3 -C codegen-units=1 -C lto --extern untrusted=liblib.rlib && ./main
But changing opt-level to 0 makes it not panic.

@comex
Copy link
Contributor

comex commented May 5, 2021

So, start with this LLVM IR:

https://gist.github.com/comex/490edb708682370dcc40c059b761811f

This is based on the minimized version in my last comment, after being run through rustc with -C llvm-args='-print-after=inline -print-before=inline -print-module-scope', plus a few manual simplifications of both the source and IR. (I had to cut out as much use of the standard library as possible to speed things up.)

The two important parts are, first:

  %4 = call fastcc { i8*, i64 } @_ZN4main22read_single_byte_value17h6a8946d030728332E(%Reader* nonnull align 8 dereferenceable(24) %input) #7

This calls read_single_byte_value, a function that writes to the input pointer (input.i = 1;).

And second:

  %val.i = load i64, i64* %8, align 8, !alias.scope !9
  %9 = icmp eq i64 %val.i, 1

This corresponds to:

    let val = input.i;
    assert!(val == 1);

Note the !alias.scope !9. This means that the load cannot alias with any accesses tagged !noalias !9. But that's okay; the only call marked with that is into_value, which indeed doesn't perform any aliasing accesses.

But now run: ~/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/bin/opt -inline s.ll -S
(first run rustup component add llvm-tools-preview if you haven't already)

And in the output, suddenly we have:

  %4 = call fastcc { i8*, i64 } @_ZN4main22read_single_byte_value17h6a8946d030728332E(%Reader* nonnull align 8 dereferenceable(24) %input) #4, !noalias !9

!noalias !9 has appeared. Not good! The store in read_single_byte_value does alias with the aforementioned load, which is still marked !alias.scope !9 after the optimization pass.

And when compiling the original source with rustc, LLVM went on to optimize the load into a constant 0 – which would be valid under the assumption that the call couldn't perform any stores that aliased with the load. For some reason, I can't get the same optimization to happen if I manually feed the .ll file into either the rustup-shipped opt or some random clang. Still, I'm pretty sure I identified the faulty pass, unless there's something I'm missing that makes the input IR incorrect already.

@nagisa
Copy link
Member

nagisa commented May 5, 2021

In order to get it to reproduce with opt, you'll need to pass in the lto input to opt with -std-link-opts argument. I'm currently waiting for bugpoint to come back with a more minimal IR to me.

-scoped-noalias-aa -inline -gvn seems like the minimal list of passes to trigger the miscompilation.

@comex
Copy link
Contributor

comex commented May 6, 2021

So, the noalias gets tacked on as part of the inlining of Slice::into_value; specifically by the function PropagateCallSiteMetadata, documented as:

/// When inlining a call site that has !llvm.mem.parallel_loop_access,
/// !llvm.access.group, !alias.scope or !noalias metadata, that metadata should
/// be propagated to all memory-accessing cloned instructions.

It achieves this by iterating over VMap.

What is VMap (I wondered)? When the inliner clones a given Instruction from the inner function to the outer function, it has to know how to translate the Instruction's operands, originally pointers to Values within the inner function, into pointers to corresponding Values within the outer function. For Values which are Instructions, the corresponding Value is normally the freshly-made clone of that Instruction. For Values which are Arguments (i.e. formal parameters), the corresponding Value is whatever was used as the actual parameter in the outer function.

Anyway, PropagateCallSiteMetadata doesn't care about the mapping, per se; it just wants to iterate over all the copied Instructions. Thus it mostly ignores the keys of VMap and just looks at the values.

It does do one thing with the key. As I said, when the key is an Argument, the corresponding Value is not a copied instruction, but some random other Value from the outer function that was used as an actual parameter. So it checks for that:

    // Check that key is an instruction, to skip the Argument mapping, which
    // points to an instruction in the original function, not the inlined one.
    if (!VMI->second || !isa<Instruction>(VMI->first))
      continue;

But there's another case it didn't consider. When the key is an Instruction, the corresponding Value is normally a freshly-made clone Instruction. But not always.

CloneFunction.cpp does some simplifications during the cloning process, such as this one:

      // If we can simplify this instruction to some other value, simply add
      // a mapping to that value rather than inserting a new instruction into
      // the basic block.

Here's the IR again:

  %4 = call fastcc { i8*, i64 } @_ZN4main22read_single_byte_value17h6a8946d030728332E(%Reader* nonnull align 8 dereferenceable(24) %input) #7
  %value.0.i = extractvalue { i8*, i64 } %4, 0
  %value.1.i = extractvalue { i8*, i64 } %4, 1
  %5 = tail call fastcc { i8*, i64 } @_ZN3lib5Slice10into_value17hd068c2a4c36b2989E(i8* noalias nonnull readonly align 1 %value.0.i, i64 %value.1.i) #7, !noalias !9

read_single_byte_value returns a pair of values, and the two values are extracted from the pair and passed as arguments to into_value. Then into_value starts off by bundling those two arguments back into a pair:

define internal fastcc { i8*, i64 } @_ZN3lib5Slice10into_value17hd068c2a4c36b2989E(i8* nonnull align 1 %0, i64 %1) unnamed_addr #6 {
  %3 = insertvalue { i8*, i64 } undef, i8* %0, 0
  %4 = insertvalue { i8*, i64 } %3, i64 %1, 1

When inlining into_value, this redundant split-and-rejoin can be simplified away. Instead of cloning %4 into the outer function, CloneFunction.cpp replaces all uses of it with the original pair, i.e. the return value of read_single_byte_value, which is coincidentally also numbered %4 (but in the outer function rather than the inner one).

In other words, VMap has an entry whose key is

  %4 = insertvalue { i8*, i64 } %3, i64 %1, 1

and whose value is

  %4 = call fastcc { i8*, i64 } @_ZN4main22read_single_byte_value17h6a8946d030728332E(%Reader* nonnull align 8 dereferenceable(24) %input) #7

(For the sake of anyone debugging this: it also has an entry with key %3 and the same value. %3 only has its first value match the original pair, but its second value is just undef, which can be replaced with anything, so it's legal to replace %3 as a whole with the original pair.)

When PropagateCallSiteMetadata sees these entries, it assumes the call to read_single_byte_value is a cloned instruction from the inner function, so it tacks on the metadata that should only apply to cloned instructions.

@jyn514 jyn514 removed the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label May 6, 2021
@valarauca
Copy link

When PropagateCallSiteMetadata sees these entries, it assumes the call to read_single_byte_value is a cloned instruction from the inner function, so it tacks on the metadata that should only apply to cloned instructions.

Shouldn't the no_alias information propagate along the dataflow of the argument? This doesn't feel erroneous.

restrict/no_alias propagating along dataflow is a common pain-point in C/C++ and why the feature is generally avoided.

@nbdd0121
Copy link
Contributor

nbdd0121 commented May 6, 2021

Shouldn't the no_alias information propagate along the dataflow of the argument? This doesn't feel erroneous.

It should, so the pass tries to propagate the noalias metadata to cloned inner function's instructions. But in this case it applies the metadata to the outer function's instructions, which is invalid.

@nikic
Copy link
Contributor

nikic commented May 8, 2021

Upstream bug report: https://bugs.llvm.org/show_bug.cgi?id=50270

@PoignardAzur
Copy link
Contributor

It should, so the pass tries to propagate the noalias metadata to cloned inner function's instructions. But in this case it applies the metadata to the outer function's instructions, which is invalid.

Ooh... so the inliner assumes that, if a function argument is declared noalias, then the SSA value that was passed to that argument in the outer scope was noalias for its own scope?

That is vicious.

@bstrie
Copy link
Contributor

bstrie commented May 9, 2021

nikic's proposed LLVM patch: https://reviews.llvm.org/D102110

@Aaron1011
Copy link
Member

Do we expect the patch to be approved and ready for backport soon, or should we disable mutable-noalias for now?

@Aaron1011 Aaron1011 added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label May 10, 2021
@briansmith
Copy link
Contributor Author

Do we expect the patch to be approved and ready for backport soon, or should we disable mutable-noalias for now?

Just speaking as a user, who happened to find this bug: I'm very excited about the fix, but also I am nervous about the noalias optimizations being in Rust 1.53.0. I didn't find this bug intentionally. It was 100% an accident; i.e. pure luck. I know a lot of people are doing great work on getting this optimization production-ready but it isn't clear (to people like me who aren't involved in the compiler development) what validation is missing in the process and what steps have been taken within LLVM to ensure that future changes to LLVM don't break this (for Rust specifically). So IMO it makes sense to disable it for 1.53.0, at least.

bors added a commit to rust-lang-ci/rust that referenced this issue May 14, 2021
Update LLVM submodule

This merges recent changes from the upstream LLVM 12 branch. One of them is intended to address rust-lang#84958.
@bstrie
Copy link
Contributor

bstrie commented May 15, 2021

nikic's patch for this issue has been merged into LLVM (https://reviews.llvm.org/rGaa9b02ac75350a6c7c949dd24d5c6a931be26ff9), and rustc's LLVM submodule has been updated (#85236). The change should be observable in the current nightly build (2021-05-14, commit 1025db84a). I've tested comex's reproduction with this nightly and it did not panic. @briansmith , can you see if the bug persists for you? (You may need to use --force to get the most recent nightly.)

@bstrie
Copy link
Contributor

bstrie commented Jun 5, 2021

Note that mutable noalias is on track to be disabled for the upcoming 1.53 release: #86036

@pietroalbini
Copy link
Member

Fixed by #86036.

@CryZe
Copy link
Contributor

CryZe commented Jun 14, 2021

That's just for 1.53, what about master or 1.54? (Ah I believe those are fixed by #85236)

@nikic
Copy link
Contributor

nikic commented Jun 14, 2021

Right, the actual fix is #85236, which is on all branches. #86036 just extends the testing period in the hope of finding additional issues before it hits stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Projects
None yet
Development

No branches or pull requests