Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation while working on PR #50882 #52694

Closed
glandium opened this issue Jul 25, 2018 · 17 comments
Closed

Miscompilation while working on PR #50882 #52694

glandium opened this issue Jul 25, 2018 · 17 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.

Comments

@glandium
Copy link
Contributor

(Moving this to a separate issue because PR #50882 is already full of noise)
Reduced STR:

  • Take current rust master
  • Apply the following patch:
diff --git a/src/liballoc/alloc.rs b/src/liballoc/alloc.rs
index 84bd275..4df7730 100644
--- a/src/liballoc/alloc.rs
+++ b/src/liballoc/alloc.rs
@@ -172,7 +172,7 @@ pub(crate) unsafe fn box_free<T: ?Sized>(ptr: Unique<T>) {
     // We do not allocate for Box<T> when T is ZST, so deallocation is also not necessary.
     if size != 0 {
         let layout = Layout::from_size_align_unchecked(size, align);
-        dealloc(ptr as *mut u8, layout);
+        Global.dealloc(NonNull::new_unchecked(ptr).cast(), layout);
     }
 }
 
  • Compile rust against the system llvm (5 or 6)
  • Run libstd tests (x.py test src/libstd --stage 1)

What happens then is that sync::once::tests::wait_for_force_to_finish fails with:

thread '<unnamed>' panicked at 'assertion failed: t1.join().is_ok()', libstd/sync/once.rs:582:9

The disassembly for wait_for_force_to_finish only contains one call to std::thread::JoinHandle::join, instead of the expected two when the code is not miscompiled. That call is followed by a test that jumps to a panic when the result of JoinHandle::join is ... Ok(()):

   93371:       e8 aa fd 07 00          callq  113120 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT
$4join17h0fb3b129ec2e38aeE>
   93376:       48 89 c3                mov    %rax,%rbx
   93379:       49 89 d7                mov    %rdx,%r15
   9337c:       48 85 db                test   %rbx,%rbx
   9337f:       74 1b                   je     9339c <_ZN3std4sync4once5tests24wait_for_force_to_fin
ish17h6199051fcaa3ff6aE+0x21c>

That JoinHandle::join returns a Result<(), Box<Any>>, and Ok(()) is represented as (0, whatever). The destination of that jump is the panic code.

At some point, I'm not entirely sure with what state of the tree, it was even better. The error was:

thread '<unnamed>' panicked at 'assertion failed: t2.join().is_ok()', libstd/sync/once.rs:583:9

And there were two calls to std::thread::JoinHandle::join, as expected, but they didn't have the same result handling:

   93c79:       e8 32 c5 07 00          callq  1101b0 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT$4join17hc6e7f9bb7d72483aE>
   93c7e:       48 89 c3                mov    %rax,%rbx
   93c81:       49 89 d7                mov    %rdx,%r15
   93c84:       48 85 db                test   %rbx,%rbx
   93c87:       75 5e                   jne    93ce7 <_ZN3std4sync4once5tests24wait_for_force_to_finish17h13c0ef8dd5eb6a3aE+0x267>
(snip)
   93c9f:       e8 0c c5 07 00          callq  1101b0 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT$4join17hc6e7f9bb7d72483aE>
   93ca4:       48 89 c3                mov    %rax,%rbx
   93ca7:       49 89 d7                mov    %rdx,%r15
   93caa:       48 85 db                test   %rbx,%rbx
   93cad:       74 1b                   je     93cca <_ZN3std4sync4once5tests24wait_for_force_to_finish17h13c0ef8dd5eb6a3aE+0x24a>

The destination of both jumps is panic code. The first jump, corresponding to t1.join().is_ok() is correct, and the second, corresponding to t2.join().is_ok() is broken, thus the test failure.

Even better: this doesn't happen when compiling with the bundled llvm. It also doesn't happen when extracting the test from libstd and compiling with a faulty compiler. It seems the fact that it's part of libstd, and that most of libstd is compiled along the test, plays a role.

I'm trying to get the corresponding mir and llvm-ir.

@glandium
Copy link
Contributor Author

There are, interestingly, tons of differences between the mir between a build that works, and a build that doesn't. Both with rust master + the patch quoted above, the only difference being one was built with system llvm 5, and the other with the bundled llvm. I'm puzzled.

@glandium
Copy link
Contributor Author

Some more data:

  • I checked out a one-month-old commit, around the time I did the last push to PR Associate an allocator to boxes #50882 that didn't hit the error, applied the same patch, and tried again with system llvm 5. It was similarly broken.
  • But back then, travis builds were using system llvm 3.9, so I tried that too, and that was not broken.

So this could very well be a case of llvm versions being broken (or compiler integration of said versions).

@glandium
Copy link
Contributor Author

glandium commented Jul 25, 2018

Even more data:
- With static system llvm trunk built manually: not broken
- With static system llvm 6.0 built manually: not broken
- With shared system llvm 6.0 built manually: not broken
- With shared system llvm 5.0 built manually: not broken

All the above built with clang 5.0. So... tried again with GCC 5.4:
- With shared system llvm 5.0 built manually: not broken

So... this is only happening with Ubuntu builds of llvm (and presumably Debian)?
@sylvestre do you know what's special about them?

Ok, It's too late and I'm tired, all those last attempts were without the patch applied, so they don't count. I'll try again tomorrow.

@sylvestre
Copy link
Contributor

Not sure. Some ideas:

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 25, 2018

This reminds me that it would nice to be able to duplicate/"virtualize" lang items for testing purposes, so the old and new standard library (fragments) are completely separated. I should open an issue door that.

@glandium
Copy link
Contributor Author

So, restarting afresh:

  • With shared system llvm 5.0 built manually with clang 5: broken
  • With shared system llvm 6.0 built manually with clang 5: broken
  • With shared system llvm trunk built manually with clang 5: not broken

So at least this is consistent with the initial results with ubuntu system llvm and bundled llvm. Time for a bisect.

@glandium
Copy link
Contributor Author

Bisection identified llvm-mirror/llvm@a57b9df as having fixed this. Which leads me to more questions:

  • what does that have anything to do?
  • are there more things that might be miscompiled than this one libstd testcase?
  • what's the path forward here? Apparently, system llvm 5 and 6 can miscompile for whatever reason, and that's what some downstreams (Linux distros, BSDs, ...) would tend to use...

@eddyb
Copy link
Member

eddyb commented Jul 26, 2018

The LLVM patch needs to be backported officially to new 5.* and 6.* patch versions if it fixes miscopilations IMO.

@sylvestre
Copy link
Contributor

I don't think a new dot release of 5 will happen, 6 is also unlikely.
However, I don't mind taking the patch to debian & ubuntu packages.

Having an upstream bug would help also.

@glandium
Copy link
Contributor Author

Some more notes:

  • the upstream patch applies cleanly on llvm 6, and fixes the miscompilation. On llvm 5, it has only a few straightforward conflicts, but fails to compile. It looks like it depends on at least llvm-mirror/llvm@f39b8ab, which has tons of conflicts.
  • beta is using llvm 6, but for some reason, it's not affected. Well, it may actually be affected, but it's not visible on this testcase. Who knows.
  • travis builds for pull requests are using system llvm 5. Those are essentially never going to be fixed if they stay on that version. What do we do about that?

@glandium
Copy link
Contributor Author

Needless to say, #50882 is blocked until the latter is addressed in some way.

@sylvestre
Copy link
Contributor

travis provides llvm 6 too: https://github.com/travis-ci/apt-source-whitelist/pull/373/files

@glandium
Copy link
Contributor Author

It's not really a matter of what travis provides, it's a matter of what minimal version of system llvm rustc supports, and as of #51899, it's llvm 5.

@glandium
Copy link
Contributor Author

Interestingly, if the minimal supported version was still 3.9, this would have gone undetected.

@glandium
Copy link
Contributor Author

Guess why beta is not affected? Because rust-lang/llvm#106 applied llvm-mirror/llvm@a57b9df

@sylvestre
Copy link
Contributor

Backported on the llvm packages https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/commit/1b9a00759063d68cb8b646df91807043643c33a7
Will update this patch when it is uploaded

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jul 27, 2018
…patches. r=dmajor

rust-lang/rust#52694 is a miscompilation I
found in rust when it uses system llvm 5 or 6, that was fixed 5 months
ago in the llvm rust bundles. This may or may not affect clang, but
considering it was also reported to upstream llvm independently of rust,
it's better to side with caution.

It doesn't affect 3.9, and bug 1478919 got rid of the last use of clang
5 (except for clang-tidy, but that's not used to compile).

The patches come from the llvm trunk from 5 months ago, so they're
already in our clang 7 snapshots.

Windows static analysis builds are still using an old trunk, but are
stuck on bug 1427808. They're "only" for static analysis, though.

--HG--
extra : rebase_source : f4fce69eb7c69b6245518a1bad37e04236c7075b
jankeromnes pushed a commit to jankeromnes/gecko that referenced this issue Jul 27, 2018
…patches. r=dmajor

rust-lang/rust#52694 is a miscompilation I
found in rust when it uses system llvm 5 or 6, that was fixed 5 months
ago in the llvm rust bundles. This may or may not affect clang, but
considering it was also reported to upstream llvm independently of rust,
it's better to side with caution.

It doesn't affect 3.9, and bug 1478919 got rid of the last use of clang
5 (except for clang-tidy, but that's not used to compile).

The patches come from the llvm trunk from 5 months ago, so they're
already in our clang 7 snapshots.

Windows static analysis builds are still using an old trunk, but are
stuck on bug 1427808. They're "only" for static analysis, though.
@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Dec 9, 2018
bors added a commit that referenced this issue Dec 16, 2018
Bump minimum required LLVM version to 6.0

Based on the discussion in #55842, while the overall position of Rust wrt LLVM continues to be contentious, there does seem to be a consensus that there is no need for continued support of LLVM 5. This PR bumps our version requirement to LLVM 6.0 and makes Travis run against that.

I hope that this is going to unblock #52694. If I understand correctly, while this issue still exists in LLVM 6, Ubuntu has backported the relevant patch.

r? @alexcrichton
bors added a commit that referenced this issue Dec 17, 2018
Bump minimum required LLVM version to 6.0

Based on the discussion in #55842, while the overall position of Rust wrt LLVM continues to be contentious, there does seem to be a consensus that there is no need for continued support of LLVM 5. This PR bumps our version requirement to LLVM 6.0 and makes Travis run against that.

I hope that this is going to unblock #52694. If I understand correctly, while this issue still exists in LLVM 6, Ubuntu has backported the relevant patch.

r? @alexcrichton
@glandium
Copy link
Contributor Author

It appears the miscompilation doesn't happen after #55426.

gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…patches. r=dmajor

rust-lang/rust#52694 is a miscompilation I
found in rust when it uses system llvm 5 or 6, that was fixed 5 months
ago in the llvm rust bundles. This may or may not affect clang, but
considering it was also reported to upstream llvm independently of rust,
it's better to side with caution.

It doesn't affect 3.9, and bug 1478919 got rid of the last use of clang
5 (except for clang-tidy, but that's not used to compile).

The patches come from the llvm trunk from 5 months ago, so they're
already in our clang 7 snapshots.

Windows static analysis builds are still using an old trunk, but are
stuck on bug 1427808. They're "only" for static analysis, though.

UltraBlame original commit: dfe8ab123e01667be489a20b253e3cd3b20c10bd
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…patches. r=dmajor

rust-lang/rust#52694 is a miscompilation I
found in rust when it uses system llvm 5 or 6, that was fixed 5 months
ago in the llvm rust bundles. This may or may not affect clang, but
considering it was also reported to upstream llvm independently of rust,
it's better to side with caution.

It doesn't affect 3.9, and bug 1478919 got rid of the last use of clang
5 (except for clang-tidy, but that's not used to compile).

The patches come from the llvm trunk from 5 months ago, so they're
already in our clang 7 snapshots.

Windows static analysis builds are still using an old trunk, but are
stuck on bug 1427808. They're "only" for static analysis, though.

UltraBlame original commit: dfe8ab123e01667be489a20b253e3cd3b20c10bd
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…patches. r=dmajor

rust-lang/rust#52694 is a miscompilation I
found in rust when it uses system llvm 5 or 6, that was fixed 5 months
ago in the llvm rust bundles. This may or may not affect clang, but
considering it was also reported to upstream llvm independently of rust,
it's better to side with caution.

It doesn't affect 3.9, and bug 1478919 got rid of the last use of clang
5 (except for clang-tidy, but that's not used to compile).

The patches come from the llvm trunk from 5 months ago, so they're
already in our clang 7 snapshots.

Windows static analysis builds are still using an old trunk, but are
stuck on bug 1427808. They're "only" for static analysis, though.

UltraBlame original commit: dfe8ab123e01667be489a20b253e3cd3b20c10bd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
Projects
None yet
Development

No branches or pull requests

5 participants