Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump threading MIR opt unsoundly uses bitpattern equality for floats #128243

Closed
Chris00 opened this issue Jul 26, 2024 · 17 comments
Closed

Jump threading MIR opt unsoundly uses bitpattern equality for floats #128243

Chris00 opened this issue Jul 26, 2024 · 17 comments
Assignees
Labels
A-mir-opt Area: MIR optimizations C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Chris00
Copy link

Chris00 commented Jul 26, 2024

I tried this code (sorry, I cannot reproduce it without inari ATM):

use inari::{Interval as I, const_interval};

fn f(u: I) -> f64 {
    let t = u.inf() == 0.;
    // let t = u.inf() == -0.;
    // println!("{}", t);
    if t { 1. }
    else { println!("{} {}", u.inf(), u.inf() == 0.);
        u.inf() }
}

fn main() {
    println!("{}", f(const_interval!(0., 0.)));
}

I expected to see this happen: 1

Instead, in --release mode, I got:

-0 true
-0

If I uncomment one of the above commented lines, the code executes correctly.

Meta

This happens with both

rustc --version --verbose:

rustc 1.82.0-nightly (7120fdac7 2024-07-25)
binary: rustc
commit-hash: 7120fdac7a6e55a5e4b606256042890b36067052
commit-date: 2024-07-25
host: x86_64-unknown-linux-gnu
release: 1.82.0-nightly
LLVM version: 18.1.7

and

rustc 1.80.0 (051478957 2024-07-21)
binary: rustc
commit-hash: 051478957371ee0084a7c0913941d2a8c4757bb9
commit-date: 2024-07-21
host: x86_64-unknown-linux-gnu
release: 1.80.0
LLVM version: 18.1.7
@Chris00 Chris00 added the C-bug Category: This is a bug. label Jul 26, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 26, 2024
@tgross35
Copy link
Contributor

To minimize this, you should try to inline the logic from inari into your code. E.g. type F64X2 = __m128d;, struct Interval { rep: F64X2 }, const_interval is a transumte, etc.

However, what is the reason you believe this to be a compiler/std bug rather than inari? The return -0 logic comes from inari.

@tgross35 tgross35 added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Jul 26, 2024
@Chris00
Copy link
Author

Chris00 commented Jul 26, 2024

I tried but inlining the code, the program behaves correctly:

use std::{arch::x86_64::*, mem::transmute};

#[derive(Copy, Clone)]
struct I {
    rep: __m128d,
}

impl I {
    fn new(a: f64, b: f64) -> Self {
        Self { rep: unsafe { transmute([-a, b]) } }
    }

    fn inf(self) -> f64 {
        unsafe { transmute::<_, [f64; 2]>(self.rep)[0] }
    }
}


fn f(u: I) -> f64 {
    let t = u.inf() == 0.;
    if t { 1. }
    else { println!("{} {}", u.inf(), u.inf() == 0.);
        u.inf() }
}

fn main() {
    println!("{}", f(I::new(0., 0.)));
}

@tgross35
Copy link
Contributor

If I am understanding the code correctly then transmute::<_, [f64; 2]>(self.rep)[0] is only extract0, the logic from https://github.com/unageek/inari/blob/3660fa09392b90d320f7e6961e52922ed4b0587c/src/numeric.rs#L27-L37 and here https://github.com/unageek/inari/blob/3660fa09392b90d320f7e6961e52922ed4b0587c/src/interval.rs#L65-L67 seems to be missing.

@Chris00
Copy link
Author

Chris00 commented Jul 26, 2024

You are correct, I oversimplified the inlining. The bug can be reproduced with:

use std::{arch::x86_64::*, mem::transmute};

#[derive(Copy, Clone)]
struct I {
    rep: __m128d,
}

impl I {
    fn new(a: f64, b: f64) -> Self {
        Self { rep: unsafe { transmute([-a, b]) } }
    }

    pub(crate) fn inf_raw(self) -> f64 {
        unsafe { - transmute::<_, [f64; 2]>(self.rep)[0] }
    }

    fn inf(self) -> f64 {
        let x = self.inf_raw();
        if x.is_nan() {
            // The empty interval.
            f64::INFINITY
        } else if x == 0.0 {
            -0.0
        } else {
            x
        }
    }
}


fn f(u: I) -> f64 {
    let t = u.inf() == 0.;
    if t { 1. }
    else { println!("{} {}", u.inf(), u.inf() == 0.);
        u.inf() }
}

fn main() {
    println!("{}", f(I::new(0., 0.)));
}

@tgross35
Copy link
Contributor

Reduced some

#![allow(unused)]
use std::{arch::x86_64::*, mem::transmute};
use std::hint::black_box;

fn check(x: f64) -> f64 {
    if x == 0.0 {
        -0.0
    } else {
        1234.0
    }
}

fn main() {
    let f = 0.0;
    let simty: __m128d = unsafe { transmute([-f, f]) };
    let as_f = unsafe { -transmute::<_, [f64; 2]>(simty)[0] };

    if check(as_f) == 0.0 {
        println!("ok");
    } else {
        println!("bug")
    }
}

https://godbolt.org/z/Whh4sh4bn

@tgross35
Copy link
Contributor

tgross35 commented Jul 26, 2024

What, why is this code fragile?? No simd types required.

pub fn main() {
    let f: f64 = 0.0;
    let as_f = -f;
    println!("{:#018x}", as_f.to_bits());

    let tmp = if -0.0 == 0.0 {
        -0.0
    } else {
        0.0
    };

    if tmp == 0.0 {
        println!("ok");
    } else {
        println!("bug")
    }
}

Prints "ok" by default, "bug" with -O https://godbolt.org/z/soz3jTThr.

Spot checking some versions it seems to show up at 1.78, which aligns with the LLVM18 version bump. Still reproduces on nightly.

This might be a known issue and it might be fixed with the LLVM19 update. Needs some more digging but I'm going to guess LLVM for now.

@tgross35 tgross35 added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jul 26, 2024
@nikic nikic added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness A-mir-opt Area: MIR optimizations and removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Jul 26, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 26, 2024
@nikic
Copy link
Contributor

nikic commented Jul 26, 2024

LLVM IR already includes an unconditional bug print. Also reproduces with -Z mir-opt-level=2 (without -O), so this is probably a mir opt bug (https://godbolt.org/z/Eo4j5Y1qW).

@tgross35 tgross35 removed E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 26, 2024
@saethlin
Copy link
Member

This is a bug in JumpThreading. The bad codegen reproduces with -Zmir-opt-level=0 -Zmir-enable-passes=+JumpThreading.
The MIR diff for the bad optimization:

-// MIR for `main` before JumpThreading
+// MIR for `main` after JumpThreading
 
 fn main() -> () {
     let mut _0: ();
@@ -165,7 +165,7 @@ fn main() -> () {
 
     bb7: {
         _29 = const -0f64;
-        goto -> bb9;
+        goto -> bb17;
     }
 
     bb8: {
@@ -179,7 +179,7 @@ fn main() -> () {
         StorageLive(_32);
         _32 = _29;
         _31 = Eq(move _32, const 0f64);
-        switchInt(move _31) -> [0: bb13, otherwise: bb10];
+        goto -> bb10;
     }
 
     bb10: {
@@ -242,4 +242,13 @@ fn main() -> () {
         StorageDead(_1);
         return;
     }
+
+    bb17: {
+        StorageDead(_30);
+        StorageLive(_31);
+        StorageLive(_32);
+        _32 = _29;
+        _31 = Eq(move _32, const 0f64);
+        goto -> bb13;
+    }
 }

@tgross35
Copy link
Contributor

tgross35 commented Jul 26, 2024

$cargo bisect-rustc --start=1.74.0 --end=1.79.0 -- run --release
...
********************************************************************************
Regression in nightly-2024-02-12
********************************************************************************

fetching https://static.rust-lang.org/dist/2024-02-11/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2024-02-11: 40 B / 40 B [======================================================================================================================================================] 100.00 % 694.44 KB/s converted 2024-02-11 to 6cc4843512d613f51ec81aba689180c31b0b28b6
fetching https://static.rust-lang.org/dist/2024-02-12/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2024-02-12: 40 B / 40 B [======================================================================================================================================================] 100.00 % 498.67 KB/s converted 2024-02-12 to 1a648b397dedc98ada3dd3360f6d661ec2436c56
looking for regression commit between 2024-02-11 and 2024-02-12
fetching (via remote github) commits from max(6cc4843512d613f51ec81aba689180c31b0b28b6, 2024-02-09) to 1a648b397dedc98ada3dd3360f6d661ec2436c56
ending github query because we found starting sha: 6cc4843512d613f51ec81aba689180c31b0b28b6
get_commits_between returning commits, len: 9
  commit[0] 2024-02-10: Auto merge of #119614 - RalfJung:const-refs-to-static, r=oli-obk
  commit[1] 2024-02-10: Auto merge of #16527 - Veykril:salsa-no-self-ref, r=Veykril
  commit[2] 2024-02-10: Auto merge of #117206 - cjgillot:jump-threading-default, r=tmiasko
  commit[3] 2024-02-11: Auto merge of #120232 - c272:json-buildstd, r=Mark-Simulacrum
  commit[4] 2024-02-11: Auto merge of #120405 - cjgillot:gvn-pointer, r=oli-obk
  commit[5] 2024-02-11: Auto merge of #120914 - ehuss:downgrade-xcode, r=Mark-Simulacrum
  commit[6] 2024-02-11: Auto merge of #120920 - matthiaskrgr:rollup-jsryr8e, r=matthiaskrgr
  commit[7] 2024-02-11: Auto merge of #120903 - matthiaskrgr:rollup-tmsuzth, r=matthiaskrgr
  commit[8] 2024-02-11: Auto merge of #120803 - onur-ozkan:fix-compilation-cache, r=Mark-Simulacrum
validated commits found, specifying toolchains

checking the start range to verify it passes
installing 0cbef48150e1fab161b5fd147b57ceb3f9272a52
rust-std-nightly-aarch64-apple-darwin: 25.33 MB / 25.33 MB [=====================================================================================================================================] 100.00 % 20.82 MB/s testing...
RESULT: 0cbef48150e1fab161b5fd147b57ceb3f9272a52, ===> Compile error
uninstalling 0cbef48150e1fab161b5fd147b57ceb3f9272a52

ERROR: the commit at the start of the range (0cbef48150e1fab161b5fd147b57ceb3f9272a52) includes the regression

@tgross35
Copy link
Contributor

Well, #117206. Must have been a preexisting bug

@saethlin
Copy link
Member

Ah. You always need to stabilize the MIR-opt pipeline by carefully selecting specifically the passes you need when bisecting. You should bisect with the flags I mentioned above, -Zmir-opt-level=0 -Zmir-enable-passes=+JumpThreading.

@tgross35
Copy link
Contributor

Ah, thanks.

********************************************************************************
Regression in nightly-2023-10-24
********************************************************************************

fetching https://static.rust-lang.org/dist/2023-10-23/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2023-10-23: 40 B / 40 B [======================================================================================================================================================] 100.00 % 585.21 KB/s converted 2023-10-23 to 54b0434cead71e33bb4ddb52acde7767452b276d
fetching https://static.rust-lang.org/dist/2023-10-24/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2023-10-24: 40 B / 40 B [======================================================================================================================================================] 100.00 % 398.93 KB/s converted 2023-10-24 to cd674d61790607dfb6faa9d754bd3adfa13aea7c
looking for regression commit between 2023-10-23 and 2023-10-24
fetching (via remote github) commits from max(54b0434cead71e33bb4ddb52acde7767452b276d, 2023-10-21) to cd674d61790607dfb6faa9d754bd3adfa13aea7c
ending github query because we found starting sha: 54b0434cead71e33bb4ddb52acde7767452b276d
get_commits_between returning commits, len: 14
  commit[0] 2023-10-22: Auto merge of #117000 - weihanglo:update-cargo, r=weihanglo
  commit[1] 2023-10-22: Auto merge of #117062 - Kobzol:update-rustc-perf, r=Mark-Simulacrum
  commit[2] 2023-10-23: Auto merge of #115324 - francorbacho:master, r=davidtwco
  commit[3] 2023-10-23: Auto merge of #117066 - calebcartwright:rustfmt-sync, r=calebcartwright
  commit[4] 2023-10-23: Auto merge of #116606 - ChrisDenton:empty, r=dtolnay
  commit[5] 2023-10-23: Auto merge of #117071 - matthiaskrgr:rollup-1tcxdgj, r=matthiaskrgr
  commit[6] 2023-10-23: Auto merge of #116849 - oli-obk:error_shenanigans, r=cjgillot
  commit[7] 2023-10-23: Auto merge of #116835 - oli-obk:evaluated_static_in_metadata2, r=RalfJung
  commit[8] 2023-10-23: Auto merge of #116837 - oli-obk:smir_run_macro, r=spastorino
  commit[9] 2023-10-23: Auto merge of #117087 - matthiaskrgr:rollup-08kkjkz, r=matthiaskrgr
  commit[10] 2023-10-23: Auto merge of #107009 - cjgillot:jump-threading, r=pnkfelix
  commit[11] 2023-10-23: Auto merge of #116033 - bvanjoi:fix-116032, r=petrochenkov
  commit[12] 2023-10-23: Auto merge of #117103 - matthiaskrgr:rollup-96zuuom, r=matthiaskrgr
  commit[13] 2023-10-24: Auto merge of #116300 - cjgillot:split-move, r=petrochenkov
ERROR: no CI builds available between 54b0434cead71e33bb4ddb52acde7767452b276d and cd674d61790607dfb6faa9d754bd3adfa13aea7c within last 167 days

Seems like it was from the original implementation at #107009.

Cc @cjgillot

@theemathas
Copy link
Contributor

Minimized:

fn main() {
    let tmp = if true { -0.0 } else { 1.0 };
    if tmp == 0.0 {
        println!("ok");
    } else {
        println!("bug")
    }
}

This incorrectly prints "bug" in release mode.

@Noratrieb
Copy link
Member

Noratrieb commented Jul 27, 2024

It looks like the bug here is that jump threading deals in ScalarInts, meaning it only looks at the bit pattern when checking if two values are equal. But floats are not compared by their bitpattern, they have more complicated rules!

fn main() {
    let tmp = if true { f32::NAN } else { 1.0 };
    if tmp == f32::NAN {
        println!("true");
    } else {
        println!("false");
    }
}

same problem here, this prints true with optimizations enabled.
Either jump threading needs to be changed to handle float comparisons correctly (by storing extra state in its Condition) or it should just be disabled for floats alltogether.

@Noratrieb Noratrieb changed the title Compiler optimization problem with f64 comparison Jump threading MIR opt unsoundly uses bitpattern equality for floats Jul 27, 2024
@Noratrieb Noratrieb self-assigned this Jul 27, 2024
@jieyouxu jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 27, 2024
@bors bors closed this as completed in f62ae7e Jul 27, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jul 27, 2024
Rollup merge of rust-lang#128271 - Nilstrieb:jump-into-a-can-of-worms-called-float-equality, r=compiler-errors

Disable jump threading of float equality

Jump threading stores values as `u128` (`ScalarInt`) and does its comparisons for equality as integer comparisons.
This works great for integers. Sadly, not everything is an integer.

Floats famously have wonky equality semantcs, with `NaN!=NaN` and `0.0 == -0.0`. This does not match our beautiful integer bitpattern equality and therefore causes things to go horribly wrong.

While jump threading could be extended to support floats by remembering that they're floats in the value state and handling them properly, it's signficantly easier to just disable it for now.

fixes rust-lang#128243
@Noratrieb
Copy link
Member

reopen to track potential backports

@Noratrieb Noratrieb reopened this Jul 27, 2024
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 29, 2024
@wesleywiser
Copy link
Member

Closing as this was fixed in the 1.80.1 release this morning.

Thanks to @Chris00 for reporting and @Noratrieb for the fix!

ParkMyCar added a commit to MaterializeInc/materialize that referenced this issue Aug 15, 2024
Rust previously had a bug related to comparison of floats, see
rust-lang/rust#128243.

### Motivation

Upgrades Rust to pickup fix

### Checklist

- [x] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [x] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [x] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [x] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [x] If this PR includes major [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note),
I have pinged the relevant PM to schedule a changelog post.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants