Instcombine incorrectly transforms store i64 -> store double #44497

nunoplopes · 2020-03-09T15:33:20Z


Bugzilla Link	45152
Version	trunk
OS	All
CC	@DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:

@Q = global 8 bytes, align 8

define double @test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=>
@Q = global 8 bytes, align 8

define double @test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

The text was updated successfully, but these errors were encountered:

LebedevRI · 2020-03-09T15:42:35Z

NaN values can be canonicalized by
the CPU so the store double can write a different bit-pattern than store i64.

Citation needed?

efriedma-quic · 2020-03-09T16:34:22Z

I think the only target LLVM supports that has loads that canonicalize floats is 32-bit x86 (using non-SSE operations).

How we want to deal with that case is sort of complicated. On the one hand, we don't want to forbid hoisting loads with float type. On the other hand, it's impossible for a function with a double return type to return an SNaN in the standard x86 calling convention, so we'd need some special rule for that.

I'd prefer to say that we have to preserve SNaN patterns across non-arithmetic operations, and make the x86 backend deal with whatever complexity results from that.

nunoplopes · 2020-03-09T16:45:27Z

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.
So if we decide that a specific bit pattern must be preserved across NaN we need to work out the details. We would also need to specify what's the bit pattern produced by each operations that returns NaN. E.g. what's the value of (bitcast-to-i32 (fdiv 1.0 0.0))? It's target specific the very least, and in Alive2 is a non-deterministic value that covers all possible bit-patterns for NaN.

jfbastien · 2020-03-09T19:34:06Z

This seems like a bad bug, and I've seen something like it trigger before in GCC (funnily enough, GCC miscompiled clang):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

It seems like this transform of i64 -> double shouldn't occur if the target canonicalizes NaNs. I don't think we need a particularly complex solution for this.

efriedma-quic · 2020-03-09T20:45:48Z

I guess there are two models you could reason about. The model I was thinking about works something like the following:

Non-arithmetic operations (load/store/select/phi/call without fast-math flags) preserve the exact bit pattern of a value, even if it's a NaN.
Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

I guess there's an alternative model, like you're proposing:

NaN values have an unspecified bit pattern.
Storing/bitcasting a NaN produces a NaN bit pattern, but the chosen NaN pattern is unspecified, and may be different for each operation.

Really, the semantic results of either rule is pretty similar. It's just a question of when the NaN pattern is fixed: when an instruction produces the value, or when a store/bitcast forces it to be fixed.

It's probably worth noting that any rule that means a "load" is allowed to raise an FP exception is going to make strict fp support a lot more complicated.

nunoplopes · 2020-03-10T10:01:33Z

I agree with your summary, Eli. I would just add that in semantics 1), if an operation produces NaN when the operands are not, it produces an unspecified NaN bit-pattern.

The main reason I prefer to consider NaN to have an unspecified bit-pattern is because it makes it easier to support chips where load of an integer is not the same as load of a float. If this is not an issue, then we should document it and change Alive2 to match the semantics. And then remove any optimization that doesn't respect those semantics.

Bottom line: is the optimization present in this bug report important or can it be removed?

efriedma-quic · 2020-03-11T19:51:21Z

This particular optimization probably isn't that valuable on its own; I mean, we want to avoid bitcasts where we can, but there are other ways of addressing this particular situation on most targets.

I have three concerns with making NaN values indeterminate in registers:

We have a bunch of optimizations over memory operations that are essentially type-agnostic: they don't care what type is loaded/stored. We'd have to add a bunch of new checks.
Destroying NaN-patterns would be very unfriendly to SIMD intrinsics; for example, if we decided that you can't use _mm_shuffle_ps on integer vectors.
Allowing LLVM IR loads to raise FP exceptions probably isn't compatible with strict fp; we would need strictfp load intrinsics, and I don't think anyone wants to deal with that. If we do in fact forbid loads from raising an FP exception, every target needs to support some way to lower FP loads in a way that doesn't raise an exception. And if we have that support anyway, we can just use it unconditionally. (This should be possible on any target, at some cost to performance.)

nunoplopes · 2020-03-12T12:47:53Z

Ok, give me some time and I'll compile a list of optimizations that are broken if we assume that specific NaN patterns get propagated.

nunoplopes · 2020-03-30T16:04:28Z

Ok, I've implemented the semantics we discussed here in Alive2 for testing. This keeps all values in bit-vectors all the time, except when there's a float operation. Operands are converted from a bit-vector into float type, the operation is performed, and the result converted back to a bit-vector pattern.
The conversion from float->bit-vector produces all bit patterns for NaN.

The result is that there is only 1 regression in the LLVM test suite. On the other hand, only 2 tests get fixed.

The new test failure is this:

define i64 @All11(i64 %in) {
; CHECK-NEXT:    ret i64 0
  %out = and i64 %in, xor (i64 bitcast (<2 x float> bitcast (i64 -1 to <2 x float>) to i64), i64 -1)
  ret i64 %out
}

The problem for this test is that "bitcast (i64 -1 to <2 x float>)" shows up as llvm::ConstantFP, and thus Alive2's interpretation of the IR above is:
define i64 @All11(i64 %in) {
  %__constexpr_1 = bitcast <2 x float> { -nan, -nan } to i64
  %__constexpr_0 = xor i64 %__constexpr_1, -1
  %out = and i64 %in, %__constexpr_0
  ret i64 %out
}

So the -1 bit pattern disappears. So not sure there's a way to distinguish a bit-pattern that happens to represent NaN from a float NaN (which we interpret as any bit-pattern that represents NaN).

Anyway, assuming we reach a conclusion on what to do with this test, the question is then about the backends. Are all backends ok for LLVM to assume that read/write of data into float registers doesn't change the bits?
(this is not true in one of our chips, but I think we can live with this)

efriedma-quic · 2020-03-30T17:17:14Z

Probably under any model where floating-point "operations" are special, bitcast shouldn't count as a floating-point operation; otherwise, we lose the equivalence between bitcast and store+load.

Again, I think the only in-tree target with non-bit-preserving load/store operations is 32-bit x86 (using x87 operations to load float/double values; oddly, long double load/store are bit-preserving).

nunoplopes · 2020-04-09T09:36:08Z

There's a related test failure in Transforms/InstCombine/minmax-fold.ll:

define <4 x i32> @&#8203;bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x float>
  %t1 = bitcast <2 x i64> %b to <4 x float>
  %t2 = fcmp olt <4 x i1> %t1, %t0
  %t3 = bitcast <2 x i64> %a to <4 x i32>
  %t4 = bitcast <2 x i64> %b to <4 x i32>
  %t5 = select <4 x i1> %t2, <4 x i32> %t3, <4 x i32> %t4
  ret <4 x i32> %t5
}
=>
define <4 x i32> @&#8203;bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x float>
  %t1 = bitcast <2 x i64> %b to <4 x float>
  %t2 = fcmp olt <4 x i1> %t1, %t0
  %1 = select <4 x i1> %t2, <4 x float> %t0, <4 x float> %t1
  %t5 = bitcast <4 x float> %1 to <4 x i32>
  ret <4 x i32> %t5
}

In source, there's only a bitcast between integers, while in target there's bitcast of int->float->int. If the bit representation in NaN, then this roundtrip may change the bits (e.g. canonicalize the NaN).

RalfJung · 2020-06-16T06:55:01Z

In Rust, we are also struggling with what exactly LLVM's NaN semantics are: rust-lang/rust#73328. Would be good to get a more precise documentation of those semantics -- though it seems that's still work-in-progress if I understand the discussion here correctly?

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.

What was the semantics you implemented first, i.e., how is it different from the adjusted one you described later? Is it what Eli calls the "alternative model"? So, float/double LLVM variables actually have a different range of possible values than a memory location, and conversion happens on load/store/bitcast?

Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

Note that this means that these operations are non-deterministic! So, duplicating them would be an illegal transformation, similar to "freeze". Is that something LLVM is treating properly?

(In the "alternative model", likewise stores/bitcasts would be non-deterministic and must not be duplicated.)

nunoplopes · 2020-06-16T10:52:50Z

I see 2 models:

when you move a value in/out of a float register, the CPU canonicalizes the NaN value, so the original bit pattern may not be preserved. This is the semantics implemented ATM in Alive. When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.
CPUs are not allowed to change the NaN bits, hence they preserve the bit-pattern of the input. The non-determinism is moved to the float operations: if they produce NaN, they make all their NaN bits non-deterministic.

Each has pros and cons, as usual. The first one is required if people care about such processors. The second one allows some optimizations like the ones shown in this bug report.

RalfJung · 2020-06-16T17:13:43Z

When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

(The store would be the problem with duplication here, not the load.)
Oh I see, so basically there is a fixed global parameter of the semantics which determines the NaN bit pattern? Or is it allowed to depend on some other factors?

The second one seems to disallow e.g. turning "float f = x+x; if ((int)f == (int)f)" into "if ((int)(x+x) == (int)(x+x))" as that would duplicate the non-determinism. This particular transformation likely makes little sense, but there might be other conditions under which recomputing a result could be beneficial.

Muon · 2024-04-25T00:14:49Z

This issue is causing soundness problems for Rust today: rust-lang/rust#114479 (comment). That's an example with no NaNs and no unsafe, which results in a segfault.

llvmbot · 2024-04-25T02:16:42Z

@llvm/issue-subscribers-backend-x86

Author: Nuno Lopes (nunoplopes)

| | | | --- | --- | | Bugzilla Link | [45152](https://llvm.org/bz45152) | | Version | trunk | | OS | All | | CC | @DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen |

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:

@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @<!-- -->Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=&gt;
@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @<!-- -->Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

nikic · 2024-04-25T02:18:56Z

Marking this as an X86 backend issue, as I believe the consensus is that the load/store operations on the IR level are value-preserving (that is, they do not count as "floating-point operations") and the fact that this is not true for x87 needs to be mitigated there (together with the whole host of other problems with similar root cause).

programmerjake · 2024-04-25T03:20:35Z

I think the M68k backend will probably have similar issues when they start adding hardware floating-point support, since iirc float/double load/stores also convert from/to an internal 80-bit format.

bug for adding fp support: #61744

@llvm/issue-subscribers-backend-m68k

Muon · 2024-04-25T13:54:17Z

The manual at https://cache.nxp.com/docs/en/reference-manual/M68000PM.pdf says the following on page 3-26:

Range control is a method used to assure correct emulation of a device that only supports single- or double- precision arithmetic. If the intermediate result’s exponent exceeds the range of the selected precision, the exponent value appropriate for an underflow or overflow is stored as the result in the 16-bit extended-precision format exponent. For example, if the data format and rounding mode is single precision RM and the result of an arithmetic operation overflows the magnitude of the single-precision format, the largest normalized single-precision value is stored as an extended-precision number in the destination floating-point data register (i.e., an unbiased 15-bit exponent of $00FF and a mantissa of $FFFFFF0000000000).

That is, setting the precision control on the m68k FPU actually makes it comply with the appropriate IEEE 754 format despite its capacity, unlike the x87 FPU.

EDIT: though I cannot determine whether it quiets signaling NaNs when they are loaded into the registers. It might, given that performs a format conversion.

EDIT 2: Quoth the manual regarding the FMOVE instruction:

Although the primary function of this instruction is data movement, it is also considered an arithmetic instruction since conversions from the source operand format to the destination operand format are performed implicitly during the move operation.

And later on it says "Refer to 1.6.5 Not-A-Numbers" about its SNAN behavior. So it quiets signaling NaNs?

EDIT 3: This is confusing:

When the user creates a NAN, any nonzero bit pattern can be stored in the mantissa.

I'm assuming it's only talking about quiet NaNs, but who knows...

llvmbot · 2024-04-25T13:56:48Z

@llvm/issue-subscribers-backend-m68k

Author: Nuno Lopes (nunoplopes)

| | | | --- | --- | | Bugzilla Link | [45152](https://llvm.org/bz45152) | | Version | trunk | | OS | All | | CC | @DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen |

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:

@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @<!-- -->Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=&gt;
@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @<!-- -->Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

programmerjake · 2024-04-25T18:20:46Z

assuming qemu is correct (which is not necessarily the case), compiling and running the following program with m68k-linux-gnu-g++ makes me think that m68k actually preserves sNaN bits when using fmoved, which is a nice surprise:

#include <stdint.h>
#include <stdio.h>

union U {
    double f;
    uint64_t i;
};

double g(const uint64_t &v) __attribute__((noinline));

uint64_t f(uint64_t v) {
    return (U){.f = g(v)}.i;
}

double g(const uint64_t &v) {
    return (U){.i = v}.f;
}

int main() {
    uint64_t v = 0xFFFF0000abcdef01;
    printf("0x%llx 0x%llx\n", (unsigned long long)f(v), (unsigned long long)v);
}

it prints:

0xffff0000abcdef01 0xffff0000abcdef01

disassembly of g:

800004a8 <_Z1gRKy>:
800004a8:       206f 0004       moveal %sp@(4),%a0
800004ac:       f210 5400       fmoved %a0@,%fp0
800004b0:       4e75            rts

RalfJung · 2024-04-25T21:09:44Z

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

@nunoplopes I think it's wrong for a double load/store to do any sort of canonicalization. Load/store should preserve the bitwise value perfectly.

Rust relies on that. I also don't see anything in the LangRef docs that would say it's allowed for load/store to alter the value.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

Muon mentioned this issue Dec 29, 2022

Our floating point semantics were a mess rust-lang/unsafe-code-guidelines#237

Closed

efriedma-quic mentioned this issue Apr 24, 2024

Loading/storing float/double on 32-bit x86 without SSE can cause the value to mutate #89791

Closed

EugeneZelenko added the llvm:instcombine label Apr 25, 2024

nikic added backend:X86 and removed llvm:instcombine labels Apr 25, 2024

RKSimon added the backend:m68k label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instcombine incorrectly transforms store i64 -> store double #44497

Instcombine incorrectly transforms store i64 -> store double #44497

nunoplopes commented Mar 9, 2020 •

edited by jyknight

Loading

LebedevRI commented Mar 9, 2020

efriedma-quic commented Mar 9, 2020

nunoplopes commented Mar 9, 2020

jfbastien commented Mar 9, 2020

efriedma-quic commented Mar 9, 2020

nunoplopes commented Mar 10, 2020

efriedma-quic commented Mar 11, 2020

nunoplopes commented Mar 12, 2020

nunoplopes commented Mar 30, 2020 •

edited by jyknight

Loading

efriedma-quic commented Mar 30, 2020

nunoplopes commented Apr 9, 2020 •

edited by jyknight

Loading

RalfJung commented Jun 16, 2020

nunoplopes commented Jun 16, 2020

RalfJung commented Jun 16, 2020

Muon commented Apr 25, 2024

llvmbot commented Apr 25, 2024

Extended Description

nikic commented Apr 25, 2024

programmerjake commented Apr 25, 2024

Muon commented Apr 25, 2024 •

edited

Loading

llvmbot commented Apr 25, 2024

Extended Description

programmerjake commented Apr 25, 2024

RalfJung commented Apr 25, 2024

Instcombine incorrectly transforms store i64 -> store double #44497

Instcombine incorrectly transforms store i64 -> store double #44497

Comments

nunoplopes commented Mar 9, 2020 • edited by jyknight Loading

Extended Description

LebedevRI commented Mar 9, 2020

efriedma-quic commented Mar 9, 2020

nunoplopes commented Mar 9, 2020

jfbastien commented Mar 9, 2020

efriedma-quic commented Mar 9, 2020

nunoplopes commented Mar 10, 2020

efriedma-quic commented Mar 11, 2020

nunoplopes commented Mar 12, 2020

nunoplopes commented Mar 30, 2020 • edited by jyknight Loading

efriedma-quic commented Mar 30, 2020

nunoplopes commented Apr 9, 2020 • edited by jyknight Loading

RalfJung commented Jun 16, 2020

nunoplopes commented Jun 16, 2020

RalfJung commented Jun 16, 2020

Muon commented Apr 25, 2024

llvmbot commented Apr 25, 2024

Extended Description

nikic commented Apr 25, 2024

programmerjake commented Apr 25, 2024

Muon commented Apr 25, 2024 • edited Loading

llvmbot commented Apr 25, 2024

Extended Description

programmerjake commented Apr 25, 2024

RalfJung commented Apr 25, 2024

nunoplopes commented Mar 9, 2020 •

edited by jyknight

Loading

nunoplopes commented Mar 30, 2020 •

edited by jyknight

Loading

nunoplopes commented Apr 9, 2020 •

edited by jyknight

Loading

Muon commented Apr 25, 2024 •

edited

Loading