Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instcombine incorrectly transforms store i64 -> store double #44497

Open
nunoplopes opened this issue Mar 9, 2020 · 22 comments
Open

Instcombine incorrectly transforms store i64 -> store double #44497

nunoplopes opened this issue Mar 9, 2020 · 22 comments

Comments

@nunoplopes
Copy link
Member

nunoplopes commented Mar 9, 2020

Bugzilla Link 45152
Version trunk
OS All
CC @DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:

@Q = global 8 bytes, align 8

define double @test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=>
@Q = global 8 bytes, align 8

define double @test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000
@LebedevRI
Copy link
Member

NaN values can be canonicalized by
the CPU so the store double can write a different bit-pattern than store i64.

Citation needed?

@efriedma-quic
Copy link
Collaborator

I think the only target LLVM supports that has loads that canonicalize floats is 32-bit x86 (using non-SSE operations).

How we want to deal with that case is sort of complicated. On the one hand, we don't want to forbid hoisting loads with float type. On the other hand, it's impossible for a function with a double return type to return an SNaN in the standard x86 calling convention, so we'd need some special rule for that.

I'd prefer to say that we have to preserve SNaN patterns across non-arithmetic operations, and make the x86 backend deal with whatever complexity results from that.

@nunoplopes
Copy link
Member Author

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.
So if we decide that a specific bit pattern must be preserved across NaN we need to work out the details. We would also need to specify what's the bit pattern produced by each operations that returns NaN. E.g. what's the value of (bitcast-to-i32 (fdiv 1.0 0.0))? It's target specific the very least, and in Alive2 is a non-deterministic value that covers all possible bit-patterns for NaN.

@jfbastien
Copy link
Member

This seems like a bad bug, and I've seen something like it trigger before in GCC (funnily enough, GCC miscompiled clang):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

It seems like this transform of i64 -> double shouldn't occur if the target canonicalizes NaNs. I don't think we need a particularly complex solution for this.

@efriedma-quic
Copy link
Collaborator

I guess there are two models you could reason about. The model I was thinking about works something like the following:

  1. Non-arithmetic operations (load/store/select/phi/call without fast-math flags) preserve the exact bit pattern of a value, even if it's a NaN.
  2. Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

I guess there's an alternative model, like you're proposing:

  1. NaN values have an unspecified bit pattern.
  2. Storing/bitcasting a NaN produces a NaN bit pattern, but the chosen NaN pattern is unspecified, and may be different for each operation.

Really, the semantic results of either rule is pretty similar. It's just a question of when the NaN pattern is fixed: when an instruction produces the value, or when a store/bitcast forces it to be fixed.

It's probably worth noting that any rule that means a "load" is allowed to raise an FP exception is going to make strict fp support a lot more complicated.

@nunoplopes
Copy link
Member Author

I agree with your summary, Eli. I would just add that in semantics 1), if an operation produces NaN when the operands are not, it produces an unspecified NaN bit-pattern.

The main reason I prefer to consider NaN to have an unspecified bit-pattern is because it makes it easier to support chips where load of an integer is not the same as load of a float. If this is not an issue, then we should document it and change Alive2 to match the semantics. And then remove any optimization that doesn't respect those semantics.

Bottom line: is the optimization present in this bug report important or can it be removed?

@efriedma-quic
Copy link
Collaborator

This particular optimization probably isn't that valuable on its own; I mean, we want to avoid bitcasts where we can, but there are other ways of addressing this particular situation on most targets.

I have three concerns with making NaN values indeterminate in registers:

  1. We have a bunch of optimizations over memory operations that are essentially type-agnostic: they don't care what type is loaded/stored. We'd have to add a bunch of new checks.
  2. Destroying NaN-patterns would be very unfriendly to SIMD intrinsics; for example, if we decided that you can't use _mm_shuffle_ps on integer vectors.
  3. Allowing LLVM IR loads to raise FP exceptions probably isn't compatible with strict fp; we would need strictfp load intrinsics, and I don't think anyone wants to deal with that. If we do in fact forbid loads from raising an FP exception, every target needs to support some way to lower FP loads in a way that doesn't raise an exception. And if we have that support anyway, we can just use it unconditionally. (This should be possible on any target, at some cost to performance.)

@nunoplopes
Copy link
Member Author

Ok, give me some time and I'll compile a list of optimizations that are broken if we assume that specific NaN patterns get propagated.

@nunoplopes
Copy link
Member Author

nunoplopes commented Mar 30, 2020

Ok, I've implemented the semantics we discussed here in Alive2 for testing. This keeps all values in bit-vectors all the time, except when there's a float operation. Operands are converted from a bit-vector into float type, the operation is performed, and the result converted back to a bit-vector pattern.
The conversion from float->bit-vector produces all bit patterns for NaN.

The result is that there is only 1 regression in the LLVM test suite. On the other hand, only 2 tests get fixed.

The new test failure is this:

define i64 @All11(i64 %in) {
; CHECK-NEXT:    ret i64 0
  %out = and i64 %in, xor (i64 bitcast (<2 x float> bitcast (i64 -1 to <2 x float>) to i64), i64 -1)
  ret i64 %out
}

The problem for this test is that "bitcast (i64 -1 to <2 x float>)" shows up as llvm::ConstantFP, and thus Alive2's interpretation of the IR above is:
define i64 @All11(i64 %in) {
  %__constexpr_1 = bitcast <2 x float> { -nan, -nan } to i64
  %__constexpr_0 = xor i64 %__constexpr_1, -1
  %out = and i64 %in, %__constexpr_0
  ret i64 %out
}

So the -1 bit pattern disappears. So not sure there's a way to distinguish a bit-pattern that happens to represent NaN from a float NaN (which we interpret as any bit-pattern that represents NaN).

Anyway, assuming we reach a conclusion on what to do with this test, the question is then about the backends. Are all backends ok for LLVM to assume that read/write of data into float registers doesn't change the bits?
(this is not true in one of our chips, but I think we can live with this)

@efriedma-quic
Copy link
Collaborator

Probably under any model where floating-point "operations" are special, bitcast shouldn't count as a floating-point operation; otherwise, we lose the equivalence between bitcast and store+load.

Again, I think the only in-tree target with non-bit-preserving load/store operations is 32-bit x86 (using x87 operations to load float/double values; oddly, long double load/store are bit-preserving).

@nunoplopes
Copy link
Member Author

nunoplopes commented Apr 9, 2020

There's a related test failure in Transforms/InstCombine/minmax-fold.ll:

define <4 x i32> @&#8203;bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x float>
  %t1 = bitcast <2 x i64> %b to <4 x float>
  %t2 = fcmp olt <4 x i1> %t1, %t0
  %t3 = bitcast <2 x i64> %a to <4 x i32>
  %t4 = bitcast <2 x i64> %b to <4 x i32>
  %t5 = select <4 x i1> %t2, <4 x i32> %t3, <4 x i32> %t4
  ret <4 x i32> %t5
}
=>
define <4 x i32> @&#8203;bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x float>
  %t1 = bitcast <2 x i64> %b to <4 x float>
  %t2 = fcmp olt <4 x i1> %t1, %t0
  %1 = select <4 x i1> %t2, <4 x float> %t0, <4 x float> %t1
  %t5 = bitcast <4 x float> %1 to <4 x i32>
  ret <4 x i32> %t5
}

In source, there's only a bitcast between integers, while in target there's bitcast of int->float->int. If the bit representation in NaN, then this roundtrip may change the bits (e.g. canonicalize the NaN).

@RalfJung
Copy link
Contributor

In Rust, we are also struggling with what exactly LLVM's NaN semantics are: rust-lang/rust#73328. Would be good to get a more precise documentation of those semantics -- though it seems that's still work-in-progress if I understand the discussion here correctly?

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.

What was the semantics you implemented first, i.e., how is it different from the adjusted one you described later? Is it what Eli calls the "alternative model"? So, float/double LLVM variables actually have a different range of possible values than a memory location, and conversion happens on load/store/bitcast?

  1. Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

Note that this means that these operations are non-deterministic! So, duplicating them would be an illegal transformation, similar to "freeze". Is that something LLVM is treating properly?

(In the "alternative model", likewise stores/bitcasts would be non-deterministic and must not be duplicated.)

@nunoplopes
Copy link
Member Author

I see 2 models:

  1. when you move a value in/out of a float register, the CPU canonicalizes the NaN value, so the original bit pattern may not be preserved. This is the semantics implemented ATM in Alive. When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

  2. CPUs are not allowed to change the NaN bits, hence they preserve the bit-pattern of the input. The non-determinism is moved to the float operations: if they produce NaN, they make all their NaN bits non-deterministic.

Each has pros and cons, as usual. The first one is required if people care about such processors. The second one allows some optimizations like the ones shown in this bug report.

@RalfJung
Copy link
Contributor

When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

(The store would be the problem with duplication here, not the load.)
Oh I see, so basically there is a fixed global parameter of the semantics which determines the NaN bit pattern? Or is it allowed to depend on some other factors?

The second one seems to disallow e.g. turning "float f = x+x; if ((int)f == (int)f)" into "if ((int)(x+x) == (int)(x+x))" as that would duplicate the non-determinism. This particular transformation likely makes little sense, but there might be other conditions under which recomputing a result could be beneficial.

@Muon
Copy link

Muon commented Apr 25, 2024

This issue is causing soundness problems for Rust today: rust-lang/rust#114479 (comment). That's an example with no NaNs and no unsafe, which results in a segfault.

@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2024

@llvm/issue-subscribers-backend-x86

Author: Nuno Lopes (nunoplopes)

| | | | --- | --- | | Bugzilla Link | [45152](https://llvm.org/bz45152) | | Version | trunk | | OS | All | | CC | @DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen |

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:

@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @<!-- -->Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=&gt;
@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @<!-- -->Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

@nikic
Copy link
Contributor

nikic commented Apr 25, 2024

Marking this as an X86 backend issue, as I believe the consensus is that the load/store operations on the IR level are value-preserving (that is, they do not count as "floating-point operations") and the fact that this is not true for x87 needs to be mitigated there (together with the whole host of other problems with similar root cause).

@programmerjake
Copy link
Contributor

I think the M68k backend will probably have similar issues when they start adding hardware floating-point support, since iirc float/double load/stores also convert from/to an internal 80-bit format.

bug for adding fp support: #61744

@llvm/issue-subscribers-backend-m68k

@Muon
Copy link

Muon commented Apr 25, 2024

The manual at https://cache.nxp.com/docs/en/reference-manual/M68000PM.pdf says the following on page 3-26:

Range control is a method used to assure correct emulation of a device that only supports single- or double- precision arithmetic. If the intermediate result’s exponent exceeds the range of the selected precision, the exponent value appropriate for an underflow or overflow is stored as the result in the 16-bit extended-precision format exponent. For example, if the data format and rounding mode is single precision RM and the result of an arithmetic operation overflows the magnitude of the single-precision format, the largest normalized single-precision value is stored as an extended-precision number in the destination floating-point data register (i.e., an unbiased 15-bit exponent of $00FF and a mantissa of $FFFFFF0000000000).

That is, setting the precision control on the m68k FPU actually makes it comply with the appropriate IEEE 754 format despite its capacity, unlike the x87 FPU.

EDIT: though I cannot determine whether it quiets signaling NaNs when they are loaded into the registers. It might, given that performs a format conversion.

EDIT 2: Quoth the manual regarding the FMOVE instruction:

Although the primary function of this instruction is data movement, it is also considered an arithmetic instruction since conversions from the source operand format to the destination operand format are performed implicitly during the move operation.

And later on it says "Refer to 1.6.5 Not-A-Numbers" about its SNAN behavior. So it quiets signaling NaNs?

EDIT 3: This is confusing:

When the user creates a NAN, any nonzero bit pattern can be stored in the mantissa.

I'm assuming it's only talking about quiet NaNs, but who knows...

@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2024

@llvm/issue-subscribers-backend-m68k

Author: Nuno Lopes (nunoplopes)

| | | | --- | --- | | Bugzilla Link | [45152](https://llvm.org/bz45152) | | Version | trunk | | OS | All | | CC | @DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen |

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:

@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @<!-- -->Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=&gt;
@<!-- -->Q = global 8 bytes, align 8

define double @<!-- -->test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @<!-- -->Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

@programmerjake
Copy link
Contributor

assuming qemu is correct (which is not necessarily the case), compiling and running the following program with m68k-linux-gnu-g++ makes me think that m68k actually preserves sNaN bits when using fmoved, which is a nice surprise:

#include <stdint.h>
#include <stdio.h>

union U {
    double f;
    uint64_t i;
};

double g(const uint64_t &v) __attribute__((noinline));

uint64_t f(uint64_t v) {
    return (U){.f = g(v)}.i;
}

double g(const uint64_t &v) {
    return (U){.i = v}.f;
}

int main() {
    uint64_t v = 0xFFFF0000abcdef01;
    printf("0x%llx 0x%llx\n", (unsigned long long)f(v), (unsigned long long)v);
}

it prints:

0xffff0000abcdef01 0xffff0000abcdef01

disassembly of g:

800004a8 <_Z1gRKy>:
800004a8:       206f 0004       moveal %sp@(4),%a0
800004ac:       f210 5400       fmoved %a0@,%fp0
800004b0:       4e75            rts

@RalfJung
Copy link
Contributor

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

@nunoplopes I think it's wrong for a double load/store to do any sort of canonicalization. Load/store should preserve the bitwise value perfectly.

Rust relies on that. I also don't see anything in the LangRef docs that would say it's allowed for load/store to alter the value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests