Basic floating-point operations are underspecified #60942

Muon · 2023-02-23T08:21:37Z

Currently, the LangRef does not specify the results of basic floating-point operations (fadd, fsub, fmul, fdiv) in any detail. APFloat uses IEEE 754 semantics, but the LangRef does not guarantee it. What guarantees are there about the behavior of floating-point code? If IEEE 754 is the intended model, then x87 codegen is completely broken and probably other targets are as well. If not IEEE 754, then what is the intended model?

The text was updated successfully, but these errors were encountered:

arsenm · 2024-02-06T09:36:08Z

I think we just need to document that float=IEEE float. In the past we pretended that targets were allowed to codegen as some other format, but that never realistically worked and is an onerous burden with no known user

RalfJung · 2024-02-06T12:25:32Z

Cc @jcranmer-intel @jyknight

jcranmer-intel · 2024-02-06T23:12:42Z

There's some comments on #44218 about the x87 precision issue. It's something that's possible to fix, but given the lessening importance of 32-bit x86 (especially given that SSE-based x86 is a practical option there), the drive to fix it isn't high.

The other main potential issue I can think about is FTZ/DAZ. I'm not a connoisseur of all the architecture manuals for the architectures we support, but on a quick flick through them, I'm not entirely able to rule out the possibility that we target some architecture where the hardware can support subnormals properly per IEEE 754. But in general, our denormal story is already a bit of a mess (see also #80475 and the discussion ongoing there).

Most brutally honestly, our floating-point semantics are "assume IEEE 754 semantics, with IEEE 754-2008 signaling bit position, in terms of layout" with computation rules being "whatever the hardware does, but the compiler pretends it's IEEE 754 in default environment unless there's flags saying otherwise." Not the semantics I'd want to define, but if the user has some guarantees about the reasonableness of the FP hardware, then the compiler can generally uphold those guarantees.

bfloat, x86_fp80 and ppc_fp128 aren't IEEE 754 types. bfloat can be fully described as a IEEE 754 format with different p and emax parameters than half. x86_fp80 very nearly can be, but it also has noncanonical encodings that need clarification of LLVM semantics (as would decimal floating-point types, if/when they are added). For ppc_fp128, I am not prepared to assert anything about how to properly specify semantics beyond my knowledge that it, like x86_fp80 has noncanonical encodings and that ppc_fp128 does not fit into a simple IEEE 754 base/significant/exponent model.

Another area of underspecification is the interaction of things like fast-math flags and strict floating-point mode. We don't actually document what happens if you have an fadd instruction in a strictfp function, for example.

The current LangRef does specify this:

The binary format of half, float, double, and fp128 correspond to the IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128 respectively.

andykaylor · 2024-02-07T00:06:15Z

with computation rules being "whatever the hardware does, but the compiler pretends it's IEEE 754 in default environment unless there's flags saying otherwise." Not the semantics I'd want to define, but if the user has some guarantees about the reasonableness of the FP hardware, then the compiler can generally uphold those guarantees.

With this definition, we run into problems with constant folding. For example, if the native fdiv instruction for the target hardware doesn't return correctly-rounded results, then constant folding fdiv to a correctly rounded result may be a value-changing transformation.

This question came up recently in a discussion among SYCL developers. Does the LLVM IR fdiv instruction require that the result be correctly rounded? My initial reaction was, "of course it does!" But when you think about this in terms of targets that don't have a correctly rounded fdiv instruction, that's not so obvious anymore. If I'm writing code for a device that I know doesn't have correctly-rounded native division, do I really want the compiler to insert a bunch of extra code to get a correctly-rounded result? Almost certainly not.

So what are the semantics of fdiv? My take on it is that there is no specific requirement for accuracy, but the compiler isn't allowed to do anything that would change the result, which ultimately boils down to something like the description you gave. I think this runs into exactly the kinds of problems that @efriedma-quic was trying to explain to me here (https://discourse.llvm.org/t/propogation-of-fpclass-assumptions-vis-a-vis-fast-math-flags/76554), but I think this is something that we're going to need to learn to live with if we want to support non-CPU targets.

Even without considering non-CPU hardware, we have this problem for things like llvm.cos. What are the semantics of this intrinsic? "Return the same value as a corresponding libm ‘cos’ function but without trapping or setting errno." But we haven't said which implementation of libm we're talking about, and so if the compiler constant folds a call to this intrinsic, but the compiler was linked against a different libm implementation than the program being compiled will use, that may be a value-changing optimization. I've brought this up before (https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138) and there wasn't much support for maintaining strict numeric consistency in cases like this, but I still think that should be our default behavior.

jcranmer-intel · 2024-02-07T19:51:49Z

Putting on a few different hats here, so bear with me:

From the perspective of a language designer on a new programming language, it is useful to have code that works reliably across the hardware spectrum, even if it comes at some performance costs. Consider that newer languages specify that something like i16 evaluates strictly as an i16 in standard in arithmetic operators as opposed to being auto-promoted to the native register size (which would be slightly faster if the hardware doesn't have a dedicated add i16 instruction). With this hat on, floating-point operations should follow IEEE 754 behavior nearly exactly, with hardware that can't support this being "exotic" and having to do extra work to make it work.

From the perspective of someone trying to support "exotic" hardware (whether as a compiler writer or a programmer), it's generally the case that they want to the high-level language operator to map more directly to the instruction. So if you've got a hardware that isn't capable of doing correctly-rounded fdiv, you would probably prefer to have fdiv map to whatever precision you actually get.

Note the tension between these two hats from a user perspective: a user who's writing portable code probably would prefer to get the same results on all platforms, at the cost of performance, whereas a user who's targeting only a particular hardware would probably prefer to get the fastest code on that hardware, at the cost of different results on other hardware. Probably the only feasible way to square this tension is to simply provide both a way to get a fully-accurate result and a way to get a loosened-accuracy result.

Even without considering non-CPU hardware, we have this problem for things like llvm.cos. What are the semantics of this intrinsic? "Return the same value as a corresponding libm ‘cos’ function but without trapping or setting errno." But we haven't said which implementation of libm we're talking about, and so if the compiler constant folds a call to this intrinsic, but the compiler was linked against a different libm implementation than the program being compiled will use, that may be a value-changing optimization. I've brought this up before (https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138) and there wasn't much support for maintaining strict numeric consistency in cases like this, but I still think that should be our default behavior.

When the C++ committee talked about making these functions constexpr, I tried to point out this problem out to them and get them to care about it and abjectly failed. The general sentiment being along the lines "it's already a mess, how is this going to make it more a mess". As for what LLVM can do, we have correctly-rounded versions of these functions in llvm-libc, so in theory we could leverage that to at least make the optimizations not dependent on the host library...

Another LLVM libm intrinsic problem is that most of those intrinsics are lowered to calls to libm if there's no hardware support for them, which means good chance they set errno. Another other LLVM libm intrinsic problem is now we have some optimizations that kick in only if you call libm functions directly and other optimizations that only kick in if you use the intrinsics because they are optimized in two different passes with two different sets of rules.

IMHO, we do need to work on a better specification of floating-point semantics in general, but this is also a deeper conversation that's going to require multiple RFCs to the mailing lists, and several changes to LLVM, rather than some back-and-forth in this issue.

Muon · 2024-02-08T07:47:27Z

Firstly, I think it should be kept in mind that anything less than a formal model will almost certainly result in more soundness bugs like #44218 showing up down the line. Without a formal model, it is impossible to define what a valid optimization even is.

I think LLVM should officially adopt the semantics of IEEE 754 as the intended model for the basic operations. On ISAs which do not natively support these operations (e.g. x87), they should instead be emulated, or failing that, deferred to a softfloat library. If the ISA has instructions implementing nonstandard semantics, then those instructions should be exposed as intrinsics or similar.

Regarding intrinsics for trigonometric and transcendental functions, I think the text should be changed to refer to an "implementation-specific approximation", or similar. In any case, dynamic linking makes this largely impossible to optimize around. (Ideally, these should be correctly-rounded as well in the year 2024. I am pleasantly surprised to hear that llvm-libc has this!)

@nunoplopes

… to IEEE-754 (#102140) Fixes #60942: IEEE semantics is likely what many frontends want (it definitely is what Rust wants), and it is what LLVM passes already assume when they use APFloat to propagate float operations. This does not reflect what happens on x87, but what happens there is just plain unsound (#89885, #44218); there is no coherent specification that will describe this behavior correctly -- the backend in combination with standard LLVM passes is just fundamentally buggy in a hard-to-fix-way. There's also the questions around flushing subnormals to zero, but [this discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378) seems to indicate a general stance of: this is specific non-standard hardware behavior, and generally needs LLVM to be told that basic float ops do not return the standard result. Just naively running LLVM-compiled code on hardware configured to flush subnormals will lead to #89885-like issues. AFAIK this is also what Alive2 implements (@nunoplopes please correct me if I am wrong).

@nunoplopes

… to IEEE-754 (llvm#102140) Fixes llvm#60942: IEEE semantics is likely what many frontends want (it definitely is what Rust wants), and it is what LLVM passes already assume when they use APFloat to propagate float operations. This does not reflect what happens on x87, but what happens there is just plain unsound (llvm#89885, llvm#44218); there is no coherent specification that will describe this behavior correctly -- the backend in combination with standard LLVM passes is just fundamentally buggy in a hard-to-fix-way. There's also the questions around flushing subnormals to zero, but [this discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378) seems to indicate a general stance of: this is specific non-standard hardware behavior, and generally needs LLVM to be told that basic float ops do not return the standard result. Just naively running LLVM-compiled code on hardware configured to flush subnormals will lead to llvm#89885-like issues. AFAIK this is also what Alive2 implements (@nunoplopes please correct me if I am wrong).

@nunoplopes

… to IEEE-754 (llvm#102140) Fixes llvm#60942: IEEE semantics is likely what many frontends want (it definitely is what Rust wants), and it is what LLVM passes already assume when they use APFloat to propagate float operations. This does not reflect what happens on x87, but what happens there is just plain unsound (llvm#89885, llvm#44218); there is no coherent specification that will describe this behavior correctly -- the backend in combination with standard LLVM passes is just fundamentally buggy in a hard-to-fix-way. There's also the questions around flushing subnormals to zero, but [this discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378) seems to indicate a general stance of: this is specific non-standard hardware behavior, and generally needs LLVM to be told that basic float ops do not return the standard result. Just naively running LLVM-compiled code on hardware configured to flush subnormals will lead to llvm#89885-like issues. AFAIK this is also what Alive2 implements (@nunoplopes please correct me if I am wrong).

github-actions bot added the new issue label Feb 23, 2023

EugeneZelenko added documentation llvm:core and removed new issue labels Feb 23, 2023

jcranmer-intel added the floating-point Floating-point math label Sep 6, 2023

Muon mentioned this issue Oct 19, 2023

add float semantics RFC rust-lang/rfcs#3514

Merged

Muon mentioned this issue Apr 25, 2024

Difference between compile-time and runtime float precision on 32-bit x86 without SSE can cause miscompilation leading to segfault #89885

Open

beetrees mentioned this issue Jul 8, 2024

LLVM miscompiles consecutive half operations by using too much precision on several backends #97975

Open

11 tasks

traviscross mentioned this issue Jul 27, 2024

Tracking Issue for float_semantics RFC 3514 rust-lang/rust#128288

Open

8 tasks

RalfJung mentioned this issue Aug 6, 2024

[IR] LangRef: state explicitly that floats generally behave according to IEEE-754 #102140

Merged

arsenm closed this as completed in #102140 Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic floating-point operations are underspecified #60942

Basic floating-point operations are underspecified #60942

Muon commented Feb 23, 2023 •

edited

Loading

arsenm commented Feb 6, 2024

RalfJung commented Feb 6, 2024

jcranmer-intel commented Feb 6, 2024

andykaylor commented Feb 7, 2024

jcranmer-intel commented Feb 7, 2024

Muon commented Feb 8, 2024

Basic floating-point operations are underspecified #60942

Basic floating-point operations are underspecified #60942

Comments

Muon commented Feb 23, 2023 • edited Loading

arsenm commented Feb 6, 2024

RalfJung commented Feb 6, 2024

jcranmer-intel commented Feb 6, 2024

andykaylor commented Feb 7, 2024

jcranmer-intel commented Feb 7, 2024

Muon commented Feb 8, 2024

Muon commented Feb 23, 2023 •

edited

Loading