-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic floating-point operations are underspecified #60942
Comments
I think we just need to document that float=IEEE float. In the past we pretended that targets were allowed to codegen as some other format, but that never realistically worked and is an onerous burden with no known user |
There's some comments on #44218 about the x87 precision issue. It's something that's possible to fix, but given the lessening importance of 32-bit x86 (especially given that SSE-based x86 is a practical option there), the drive to fix it isn't high. The other main potential issue I can think about is FTZ/DAZ. I'm not a connoisseur of all the architecture manuals for the architectures we support, but on a quick flick through them, I'm not entirely able to rule out the possibility that we target some architecture where the hardware can support subnormals properly per IEEE 754. But in general, our denormal story is already a bit of a mess (see also #80475 and the discussion ongoing there). Most brutally honestly, our floating-point semantics are "assume IEEE 754 semantics, with IEEE 754-2008 signaling bit position, in terms of layout" with computation rules being "whatever the hardware does, but the compiler pretends it's IEEE 754 in default environment unless there's flags saying otherwise." Not the semantics I'd want to define, but if the user has some guarantees about the reasonableness of the FP hardware, then the compiler can generally uphold those guarantees.
Another area of underspecification is the interaction of things like fast-math flags and strict floating-point mode. We don't actually document what happens if you have an The current LangRef does specify this:
|
With this definition, we run into problems with constant folding. For example, if the native fdiv instruction for the target hardware doesn't return correctly-rounded results, then constant folding fdiv to a correctly rounded result may be a value-changing transformation. This question came up recently in a discussion among SYCL developers. Does the LLVM IR fdiv instruction require that the result be correctly rounded? My initial reaction was, "of course it does!" But when you think about this in terms of targets that don't have a correctly rounded fdiv instruction, that's not so obvious anymore. If I'm writing code for a device that I know doesn't have correctly-rounded native division, do I really want the compiler to insert a bunch of extra code to get a correctly-rounded result? Almost certainly not. So what are the semantics of fdiv? My take on it is that there is no specific requirement for accuracy, but the compiler isn't allowed to do anything that would change the result, which ultimately boils down to something like the description you gave. I think this runs into exactly the kinds of problems that @efriedma-quic was trying to explain to me here (https://discourse.llvm.org/t/propogation-of-fpclass-assumptions-vis-a-vis-fast-math-flags/76554), but I think this is something that we're going to need to learn to live with if we want to support non-CPU targets. Even without considering non-CPU hardware, we have this problem for things like llvm.cos. What are the semantics of this intrinsic? "Return the same value as a corresponding libm ‘cos’ function but without trapping or setting errno." But we haven't said which implementation of libm we're talking about, and so if the compiler constant folds a call to this intrinsic, but the compiler was linked against a different libm implementation than the program being compiled will use, that may be a value-changing optimization. I've brought this up before (https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138) and there wasn't much support for maintaining strict numeric consistency in cases like this, but I still think that should be our default behavior. |
Putting on a few different hats here, so bear with me: From the perspective of a language designer on a new programming language, it is useful to have code that works reliably across the hardware spectrum, even if it comes at some performance costs. Consider that newer languages specify that something like From the perspective of someone trying to support "exotic" hardware (whether as a compiler writer or a programmer), it's generally the case that they want to the high-level language operator to map more directly to the instruction. So if you've got a hardware that isn't capable of doing correctly-rounded Note the tension between these two hats from a user perspective: a user who's writing portable code probably would prefer to get the same results on all platforms, at the cost of performance, whereas a user who's targeting only a particular hardware would probably prefer to get the fastest code on that hardware, at the cost of different results on other hardware. Probably the only feasible way to square this tension is to simply provide both a way to get a fully-accurate result and a way to get a loosened-accuracy result.
When the C++ committee talked about making these functions constexpr, I tried to point out this problem out to them and get them to care about it and abjectly failed. The general sentiment being along the lines "it's already a mess, how is this going to make it more a mess". As for what LLVM can do, we have correctly-rounded versions of these functions in llvm-libc, so in theory we could leverage that to at least make the optimizations not dependent on the host library... Another LLVM libm intrinsic problem is that most of those intrinsics are lowered to calls to libm if there's no hardware support for them, which means good chance they set errno. Another other LLVM libm intrinsic problem is now we have some optimizations that kick in only if you call libm functions directly and other optimizations that only kick in if you use the intrinsics because they are optimized in two different passes with two different sets of rules. IMHO, we do need to work on a better specification of floating-point semantics in general, but this is also a deeper conversation that's going to require multiple RFCs to the mailing lists, and several changes to LLVM, rather than some back-and-forth in this issue. |
Firstly, I think it should be kept in mind that anything less than a formal model will almost certainly result in more soundness bugs like #44218 showing up down the line. Without a formal model, it is impossible to define what a valid optimization even is. I think LLVM should officially adopt the semantics of IEEE 754 as the intended model for the basic operations. On ISAs which do not natively support these operations (e.g. x87), they should instead be emulated, or failing that, deferred to a softfloat library. If the ISA has instructions implementing nonstandard semantics, then those instructions should be exposed as intrinsics or similar. Regarding intrinsics for trigonometric and transcendental functions, I think the text should be changed to refer to an "implementation-specific approximation", or similar. In any case, dynamic linking makes this largely impossible to optimize around. (Ideally, these should be correctly-rounded as well in the year 2024. I am pleasantly surprised to hear that llvm-libc has this!) |
… to IEEE-754 (#102140) Fixes #60942: IEEE semantics is likely what many frontends want (it definitely is what Rust wants), and it is what LLVM passes already assume when they use APFloat to propagate float operations. This does not reflect what happens on x87, but what happens there is just plain unsound (#89885, #44218); there is no coherent specification that will describe this behavior correctly -- the backend in combination with standard LLVM passes is just fundamentally buggy in a hard-to-fix-way. There's also the questions around flushing subnormals to zero, but [this discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378) seems to indicate a general stance of: this is specific non-standard hardware behavior, and generally needs LLVM to be told that basic float ops do not return the standard result. Just naively running LLVM-compiled code on hardware configured to flush subnormals will lead to #89885-like issues. AFAIK this is also what Alive2 implements (@nunoplopes please correct me if I am wrong).
… to IEEE-754 (llvm#102140) Fixes llvm#60942: IEEE semantics is likely what many frontends want (it definitely is what Rust wants), and it is what LLVM passes already assume when they use APFloat to propagate float operations. This does not reflect what happens on x87, but what happens there is just plain unsound (llvm#89885, llvm#44218); there is no coherent specification that will describe this behavior correctly -- the backend in combination with standard LLVM passes is just fundamentally buggy in a hard-to-fix-way. There's also the questions around flushing subnormals to zero, but [this discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378) seems to indicate a general stance of: this is specific non-standard hardware behavior, and generally needs LLVM to be told that basic float ops do not return the standard result. Just naively running LLVM-compiled code on hardware configured to flush subnormals will lead to llvm#89885-like issues. AFAIK this is also what Alive2 implements (@nunoplopes please correct me if I am wrong).
… to IEEE-754 (llvm#102140) Fixes llvm#60942: IEEE semantics is likely what many frontends want (it definitely is what Rust wants), and it is what LLVM passes already assume when they use APFloat to propagate float operations. This does not reflect what happens on x87, but what happens there is just plain unsound (llvm#89885, llvm#44218); there is no coherent specification that will describe this behavior correctly -- the backend in combination with standard LLVM passes is just fundamentally buggy in a hard-to-fix-way. There's also the questions around flushing subnormals to zero, but [this discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378) seems to indicate a general stance of: this is specific non-standard hardware behavior, and generally needs LLVM to be told that basic float ops do not return the standard result. Just naively running LLVM-compiled code on hardware configured to flush subnormals will lead to llvm#89885-like issues. AFAIK this is also what Alive2 implements (@nunoplopes please correct me if I am wrong).
Currently, the LangRef does not specify the results of basic floating-point operations (
fadd
,fsub
,fmul
,fdiv
) in any detail. APFloat uses IEEE 754 semantics, but the LangRef does not guarantee it. What guarantees are there about the behavior of floating-point code? If IEEE 754 is the intended model, then x87 codegen is completely broken and probably other targets are as well. If not IEEE 754, then what is the intended model?The text was updated successfully, but these errors were encountered: