-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Our floating point semantics were a mess #237
Comments
Worth noting: we can't expect floating point to be handled consistently on all architectures and over all datatypes that use floating points, yet it might be confusing if floating points have wildly different behavior based on the box they're in. rust-lang/rfcs#2977 (comment) |
This also reminds me of rust-lang/rust#75786 -- We should probably have an explicit accuracy promise documented somewhere, including for library functions. (Hopefully we can at least promise ±1ULP everywhere, though some things really ought to be ±½ULP.) |
Usually, that's why specs can be under-specified -- they can leave room for implementations to differ, either via non-determinism (that's what wasm does for NaNs; would be interesting what they do for denormals) or via introducing unspecified target-specific "canonicalization" or similar mechanisms.
That's just trig (and other) functions being platform-specific (the division/multiplication in the title is a red herring). So it's related but the issue here is about how operations provided by the core language behave. We can worry about trig functions once we have that sorted out. ;) |
Adding some background data: I recently discovered that Intel recommends setting the denormals-are-zero and flush-to-zero flags on their C++ compiler unless floating point denormal accuracy is application critical: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/understanding-floating-point-operations/setting-the-ftz-and-daz-flags.html In addition, the fact that a floating point op can vary in speed by 2 orders of magnitude is a rich source of data for timing attacks: https://cseweb.ucsd.edu/~dkohlbre/papers/subnormal.pdf Also, explicit fused multiply-add ops are part of the IEEE754-2008 standard and usually can lead to increased precision by reducing the number of roundings that occur... but that can also reduce the accuracy of later operations that encode assumptions like "this is commutative, right?" |
Speed is not currently part of the Rust specification, so that aspect, while certainly relevant in general, is unrelated to specifying floating-point behavior in Rust.
AFAIK LLVM does not introduce FMUs unless we explicitly set a flag, so right now I think there is no problem here. Unless you want to suggest we should introduce FMUs; that could complicate our floating-point story even further but in ways that are orthogonal to the problems listed here. There was at least one attempt at an RFC here (rust-lang/rfcs#2686), but since it is a hypothetical future (we don't introduce FMUs right now) I don't think it needs tracking here.
What does this mean for Rust programs? This sounds a bit like a fast-math style flag? Similar to the optimizer introducing FMUs, to my knowledge the complications around that are orthogonal to figuring out what the Rust semantics for completely standard IEEE floating point operations are when NaNs or weird platforms (i686) are involved. Fast-math is its own can of worms, and very little is known about how to specify it precisely. In contrast, the issue here is mostly about figuring out what LLVM (and, by extension, hardware) actually does; writing a reasonable spec is easy ("copy wasm"), but making sure LLVM conforms to that spec is hard because, as usual, LLVM IR behavior is scarcely documented. Or is this related to the denormal problem around SSE that you raised earlier? |
I think a separate issue is warranted for denormals and fast math issues, as long as a fix to our NaN and x87 issues don't block potential avenues there they're almost entirely separate. |
…-obk avoid promoting division, modulo and indexing operations that could fail For division, `x / y` will still be promoted if `y` is a non-zero integer literal; however, `1/(1+1)` will not be promoted any more. While at it, also see if we can reject promoting floating-point arithmetic (which are [complicated](rust-lang/unsafe-code-guidelines#237) so maybe we should not promote them). This will need a crater run to see if there's code out there that relies on these things being promoted. If we can land this, promoteds in `fn`/`const fn` cannot fail to evaluate any more, which should let us do some simplifications in codegen/Miri! Cc rust-lang/rfcs#3027 Fixes rust-lang#61821 r? `@oli-obk`
Not sure where to post this comment, but |
@thomcc wrote a lengthy post about this in https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/float.20total_cmp/near/270506799, and the surrounding conversation is also relevant. |
I believe I have devised a path forward which may allow us to cut out the i686 issue, by simply having Rust functions use the calling convention that LLVM annotates them with after optimizations and guaranteeing SSE2 is enabled and x87 is disabled outside |
As far as the NaN thing: Both GCC and LLVM incorrectly conflate "the NaN payload is implementation defined" with "the sign is |
Wait, does LLVM really treat any part of a NaN as |
I think they treat producing a NaN value as logically originating from an undefined origin every time it is produced, and thus feel entitled to select an arbitrary value each time. So perhaps I should say they treat it as round-tripping through the |
Ah okay. As long as |
As I stated elsewhere, those are compile-time constants. Rust also happens to abstract over |
LLVM does not necessarily abstract over compilation targets as much as you might think it does. I have several times asked LLVM developers about how LLVM would handle various somewhat unusual compilation requests, and been frequently told that the frontend should lower to operations the machine can specifically perform. There are some things that LLVM does abstract over but less than you might imagine when you get to actually pushing on the edges of the LangRef, so it is a "target independent" code generator only if you apply a huge series of asterisks to that, even before we introduce actual inline assembly to the mix. |
I don't think we have a concept of target-specific constants currently, but it sounds like a reasonable enough thing to add.
|
That's a fundamental difference between Rust on all platforms other than Wasm, and Wasm. Rust doesn't make any attempt to hide the target architecture; source code can explicitly ask what the architecture is. In Wasm, the target architecture is hidden; it's not known at all at source-code compile time, and at runtime it's only observable if you know what NaN bits to look for, or maybe if you look really closely at the behavior of memory shared between threads. On everything except Wasm, Rust could be ok saying "the NaN generated by 0.0/0.0 is a target-specific constant". But Wasm fundmentally doesn't know what the target architecture is going to be until at least program startup time. So on Wasm, making it a compile-time constant would require taking a performance hit on at least one popular architecture. We don't have comprehensive data, but one benchmark suggests it could be in the 5%-15% range on floating-point code. |
Oh right, so this is not really like A "value picked non-deterministically once at program startup" would still work though, if wasm would be willing to guarantee that. Though the wasm issue mentions migrating live code from x86 to ARM, so maybe wasm has to stay non-det... in which case the best the Rust can do is say "compile-time constant on some targets, fully non-det on others". |
I know that this involves fixing a lot of things with LLVM, just thought I'd put in my two cents before an RFC is written up (FWIW, my PhD topic is on floating-point arithmetic decision procedures). Happy to answer any questions.
|
That is a non-goal for now, as far as I am concerned.
I don't think we want this; also see rust-lang/rfcs#3352. I think we want to just say that const-eval produces some unspecified result; the result is deterministic for any given build of the compiler and target configuration but may otherwise vary. |
Does that mean that it's OK if the GCC and LLVM backends produce a different result for |
Isn't that already the case? LLVM will const-eval floats IIRC, and it does not necessarily use the same sign propagation conventions as the target, which means that unless the optimizations exactly line up you can get different results on different compilers for that (not |
That's okay, although it would be preferable if you don't close that off. It's really painful to emulate other rounding modes.
It's not true that we need to relax the restrictions on const evaluation in order to allow floating-point to be used in it. We're currently assuming a fixed rounding mode already (round to nearest, ties to even), so the results of all floating-point computations that don't depend on the unspecified bits of a NaN are already fully determined. Any inconsistency or apparent nondeterminism in NaN-free code is a bug. Among IEEE 754-conforming hardware implementations, the only differences may be in the production of NaNs. Picking an arbitrary NaN among the possible NaNs allowed by IEEE 754 is incorrect, since the hardware will actually produce specific NaNs, we just don't know which. It is surprising if const evaluation produces results which are unobtainable in a runtime evaluation. This is why we must overapproximate. Note again that the only instances in which the result is not determined are those in which we directly or indirectly depend on the unspecified bits of a NaN.
Yes. That expression does not produce a fully determined result. A NaN is encoded as having any sign bit, all exponent bits set, and at least one significand bit set. The result of that expression will be a quiet NaN, because all computations produce quiet NaNs. Those are the only restrictions on the result. However, IEEE 754 does not require particular encodings for quiet or signaling NaNs, and leaves it to the target. Furthermore, different ISAs just produce different NaNs. In that regard, it cannot be soundly constant-folded without knowing the details of the target (if the ISA even specifies which NaNs it produces in that instance). |
That's exactly what is being discussed in rust-lang/rfcs#3352. I agree it is surprising but I think it is better than the alternatives. (Your statement boils down to "runtime behavior must be superset of compiletime behavior", which is being discussed in the RFC thread, though not yet in the RFC text.) So let's discuss that question there and not here. Also note that all your statements assume that Rust float semantics are exactly IEEE float semantics, which is not a given. We eventually might adopt something like the wasm semantics which makes more guarantees than IEEE when it comes to NaNs.
This statement is in fact wrong if Rust uses IEEE float semantics. Since under those semantics NaNs bits are picked non-deterministically when the NaN is produced, Rust can constant-fold such operations to an arbitrary NaN. IOW, Rust does not guarantee that NaNs behave the same way as they would if you were to write assembly by hand. It guarantees that they behave as in a valid IEEE implementation (and maybe even a particular variant of that, as in the wasm case), but it doesn't have to be the same IEEE implementation as what your hardware does. |
I am indeed expecting that Rust uses IEEE 754 semantics. After all, it's what almost all hardware implements, and that's fast.
An arbitrary quiet NaN, specifically. However, if we're using the hardware FPU, we don't determine which NaNs are quiet, so the result is still target-dependent. |
I am indeed expecting that Rust uses IEEE 754 semantics. After all, it's what almost all hardware implements, and that's fast.
Hardware *refines* IEEE, making particular choices for the nondeterministic operations.
For nondeterministic specs it is important to differentiate "Rust implements the spec" and "Rust exactly matches the spec" (refinement vs equivalence / subset of behaviors vs equal set of behaviors). Those things are not the same. I was asking whether Rust exactly matches the spec and you answered a different question by saying that Rust refines/implements the spec. Everyone agrees on the latter but there is some demand for Rust to have less non-determinism than what IEEE offers.
The proposal was to start by saying that Rust fully exploits the non-determinism offered by IEEE without any regard for the current target. This means optimizations and const-eval are free to make choices that the hardware would never make.
Further down the road we might want to refine this, making guarantees not made by IEEE: either by saying that we exactly match hardware behavior (which as you said makes optimizations tricky, and means we have to make some choices regarding how closely const-eval matches runtime behavior) or by carefully making some guarantees that still overapproximate hardware (like what wasm does). FWIW LLVM considers it a non-goal to match hardware behavior so that option would require a bunch of LLVM work before we could even seriously consider it.
|
Ah, I see. So, exactly matching the spec is in a certain sense ill-posed. The spec requires that both quiet and signaling NaNs exist and that computational operations only return quiet NaNs. That is, there is more than just nondeterminism at play here. For a (nondeterministic) abstract machine to comply, it must still at minimum choose some NaNs to be quiet and some to be signaling. Notably, different machines can make different choices. However, if necessary, Rust can leave the exact bit patterns unspecified. Implementation-wise, there are a few pitfalls with actually achieving compliance. Firstly, at some point, rustc has to output specific bits for NaNs, and there are real-world CPUs which disagree about signaling and quiet NaNs. Notoriously, MIPS does the opposite of x86 and ARM. Secondly, LLVM is still probably far away from spec compliance, even disregarding NaNs (llvm/llvm-project#44497, llvm/llvm-project#43070, llvm/llvm-project#24913, llvm/llvm-project#25233, llvm/llvm-project#18362). |
TBH my inclination is to entirely ignore everything related to signalling NaNs for now. AFAIK that is what LLVM does, so some groundwork will be needed elsewhere before we can even attempt to do any better. |
That's somewhat tenable, I think? We don't currently have any way of distinguishing sNaN from qNaN. The ordering given by |
I have written a pre-RFC to document our current floating-point semantics. Please comment on Zulip or hackmd! |
I'll have a look through and comment on it when I get back either tomorrow or when I get back home in a few days. |
So... I think this is not so much of a mess any more? For the x86-32-specific trouble, the FCP in rust-lang/rust#113053 went through. These are unambiguously platform-specific bugs but are hard to fix. We now have rust-lang/rust#114479 and rust-lang/rust#115567 tracking this. Most of the rest boils down to "we don't guarantee much about the bits you get out of a NaN-producing operation". That is tracked in rust-lang/rust#73328. My Pre-RFC describes our current de-facto guarantees. I assume that RFC will need approval by t-opsem and t-lang. Let's keep an issue open on our side as well until we have a team decision on this. |
…bilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327
Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327
The RFC rust-lang/rfcs#3514 should resolve all questions for good. :) |
Superseded by rust-lang/rust#128288, now that the RFC got accepted! |
Floating-point semantics are hard, in particular the NaN part, but this should describe them accurately -- except on x86-32, for more than one reason.
We still need to officially decide that those are our intended semantics though. rust-lang/rust#73328 tracks that on the rustc side.
Historic info
There are several ways in which our de-facto FP semantics currently are broken:
NaN
s rust#73328, which is a meta-issue. Specific issues: Wrong signs on division producing NaN rust#55131 (on all platforms)I'm adding this here because figuring out the semantics of Rust is part of the goal of the UCG (I think?) and we should have an overarching "FP semantics" tracker somewhere.
The text was updated successfully, but these errors were encountered: