-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the motivation for NaN
canonicalization?
#1463
Comments
The use case that motivated this was NaN-boxing at the wasm level. When doing NaN-boxing, it's common to interpret the canonical NaN bitpattern as an actual floating-point NaN, and non-canonical NaN bitpatterns as other types of data. When doing NaN boxing, at the point where one is doing a floating-point arithmetic operation, one has already ensured that the operands are either non-NaN or canonical NaN (because otherwise they're not floats and floating-point arithmetic doesn't make sense). Wasm's NaN rule means that the result in that case will also be either non-NaN or canonical NaN, so the wasm code doesn't need to add extra instructions to manually canonicalize it. A real-world user of NaN boxing in wasm is the SpiderMonkey JS engine compiled to wasm. This form of NaN propagation is automatically provided in instructions that implement IEEE 754's NaN propagation recommendation, as well as hardware that only produces canonical NaNs, which covers most instructions on popular architectures. x86's min and max instructions indeed do not have this behavior, and sometimes require extra instructions to implement wasm's An IEEE 754-2019 |
The question is not really about benefits of IEEE 754, but rather about choices we make when we interpret the standard. This is not the only choice that has been made, for example we don't support signaling
Spec defines how operations treat
Some previous discussions for backgroundA couple of issues from this org that highlight that JavaScript does not have a notion of canonical We are following interpretation of 754-2019 with a specific canonical |
Wasm is designed to be a compilation target for other languages. NaN boxing is used in some popular language implentations. Almost all hardware, including almost all x86 instructions, already have the specific behavior needed for NaN boxing. Wasm's design here exposes that behavior, for the benefit of these language implementations. |
I am not sure I understand your response, I am having trouble connecting boxing to the semantics of FP operations. Non-FP data within a NaN payload in NaN-boxing is never intended to be passed into FP operations, the technique would work regardless of what the FP ops do with NaNs. It is relatively easy to see if you follow how (for example) adding two boxed integers would work - it would not be done by the corresponding floating point operation, as they boxed values are not numbers as far as the operation is concerned. 754-2019 introduces min/max behavior with respect to sign of zero, Wasm and JS specs both follow that. Canonical NaNs are a bit more ambiguous. To illustrate that, there are three options for canonical NaN behavior (more strict to less strict):
I can see situations where behavior w.r.t sign of zero is observable and desired, for example it can be an indication of underflow and it would be very desireable to preserve the sign. From that point of view it makes sense to pay the price of having to use more instructions. In contrast, I am still trying to understand what we are getting in return for this rule (and conversely what JS is missing). |
@penzn, under NaN-boxing, proper float NaNs must be canonical, otherwise they would be confused with pointers, e.g., by a GC. Hence, you want all float ops to produce canonical NaNs (provided they aren't fed non-canonical ones, which indeed should never happen). |
Yes, it seems that if My question then is what do interpreter implementers do when they build for native targets? This must have been solved there, as those are not going to preserve canonical |
Interpreter code that I've seen does seem to rely on native targets preserving NaN payloads, for add/sub/mul/div/sqrtl, and most native targets do propagate NaNs for those in practice. For min/max, intetrpreters seem to have explicit NaN tests so they determine the NaN result themselves. |
TL;DR but this is perhaps related: the Release 2.0 draft of the WebAssembly Specification says in one place: "There is no undefined behavior, i.e., the execution rules cover all possible cases that can occur in a valid program, and the rules are mutually consistent." Then it says elsewhere: "Some operators are non-deterministic, because they can return one of several possible results (such as different NaN values)." I wonder what other "some operators" there are which might "return one of several possible results". Even seemingly trivial undefined behavior is potentially dangerous. On the "so what" end of the spectrum, I might be able to approximate the proportion of processors you have from whichever vendor. But at the shopstopper end, distributed computing could be brought to a halt because consensus fails to crystalize (or takes really long due to sporadic disagreement among outputs). Theoretically, this could even serve as a de facto DDoS attack vector by creating superfluous cross-verification traffic. Not that fixing this would be trivial, however. Unless of course it's merely a doc bug. |
The only arithmetic operations that can return "one of several possible results" are floating-point operation which can return NaN, and the only permitted variations are in the NaN bits. The vast majority of code in the world never looks at NaN bits, so this rarely matters in practice. But indeed, rarely is not never, so I am also now in the process of proposing a fully-deterministic mode for NaNs, for environments that want to have it. There is some discussion here: WebAssembly/relaxed-simd#108 |
That's why having canonical Also, are there any SIMD users of NaN-boxing? I understand that the canonicalization has been added long before SIMD. |
That would add overhead though. The motivation for the feature was to avoid this overhead.
Historically, Wasm's design has not prioritized the convenience of interpreter implementors.
I'm not aware of any today. But in a program that mixes SIMD and scalar on the same data, it could end up being important for SIMD operations to preserve the canonical NaN property to prevent the scalar operations from interpreting non-canonical NaNs as boxed values. |
To clarify, this is not a contradiction, as undefined behaviour ≠ non-deterministic behaviour. In particular, the former quote is carefully phrased to say that the "rules cover all possible cases", which doesn't exclude the possibility of multiple allowed cases in some places, as per the latter quote. Which ones those are is still fully defined. Undefined behaviour is much worse: it implies that anything could happen, including launching missiles. In fact, the usual interpretation of undefined behaviour as e.g. defined in the C world is even worse than that, since it implies that the entire execution is undefined, i.e., anything could have happened even before execution reached an undefined point.
The standard example is anything related to threading, i.e., accesses to shared memory can yield multiple possible results, dependent on timing or possible reordering in the CPU or engine. Unlike C, Wasm still fully defines all allowed behaviours. |
@sunfishcode Thanks, I read your other NaN thread. I think the relaxed SIMD thing would be a good idea, in case the programmer sees no risk in hardware-dependent outputs and just wants maximum speed. "But indeed, rarely is not never" Yeah, that's why I wanted to point out the risks, and I'm happy that you're already aware. I understand that the question before you is: what to do about all the WASM modules already out there which didn't contemplate this? I really wonder if the canonicalization penalty is that high. I think the most common affected instruction would be sqrt, which is slow enough that adding a NaN inspection after the fact would barely move the needle on overall performance. (And most CPUs have status bits that you can inspect in order to do this faster. And branch prediction would hide most of the cost even so.) @rossberg I know what you mean and that's a good explanation, but if a WASM module can behave like a random oracle, however biased, there's still a potential attack surface. |
I think it adds overhead either way - currently it adds it to (x86) implementations of It is possible to say that it adds the overhead to x86 execution only, while the alternative would've forced NaN-boxing users to add it to all platforms. This doesn't make much sense to me either, given that NaN-boxing is a relatively rare, while the hardware this penalizes is rather common. Not to nitpick, but I find your last two statements slightly at odds with each other 😉 If we are not prioritizing NaN-boxing interpreters, then we should not be worried about purely hypothetical cases that would affect them, and vice versa. |
Sorry if I missed it, but what was the motivation for keeping the non-determinism around the sign-bit of NaNs? This is unfortunate, since it means code that sorts floats via the IEEE-754 2019 Concretely, this is kinda problematic for the Rust stdlib, since we've stabilized It's difficult for us to justify investing time into fixing the non-determinism on other platforms (currently it tends to vary between debug and release builds on several non-wasm targets for various reasons) due to the fact that WebAssembly considers it non-deterministic. To be clear, there's a lot of leeway for how this is specified from our point of view. We'd just like that it be deterministic based on the inputs -- the status quo is pretty bad for us, and ends up causing some bugs in programs that don't particularly care about NaN beyond not expecting it to non-deterministically sort in a different location (well, to be more precise, to non-deterministically have values which sort into different locations). See rust-lang/unsafe-code-guidelines#237 for some more discussion here, and I'm sorry if this was the wrong place to bring this up (I can file another issue if that would be better). |
I kind of barely understand "A canonical NaN is a floating-point value ±nan(canonₙ) where canonₙ is a payload whose most significant bit is 1 while all others are 0" tbh. Is that supposed to imply both signs for a canonical NaN should be considered canonical NaNs? I can derive that looking at other parts of the formal syntax as well, but it seems to be in conflict with various remarks on the operational semantics. |
Ah, I misunderstood your question then. Wasm hasn't historically prioritized the convenience of interpreter-based wasm engines, but that's different than interpreters compiled to Wasm. The idea at the time was that it's only min and max and not the entire instruction set, and in some cases, optimizers can avoid the overhead, such as when one of the operands is constant, which is a fairly common case for min and max. And x86 designers would seem to have several reasons for adding IEEE 754-2019 minimum and maximum instructions, and while it's well known that it takes time to get to shipping new hardware and more time before widespread use, we hope Wasm will be around for a much longer time than even that. Reasonable people can disagree with this reasoning, but that was the reasoning for the choice made in that time. The main motivation for non-determinism of the sign-bit of NaNs is that x86 and ARM differ. When x86 generates a NaN from non-NaN operands (eg. 0.0/0.0), it sets the sign bit to 1; when ARM does it, it sets the sign bit to 0. This behavior isn't based on the inputs at all, and neither x86 nor ARM hardware provide any way to make it be based on the inputs. We could add extra instructions to canonicalize NaNs, but at the time the rule was made, it was guessed that that would add too much overhead to be worth doing. One might ask, why not have a rule saying "it's either 0 or 1, but it doesn't change within a run of a program?". At the time the rule was made, the spec didn't have that kind of "nondeterministic but consistent within a run" concept. Today, with the relaxed-simd proposal, there are discussions about adding this kind of concept, however even if we made such a rule today, existing wasm engines conforming to older versions of the spec wouldn't be guaranteed to follow any new NaN restrictions. And there are people doing live migration of wasm programs between x86 and ARM today. I have a proposal to add a mode where NaNs are deterministic, however the current design is such that toolchains like Rust wouldn't be permitted to depend on this behavior, so I don't think that helps with the issue here. So I don't see any easy answers here. Yes, the terminology is confusing; the wasm spec uses the term "canonical NaN" to mean a NaN with either sign, however outside the spec in other contexts, "canonical NaN' often means a NaN with a zero sign bit. |
@thomcc one possible strengthening we could investigate for our NaN semantics would be to make their results deterministic based on inputs across the lifetime of a single cross-module "run" - i.e. the result of For Wasm folks, I'm essentially suggesting that we could strengthen |
@workingjubilee you are exactly right, both signs are canonical. @thomcc - as @sunfishcode pointed out, the choices are thanks to differences in hardware behavior, and it would be expensive to produce only one Speaking of which, how does Rust handle this? I am no expert on it, but it looks like the docs say this about NaN constants:
This makes me think that it is mostly in line with other languages, in terms that you can't expect (depend on) exact |
I don't think that's what nondeterminism means to us, really. The results of NaN inputs on x86 or Arm being nondeterministic is not correct. It's also not correct for the float constant
For the same machine state, the same instruction, and the same inputs, it will be the same output, thus you can always predict the output value. Thus, the output is dependent on the inputs, fully... just in a trivial way, because in the absence of NaNs to propagate, all values map to the same output. This is not true, as far as I understand it, for wasm, because nondeterminism means that it is acceptable if certain bits are filled in from a randomness function. Thus the same inputs to the same instruction and with the same machine state may have different output. This matters because Rust allows compile-time function evaluation. Thus, as far as Rust is concerned, the difficulty of implementation is also the difficulty of using it as a compilation target. This is not uniquely true just of this case: it's true of any language that attempts to have real float semantics instead of arbitrarily kowtowing to "whatever the hardware does". Most nonconformant float implementations at least allow shimming their behavior, but true nondeterminism makes even that harder. This, of course, is what you might think of as a nonsense implementation: "Surely no one would ever do that." However, wasm does not actually rule it out, as I understand it. |
@workingjubilee things are deterministic at the level of platform asm, but Wasm, Rust, and LLVM need to specify behaviours that encompass all the possible observable outcomes from different compilation targets (e.g. see the comment above about divergence between x86 and Arm). The "easy" way to solve this problem is to over-approximate things at the source level through some kind of limited non-determinism - with the added concern that Rust needs to care not only about over-approximating platform behaviours, but to some extent the solutions chosen by LLVM and Wasm, since it's targetting those directly. It looks like Rust has not yet defined a policy here, but interesting discussions are happening in the issue @thomcc linked above, with a possible stop-gap solution being something like Wasm's non-determinism. |
@conrad-watt Yes, I am quite aware and have been part of many of those discussions. Again, that is not "non-determinism as she is spoke". If you generate 5 floating-point numbers, say, 0.0, -0.0, 0.5, -0.5, and what the hell, Infinity, and then generate 5 NaNs on a given platform, then sort them according to IEEE754's [NaN, NaN, NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf]
[-0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN, NaN, NaN] Except on wasm, there are instead also these possible results, because of NaN sign nondeterminism: [NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN, NaN]
[NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN]
[NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN]
[NaN, NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN] |
Related discussion in other venues:
The rules documented for Rust currently are not the rules we want, they are the rules we have because LLVM and wasm don't let us implement the rules we want.
Someone mentioned on the LLVM thread that a possible implementation of the IEEE spec would be to embed some instruction pointer information in the NaN, I guess to let people figure out where it was generated. Not sure if any hardware actually does that though. It would already be in conflict with wasm's "canonical" NaN spec. |
@workingjubilee apologies, I think we may be talking past each other somewhat - the non-determinism I'm referring to in my comment above is expressed at a semantic/spec level. We can certainly investigate whether it would be feasible to strengthen Wasm's specification so that only the "first two" outcomes are allowed - from a spec point of view this would still be a (more constrained) kind of non-determinism, although I agree potentially more helpful. |
I take this as the Rust rules currently don't require strict behavior. I assume there is some collective decision making process involved, which mean that changing the rules is not a certainty yet. As an aside, I'm not sure how LLVM and Wasm prevent you from changing the rules though - would not it be possible to generate IR conformant with the desired behavior?
What functional difference can be made out from |
Sorry, the comment I deleted did not belong here |
I am wondering whether there are any uses of Wasm that depend on
NaN
value canonicalization.For some time I thought that our
NaN
semantics are identical to ECMAScript, but it looks like the latter (with some minor change to formatting, emphasis mine) does not distinguish betweenNaN
bit patterns:while in Wasm a non-canonical
NaN
cannot be returned from operation consuming a canonicalNaN
:This is much stricter property, and it comes with a cost for operations that might otherwise change
NaN
bits (fmin
andfmax
on x86 for example). However this distinction is completely ignored by most programming languages, which on the Web means Wasm code has to perform a somewhat costly conversion the rest of the browser is completely oblivious to. Also, for canonical/non-canonical distinction to be practically useful, non-canonicalNaN
s need to be produced and consumed as a form of metadata, however spec does not mandate their values are preserved, which makes the distinction at least partially moot.What are the use cases that practically depend on this behavior? It is of course possible to construct a test that would depend on exact bit pattern, but what language/library does use that in practice (i.e. where the distinction is not 'implementation defined' or 'undefined')?
The text was updated successfully, but these errors were encountered: