Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How better handle negative NaNs? #1

Open
MaxGraey opened this issue Nov 17, 2018 · 30 comments
Open

How better handle negative NaNs? #1

MaxGraey opened this issue Nov 17, 2018 · 30 comments

Comments

@MaxGraey
Copy link

MaxGraey commented Nov 17, 2018

I think need better clarify how handle negative NaNs. Most of implementations in built-ins of LLVM, GCC, Go and Rust use non-sign agnostic for NaNs like:

signbit(+NaN) == false // +NaN => 0x7ff80000_00000000
signbit(-NaN) == true  // -NaN => 0xfff80000_00000000

But in spec this not strictly mentioned and it seems we need always handle signed and unsigned NaNs as false?

Relate to this discussion

@chicoxyzzy
Copy link
Member

chicoxyzzy commented Nov 17, 2018

-NaN is actually evaluates to NaN in JS

> function ident(n) {return n}
< undefined
> ident(-0)
< -0
> ident(-NaN)
< NaN

so I suppose it should be handled as NaN

@MaxGraey
Copy link
Author

const F64 = new Float64Array(1);
const U64 = new Uint32Array(F64.buffer);

F64[0] = NaN;
console.log('0x' + U64[1].toString(16));

F64[0] =-NaN;
console.log('0x' + U64[1].toString(16));

> 0x7ff80000
> 0xfff80000

@chicoxyzzy
Copy link
Member

There is no negative NaN in spec though

distinct “Not-a-Number” values of the IEEE Standard are represented in ECMAScript as a single special NaN value

https://tc39.github.io/ecma262/#sec-ecmascript-language-types-number-type

@hax
Copy link
Member

hax commented Nov 5, 2019

I found that all engines are actually have different raw bits for NaN and -NaN.

For example chakra implement it in chakra-core/ChakraCore#5905 .

And it seems Chrome recently also implement it (version 79+) though I have no time to find the original PR.

@MaxGraey MaxGraey reopened this Nov 5, 2019
@hax
Copy link
Member

hax commented Nov 5, 2019

// use TypedArray to expose the sign bit
// note this also use the coercion `ToNumber` semantic
Math.signbit = (() => {
	const LE = new Uint8Array(new Uint16Array([1]).buffer)[0]
	return function signbit(n) {
		const f64 = new Float64Array([n])
		const i32 = new Uint32Array(f64.buffer)
		return (i32[LE] >>> 31) === 1
	}
})()

console.log(Math.signbit(0))
console.log(Math.signbit(-0))
console.log(Math.signbit(Infinity))
console.log(Math.signbit(-Infinity))
console.log(Math.signbit(NaN))
console.log(Math.signbit(-NaN))
console.log(Math.signbit(-(-NaN)))
const negNaN = Number.POSITIVE_INFINITY / Number.NEGATIVE_INFINITY
console.log(Math.signbit(negNaN))
  • Chrome 80, FireFox 70 and Safari 13 all return interlaced false and true 😊
  • Chrome 78, Node 12 ~ 13 returnfalse false false false for NaN cases
  • Node 10 ~ 11 return false true false true for NaN cases 😊
  • Node 0.12 ~ 9 return false false false true for NaN cases
  • Node 0.8 ~ 0.10 return false true false true for NaN cases 😊
  • Old Chakra return true true true true for NaN cases 🤣

Note all tests are run on my MacBook Air (macOS High Sierra 10.13.6, Intel Core i5)

@MaxGraey
Copy link
Author

MaxGraey commented Nov 5, 2019

Interesting. Btw you could use simpler approach because JS should use LE for x84:

const F64 = new Float64Array(1);
const U64 = new Uint32Array(F64.buffer);

const signbit = x => (F64[0] = x, Boolean(U64[1] >>> 31));

@ghost
Copy link

ghost commented Mar 19, 2021

I came along and was wondering why special casing was made for NaNs too.

It wouldn't act like C's signbit at all then, but according to @chicoxyzzy, JS doesn't have a negative NaN.

If it isn't possible to create/use a NaN with an arbitrary bitset, then wouldn't one be able to use the bit manipulation implementations that most other languages use for signbit, without special casing NaNs, relying on the JS VM to canonicalize the NaN upon writing/reading/serializing it?

@hax
Copy link
Member

hax commented Apr 19, 2021

JS doesn't have a negative NaN.

I think as my previous tests, engines actually have negative NaNs, currently it could be treated as abstract leak of implementation details in some degree, but if introduce signbit, I suppose it should reflect them as is.

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

Exposing the bit patterns of NaN is a massive mistake in Typed Arrays, and one we should not extend anywhere else. Math.signbit should, like every non-Typed-Array part of the language, canonicalize NaNs and not distinguish between any bit patterns of any implementation's NaN values.

@ghost
Copy link

ghost commented Apr 19, 2021

Exposing the bit patterns of NaN is a massive mistake in Typed Arrays

If I may ask, why? NaN is just as much of a number as 53.5 is, as 8 is, as 0 is, as -0 is, as infinity is, etc, as least according to IEEE 754 semantics and rules. All of them have a hard bit-pattern, and because TypedArrays expose any of them, I'd argue that they should all be exposed.

Maybe... just maybe, the language spec should be changed to reflect modern implementations, and have different NaNs?

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

@crimsoncodes0 because in JS, explicitly and intentionally, there is supposed to only be one observable NaN value.

Typed Arrays expose them because the implementations that led to them didn't canonicalize. That doesn't mean it's a good decision.

Nothing should ever be added to the language that widens this unfortunate exposure.

@MaxGraey
Copy link
Author

MaxGraey commented Apr 19, 2021

I think as my previous tests, engines actually have negative NaNs, currently it could be treated as abstract leak of implementation details in some degree, but if introduce signbit, I suppose it should reflect them as is.

Yes, according IEEE 754 negative NaN is canonical and fully valid (chould be preserve sign and propagate with sign)

@MaxGraey
Copy link
Author

MaxGraey commented Apr 19, 2021

Exposing the bit patterns of NaN is a massive mistake in Typed Arrays, and one we should not extend anywhere else.

@ljharb In my opinion the big mistake is try to fix IEEE 754 on software (language or VM) level. Even WebAssembly which try to be most deterministic ISA/VM don't try to do this

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

All of JavaScript does this already, outside of typed arrays. It’s part of the language design.

@ghost
Copy link

ghost commented Apr 19, 2021

Would it be a web compatibility-breaking change to add to the TypedArray's spec that implementations must canonicalize NaN values from the Float{32,64}Array numerical accessors and DataView.getFloat{32,64}?

Presently, it sounds like the language is quite frankly... broken. Yes, it's a small thing, but it still breaks a fundamental part of the ES language spec, and explicitly putting a a step into the algorithms for reading memory into JS floats would fill this hole, and clear up this issue, as JS implementations would no-longer expose NaN bit patterns.

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

It wouldn't likely break the web, but the committee explicitly decided in 2015 to not mandate NaN canonicalization in Typed Arrays, for performance reasons, and I'm quite confident there's no appetite to revisit that decision.

@MaxGraey
Copy link
Author

It wouldn't likely break the web, but the committee explicitly decided in 2015 to not mandate NaN canonicalization in Typed Arrays, for performance reasons

And this totally make sense. How about relax NaN canonization to other lang parts? I don't think it may break the web

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

@MaxGraey other language parts aren't used in hot paths or perf-sensitive code like Typed Arrays are (that's their reason for existing). I would be strongly opposed to any attempt to further worsen the situation around NaN canonicalization in the language.

@MaxGraey
Copy link
Author

MaxGraey commented Apr 19, 2021

attempt to further worsen the situation around NaN canonicalization in the language.

Why? In user space bit signature of NaN doesn't matter at all. It may still canonize for FFI or something like this if it's necessary. Relax this requirement will simplify and speedup js engines

@ghost
Copy link

ghost commented Apr 19, 2021

Off-topic, but does ECMAScript's canonical NaN value have a canonical bitset?

in JS, explicitly and intentionally, there is supposed to only be one observable NaN value.

And is there any documented reasoning behind that decision? If so, could it be linked, so that we may at least understand this situation (a bit) better?

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

@crimsoncodes0 no, since the only bits of it are exposed via Typed Arrays.

The spec itself: https://tc39.es/ecma262/#sec-ecmascript-language-types-number-type.

In some implementations, external code might be able to detect a difference between various Not-a-Number values, but such behaviour is implementation-defined; to ECMAScript code, all NaN values are indistinguishable from each other.

@ghost
Copy link

ghost commented Apr 19, 2021

I can't open the spec's multi-megabyte webpage without causing my entire device to lag, or crashing my (mobile) browser, is there a way to open only a small section of the spec?


Besides that, I have one last question to help me assess this problem: does the ES spec say that the floating point number (5.0) has a bitset? Does it acknowledge that it has one or otherwise say that it does?

If it acknowledges that any numbers have bit-patterns, it should acknowledge that all numbers do, including not-a-number, otherwise the specification makes no sense whatsoever, and ought to be changed.

If it does not acknowledge that any numbers have bit-patterns, then TypedArrays and DataViews are just plain broken features in JavaScript, since they clearly expose these "non-existent" bit-patterns to user scripts.

@ljharb
Copy link
Member

ljharb commented Apr 19, 2021

Here's the same section on the multipage build: https://tc39.es/ecma262/multipage/ecmascript-data-types-and-values.html#sec-ecmascript-language-types-number-type

There's a note in there about the bit pattern; not sure if that answers your question.

That the language here is incongruous between "typed arrays" and "everything else" is true, but doesn't mean anything can change it. It also doesn't mean the incongruity should be worsened.

@hax
Copy link
Member

hax commented Apr 20, 2021

I would be strongly opposed to any attempt to further worsen the situation around NaN canonicalization in the language.

But I think the semantic of signbit() should expose the sign bit as is. This is what signbit in any other languages do.

It also keep the simple invariant of signbit(x) === !signbit(-x).

@ljharb
Copy link
Member

ljharb commented Apr 20, 2021

I don't think that invariant is possible; -(-NaN) is not guaranteed to have the same bit pattern as the original NaN. Engines are already allowed to canonicalize NaN in Typed Arrays - many just don't choose to.

There are no guarantees once you have a NaN. even storing it in a variable can change the bit pattern.

@ghost
Copy link

ghost commented Apr 20, 2021

The first step of the unary negation algorithm canonicalizes the NaN, therefore this is merely a double canonicalization, thus the NaN should be the exact same NaN and consequently have the same bit-pattern, so I don't follow?

If the above is correct, then current engines aren't implementing it at all.

@ljharb
Copy link
Member

ljharb commented Apr 20, 2021

Feel free to experiment with it in various engines - when writing https://npmjs.com/get-nans, i found a lot of unpredictable and unintuitive behavior.

@hax
Copy link
Member

hax commented Apr 20, 2021

-(-NaN) is not guaranteed to have the same bit pattern

As my previous test #1 (comment) , most engines keep the bit pattern.

@dy
Copy link

dy commented Dec 27, 2024

@hax it's not consistent btw.

// use TypedArray to expose the sign bit
// note this also use the coercion ToNumber semantic
Math.signbit = (() => {
const LE = new Uint8Array(new Uint16Array([1]).buffer)[0]
return function signbit(n) {
const f64 = new Float64Array([n])
const i32 = new Uint32Array(f64.buffer)
return (i32[LE] >>> 31) === 1
}
})()

If you call signbit long enough, it will give different results. Try this in console:

let view = new DataView(new Float32Array(1).buffer)
for (let i = 0; i < 1e5; i++)
{
    view.setFloat32(0,-NaN)
    if ([ view.getUint8(3), view.getUint8(2), view.getUint8(1), view.getUint8(0) ]+'' !== [0, 0, 192, 127]+'') 
    console.log('failed', i)
}

After ~3k calls it gives different result.

@ljharb
Copy link
Member

ljharb commented Dec 27, 2024

@dy as i explained on nodejs/node#56373 (comment)

This is expected, and is the nature of the language. Although the majority of the JS language only has one observable NaN, via Typed Arrays one can view bit patterns of many of the millions of NaN values in IEEE 754.

It would be perfectly reasonable in this non-TA API to canonicalize all NaN values, and treat them all the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants