Editorial: Make numeric values and operations default to mathematical values #2007

caiolima · 2020-05-22T17:36:53Z

As reported on #1964, the fact that we are using Number as default representation of numeric literals and operations creates uncounted bugs throughout the spec. This PR is proposing to rollback the default value to mathematical value. Since the major intention of PR #1135 was to avoid implicit conversions between numeric versions, we are keeping this change but it necessary to properly cast those numeric types when they are manipulated, since mixing different numeric types is not allowed.

The goals we want to achieve here are:

Resolve bugs on current version of spec like Definition of modulo is broken #1960 or Typo in 6.2.4 #1229.
Mathematical values should never be leaked and are internal concept of ECMA262 specification. If this happens, it should be considered as editorial error.
Avoid the maximum of explicit conversions between Number, BigInt and Mathematical values.

Cc. @littledan @waldemarhorwat @ljharb @syg @michaelficarra @bakkot

PS. It's still a WIP PR, but I think we have enough changes to discuss some changes made before I keep moving forward.

spec.html

caiolima

Left some early comments on places I need to change already.

spec.html

caiolima · 2020-05-22T18:12:02Z

spec.html

        1. Let _n_ be 0.
        1. For each element _e_ of _elements_, do
-          1. Perform ! CreateDataPropertyOrThrow(_array_, ! ToString(_n_), _e_).
+          1. Perform ! CreateDataPropertyOrThrow(_array_, ! ToString(𝔽(_n_)), _e_).


Maybe it would be better to use:

1. Let _nNumber_ be the Number value for _n)_. 1. Perform ! CreateDataPropertyOrThrow(_array_, ! ToString(_nNumber_), _e_).

This pattern is already used in other places of this PR, so it would be good to be consistent.

I prefer the more terse form. Is there a benefit to splitting it out?

For this case there is no much benefit, but there are other places where we keep applying 𝔽(_n_) very often. The advantage of splitting them out is that it is easier for someone not involved with ECMA262 understand, while 𝔽(_n_) requires understanding what 𝔽 means. I'm fine with leaving it as it is. What do you think of other places where we are using splitted version?

spec.html

caiolima · 2020-05-22T18:19:10Z

spec.html

            1. Else,
-              1. Let _e_, _n_, and _f_ be integers such that _f_ &ge; 0, 10<sup>_f_</sup> &le; _n_ &lt; 10<sup>_f_ + 1</sup>, the Number value for ℝ(_n_) &times; 10<sub>ℝ</sub><sup>ℝ(_e_) - ℝ(_f_)</sup> is _x_, and _f_ is as small as possible. Note that the decimal representation of _n_ has _f_ + 1<sub>ℝ</sub> digits, _n_ is not divisible by 10, and the least significant digit of _n_ is not necessarily uniquely determined by these criteria.
+              1. Let _e_, _n_ be integers, and _f_ be a Number whose mathematical value is an integer such that ℝ(_f_) &ge; 0, 10<sup>ℝ(_f_)</sup> &le; _n_ &lt; 10<sup>ℝ(_f_) + 1</sup>, the Number value for _n_ &times; 10<sup>_e_ - ℝ(_f_)</sup> is ℝ(_x_), and _f_ is as small as possible. Note that the decimal representation of _n_ has ℝ(_f_) + 1 digits, _n_ is not divisible by 10, and the least significant digit of _n_ is not necessarily uniquely determined by these criteria.


This is a good example of place where we could create a new variable to store MV of x and f. We have multiple calls of ℝ(x) and ℝ(f).

ljharb · 2020-05-22T18:23:42Z

Marking this as a draft; when it's no longer WIP, please mark it as ready for review :-)

spec.html

michaelficarra · 2020-05-22T20:24:23Z

@ljharb I don't know what comment of yours GitHub is saying I deleted, but if I did, it was a mistake.

spec.html

bakkot · 2020-05-26T18:39:46Z

I think we generally write Number values with *, as in *1*, and write mathematical values without it, as in 1. (And BigInts as *1n*.)

If we keep to that convention, I don't know that the subscripts are necessary.

ljharb · 2020-05-26T18:55:41Z

That's true, but explicit notation seems more readable to me than a style-based convention.

waldemarhorwat

Thank you for doing this!

My main observations:

I like the use of ℝ(x) as a coercion operator to mathematical values. I don't think we should use ℝ as a subscript to indicate integers or real numbers; instead, those should have no suffix.
Having generic operations other than the simplest such as abs() across Numbers and real numbers doesn't work because of various Number glitches (±0, ±∞, NaN, rounding, etc.). We should define those only on real numbers and call utility functions if they're needed on Numbers.
There are more places in the algorithms where keeping integral Number counters causes problems if they ever exceed 2⁵³. I commented on one example, but there are more and I haven't done an exhaustive pass to find them all. I consider uses of integral Numbers for spec-internal counters suspicious unless proven otherwise.

spec.html

waldemarhorwat · 2020-05-27T00:24:54Z

spec.html

@@ -4892,9 +4893,9 @@ <h1>ToInt32 ( _argument_ )</h1>
      <emu-alg>
        1. Let _number_ be ? ToNumber(_argument_).
        1. If _number_ is *NaN*, *+0*, *-0*, *+&infin;*, or *-&infin;*, return *+0*.
-        1. Let _int_ be the Number value that is the same sign as _number_ and whose magnitude is floor(abs(_number_)).
+        1. Let _int_ be the mathematical value that is the same sign as _number_ and whose magnitude is floor(abs(_number_)).


It's weird to ask for a mathematical value whose magnitude is a Number. Make where the conversion happens a bit more explicit.

waldemarhorwat · 2020-05-27T00:28:59Z

spec.html

@@ -8940,11 +8941,11 @@ <h1>[[DefineOwnProperty]] ( _P_, _Desc_ )</h1>

      <emu-clause id="sec-arraycreate" aoid="ArrayCreate">
        <h1>ArrayCreate ( _length_ [ , _proto_ ] )</h1>
-        <p>The abstract operation ArrayCreate takes argument _length_ (a non-negative integer) and optional argument _proto_. It is used to specify the creation of new Array exotic objects. It performs the following steps when called:</p>
+        <p>The abstract operation ArrayCreate takes argument _length_ (a non-negative integral Number) and optional argument _proto_. It is used to specify the creation of new Array exotic objects. It performs the following steps when called:</p>


What is a "non-negative integral Number"? With real numbers it's clear: 0 is non-negative. But is the Number -0 negative or not?

We need to define it, as we are doing for integral Number. Would it be a problem if we define that non-negative integral Number is integral Numbers greater than +0, considering that +0 is greater than -0?

@caiolima +0 is not greater than -0. I would imagine the term "non-negative" when referring to integral Numbers simply refers to whether the sign bit is set. So this would exclude -0. If we want to include -0, I would just add "or -0". But we could use mathematical values here instead, right?

waldemarhorwat · 2020-05-27T00:38:44Z

spec.html

@@ -28583,7 +28588,7 @@ <h1>TimeClip ( _time_ )</h1>
        <p>The abstract operation TimeClip takes argument _time_ (a Number). It calculates a number of milliseconds. It performs the following steps when called:</p>
        <emu-alg>
          1. If _time_ is not finite, return *NaN*.
-          1. If abs(_time_) &gt; 8.64 &times; 10<sup>15</sup>, return *NaN*.
+          1. If abs(_time_) &gt; 8.64<sub>𝔽</sub> &times; 10<sub>𝔽</sub><sup>15<sub>𝔽</sub></sup>, return *NaN*.


Exponentiation is not defined on Numbers. And it's weird to invoke it here using a Number exponent.

What does Math.pow/** do on Numbers?

waldemarhorwat · 2020-05-27T00:42:04Z

spec.html

          1. Find a value _t_ such that YearFromTime(_t_) is _ym_ and MonthFromTime(_t_) is _mn_ and DateFromTime(_t_) is 1; but if this is not possible (because some argument is out of range), return *NaN*.
-          1. Return Day(_t_) + _dt_ - 1.
+          1. Return Day(_t_) + _dt_ - 1<sub>𝔽</sub>.


What's the order of operations here? It matters because Number arithmetic is not associative.

waldemarhorwat · 2020-05-27T00:43:35Z

spec.html

@@ -30529,8 +30536,8 @@ <h1>String.prototype.substring ( _start_, _end_ )</h1>
          1. Let _len_ be the length of _S_.
          1. Let _intStart_ be ? ToInteger(_start_).
          1. If _end_ is *undefined*, let _intEnd_ be _len_; else let _intEnd_ be ? ToInteger(_end_).
-          1. Let _finalStart_ be min(max(_intStart_, 0), _len_).
-          1. Let _finalEnd_ be min(max(_intEnd_, 0), _len_).
+          1. Let _finalStart_ be min(max(ℝ(_intStart_), 0), _len_).


I like this notation for "mathematical value".

caiolima · 2020-05-27T12:45:21Z

Thank you very much for the reviews so far. I'm going to keep addressing them in the coming days.

I like the use of ℝ(x) as a coercion operator to mathematical values. I don't think we should use ℝ as a subscript to indicate integers or real numbers; instead, those should have no suffix.

That's the intention, but IIUC from your comment, you would like to don't define subscript ℝ? From the changes I've done so far, I agree that they won't be necessary.

Having generic operations other than the simplest such as abs() across Numbers and real numbers doesn't work because of various Number glitches (±0, ±∞, NaN, rounding, etc.). We should define those only on real numbers and call utility functions if they're needed on Numbers.

I think I see your point here. So far I've been using Number operations like min or max considering semantics of Math.min and Math.max, but it is not quite correct, since we don't have special cases for (±0, ±∞, NaN) on those generic operations AFICT. I'll only consider those operations valid on mathematical values and add a note that if we have Numbers or BigInts as input on those operations, it is an Editorial error.

There are more places in the algorithms where keeping integral Number counters causes problems if they ever exceed 253. I commented on one example, but there are more and I haven't done an exhaustive pass to find them all. I consider uses of integral Numbers for spec-internal counters suspicious unless proven otherwise.

Those are indeed bugs and thank you for bringing this. I changed a bunch of them, but I'm quite sure that there are leftovers like the one you found.

michaelficarra · 2020-05-27T22:51:57Z

Some updates from today's editor call:

Number literals should use the subscript 𝔽 as in *123*𝔽 to distinguish them from mathematical values, written as 123, and BigInt values, written as *123*ℤ.
No subscript 𝔽 for *NaN*, *+Infinity*, or *-Infinity*.
*+0*𝔽 and *-0*𝔽 should always be written with the sign indicated and with the subscript 𝔽. Never use *±0*𝔽.
Never include subscripts on operators.
Do not use X is zero. Use something like X is *+0*𝔽 or *-0*𝔽 for Numbers. Use something like X is 0 for mathematical values.
Never negate the mathematical value literal 0, as in -0 or ±0.

bakkot · 2020-05-27T23:05:05Z

you would like to don't define subscript ℝ? From the changes I've done so far, I agree that they won't be necessary.

Editor group agrees with this. A string of digits without a subscript will always be a mathematical value. (They would also be written normally, not in bold.)

waldemarhorwat · 2020-05-29T21:46:23Z

I like the conclusions from the last two comments.

In some contexts it may be attractive to abbreviate "+Infinity or -Infinity" as "±Infinity" and likewise "+0_𝔽 or -0_𝔽" as "±0_𝔽". I wouldn't mind such ± abbreviations — they can be explicitly defined somewhere in the spec but should be fairly obvious to even casual readers. I'm also OK with always writing them out the long way as well if that's what we chose.

See tc39/ecma262#2007. More work will be needed, but this already fixes the warnings.

The initial value of a 'length' property should be a Number, so use the appropriate notation, after PR tc39#2007.

…9#2248) The initial value of a 'length' property should be a Number, so use the appropriate notation, after PR tc39#2007.

Ref: tc39/ecma262#2007. Fixes #1295.

….substr `"a".substr(0, Infinity)` should return `"a"`, but tc39#2007 incorrectly changed the result to be `""`. If we change `substr`'s algorithm to be more similar to `String.prototype.substring` and `String.prototype.slice`, we can easily fix this issue: 1. `min(intStart, size)` guarantees that `intStart` is now in `[0, size]`. 2. Clamping `intLength` guarantees it's also in `[0, size]`. With these two changes, `intEnd = min(intStart + intLength, size)` is now in range `[intStart, size]`, so we no longer have to check for `intStart ≥ intEnd`, but instead can directly perform the `substring` operation.

….substr (tc39#2844) `"a".substr(0, Infinity)` should return `"a"`, but tc39#2007 incorrectly changed the result to be `""`. If we change `substr`'s algorithm to be more similar to `String.prototype.substring` and `String.prototype.slice`, we can easily fix this issue: 1. `min(intStart, size)` guarantees that `intStart` is now in `[0, size]`. 2. Clamping `intLength` guarantees it's also in `[0, size]`. With these two changes, `intEnd = min(intStart + intLength, size)` is now in range `[intStart, size]`, so we no longer have to check for `intStart ≥ intEnd`, but instead can directly perform the `substring` operation.

caiolima force-pushed the mathematical-values-fix branch from f585605 to a14ccc2 Compare May 22, 2020 17:38

michaelficarra reviewed May 22, 2020

View reviewed changes