-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: Make B.1 "Additional Syntax" normative #1651
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make available a rendered form? For my purposes, I'd rather review by reading that.
Thanks.
spec.html
Outdated
LegacyOctalLikeDecimalIntegerLiteral :: | ||
`0` OctalDigit | ||
LegacyOctalLikeDecimalIntegerLiteral OctalDigit | ||
<p>The following syntax from <emu-xref href="#sec-literals-numeric-literals"></emu-xref>, and its associated semantics, used to be normative optional:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter what it used to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not my call. Like I said:
I left a placeholder summary in B.1.1 to keep section numbers in sync, and in case a tombstone is wanted.
See this comment and following.
Did the July meeting make a decision on what (if anything) is wanted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not discussed, to my knowledge.
Is it possible to opt into the auto-rendering that happens for tc39/ecma262? |
@jmdyck not at this time; we do plan on making all PRs have auto-render behavior at some point, but for now you'll have to do it manually :-/ |
0768ad3
to
e79b676
Compare
[Edit:] Now here |
39bea71
to
f7db111
Compare
I'm extremely uncomfortable with pulling B.1 into the main body of the spec. This was decided at a time when I wasn't present, and it wasn't clearly on the agenda prior to the meeting, so I had no notice of it. There are significant problems with integrating B.1 into the spec. This breaks the separation between things we wish weren't in the spec from ones we do. Worse, some of the things in B.1 do not parse unambiguously — the order in which the grammar rules are written matters — and it would do terrible violence to the main grammar's readability to make its rules order-dependent. |
@waldemarhorwat, I can tell you from personal experience that B.1 not being in the main body of the spec causes implementors (of tooling, at the very least) to waste significant amounts of time implementing a grammar which is not in practice used by anyone anywhere, which they then have to go back and replace once they discover that they've implemented the wrong thing. (Specifically, a coworker of mine did exactly this earlier this year.) I don't see the value in keeping the non-annex-B grammar. It's true that it is simpler, but it is also irrelevant. |
The spec can still make that distinction. (E.g., @littledan says "In my opinion, it would be fine to include a note that we are somehow sad about these parts of the grammar, ...") I haven't included such notes in my commits so far because I'm hoping for some consensus on wording from the committee or the editors. Maybe I'll go back and put in stubs. [Later: Done.] You're saying there's a benefit to having all that deprecated stuff in one place, separate from the "core" of the language. Maybe there is. But does it outweigh the costs? If you want to convince the committee of that, issue #1595 might be the more appropriate spot. (This PR is just the first among presumably many.)
Yeah, that's true. And it's icky from a theoretical standpoint. But given that that's what implementers have to implement, what's the benefit in hiding the ick in an annex?
So we mostly won't do that. There's only 6 productions (all from B.1.4 RegExps) that actually rely on order-disambiguation, so those will be the only ones that call for it. (I'll be suggesting |
Unlike the main body grammar, the Annex B grammar is not a valid LR(1) grammar; in fact, it's not a valid LR(n) grammar for any n. This means that it cannot be validated. The bigger problem is that integrating the Annex B grammar into the main grammar is infectious and makes it impossible to validate the rest of the main grammar. Claiming that only a few productions introduce order dependence that cause the grammar to be invalid is true but irrelevant — validity is a property of the entire grammar, and the entire grammar becomes invalid. With the loss of validity we'd lose crucial invariants for the entire grammar, such as:
This invariant is so obvious that most people take it for granted. If we proceed with this integration, we'll lose it for the main grammar. Losing this invariant, in addition to impacting readability, will make the grammar unmaintainable. At that point there is no way to contain the damage to the main grammar and prevent it from spreading. When adding new syntax extensions, it's a bad idea for a number of practical reasons to introduce additional order dependencies, but we would have no way of knowing that if we can't validate the grammar. I agree with the comments above that not realizing that some grammar productions are modified by Annex B can be an issue, but a much less dangerous solution to that is to just put notes next to the affected main grammar productions. On the other hand, there will likely be future instances in which the Annex B grammar does get updated but is no longer a superset of the main grammar. That's easy to do unintentionally and results in actual bugs, and making this comparison of the two grammars is a crucial check of figuring out that the grammar someone is proposing for some new extension isn't doing what they think it's doing. |
I added a topic to the next meeting to discuss this. For now I'd appreciate it if the Annex B grammars were not merged into the main grammar. |
I believe this is only true of the modified Pattern grammar, which is a distinct grammar and not a part of the main grammar. Any invariants which are true of the main grammar would remain true, and it should remain as easy to validate them as it currently is. (The Pattern grammar is already significantly complicated by the throw-away-and-reparse behavior required to parse named capturing groups, so it already has fewer invariants than the main grammar.)
I think the reverse problem, where the grammars get out of sync because they are updated incorrectly, is more common and more severe: see #1102, etc. If there is only a single grammar to maintain, all that needs to be checked is that each update to the grammar does not render any previously valid source text invalid - a check which already needs to occur for any grammar change. I don't see the advantage of maintaining two copies for specifically this subset of the grammar. |
I disagree with @waldemarhorwat 's conclusion, i.e., I continue to support moving all the grammar into the main text. However, I agree with @waldemarhorwat 's procedural request. The only content I made clear on the agenda prior to the Berlin meeting was "Annex B Reform". I was aware that Waldemar would likely argue and had prepared to counter-argue. However, I was not prepared for Waldemar's absence. Given that absence, it did not occur to me at the time to explicitly postpone declaration of consensus until checking with Waldemar. I would expect such a courtesy were others to discuss, without warning, an issue known to be central to my concerns. I should have extended this same courtesy to Waldemar, and wish to do so now, retroactively. Let us consider this issue reopened until the next meeting. |
I'd dispute that the main-body grammar is LR(1), because it isn't technically a CFG to begin with. But I'll grant that it has one less non-CFG-ism than the Annex B grammar.
As @bakkot points out, order-disambiguation is confined to B.1.4's extensions to the
ES lost that invariant ages ago. (Probably never had it.) For example:
As @bakkot said, the separation of main-body from Annex B is already the source of maintenance bugs.
The ambiguity of IfStatements doesn't prevent us from detecting a new ambiguity. It seems to me that similarly, existing order dependencies don't prevent us from detecting a new one.
But apparently, that crucial check isn't happening. Ultimately, I think my point is this: What difference does it make if we can validate the main-body If we can't validate the "real" grammar, then keeping Annex B separate isn't going to change that. |
And due to that ASI continues to be an ongoing source of compatibility problems.
You can; the semantics of the cover grammar tell you what to reparse it as.
The grammar could have been worded without the dangling else problem, but for historical reasons we decided not to; it's a straightforward and well-understood transformation and the version of the grammar I've been validating does include that transformation. Note that in more complex situations such as nesting function calls and new, we did write the grammar to avoid the "dangling ()" problem.
We can validate the grammar in the main body of the spec. It's not possible to validate the Annex B grammar because invalid behavior is correct by definition. If we integrate it into the main body of the spec, then we won't be able to validate future changes to the main body of the spec. |
Indeed. I'm not saying ASI is good, I'm saying it refutes the invariant you claimed.
That seems a bit of a stretch, but I'll take it. So you're validating the covered grammars in addition to the covering grammars?
Okay, so you took the spec's grammar, transformed away an ambiguity, and validated the result. Are you sure you can't apply a similar process to the order dependencies of B.1.4?
Are you saying that the Annex B grammar doesn't distinguish valid behavior from invalid? That seems patently false, so I have no idea what you mean.
But if the main-body Also, I'm still curious about these questions:
|
ASI is a separable concern. Just because the invariant fails for insertion of implicit semicolons doesn't mean that it doesn't apply to the grammar.
I'm talking about validating a grammar to ensure that it has no ambiguities.
I already answered that. See my responses above. |
Well, I've re-read all your responses, but I didn't discern an answer to that particular question. Oh well. |
This is pretty much the sequence of commits I intend for the final PR. If reviewers would like a different breakdown, let me know.
I don't advise looking at the total diff; I've gone back and added syntax deprecation notes. For the Pattern grammar, the note is incomplete, because the difference between the two versions of the grammar is more that just a set of productions. I could flesh it out, but I don't want to bother until it's been decided that we're going to have such notes.
|
Great to see this PR. Thanks for your diligent work on it, @jmdyck . I agree with @waldemarhorwat that the ECMAScript specification is for more than just browsers--it's also for tools, like the ones maintained by @erights and @bakkot , which suffer from the obscurity of the Annex B split. It's unfortunate that the ecosystem-compatible ECMAScript grammar is in the state as it is (I'm sure everyone would prefer to implement a simpler grammar), but pushing it off into Annex B only adds confusion. I look forward to discussing this at the upcoming TC39 meeting. Probably close reviews of this PR (and detailed editorial decisions, like whether to mark formerly Annex B things as "sad") should wait until that discussion concludes. |
And I look forward to reading the notes!
Yup, that's what I figured after the comment above from @erights. |
force-pushed to:
|
spec.html
Outdated
<ul> | ||
<li>It is a Syntax Error if a |Module| contains the source code matching this production.</li> | ||
</ul> | ||
<emu-note>In a |Script|, this syntax is allowed, but deprecated.</emu-note> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to clarify in https://tc39.es/ecma262/#sec-syntactic-and-lexical-grammars that "deprecated/legacy" means "optional for implementation".
However, I am in line with @jmdyck about putting deprecated behavior under something like if HostImplementsOptional()
instead, but without "units" for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to clarify in https://tc39.es/ecma262/#sec-syntactic-and-lexical-grammars that "deprecated/legacy" means "optional for implementation".
It doesn't mean that. See here for a pending definition of "legacy".
However, I am in line with @jmdyck about putting deprecated behavior under something like
if HostImplementsOptional()
instead, but without "units" for now.
That suggestion is for when you take stuff from Annex B and incorporate it into the main spec, keeping it normative-optional. This PR (and #1867) take stuff from Annex B and incorporate it into the main spec, but make it normative.
3d0c24c
to
7a79833
Compare
bb4f7aa
to
cdbf14f
Compare
(Part of Annex B reform, see PR tc39#1595.)
... namely Syntax for Patterns and Static Semantics for Patterns And for consistency, rename Pattern Semantics to Runtime Semantics for Patterns ... so that clause-headers more precisely convey what content is where.
... in previous commit.
... namely: - Patterns - Group Specifiers - Character Classes - Escapes (This moves productions around, but doesn't alter them at all.) Also, rearrange Early Errors rules and runtime semantics rules to reflect the same order of productions.
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.) (Part of Annex B reform, see PR tc39#1595.)
... by using 'QuantifiableAssertion' under [+U] too. (This involves adding the [U] parameter to QuantifiableAssertion.)
... down to its proper place in production order. (This commit's diff probably shows a complicated combination of tweaks, but it's really just taking a block of 22 lines and shifting it down the file.)
Note that: ExtendedAtom was only ever 'invoked' under [~U], and when the merged production is invoked with [~U], it exactly reproduces the RHSs of ExtendedAtom. Atom was only ever invoked under [+U], and when the merged production is invoked with [+U], it reproduces the RHSs of former Atom except for the placement of the PatternCharacter RHS, but that's okay, because former Atom wasn't an order-disambiguated production.
... by merging three pairs of RHSs. (Preserves the order of alternatives under [~U], but not under [+U], but that's okay, because the [+U] sides aren't order-disambiguated.)
... by merging the two capturing-group alternatives. (This may be affected by the outcome of issue tc39#1673.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
I'm closing this PR, because: |
This PR was originally created to make all of B.1 "Additional Syntax" normative (incorporating it into the body of the spec, as part of the reform described in PR #1595).
But then I split off PR #1867 to handle:
So now this PR only handles:
Notes: