Skip to content

Commit

Permalink
Normative: Make B.1.2 "String Literals" normative.
Browse files Browse the repository at this point in the history
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead <! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

We want to continue to disallow such syntax in strict mode,
but the mechanism to do so must change.
Formerly, the spec would say that in such contexts,
it's forbidden to extend the syntax in this way.
But since (with this PR), this is no longer an extension,
we instead use early error rules to say that in such contexts,
occurrences of the 'new' parts of the syntax are Syntax Errors.

For change 1, making it a Syntax Error is fairly straightforward.

But for change 2, we can't simply say that
LegacyOctalEscapeSequence is a Syntax Error in strict mode,
because strict mode still has to allow the restricted syntax.

Instead, we say that if we're in strict mode code,
an instance of LegacyOctalEscapeSequence is a Syntax Error
*unless* it's an instance of the restricted syntax.
To express the latter condition,
we use the cover grammar machinery.
(It could be done in other ways, but I think this is clearest.)
  • Loading branch information
jmdyck committed Jun 10, 2021
1 parent 060cad7 commit c36d77a
Showing 1 changed file with 106 additions and 89 deletions.
195 changes: 106 additions & 89 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -14365,12 +14365,11 @@ <h2>Syntax</h2>

EscapeSequence ::
CharacterEscapeSequence
`0` [lookahead &lt;! DecimalDigit]
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
</emu-grammar>
<p>A conforming implementation, when processing strict mode code, must not extend the syntax of |EscapeSequence| to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.</p>
<emu-grammar type="definition">

CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
Expand All @@ -14387,6 +14386,21 @@ <h2>Syntax</h2>
`x`
`u`

LegacyOctalEscapeSequence ::
OctalDigit [lookahead &lt;! OctalDigit]
ZeroToThree OctalDigit [lookahead &lt;! OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit

ZeroToThree :: one of
`0` `1` `2` `3`

FourToSeven :: one of
`4` `5` `6` `7`

NonOctalDecimalEscapeSequence :: one of
`8` `9`

HexEscapeSequence ::
`x` HexDigit HexDigit

Expand All @@ -14402,7 +14416,37 @@ <h2>Syntax</h2>
<p>&lt;LF&gt; and &lt;CR&gt; cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n` or `\\u000A`.</p>
</emu-note>

<emu-clause id="sec-static-semantics-sv" oldids="sec-string-literals-static-semantics-stringvalue" type="sdo" aoid="SV">
<h2>Supplemental Syntax</h2>
<p>When processing an instance of the production <emu-grammar>LegacyOctalEscapeSequence :: OctalDigit</emu-grammar> the following production is used to refine the interpretation of |LegacyOctalEscapeSequence|.</p>
<emu-grammar type="definition">
StrictZeroEscapeSequence ::
`0` [lookahead &lt;! DecimalDigit]
</emu-grammar>

<emu-clause id="sec-string-literals-early-errors">
<h1>Static Semantics: Early Errors</h1>
<emu-grammar>
EscapeSequence :: LegacyOctalEscapeSequence
</emu-grammar>
<ul>
<li>It is a Syntax Error if the source code matching this production is strict mode code and |EscapeSequence| is not covering a |StrictZeroEscapeSequence|.</li>
</ul>
<emu-grammar>
EscapeSequence :: NonOctalDecimalEscapeSequence
</emu-grammar>
<ul>
<li>It is a Syntax Error if the source code matching this production is strict mode code.</li>
</ul>
<emu-note>In non-strict code, this syntax is allowed, but deprecated.</emu-note>
<emu-note>
<p>It is possible for string literals to precede a Use Strict Directive that places the enclosing code in <emu-xref href="#sec-strict-mode-code">strict mode</emu-xref>, and implementations must take care to enforce the above rules for such literals. For example, the following source text contains a Syntax Error:</p>
<pre><code class="javascript">
function invalid() { "\7"; "use strict"; }
</code></pre>
</emu-note>
</emu-clause>

<emu-clause id="sec-static-semantics-sv" oldids="sec-string-literals-static-semantics-stringvalue,sec-additional-syntax-string-literals-static-semantics" type="sdo" aoid="SV">
<h1>Static Semantics: SV</h1>
<p>A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of String values contributed by the various parts of the string literal. As part of this process, some Unicode code points within the string literal are interpreted as having a mathematical value (MV), as described below or in <emu-xref href="#sec-literals-numeric-literals"></emu-xref>.</p>
<ul>
Expand Down Expand Up @@ -14442,9 +14486,6 @@ <h1>Static Semantics: SV</h1>
<li>
The SV of <emu-grammar>SingleStringCharacter :: LineContinuation</emu-grammar> is the empty String.
</li>
<li>
The SV of <emu-grammar>EscapeSequence :: `0`</emu-grammar> is the String value consisting of the code unit 0x0000 (NULL).
</li>
<li>
The SV of <emu-grammar>CharacterEscapeSequence :: SingleEscapeCharacter</emu-grammar> is the String value consisting of the code unit whose value is determined by the |SingleEscapeCharacter| according to <emu-xref href="#table-string-single-character-escape-sequences"></emu-xref>.
</li>
Expand Down Expand Up @@ -14599,6 +14640,15 @@ <h1>Static Semantics: SV</h1>
<li>
The SV of <emu-grammar>NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator</emu-grammar> is the result of performing UTF16EncodeCodePoint on the code point value of |SourceCharacter|.
</li>
<li>
The SV of <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> is the String value consisting of the code unit whose value is the MV of |LegacyOctalEscapeSequence|.
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `8`</emu-grammar> is the String value consisting of the code unit 0x0038 (DIGIT EIGHT).
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `9`</emu-grammar> is the String value consisting of the code unit 0x0039 (DIGIT NINE).
</li>
<li>
The SV of <emu-grammar>HexEscapeSequence :: `x` HexDigit HexDigit</emu-grammar> is the String value consisting of the code unit whose value is the MV of |HexEscapeSequence|.
</li>
Expand All @@ -14617,6 +14667,39 @@ <h1>Static Semantics: SV</h1>
<emu-clause id="sec-string-literals-static-semantics-mv">
<h1>Static Semantics: MV</h1>
<ul>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit</emu-grammar> is (8 times the MV of |ZeroToThree|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: FourToSeven OctalDigit</emu-grammar> is (8 times the MV of |FourToSeven|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit</emu-grammar> is (64 (that is, 8<sup>2</sup>) times the MV of |ZeroToThree|) plus (8 times the MV of the first |OctalDigit|) plus the MV of the second |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `1`</emu-grammar> is 1.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `2`</emu-grammar> is 2.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `3`</emu-grammar> is 3.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `4`</emu-grammar> is 4.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `5`</emu-grammar> is 5.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `6`</emu-grammar> is 6.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `7`</emu-grammar> is 7.
</li>
<li>
The MV of <emu-grammar>HexEscapeSequence :: `x` HexDigit HexDigit</emu-grammar> is (16 times the MV of the first |HexDigit|) plus the MV of the second |HexDigit|.
</li>
Expand Down Expand Up @@ -24526,7 +24609,7 @@ <h1>Forbidden Extensions</h1>
When processing strict mode code, an implementation must not relax the early error rules of <emu-xref href="#sec-numeric-literals-early-errors"></emu-xref>.
</li>
<li>
|TemplateCharacter| must not be extended to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as defined in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.
|TemplateCharacter| must not be extended to include |LegacyOctalEscapeSequence| or |NonOctalDecimalEscapeSequence| as defined in <emu-xref href="#sec-literals-string-literals"></emu-xref>.
</li>
<li>
When processing strict mode code, the extensions defined in <emu-xref href="#sec-labelled-function-declarations"></emu-xref>, <emu-xref href="#sec-block-level-function-declarations-web-legacy-compatibility-semantics"></emu-xref>, <emu-xref href="#sec-functiondeclarations-in-ifstatement-statement-clauses"></emu-xref>, and <emu-xref href="#sec-initializers-in-forin-statement-heads"></emu-xref> must not be supported.
Expand Down Expand Up @@ -41432,9 +41515,16 @@ <h1>Lexical Grammar</h1>
<emu-prodref name="SingleEscapeCharacter"></emu-prodref>
<emu-prodref name="NonEscapeCharacter"></emu-prodref>
<emu-prodref name="EscapeCharacter"></emu-prodref>
<emu-prodref name="LegacyOctalEscapeSequence"></emu-prodref>
<emu-prodref name="ZeroToThree"></emu-prodref>
<emu-prodref name="FourToSeven"></emu-prodref>
<emu-prodref name="NonOctalDecimalEscapeSequence"></emu-prodref>
<emu-prodref name="HexEscapeSequence"></emu-prodref>
<emu-prodref name="UnicodeEscapeSequence"></emu-prodref>
<emu-prodref name="Hex4Digits"></emu-prodref>
<p>When processing an instance of the production <emu-prodref name="LegacyOctalEscapeSequence"></emu-prodref> the following production is used to refine the interpretation of |LegacyOctalEscapeSequence|.</p>
<emu-prodref name="StrictZeroEscapeSequence"></emu-prodref>
<p>&nbsp;</p>
<emu-prodref name="RegularExpressionLiteral"></emu-prodref>
<emu-prodref name="RegularExpressionBody"></emu-prodref>
<emu-prodref name="RegularExpressionChars"></emu-prodref>
Expand Down Expand Up @@ -41781,86 +41871,13 @@ <h1>Additional Syntax</h1>

<emu-annex id="sec-additional-syntax-string-literals">
<h1>String Literals</h1>
<p>The syntax and semantics of <emu-xref href="#sec-literals-string-literals"></emu-xref> is extended as follows except that this extension is not allowed for strict mode code:</p>
<h2>Syntax</h2>
<emu-grammar type="definition">
EscapeSequence ::
CharacterEscapeSequence
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence

LegacyOctalEscapeSequence ::
OctalDigit [lookahead &lt;! OctalDigit]
ZeroToThree OctalDigit [lookahead &lt;! OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit

ZeroToThree :: one of
`0` `1` `2` `3`

FourToSeven :: one of
`4` `5` `6` `7`
<p>The following syntax from <emu-xref href="#sec-literals-string-literals"></emu-xref>, and its associated semantics, used to be normative optional:</p>
<emu-grammar>
EscapeSequence :: LegacyOctalEscapeSequence

NonOctalDecimalEscapeSequence :: one of
`8` `9`
EscapeSequence :: NonOctalDecimalEscapeSequence
</emu-grammar>
<p>This definition of |EscapeSequence| is not used in strict mode.</p>
<emu-note>
<p>It is possible for string literals to precede a Use Strict Directive that places the enclosing code in <emu-xref href="#sec-strict-mode-code">strict mode</emu-xref>, and implementations must take care to not use this extended definition of |EscapeSequence| with such literals. For example, attempting to parse the following source text must fail:</p>
<pre><code class="javascript">
function invalid() { "\7"; "use strict"; }
</code></pre>
</emu-note>

<emu-annex id="sec-additional-syntax-string-literals-static-semantics">
<h1>Static Semantics</h1>
<ul>
<li>
The SV of <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> is the String value consisting of the code unit whose value is the MV of |LegacyOctalEscapeSequence|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit</emu-grammar> is (8 times the MV of |ZeroToThree|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: FourToSeven OctalDigit</emu-grammar> is (8 times the MV of |FourToSeven|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit</emu-grammar> is (64 (that is, 8<sup>2</sup>) times the MV of |ZeroToThree|) plus (8 times the MV of the first |OctalDigit|) plus the MV of the second |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `8`</emu-grammar> is the String value consisting of the code unit 0x0038 (DIGIT EIGHT).
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `9`</emu-grammar> is the String value consisting of the code unit 0x0039 (DIGIT NINE).
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `1`</emu-grammar> is 1.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `2`</emu-grammar> is 2.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `3`</emu-grammar> is 3.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `4`</emu-grammar> is 4.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `5`</emu-grammar> is 5.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `6`</emu-grammar> is 6.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `7`</emu-grammar> is 7.
</li>
</ul>
</emu-annex>
<p>and the productions for |LegacyOctalEscapeSequence|, |ZeroToThree|, and |FourToSeven|.</p>
</emu-annex>

<emu-annex id="sec-html-like-comments">
Expand Down Expand Up @@ -42067,7 +42084,7 @@ <h1>Static Semantics: CharacterValue</h1>
</emu-alg>
<emu-grammar>CharacterEscape :: LegacyOctalEscapeSequence</emu-grammar>
<emu-alg>
1. Return the MV of |LegacyOctalEscapeSequence| (see <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>).
1. Return the MV of |LegacyOctalEscapeSequence| (see <emu-xref href="#sec-string-literals-static-semantics-mv"></emu-xref>).
</emu-alg>
</emu-annex>

Expand Down Expand Up @@ -43054,7 +43071,7 @@ <h1>The Strict Mode of ECMAScript</h1>
A conforming implementation, when processing strict mode code, must disallow instances of the productions <emu-grammar>NumericLiteral :: LegacyOctalIntegerLiteral</emu-grammar> and <emu-grammar>DecimalIntegerLiteral :: NonOctalDecimalIntegerLiteral</emu-grammar>.
</li>
<li>
A conforming implementation, when processing strict mode code, may not extend the syntax of |EscapeSequence| to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.
A conforming implementation, when processing strict mode code, must disallow instances of the production <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> that do not cover a |StrictZeroEscapeSequence|, and instances of the production <emu-grammar>EscapeSequence :: NonOctalDecimalEscapeSequence</emu-grammar>.
</li>
<li>
Assignment to an undeclared identifier or otherwise unresolvable reference does not create a property in the global object. When a simple assignment occurs within strict mode code, its |LeftHandSideExpression| must not evaluate to an unresolvable Reference. If it does a *ReferenceError* exception is thrown (<emu-xref href="#sec-putvalue"></emu-xref>). The |LeftHandSideExpression| also may not be a reference to a data property with the attribute value { [[Writable]]: *false* }, to an accessor property with the attribute value { [[Set]]: *undefined* }, nor to a non-existent property of an object whose [[Extensible]] internal slot has the value *false*. In these cases a `TypeError` exception is thrown (<emu-xref href="#sec-assignment-operators"></emu-xref>).
Expand Down

0 comments on commit c36d77a

Please sign in to comment.