Skip to content

Commit

Permalink
Editorial: Introduce named character reference state
Browse files Browse the repository at this point in the history
Clarify what should be matched against the named character
reference table.

Fixes whatwg#2504.
  • Loading branch information
inikulin committed May 9, 2017
1 parent 33d00f0 commit 8f7b6c3
Showing 1 changed file with 45 additions and 34 deletions.
79 changes: 45 additions & 34 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -103795,58 +103795,69 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

<dt>Anything else</dt>

<dd>
<dd><span>Reconsume</span> in the <span>named character reference state</span>.</dd>

<p>Consume the maximum number of characters possible, with the consumed characters matching one
of the identifiers in the first column of the <span>named character references</span> table (in
a <span>case-sensitive</span> manner). Append each character to the <var data-x="temporary
buffer">temporary buffer</var> when it's consumed.</p>
</dl>

<p>If no match can be made and the <var data-x="temporary buffer">temporary buffer</var>
consists of a U+0026 AMPERSAND character (&amp;) followed by a sequence of one or more <span
data-x="ASCII alphanumeric">ASCII alphanumerics</span> and a U+003B SEMICOLON character (;),
then this is an <dfn data-x="parse-error-unknown-named-character-reference">unknown-named-character-reference</dfn>
<span>parse error</span>.</p>

<p>If no match can be made, switch to the <span>character reference end state</span>.</p>
<h5><dfn>Named character reference state</dfn></h5>

<p>Consume the maximum number of characters possible, with the consumed characters matching one
of the identifiers in the first column of the <span>named character references</span> table (in
a <span>case-sensitive</span> manner). Append each character to the <var data-x="temporary
buffer">temporary buffer</var> when it's consumed.</p>

<dl class="switch">

<dt>If there is a match</dt>

<dd>
<p>If the character reference was consumed as part of an attribute (<var data-x="return
state">return state</var> is either <span>attribute value (double-quoted) state</span>,
<span>attribute value (single-quoted) state</span> or <span>attribute value (unquoted)
state</span>), and the last character matched is not a U+003B SEMICOLON character (;), and the
<span>next input character</span> is either a U+003D EQUALS SIGN character (=) or an <span>ASCII
alphanumeric</span>, then, for historical reasons, switch to the <span>character reference end
state</span>.
<span>next input character</span> is either a U+003D EQUALS SIGN character (=) or an
<span>ASCII alphanumeric</span>, then, for historical reasons, switch to the <span>character
reference end state</span>.</p>
<!-- "=" added because of https://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->

<p>If the last character matched is not a U+003B SEMICOLON character (;), this is a
<dfn data-x="parse-error-missing-semicolon-after-character-reference">missing-semicolon-after-character-reference</dfn>
<span>parse error</span>.</p>
<p>Otherwise:</p>

<ol>
<li><p>If the last character matched is not a U+003B SEMICOLON character (;), then this is a
<span>parse error</span>.</p></li>

<li><p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string.
Append one or two characters corresponding to the character reference name (as given by the
second column of the <span>named character references</span> table) to the <var
data-x="temporary buffer">temporary buffer</var>.</p></li>
</ol>
</dd>

<p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string. Append one
or two characters corresponding to the character reference name (as given by the second column
of the <span>named character references</span> table) to the <var data-x="temporary
buffer">temporary buffer</var>.</p>
<dt>Otherwise</dt>

<p>Switch to the <span>character reference end state</span>.</p>
<dd>If the <var data-x="temporary buffer">temporary buffer</var> consists of a U+0026 AMPERSAND
character (&amp;) followed by a sequence of one or more <span data-x="ASCII alphanumeric">ASCII
alphanumerics</span> and a U+003B SEMICOLON character (;), then this is a <span>parse
error</span>.</dd>

<div class="example">
</dl>

<p>If the markup contains (not in an attribute) the string <code data-x="">I'm &amp;notit; I
tell you</code>, the character reference is parsed as "not", as in, <code data-x="">I'm &not;it;
I tell you</code> (and this is a parse error). But if the markup was <code data-x="">I'm
&amp;notin; I tell you</code>, the character reference would be parsed as "notin;", resulting
in <code data-x="">I'm &notin; I tell you</code> (and no parse error).</p>
<p>Switch to the <span>character reference end state</span>.</p>

<p>However, if the markup contains the string <code data-x="">I'm &amp;notit; I tell you</code>
in an attribute, no character reference is parsed and string remains intact (and there is no
parse error).</p>
<div class="example">

</div>
<p>If the markup contains (not in an attribute) the string <code data-x="">I'm &amp;notit; I
tell you</code>, the character reference is parsed as "not", as in, <code data-x="">I'm &not;it;
I tell you</code> (and this is a parse error). But if the markup was <code data-x="">I'm
&amp;notin; I tell you</code>, the character reference would be parsed as "notin;", resulting
in <code data-x="">I'm &notin; I tell you</code> (and no parse error).</p>

</dd>
<p>However, if the markup contains the string <code data-x="">I'm &amp;notit; I tell you</code>
in an attribute, no character reference is parsed and string remains intact (and there is no
parse error).</p>

</dl>
</div>


<h5><dfn>Numeric character reference state</dfn></h5>
Expand Down

0 comments on commit 8f7b6c3

Please sign in to comment.