Skip to content

Commit

Permalink
Support \u{...} outside of character classes
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed May 17, 2024
1 parent 0793d55 commit 7d1c112
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 3 deletions.
4 changes: 2 additions & 2 deletions demo/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ <h2>Themes</h2>
<div class="info no-margin-collapse">
<p id="info-theme-default">You don't need to add a stylesheet to your page to use the default theme. Just run <code>RegexColorizer.loadStyles()</code>.</p>
<p id="info-theme-nobg" class="hidden">The No BG theme avoids background colors as part of highlighting.</p>
<p id="info-theme-regexpal" class="hidden">In 2007, <a href="https://stevenlevithan.com/regexpal/">RegexPal</a> was the first web-based regex tester with regex syntax highlighting. Regex Colorizer started out by extracting RegexPal's highlighting code into a standalone library. This theme uses all black text, because RegexPal's implementation used highlighted text underneath a <code>textarea</code> with a transparent background. The RegexPal theme doesn't uniquely distinguish: escaped literal tokens, backreferences, and character class boundaries.</p>
<p id="info-theme-regexbuddy" class="hidden">OG <a href="https://www.regexbuddy.com/">RegexBuddy</a>'s highlighting styles. RegexBuddy inspired RegexPal and numerous regex testers that came after it. Currently, the RegexBuddy theme doesn't uniquely distinguish: escaped literal tokens, backreferences, character class boundaries, and range-hyphens.</p>
<p id="info-theme-regexpal" class="hidden">In 2007, <a href="https://stevenlevithan.com/regexpal/">RegexPal</a> was the first web-based regex tester with regex syntax highlighting. Regex Colorizer started out by extracting RegexPal's highlighting code into a standalone library. This theme uses all black text, because RegexPal's implementation used highlighted text underneath a <code>textarea</code> with a transparent background. The RegexPal theme doesn't <em>uniquely</em> distinguish the following: escaped literal tokens, backreferences, and character class boundaries.</p>
<p id="info-theme-regexbuddy" class="hidden">OG <a href="https://www.regexbuddy.com/">RegexBuddy</a>'s highlighting styles. RegexBuddy inspired RegexPal and some other regex testers that came after it. Currently, the RegexBuddy theme doesn't <em>uniquely</em> distinguish the following: escaped literal tokens, backreferences, character class boundaries, and range-hyphens.</p>
</div>

<h2>More examples</h2>
Expand Down
29 changes: 28 additions & 1 deletion regex-colorizer.js
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ const RegexColorizer = (() => {

/**
* Returns the character code for the provided regex token. Supports tokens used within character
* classes only, since that's all it's currently needed for.
* classes only.
* @param {string} token Regex token.
* @returns {number} Character code of the provided token, or NaN.
*/
Expand Down Expand Up @@ -585,6 +585,33 @@ const RegexColorizer = (() => {
lastToken = {
quantifiable: false,
};
// '\u{...}'
} else if (m.startsWith('\\u{')) {
if (flagsObj.unicode) {
const charCode = getTokenCharCode(m);
output += charCode <= 0x10FFFF ?
to.metasequence(m) :
to.error(m, error.INDEX_OVERFLOW);
lastToken = {
quantifiable: true,
};
// Non-Unicode mode and '\u{...}' includes only decimal digits, so treat as an escaped
// literal 'u' followed by a quantifier
} else if (/^\\u{\d+}$/.test(m)) {
// If there's a following `?` it will be handled as the next token which technically
// isn't correct, but everything is still highlighted correctly apart from the gap in
// tokens that might be visible depending on styling
output += to.error('\\u', error.INCOMPLETE_TOKEN) + to.error(m.slice(2), error.UNQUANTIFIABLE);
lastToken = {
quantifiable: false,
};
// Non-Unicode mode and '\u{...}' includes hex digits A-F/a-f
} else {
output += to.error('\\u', error.INCOMPLETE_TOKEN) + m.slice(2);
lastToken = {
quantifiable: true,
};
}
// Unquantifiable metasequence
} else if ('bB'.includes(char1)) {
output += to.metasequence(m);
Expand Down

0 comments on commit 7d1c112

Please sign in to comment.