Skip to content

Commit

Permalink
fix(compiler): should not break a text token on a non-valid start tag (
Browse files Browse the repository at this point in the history
…#42605)

Previously the lexer would break out of consuming a text token if it contains
a `<` character. Then if the next characters did not indicate an HTML syntax
item, such as a tag or comment, then it would start a new text token. These
consecutive text tokens are then merged into each other in a post tokenization
step.

In the commit before this, interpolation no longer leaks across text tokens.
The approach given above to handling `<` characters that appear in text is
no longer adequate. This change ensures that the lexer only breaks out of
a text token if the next characters indicate a valid HTML tag, comment,
CDATA etc.

PR Close #42605
  • Loading branch information
petebacondarwin authored and dylhunn committed Jun 22, 2021
1 parent c873440 commit 9de65db
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 2 deletions.
21 changes: 20 additions & 1 deletion packages/compiler/src/ml_parser/lexer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ class _Tokenizer {
}

private _isTextEnd(): boolean {
if (this._cursor.peek() === chars.$LT || this._cursor.peek() === chars.$EOF) {
if (this._isTagStart() || this._cursor.peek() === chars.$EOF) {
return true;
}

Expand All @@ -740,6 +740,25 @@ class _Tokenizer {
return false;
}

/**
* Returns true if the current cursor is pointing to the start of a tag
* (opening/closing/comments/cdata/etc).
*/
private _isTagStart(): boolean {
if (this._cursor.peek() === chars.$LT) {
// We assume that `<` followed by whitespace is not the start of an HTML element.
const tmp = this._cursor.clone();
tmp.advance();
// If the next character is alphabetic, ! nor / then it is a tag start
const code = tmp.peek();
if ((chars.$a <= code && code <= chars.$z) || (chars.$A <= code && code <= chars.$Z) ||
code === chars.$SLASH || code === chars.$BANG) {
return true;
}
}
return false;
}

private _readUntil(char: number): string {
const start = this._cursor.clone();
this._attemptUntilChar(char);
Expand Down
38 changes: 37 additions & 1 deletion packages/compiler/test/ml_parser/lexer_spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ import {ParseLocation, ParseSourceFile, ParseSourceSpan} from '../../src/parse_u
]);
});

it('should parse valid start tag in interpolation', () => {
it('should break out of interpolation in text token on valid start tag', () => {
expect(tokenizeAndHumanizeParts('{{ a <b && c > d }}')).toEqual([
[lex.TokenType.TEXT, '{{ a '],
[lex.TokenType.TAG_OPEN_START, '', 'b'],
Expand All @@ -624,6 +624,42 @@ import {ParseLocation, ParseSourceFile, ParseSourceSpan} from '../../src/parse_u
]);
});

it('should break out of interpolation in text token on valid comment', () => {
expect(tokenizeAndHumanizeParts('{{ a }<!---->}')).toEqual([
[lex.TokenType.TEXT, '{{ a }'],
[lex.TokenType.COMMENT_START],
[lex.TokenType.RAW_TEXT, ''],
[lex.TokenType.COMMENT_END],
[lex.TokenType.TEXT, '}'],
[lex.TokenType.EOF],
]);
});

it('should break out of interpolation in text token on valid CDATA', () => {
expect(tokenizeAndHumanizeParts('{{ a }<![CDATA[]]>}')).toEqual([
[lex.TokenType.TEXT, '{{ a }'],
[lex.TokenType.CDATA_START],
[lex.TokenType.RAW_TEXT, ''],
[lex.TokenType.CDATA_END],
[lex.TokenType.TEXT, '}'],
[lex.TokenType.EOF],
]);
});

it('should ignore invalid start tag in interpolation', () => {
// Note that if the `<=` is considered an "end of text" then the following `{` would
// incorrectly be considered part of an ICU.
expect(tokenizeAndHumanizeParts(`<code>{{'<={'}}</code>`, {tokenizeExpansionForms: true}))
.toEqual([
[lex.TokenType.TAG_OPEN_START, '', 'code'],
[lex.TokenType.TAG_OPEN_END],
[lex.TokenType.TEXT, '{{\'<={\'}}'],
[lex.TokenType.TAG_CLOSE, '', 'code'],
[lex.TokenType.EOF],
]);
});


it('should parse start tags quotes in place of an attribute name as text', () => {
expect(tokenizeAndHumanizeParts('<t ">')).toEqual([
[lex.TokenType.INCOMPLETE_TAG_OPEN, '', 't'],
Expand Down

0 comments on commit 9de65db

Please sign in to comment.