Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode HTML entities created by block parser matcher reliance on innerHTML #25120

Closed
wants to merge 2 commits into from

Conversation

getdave
Copy link
Contributor

@getdave getdave commented Sep 7, 2020

This PR fixes an issue with Block parsing as evidenced by #24282 whereby < characters are converted into the HTML entity equivalent &lt; thus causing a block validation error in the HTML block.

In addition it fixes the issue whereby & and > characters in the HTML block are converted into their entity forms.

See also #23509.

Description

Essentially on trunk right now, if you have a HTML block which contains the literal < character then it will cause a block validation error when the block is parsed.

Try it now...

  1. New Post.
  2. Add HTML block.
  3. Add content 3 < 4.
  4. Save Draft.
  5. Reload the editor.
  6. See validation error.
Block validation: Block validation failed for `core/html` ({name: "core/html", icon: {…}, keywords: Array(1), attributes: {…}, providesContext: {…}, …}).

Content generated by `save` function:

3 &lt; 4

Content retrieved from post body:

3 < 4
Screen.Capture.on.2021-09-14.at.17-55-00.mov

Why does this happen?

It's complex...but the TL;DR is

  • validation errors occur because the parsed version of the block's save content is different from that in the post body.
    • saved post content - 3 < 4 (correct)
    • parsed block content - 3 &lt; 4 (does not match!)
  • the parsed content is different because it contains the HTML entity version of < which is &lt;.
  • it contains this entity because the html matcher which is used to parse block's identified as containing HTML utilises .innerHTML to read the content from the match.
  • Unfortunately, Element.innerHTML returns the following characters as HTML entities: &, <, or >.
  • Therefore if the HTML block contains any of those characters they will be parsed as their HTML entity equivalents.

So why does the validation error only happen for <?

A good question!

The reason appears to be that

  • the isEquivalentHTML() method used to determine whether blocks are valid utilises getHTMLTokens() which relies on Tokenizer from the npm package `'simple-html-tokenizer'.
  • it appears that it will "see" the < from 3 < 4 as being an opening tag. See Fails on < inside a <pre> tag tildeio/simple-html-tokenizer#20 for more details.
  • as a result the comparison function isEquivalentHTML will return false.
  • as this only happens for the < character the other two characters (& and >) do not cause validation errors.

What bugs do & and > cause?

& and > are still converted to their entity forms there is an inconsistency in the HTML block as the original content & becomes &amp; and > becomes &gt;. This is not what the user wants nor expects.

Screen.Capture.on.2021-09-14.at.17-53-01.mp4

Screen Shot 2021-09-14 at 17 48 55

Solution - how does this PR solve these issues?

We need to ensure that &, < and > are converted into their string form. However, we do not want to convert all HTML entities as this would cause things such as &nbsp; to be unintentionally converted.

Therefore our fix acts in a limited way and only transforms those characters that are specifically returned as HTML entities from Element.innerHTML.

Because the chars are converted at the base "matcher" level of the block parsers all the block validation issues are resolved. Moreover, the content the user entered is preserved (ie: not converted into entity form).

How has this been tested?

On Master

  • Create HTML block
  • Insert content 3 < 4.
  • Save the draft.
  • Reload the browser.
  • See validation error in browser console.
Block validation: Block validation failed for `core/html` ({name: "core/html", icon: {…}, keywords: Array(1), attributes: {…}, providesContext: {…}, …}).

Content generated by `save` function:

3 &lt; 4

Content retrieved from post body:

3 < 4

This PR

  • Checkout this PR
  • Create HTML block
  • Insert content 3 < 4.
  • Save the draft.
  • Reload the browser.
  • See no validation errors.
  • See content displayed in editor without any entities

Types of changes

Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.
  • I've updated all React Native files affected by any refactorings/renamings in this PR.



Original Issue Content

Essentially if you have a HTML block which contains 3 < 4 it will cause validation errors because the parsed version of the block's save content is different from that in the post body.

  • parsed save content 3 &lt; 4
  • post body content
<!-- wp:html -->
3 < 4
<!-- /wp:html -->

The serialized block content is actually 3 < 4. It is only when this content is parsed that it is converted into a string containing HTML entities.

Note I believe this occurs because the html() matcher passed to hpqParse relies on using .innerHTML to retrieve the content of the parsed HTML.

For the match (DOM node) which is passed into html, some light debugging reveals:

  • match.innerHTML => 3 &lt; 4
  • match.innerText => 3 < 4

I understand this difference is due to the way innerHTML works - indeed the MDN reference for innerHTML says:

Note: If a <div>, <span>, or <noembed> node has a child text node that includes the characters (&), (<), or (>), innerHTML returns these characters as the HTML entities "&", "<" and ">" respectively. Use Node.textContent to get a raw copy of these text nodes' contents.

I believe this is why the < gets encoded to &lt;.

However, in the matcher implementation match.innerHTML is returned which means the string with encoded entities is returned.

Wrapping this string in @wordpess/html-entities's decodeEntities() ensures that any entities are converted back to their string form.

Note for @ellatrix: I could be way off on this one. I just narrowed down the cause of the specific issue but that doesn't mean it doesn't have lots of knock-on effects. I'm hopeful that as we're just fixing the innerHTML returned it should be ok, but this will needs lots of testing with content variations.

Detailed Explanation

The block parser works roughly as follows:

{
  "blockName": "core/html",
  "attrs": {},
  "innerBlocks": [],
  "innerHTML": "3 < 4",
  "innerContent": [
    "3 < 4"
  ]
}

Notice how innerHTML and innerContent are correct as 3 < 4 but the block attributes have yet to be parsed.

  • Within createBlockWithFallback the createBlock function is called

let block = createBlock(
name,
getBlockAttributes( blockType, innerHTML, attributes ),
innerBlocks
);

This in turn calls getBlockAttributes which calls getBlockAttribute to map all the block attributes:

const blockAttributes = mapValues(
blockType.attributes,
( attributeSchema, attributeKey ) => {
return getBlockAttribute(
attributeKey,
attributeSchema,
innerHTML,
attributes
);
}
);

getBlockAttribute is passed an attributeSchema argument that has the following shape/content:

{
  "type": "string",
  "source": "html"
}

...which it uses to determine how it should parse the attributes.

export function parseWithAttributeSchema( innerHTML, attributeSchema ) {
return hpqParse( innerHTML, matcherFromSource( attributeSchema ) );
}

This takes a 2nd argument of the matcher function to use in hpqParse. This is determine by a call to matcherFromSource which uses the source type (in our case html) to retrieve the appropriate matcher.

case 'html':
return html( sourceConfig.selector, sourceConfig.multiline );

In our case this is the html matcher.

  • We can see that the html matcher uses innerHTML to retrieve the content for use in the parsed attribute:

return match.innerHTML;

This is where the issue described above occurs. Namely:

If a <div>, <span>, or <noembed> node has a child text node that includes the characters (&), (<), or (>), innerHTML returns these characters as the HTML entities "&", "<" and ">" respectively.

I believe this is why the < gets encoded to &lt;. But this only happens for blocks which have a source type of html. As far as I'm aware that is just the core/html block.

  • Back in the createBlockWithFallback function, createBlock has now returned and we pass the result to the getBlockContentValidationResult function to determine whether the block is considered valid.

let block = createBlock(
name,
getBlockAttributes( blockType, innerHTML, attributes ),
innerBlocks
);
// Block validation assumes an idempotent operation from source block to serialized block
// provided there are no changes in attributes. The validation procedure thus compares the
// provided source value with the serialized output before there are any modifications to
// the block. When both match, the block is marked as valid.
if ( ! isFallbackBlock ) {
const { isValid, validationIssues } = getBlockContentValidationResult(
blockType,
block.attributes,
innerHTML
);
block.isValid = isValid;
block.validationIssues = validationIssues;
}

At this point the block object returned from createBlock looks like this

{
  "clientId": "8a126e6e-9055-490e-8cf3-5127b1fa9194",
  "name": "core/html",
  "isValid": true,
  "attributes": {
    "content": "3 &lt; 4"
  },
  "innerBlocks": []
}

and the innerHTML is "3 < 4".

  • Therefore when the getBlockContentValidationResult function is called the result is that the block is determined to be invalid due to the mismatch between the saved block content and the value of block.attributes.content:

const isValid = isEquivalentHTML(
originalBlockContent,
generatedBlockContent,
logger
);

In the code above:

  • originalBlockContent is 3 < 4
  • generatedBlockContent is 3 &lt; 4

The reason why this fails validation is likely due to the reliance on simple-html-tokenizer. It appears that it will "see" the < from 3 < 4 as being an opening tag. See tildeio/simple-html-tokenizer#20 for more details.

We can avoid this by correctly escaping entities.


The validation of a block checks whether the following two items are "equivalent HTML"

  1. block.originalContent.
  2. generatedBlockContent - returned by calling getSaveContent( blockType, block.attributes );.

This is achieved via the isEquivalentHTML method and if this does not return true then the block is considered invalid.

Description

  • Fix HTML block validation errors due to blocks containing certain characters get incorrectly parsed.
  • Also ensure conversion of HTML block to Reusable does not cause errors.

@getdave getdave self-assigned this Sep 7, 2020
@getdave getdave requested a review from ellatrix September 7, 2020 16:32
@getdave getdave added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Package] Blocks /packages/blocks [Type] Bug An existing feature does not function as intended labels Sep 7, 2020
@getdave getdave marked this pull request as ready for review September 7, 2020 16:34
@getdave getdave requested a review from youknowriad as a code owner September 7, 2020 16:34
@getdave getdave added the Needs Technical Feedback Needs testing from a developer perspective. label Sep 7, 2020
@getdave
Copy link
Contributor Author

getdave commented Sep 7, 2020

Hmmm...looks like we have failing unit tests so this could well be unsuitable.

@github-actions
Copy link

github-actions bot commented Sep 7, 2020

Size Change: +48 B (0%)

Total Size: 1.26 MB

Filename Size Change
build/blocks/index.min.js 47.2 kB +48 B (0%)
ℹ️ View Unchanged
Filename Size
build/a11y/index.min.js 982 B
build/annotations/index.min.js 2.76 kB
build/api-fetch/index.min.js 2.26 kB
build/autop/index.min.js 2.14 kB
build/blob/index.min.js 475 B
build/block-directory/index.min.js 6.58 kB
build/block-directory/style-rtl.css 990 B
build/block-directory/style.css 991 B
build/block-editor/default-editor-styles-rtl.css 378 B
build/block-editor/default-editor-styles.css 378 B
build/block-editor/index.min.js 153 kB
build/block-editor/style-rtl.css 14.6 kB
build/block-editor/style.css 14.6 kB
build/block-library/blocks/archives/editor-rtl.css 61 B
build/block-library/blocks/archives/editor.css 60 B
build/block-library/blocks/archives/style-rtl.css 65 B
build/block-library/blocks/archives/style.css 65 B
build/block-library/blocks/audio/editor-rtl.css 150 B
build/block-library/blocks/audio/editor.css 150 B
build/block-library/blocks/audio/style-rtl.css 103 B
build/block-library/blocks/audio/style.css 103 B
build/block-library/blocks/audio/theme-rtl.css 110 B
build/block-library/blocks/audio/theme.css 110 B
build/block-library/blocks/avatar/editor-rtl.css 116 B
build/block-library/blocks/avatar/editor.css 116 B
build/block-library/blocks/avatar/style-rtl.css 59 B
build/block-library/blocks/avatar/style.css 59 B
build/block-library/blocks/block/editor-rtl.css 161 B
build/block-library/blocks/block/editor.css 161 B
build/block-library/blocks/button/editor-rtl.css 441 B
build/block-library/blocks/button/editor.css 441 B
build/block-library/blocks/button/style-rtl.css 542 B
build/block-library/blocks/button/style.css 542 B
build/block-library/blocks/buttons/editor-rtl.css 292 B
build/block-library/blocks/buttons/editor.css 292 B
build/block-library/blocks/buttons/style-rtl.css 275 B
build/block-library/blocks/buttons/style.css 275 B
build/block-library/blocks/calendar/style-rtl.css 207 B
build/block-library/blocks/calendar/style.css 207 B
build/block-library/blocks/categories/editor-rtl.css 84 B
build/block-library/blocks/categories/editor.css 83 B
build/block-library/blocks/categories/style-rtl.css 79 B
build/block-library/blocks/categories/style.css 79 B
build/block-library/blocks/code/style-rtl.css 103 B
build/block-library/blocks/code/style.css 103 B
build/block-library/blocks/code/theme-rtl.css 124 B
build/block-library/blocks/code/theme.css 124 B
build/block-library/blocks/columns/editor-rtl.css 108 B
build/block-library/blocks/columns/editor.css 108 B
build/block-library/blocks/columns/style-rtl.css 406 B
build/block-library/blocks/columns/style.css 406 B
build/block-library/blocks/comment-author-avatar/editor-rtl.css 125 B
build/block-library/blocks/comment-author-avatar/editor.css 125 B
build/block-library/blocks/comment-content/style-rtl.css 92 B
build/block-library/blocks/comment-content/style.css 92 B
build/block-library/blocks/comment-template/style-rtl.css 187 B
build/block-library/blocks/comment-template/style.css 185 B
build/block-library/blocks/comments-pagination-numbers/editor-rtl.css 123 B
build/block-library/blocks/comments-pagination-numbers/editor.css 121 B
build/block-library/blocks/comments-pagination/editor-rtl.css 222 B
build/block-library/blocks/comments-pagination/editor.css 209 B
build/block-library/blocks/comments-pagination/style-rtl.css 235 B
build/block-library/blocks/comments-pagination/style.css 231 B
build/block-library/blocks/comments-title/editor-rtl.css 75 B
build/block-library/blocks/comments-title/editor.css 75 B
build/block-library/blocks/comments/editor-rtl.css 95 B
build/block-library/blocks/comments/editor.css 95 B
build/block-library/blocks/cover/editor-rtl.css 615 B
build/block-library/blocks/cover/editor.css 616 B
build/block-library/blocks/cover/style-rtl.css 1.55 kB
build/block-library/blocks/cover/style.css 1.55 kB
build/block-library/blocks/embed/editor-rtl.css 293 B
build/block-library/blocks/embed/editor.css 293 B
build/block-library/blocks/embed/style-rtl.css 410 B
build/block-library/blocks/embed/style.css 410 B
build/block-library/blocks/embed/theme-rtl.css 110 B
build/block-library/blocks/embed/theme.css 110 B
build/block-library/blocks/file/editor-rtl.css 300 B
build/block-library/blocks/file/editor.css 300 B
build/block-library/blocks/file/style-rtl.css 253 B
build/block-library/blocks/file/style.css 254 B
build/block-library/blocks/file/view.min.js 346 B
build/block-library/blocks/freeform/editor-rtl.css 2.44 kB
build/block-library/blocks/freeform/editor.css 2.44 kB
build/block-library/blocks/gallery/editor-rtl.css 948 B
build/block-library/blocks/gallery/editor.css 950 B
build/block-library/blocks/gallery/style-rtl.css 1.53 kB
build/block-library/blocks/gallery/style.css 1.53 kB
build/block-library/blocks/gallery/theme-rtl.css 108 B
build/block-library/blocks/gallery/theme.css 108 B
build/block-library/blocks/group/editor-rtl.css 333 B
build/block-library/blocks/group/editor.css 333 B
build/block-library/blocks/group/style-rtl.css 57 B
build/block-library/blocks/group/style.css 57 B
build/block-library/blocks/group/theme-rtl.css 78 B
build/block-library/blocks/group/theme.css 78 B
build/block-library/blocks/heading/style-rtl.css 76 B
build/block-library/blocks/heading/style.css 76 B
build/block-library/blocks/html/editor-rtl.css 327 B
build/block-library/blocks/html/editor.css 329 B
build/block-library/blocks/image/editor-rtl.css 736 B
build/block-library/blocks/image/editor.css 737 B
build/block-library/blocks/image/style-rtl.css 627 B
build/block-library/blocks/image/style.css 630 B
build/block-library/blocks/image/theme-rtl.css 110 B
build/block-library/blocks/image/theme.css 110 B
build/block-library/blocks/latest-comments/style-rtl.css 284 B
build/block-library/blocks/latest-comments/style.css 284 B
build/block-library/blocks/latest-posts/editor-rtl.css 199 B
build/block-library/blocks/latest-posts/editor.css 198 B
build/block-library/blocks/latest-posts/style-rtl.css 463 B
build/block-library/blocks/latest-posts/style.css 462 B
build/block-library/blocks/list/style-rtl.css 88 B
build/block-library/blocks/list/style.css 88 B
build/block-library/blocks/media-text/editor-rtl.css 266 B
build/block-library/blocks/media-text/editor.css 263 B
build/block-library/blocks/media-text/style-rtl.css 493 B
build/block-library/blocks/media-text/style.css 490 B
build/block-library/blocks/more/editor-rtl.css 431 B
build/block-library/blocks/more/editor.css 431 B
build/block-library/blocks/navigation-link/editor-rtl.css 705 B
build/block-library/blocks/navigation-link/editor.css 703 B
build/block-library/blocks/navigation-link/style-rtl.css 115 B
build/block-library/blocks/navigation-link/style.css 115 B
build/block-library/blocks/navigation-submenu/editor-rtl.css 296 B
build/block-library/blocks/navigation-submenu/editor.css 295 B
build/block-library/blocks/navigation-submenu/view.min.js 423 B
build/block-library/blocks/navigation/editor-rtl.css 2.03 kB
build/block-library/blocks/navigation/editor.css 2.04 kB
build/block-library/blocks/navigation/style-rtl.css 1.96 kB
build/block-library/blocks/navigation/style.css 1.95 kB
build/block-library/blocks/navigation/view-modal.min.js 2.78 kB
build/block-library/blocks/navigation/view.min.js 443 B
build/block-library/blocks/nextpage/editor-rtl.css 395 B
build/block-library/blocks/nextpage/editor.css 395 B
build/block-library/blocks/page-list/editor-rtl.css 363 B
build/block-library/blocks/page-list/editor.css 363 B
build/block-library/blocks/page-list/style-rtl.css 175 B
build/block-library/blocks/page-list/style.css 175 B
build/block-library/blocks/paragraph/editor-rtl.css 157 B
build/block-library/blocks/paragraph/editor.css 157 B
build/block-library/blocks/paragraph/style-rtl.css 260 B
build/block-library/blocks/paragraph/style.css 260 B
build/block-library/blocks/post-author/style-rtl.css 175 B
build/block-library/blocks/post-author/style.css 176 B
build/block-library/blocks/post-comments-form/editor-rtl.css 96 B
build/block-library/blocks/post-comments-form/editor.css 96 B
build/block-library/blocks/post-comments-form/style-rtl.css 493 B
build/block-library/blocks/post-comments-form/style.css 493 B
build/block-library/blocks/post-comments/editor-rtl.css 77 B
build/block-library/blocks/post-comments/editor.css 77 B
build/block-library/blocks/post-comments/style-rtl.css 632 B
build/block-library/blocks/post-comments/style.css 630 B
build/block-library/blocks/post-excerpt/editor-rtl.css 73 B
build/block-library/blocks/post-excerpt/editor.css 73 B
build/block-library/blocks/post-excerpt/style-rtl.css 69 B
build/block-library/blocks/post-excerpt/style.css 69 B
build/block-library/blocks/post-featured-image/editor-rtl.css 605 B
build/block-library/blocks/post-featured-image/editor.css 605 B
build/block-library/blocks/post-featured-image/style-rtl.css 153 B
build/block-library/blocks/post-featured-image/style.css 153 B
build/block-library/blocks/post-template/editor-rtl.css 99 B
build/block-library/blocks/post-template/editor.css 98 B
build/block-library/blocks/post-template/style-rtl.css 282 B
build/block-library/blocks/post-template/style.css 282 B
build/block-library/blocks/post-terms/style-rtl.css 73 B
build/block-library/blocks/post-terms/style.css 73 B
build/block-library/blocks/post-title/style-rtl.css 80 B
build/block-library/blocks/post-title/style.css 80 B
build/block-library/blocks/preformatted/style-rtl.css 103 B
build/block-library/blocks/preformatted/style.css 103 B
build/block-library/blocks/pullquote/editor-rtl.css 198 B
build/block-library/blocks/pullquote/editor.css 198 B
build/block-library/blocks/pullquote/style-rtl.css 370 B
build/block-library/blocks/pullquote/style.css 370 B
build/block-library/blocks/pullquote/theme-rtl.css 167 B
build/block-library/blocks/pullquote/theme.css 167 B
build/block-library/blocks/query-pagination-numbers/editor-rtl.css 122 B
build/block-library/blocks/query-pagination-numbers/editor.css 121 B
build/block-library/blocks/query-pagination/editor-rtl.css 221 B
build/block-library/blocks/query-pagination/editor.css 211 B
build/block-library/blocks/query-pagination/style-rtl.css 234 B
build/block-library/blocks/query-pagination/style.css 231 B
build/block-library/blocks/query/editor-rtl.css 365 B
build/block-library/blocks/query/editor.css 364 B
build/block-library/blocks/quote/style-rtl.css 213 B
build/block-library/blocks/quote/style.css 213 B
build/block-library/blocks/quote/theme-rtl.css 223 B
build/block-library/blocks/quote/theme.css 226 B
build/block-library/blocks/read-more/style-rtl.css 132 B
build/block-library/blocks/read-more/style.css 132 B
build/block-library/blocks/rss/editor-rtl.css 202 B
build/block-library/blocks/rss/editor.css 204 B
build/block-library/blocks/rss/style-rtl.css 289 B
build/block-library/blocks/rss/style.css 288 B
build/block-library/blocks/search/editor-rtl.css 165 B
build/block-library/blocks/search/editor.css 165 B
build/block-library/blocks/search/style-rtl.css 385 B
build/block-library/blocks/search/style.css 386 B
build/block-library/blocks/search/theme-rtl.css 114 B
build/block-library/blocks/search/theme.css 114 B
build/block-library/blocks/separator/editor-rtl.css 146 B
build/block-library/blocks/separator/editor.css 146 B
build/block-library/blocks/separator/style-rtl.css 233 B
build/block-library/blocks/separator/style.css 233 B
build/block-library/blocks/separator/theme-rtl.css 194 B
build/block-library/blocks/separator/theme.css 194 B
build/block-library/blocks/shortcode/editor-rtl.css 464 B
build/block-library/blocks/shortcode/editor.css 464 B
build/block-library/blocks/site-logo/editor-rtl.css 708 B
build/block-library/blocks/site-logo/editor.css 708 B
build/block-library/blocks/site-logo/style-rtl.css 192 B
build/block-library/blocks/site-logo/style.css 192 B
build/block-library/blocks/site-tagline/editor-rtl.css 86 B
build/block-library/blocks/site-tagline/editor.css 86 B
build/block-library/blocks/site-title/editor-rtl.css 84 B
build/block-library/blocks/site-title/editor.css 84 B
build/block-library/blocks/social-link/editor-rtl.css 177 B
build/block-library/blocks/social-link/editor.css 177 B
build/block-library/blocks/social-links/editor-rtl.css 674 B
build/block-library/blocks/social-links/editor.css 673 B
build/block-library/blocks/social-links/style-rtl.css 1.39 kB
build/block-library/blocks/social-links/style.css 1.38 kB
build/block-library/blocks/spacer/editor-rtl.css 322 B
build/block-library/blocks/spacer/editor.css 322 B
build/block-library/blocks/spacer/style-rtl.css 48 B
build/block-library/blocks/spacer/style.css 48 B
build/block-library/blocks/table/editor-rtl.css 494 B
build/block-library/blocks/table/editor.css 494 B
build/block-library/blocks/table/style-rtl.css 611 B
build/block-library/blocks/table/style.css 609 B
build/block-library/blocks/table/theme-rtl.css 175 B
build/block-library/blocks/table/theme.css 175 B
build/block-library/blocks/tag-cloud/style-rtl.css 226 B
build/block-library/blocks/tag-cloud/style.css 227 B
build/block-library/blocks/template-part/editor-rtl.css 235 B
build/block-library/blocks/template-part/editor.css 235 B
build/block-library/blocks/template-part/theme-rtl.css 101 B
build/block-library/blocks/template-part/theme.css 101 B
build/block-library/blocks/text-columns/editor-rtl.css 95 B
build/block-library/blocks/text-columns/editor.css 95 B
build/block-library/blocks/text-columns/style-rtl.css 166 B
build/block-library/blocks/text-columns/style.css 166 B
build/block-library/blocks/verse/style-rtl.css 87 B
build/block-library/blocks/verse/style.css 87 B
build/block-library/blocks/video/editor-rtl.css 561 B
build/block-library/blocks/video/editor.css 563 B
build/block-library/blocks/video/style-rtl.css 159 B
build/block-library/blocks/video/style.css 159 B
build/block-library/blocks/video/theme-rtl.css 110 B
build/block-library/blocks/video/theme.css 110 B
build/block-library/common-rtl.css 1.01 kB
build/block-library/common.css 1 kB
build/block-library/editor-rtl.css 10.3 kB
build/block-library/editor.css 10.3 kB
build/block-library/elements-rtl.css 54 B
build/block-library/elements.css 54 B
build/block-library/index.min.js 184 kB
build/block-library/reset-rtl.css 478 B
build/block-library/reset.css 478 B
build/block-library/style-rtl.css 11.7 kB
build/block-library/style.css 11.7 kB
build/block-library/theme-rtl.css 695 B
build/block-library/theme.css 700 B
build/block-serialization-default-parser/index.min.js 1.11 kB
build/block-serialization-spec-parser/index.min.js 2.83 kB
build/components/index.min.js 230 kB
build/components/style-rtl.css 14 kB
build/components/style.css 14 kB
build/compose/index.min.js 11.7 kB
build/core-data/index.min.js 14.7 kB
build/customize-widgets/index.min.js 11.2 kB
build/customize-widgets/style-rtl.css 1.4 kB
build/customize-widgets/style.css 1.4 kB
build/data-controls/index.min.js 653 B
build/data/index.min.js 7.99 kB
build/date/index.min.js 32 kB
build/deprecated/index.min.js 507 B
build/dom-ready/index.min.js 324 B
build/dom/index.min.js 4.69 kB
build/edit-navigation/index.min.js 16 kB
build/edit-navigation/style-rtl.css 4.02 kB
build/edit-navigation/style.css 4.03 kB
build/edit-post/classic-rtl.css 546 B
build/edit-post/classic.css 547 B
build/edit-post/index.min.js 30.5 kB
build/edit-post/style-rtl.css 6.97 kB
build/edit-post/style.css 6.97 kB
build/edit-site/index.min.js 54.2 kB
build/edit-site/style-rtl.css 8.24 kB
build/edit-site/style.css 8.23 kB
build/edit-widgets/index.min.js 16.5 kB
build/edit-widgets/style-rtl.css 4.35 kB
build/edit-widgets/style.css 4.35 kB
build/editor/index.min.js 41.3 kB
build/editor/style-rtl.css 3.66 kB
build/editor/style.css 3.65 kB
build/element/index.min.js 4.27 kB
build/escape-html/index.min.js 537 B
build/format-library/index.min.js 6.75 kB
build/format-library/style-rtl.css 571 B
build/format-library/style.css 571 B
build/hooks/index.min.js 1.64 kB
build/html-entities/index.min.js 448 B
build/i18n/index.min.js 3.77 kB
build/is-shallow-equal/index.min.js 527 B
build/keyboard-shortcuts/index.min.js 1.78 kB
build/keycodes/index.min.js 1.38 kB
build/list-reusable-blocks/index.min.js 1.74 kB
build/list-reusable-blocks/style-rtl.css 835 B
build/list-reusable-blocks/style.css 835 B
build/media-utils/index.min.js 2.93 kB
build/notices/index.min.js 953 B
build/nux/index.min.js 2.05 kB
build/nux/style-rtl.css 732 B
build/nux/style.css 728 B
build/plugins/index.min.js 1.94 kB
build/preferences-persistence/index.min.js 2.22 kB
build/preferences/index.min.js 1.3 kB
build/primitives/index.min.js 933 B
build/priority-queue/index.min.js 612 B
build/react-i18n/index.min.js 696 B
build/react-refresh-entry/index.min.js 8.44 kB
build/react-refresh-runtime/index.min.js 7.31 kB
build/redux-routine/index.min.js 2.68 kB
build/reusable-blocks/index.min.js 2.22 kB
build/reusable-blocks/style-rtl.css 256 B
build/reusable-blocks/style.css 256 B
build/rich-text/index.min.js 11.1 kB
build/server-side-render/index.min.js 1.61 kB
build/shortcode/index.min.js 1.53 kB
build/token-list/index.min.js 644 B
build/url/index.min.js 3.61 kB
build/vendors/react-dom.min.js 38.5 kB
build/vendors/react.min.js 4.34 kB
build/viewport/index.min.js 1.08 kB
build/warning/index.min.js 268 B
build/widgets/index.min.js 7.19 kB
build/widgets/style-rtl.css 1.16 kB
build/widgets/style.css 1.16 kB
build/wordcount/index.min.js 1.06 kB

compressed-size-action

@getdave
Copy link
Contributor Author

getdave commented Sep 9, 2020

These unit tests are failing

 npm run test-unit test/integration/blocks-raw-handling.test.js

This is because the &nbsp; entity is now being converted into an actual space. I believe this would probably be correct functionality but I'd like a confidence check.

@ellatrix
Copy link
Member

ellatrix commented Sep 9, 2020

Is the problem specific to reusable blocks? Are there problems anywhere else (with steps to reproduce)?

@getdave
Copy link
Contributor Author

getdave commented Oct 2, 2020

@ellatrix Sorry for the delay here.

Is the problem specific to reusable blocks?

No, on master you can simply:

  1. Add a HTML block with content 3 < 4.
  2. Save post.
  3. Reload browser.
  4. See error in console.

If you add as a reusable block the same thing happens but it's worse because the Reusable Blocks are parsed in the background and then there are errors in the console because you have a Reusable block saved.

Are there problems anywhere else (with steps to reproduce)?

I'm yet to witness any. We do have integration tests failing though so let me look into those and come back to you.

@getdave
Copy link
Contributor Author

getdave commented Oct 2, 2020

Testing

Test Expectations

The serialized form of the block's content (ie: that which is persisted) should be:

<!-- wp:paragraph -->
<p>3 &lt; 4</p>
<!-- /wp:paragraph -->

The parsed form of the block (ie: that displayed in the editor) should be:

3 < 4

Testing results for other blocks

  • Paragraph block - no errors.
  • Classic block - no errors.
  • HTML block - errors (as described above).

@getdave getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from 0cdd361 to bbc2590 Compare October 2, 2020 13:18
@ellatrix ellatrix mentioned this pull request Nov 25, 2020
6 tasks
Base automatically changed from master to trunk March 1, 2021 15:44
@getdave getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from bbc2590 to 12f0370 Compare September 14, 2021 14:25
@getdave
Copy link
Contributor Author

getdave commented Sep 14, 2021

Update: i've alter the PR to only convert the specific HTML entities that are not converted by Element.innerHTML namely:

  • "&",
  • "<"
  • ">"

There's a simple lookup which converts them to their correct form.

This preserves existing functionality (all unit tests should now pass - we'll see!) and also fixes the test cases shown above.

The reason for this is because now all of the entities have been converted the parsed block content matches the saved block content and thus the block validation passes.

The best thing is that it also fixes another bug with & being converted into &amp; in HTML block preview mode.

I'll write this up, add test cases and unit tests and then ask for code review.

@getdave
Copy link
Contributor Author

getdave commented Sep 15, 2021

I've just noticed this PR introduces a regression with the core/code block. Annoyingly it has the exact inverse behaviour of the core/html block in that it escapes the code input before it's saved to the database.

value={ escape( attributes.content ) }

My understanding is that it's best to sanitise data on output (rendering) not on input (saving to DB). I'll dig deeper shortly...

@getdave
Copy link
Contributor Author

getdave commented Feb 3, 2022

@dmsnell I think you've come across something similar recently?

@getdave getdave requested a review from dmsnell February 3, 2022 09:26
@getdave getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from 056b585 to 8261b03 Compare April 26, 2022 15:17
@getdave getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from 8261b03 to bc5b16d Compare July 21, 2022 09:33
@getdave
Copy link
Contributor Author

getdave commented Jul 21, 2022

It looks like the replaceInnerHTMLEntities function does successfully convert entities on the core/code block's content. However, that is actually not what we want in this context because if we allow the content entered into the Code block to be parsed into HTML then it will not displayed in the editor.

The reason core/code escapes it's content on save

value={ escape( attributes.content ) }

...is precisely to avoid it being rendered as HTML in order that it can display in raw form.

@@ -38,6 +52,6 @@ export function html( selector, multilineTag ) {
return value;
}

return match.innerHTML;
return replaceInnerHTMLEntities( match.innerHTML );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can change this behaviour at this point. That's why I added a new type. Otherwise we could just return the raw html value here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with this a bit more today. You're correct we cannot change the behaviour. It even breaks Core block let alone any third party blocks.

I'm now more convinced than ever that having a raw source type such as you propose is the best solution.

@getdave
Copy link
Contributor Author

getdave commented Jul 21, 2022

Closing this based on discussion here.

IMHO the proposal to add a new raw source type is the best option.

@getdave getdave closed this Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f Needs Technical Feedback Needs testing from a developer perspective. [Package] Blocks /packages/blocks [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants