Decode HTML entities created by block parser matcher reliance on innerHTML #25120

getdave · 2020-09-07T16:27:47Z

This PR fixes an issue with Block parsing as evidenced by #24282 whereby < characters are converted into the HTML entity equivalent < thus causing a block validation error in the HTML block.

In addition it fixes the issue whereby & and > characters in the HTML block are converted into their entity forms.

Description

Essentially on trunk right now, if you have a HTML block which contains the literal < character then it will cause a block validation error when the block is parsed.

Try it now...

New Post.
Add HTML block.
Add content 3 < 4.
Save Draft.
Reload the editor.
See validation error.

Block validation: Block validation failed for `core/html` ({name: "core/html", icon: {…}, keywords: Array(1), attributes: {…}, providesContext: {…}, …}).

Content generated by `save` function:

3 &lt; 4

Content retrieved from post body:

3 < 4

Screen.Capture.on.2021-09-14.at.17-55-00.mov

Why does this happen?

It's complex...but the TL;DR is

validation errors occur because the parsed version of the block's save content is different from that in the post body.
- saved post content - 3 < 4 (correct)
- parsed block content - 3 < 4 (does not match!)
the parsed content is different because it contains the HTML entity version of < which is <.
it contains this entity because the html matcher which is used to parse block's identified as containing HTML utilises .innerHTML to read the content from the match.
Unfortunately, Element.innerHTML returns the following characters as HTML entities: &, <, or >.
Therefore if the HTML block contains any of those characters they will be parsed as their HTML entity equivalents.

So why does the validation error only happen for `<`?

A good question!

The reason appears to be that

the isEquivalentHTML() method used to determine whether blocks are valid utilises getHTMLTokens() which relies on Tokenizer from the npm package `'simple-html-tokenizer'.
it appears that it will "see" the < from 3 < 4 as being an opening tag. See Fails on < inside a <pre> tag tildeio/simple-html-tokenizer#20 for more details.
as a result the comparison function isEquivalentHTML will return false.
as this only happens for the < character the other two characters (& and >) do not cause validation errors.

What bugs do & and > cause?

& and > are still converted to their entity forms there is an inconsistency in the HTML block as the original content & becomes & and > becomes >. This is not what the user wants nor expects.

Screen.Capture.on.2021-09-14.at.17-53-01.mp4

Solution - how does this PR solve these issues?

We need to ensure that &, < and > are converted into their string form. However, we do not want to convert all HTML entities as this would cause things such as   to be unintentionally converted.

Therefore our fix acts in a limited way and only transforms those characters that are specifically returned as HTML entities from Element.innerHTML.

Because the chars are converted at the base "matcher" level of the block parsers all the block validation issues are resolved. Moreover, the content the user entered is preserved (ie: not converted into entity form).

How has this been tested?

On Master

Create HTML block
Insert content 3 < 4.
Save the draft.
Reload the browser.
See validation error in browser console.

Block validation: Block validation failed for `core/html` ({name: "core/html", icon: {…}, keywords: Array(1), attributes: {…}, providesContext: {…}, …}).

Content generated by `save` function:

3 &lt; 4

Content retrieved from post body:

3 < 4

This PR

Checkout this PR
Create HTML block
Insert content 3 < 4.
Save the draft.
Reload the browser.
See no validation errors.
See content displayed in editor without any entities

Types of changes

Bug fix (non-breaking change which fixes an issue)

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.
I've included developer documentation if appropriate.
I've updated all React Native files affected by any refactorings/renamings in this PR.

Original Issue Content

Essentially if you have a HTML block which contains 3 < 4 it will cause validation errors because the parsed version of the block's save content is different from that in the post body.

parsed save content 3 < 4
post body content

<!-- wp:html -->
3 < 4
<!-- /wp:html -->

The serialized block content is actually 3 < 4. It is only when this content is parsed that it is converted into a string containing HTML entities.

Note I believe this occurs because the html() matcher passed to hpqParse relies on using .innerHTML to retrieve the content of the parsed HTML.

For the match (DOM node) which is passed into html, some light debugging reveals:

match.innerHTML => 3 < 4
match.innerText => 3 < 4

I understand this difference is due to the way innerHTML works - indeed the MDN reference for innerHTML says:

Note: If a <div>, <span>, or <noembed> node has a child text node that includes the characters (&), (<), or (>), innerHTML returns these characters as the HTML entities "&", "<" and ">" respectively. Use Node.textContent to get a raw copy of these text nodes' contents.

I believe this is why the < gets encoded to <.

However, in the matcher implementation match.innerHTML is returned which means the string with encoded entities is returned.

Wrapping this string in @wordpess/html-entities's decodeEntities() ensures that any entities are converted back to their string form.

Note for @ellatrix: I could be way off on this one. I just narrowed down the cause of the specific issue but that doesn't mean it doesn't have lots of knock-on effects. I'm hopeful that as we're just fixing the innerHTML returned it should be ok, but this will needs lots of testing with content variations.

Detailed Explanation

The block parser works roughly as follows:

createBlockWithFallback is called with the block node in question. In our use case it looks like this:

{
  "blockName": "core/html",
  "attrs": {},
  "innerBlocks": [],
  "innerHTML": "3 < 4",
  "innerContent": [
    "3 < 4"
  ]
}

Notice how innerHTML and innerContent are correct as 3 < 4 but the block attributes have yet to be parsed.

Within createBlockWithFallback the createBlock function is called

gutenberg/packages/blocks/src/api/parser.js

Lines 530 to 534 in 0d2771b

    
           let block = createBlock( 
        
           	name, 
        
           	getBlockAttributes( blockType, innerHTML, attributes ), 
        
           	innerBlocks 
        
           );

This in turn calls getBlockAttributes which calls getBlockAttribute to map all the block attributes:

gutenberg/packages/blocks/src/api/parser.js

Lines 291 to 301 in 0d2771b

    
           const blockAttributes = mapValues( 
        
           	blockType.attributes, 
        
           	( attributeSchema, attributeKey ) => { 
        
           		return getBlockAttribute( 
        
           			attributeKey, 
        
           			attributeSchema, 
        
           			innerHTML, 
        
           			attributes 
        
           		); 
        
           	} 
        
           );

getBlockAttribute is passed an attributeSchema argument that has the following shape/content:

{
  "type": "string",
  "source": "html"
}

...which it uses to determine how it should parse the attributes.

Depending on the source type (in our case html), parseWithAttributeSchema is then called with the attrubuteSchema above.
This in turn hands off to hpqParse from the hpq library.

gutenberg/packages/blocks/src/api/parser.js

Lines 219 to 221 in 0d2771b

    
           export function parseWithAttributeSchema( innerHTML, attributeSchema ) { 
        
           	return hpqParse( innerHTML, matcherFromSource( attributeSchema ) ); 
        
           }

This takes a 2nd argument of the matcher function to use in hpqParse. This is determine by a call to matcherFromSource which uses the source type (in our case html) to retrieve the appropriate matcher.

gutenberg/packages/blocks/src/api/parser.js

Lines 184 to 185 in 0d2771b

    
           case 'html': 
        
           	return html( sourceConfig.selector, sourceConfig.multiline );

In our case this is the html matcher.

We can see that the html matcher uses innerHTML to retrieve the content for use in the parsed attribute:

gutenberg/packages/blocks/src/api/matchers.js

Lines 40 to 41 in 0d2771b


	return match.innerHTML;

This is where the issue described above occurs. Namely:

If a <div>, <span>, or <noembed> node has a child text node that includes the characters (&), (<), or (>), innerHTML returns these characters as the HTML entities "&", "<" and ">" respectively.

I believe this is why the < gets encoded to <. But this only happens for blocks which have a source type of html. As far as I'm aware that is just the core/html block.

Back in the createBlockWithFallback function, createBlock has now returned and we pass the result to the getBlockContentValidationResult function to determine whether the block is considered valid.

gutenberg/packages/blocks/src/api/parser.js

Lines 530 to 548 in 0d2771b

    
           let block = createBlock( 
        
           	name, 
        
           	getBlockAttributes( blockType, innerHTML, attributes ), 
        
           	innerBlocks 
        
           ); 
        
           // Block validation assumes an idempotent operation from source block to serialized block 
        
           // provided there are no changes in attributes. The validation procedure thus compares the 
        
           // provided source value with the serialized output before there are any modifications to 
        
           // the block. When both match, the block is marked as valid. 
        
           if ( ! isFallbackBlock ) { 
        
           	const { isValid, validationIssues } = getBlockContentValidationResult( 
        
           		blockType, 
        
           		block.attributes, 
        
           		innerHTML 
        
           	); 
        
           	block.isValid = isValid; 
        
           	block.validationIssues = validationIssues; 
        
           }

At this point the block object returned from createBlock looks like this

{
  "clientId": "8a126e6e-9055-490e-8cf3-5127b1fa9194",
  "name": "core/html",
  "isValid": true,
  "attributes": {
    "content": "3 &lt; 4"
  },
  "innerBlocks": []
}

and the innerHTML is "3 < 4".

Therefore when the getBlockContentValidationResult function is called the result is that the block is determined to be invalid due to the mismatch between the saved block content and the value of block.attributes.content:

gutenberg/packages/blocks/src/api/validation/index.js

Lines 690 to 694 in cecabd0

    
           const isValid = isEquivalentHTML( 
        
           	originalBlockContent, 
        
           	generatedBlockContent, 
        
           	logger 
        
           );

In the code above:

originalBlockContent is 3 < 4
generatedBlockContent is 3 < 4

The reason why this fails validation is likely due to the reliance on simple-html-tokenizer. It appears that it will "see" the < from 3 < 4 as being an opening tag. See tildeio/simple-html-tokenizer#20 for more details.

We can avoid this by correctly escaping entities.

The validation of a block checks whether the following two items are "equivalent HTML"

block.originalContent.
generatedBlockContent - returned by calling getSaveContent( blockType, block.attributes );.

This is achieved via the isEquivalentHTML method and if this does not return true then the block is considered invalid.

Description

Fix HTML block validation errors due to blocks containing certain characters get incorrectly parsed.
Also ensure conversion of HTML block to Reusable does not cause errors.

getdave · 2020-09-07T16:36:25Z

Hmmm...looks like we have failing unit tests so this could well be unsuitable.

github-actions · 2020-09-07T16:36:57Z

Size Change: +48 B (0%)

Total Size: 1.26 MB

Filename	Size	Change
`build/blocks/index.min.js`	47.2 kB	+48 B (0%)

ℹ️ View Unchanged

Filename	Size
`build/a11y/index.min.js`	982 B
`build/annotations/index.min.js`	2.76 kB
`build/api-fetch/index.min.js`	2.26 kB
`build/autop/index.min.js`	2.14 kB
`build/blob/index.min.js`	475 B
`build/block-directory/index.min.js`	6.58 kB
`build/block-directory/style-rtl.css`	990 B
`build/block-directory/style.css`	991 B
`build/block-editor/default-editor-styles-rtl.css`	378 B
`build/block-editor/default-editor-styles.css`	378 B
`build/block-editor/index.min.js`	153 kB
`build/block-editor/style-rtl.css`	14.6 kB
`build/block-editor/style.css`	14.6 kB
`build/block-library/blocks/archives/editor-rtl.css`	61 B
`build/block-library/blocks/archives/editor.css`	60 B
`build/block-library/blocks/archives/style-rtl.css`	65 B
`build/block-library/blocks/archives/style.css`	65 B
`build/block-library/blocks/audio/editor-rtl.css`	150 B
`build/block-library/blocks/audio/editor.css`	150 B
`build/block-library/blocks/audio/style-rtl.css`	103 B
`build/block-library/blocks/audio/style.css`	103 B
`build/block-library/blocks/audio/theme-rtl.css`	110 B
`build/block-library/blocks/audio/theme.css`	110 B
`build/block-library/blocks/avatar/editor-rtl.css`	116 B
`build/block-library/blocks/avatar/editor.css`	116 B
`build/block-library/blocks/avatar/style-rtl.css`	59 B
`build/block-library/blocks/avatar/style.css`	59 B
`build/block-library/blocks/block/editor-rtl.css`	161 B
`build/block-library/blocks/block/editor.css`	161 B
`build/block-library/blocks/button/editor-rtl.css`	441 B
`build/block-library/blocks/button/editor.css`	441 B
`build/block-library/blocks/button/style-rtl.css`	542 B
`build/block-library/blocks/button/style.css`	542 B
`build/block-library/blocks/buttons/editor-rtl.css`	292 B
`build/block-library/blocks/buttons/editor.css`	292 B
`build/block-library/blocks/buttons/style-rtl.css`	275 B
`build/block-library/blocks/buttons/style.css`	275 B
`build/block-library/blocks/calendar/style-rtl.css`	207 B
`build/block-library/blocks/calendar/style.css`	207 B
`build/block-library/blocks/categories/editor-rtl.css`	84 B
`build/block-library/blocks/categories/editor.css`	83 B
`build/block-library/blocks/categories/style-rtl.css`	79 B
`build/block-library/blocks/categories/style.css`	79 B
`build/block-library/blocks/code/style-rtl.css`	103 B
`build/block-library/blocks/code/style.css`	103 B
`build/block-library/blocks/code/theme-rtl.css`	124 B
`build/block-library/blocks/code/theme.css`	124 B
`build/block-library/blocks/columns/editor-rtl.css`	108 B
`build/block-library/blocks/columns/editor.css`	108 B
`build/block-library/blocks/columns/style-rtl.css`	406 B
`build/block-library/blocks/columns/style.css`	406 B
`build/block-library/blocks/comment-author-avatar/editor-rtl.css`	125 B
`build/block-library/blocks/comment-author-avatar/editor.css`	125 B
`build/block-library/blocks/comment-content/style-rtl.css`	92 B
`build/block-library/blocks/comment-content/style.css`	92 B
`build/block-library/blocks/comment-template/style-rtl.css`	187 B
`build/block-library/blocks/comment-template/style.css`	185 B
`build/block-library/blocks/comments-pagination-numbers/editor-rtl.css`	123 B
`build/block-library/blocks/comments-pagination-numbers/editor.css`	121 B
`build/block-library/blocks/comments-pagination/editor-rtl.css`	222 B
`build/block-library/blocks/comments-pagination/editor.css`	209 B
`build/block-library/blocks/comments-pagination/style-rtl.css`	235 B
`build/block-library/blocks/comments-pagination/style.css`	231 B
`build/block-library/blocks/comments-title/editor-rtl.css`	75 B
`build/block-library/blocks/comments-title/editor.css`	75 B
`build/block-library/blocks/comments/editor-rtl.css`	95 B
`build/block-library/blocks/comments/editor.css`	95 B
`build/block-library/blocks/cover/editor-rtl.css`	615 B
`build/block-library/blocks/cover/editor.css`	616 B
`build/block-library/blocks/cover/style-rtl.css`	1.55 kB
`build/block-library/blocks/cover/style.css`	1.55 kB
`build/block-library/blocks/embed/editor-rtl.css`	293 B
`build/block-library/blocks/embed/editor.css`	293 B
`build/block-library/blocks/embed/style-rtl.css`	410 B
`build/block-library/blocks/embed/style.css`	410 B
`build/block-library/blocks/embed/theme-rtl.css`	110 B
`build/block-library/blocks/embed/theme.css`	110 B
`build/block-library/blocks/file/editor-rtl.css`	300 B
`build/block-library/blocks/file/editor.css`	300 B
`build/block-library/blocks/file/style-rtl.css`	253 B
`build/block-library/blocks/file/style.css`	254 B
`build/block-library/blocks/file/view.min.js`	346 B
`build/block-library/blocks/freeform/editor-rtl.css`	2.44 kB
`build/block-library/blocks/freeform/editor.css`	2.44 kB
`build/block-library/blocks/gallery/editor-rtl.css`	948 B
`build/block-library/blocks/gallery/editor.css`	950 B
`build/block-library/blocks/gallery/style-rtl.css`	1.53 kB
`build/block-library/blocks/gallery/style.css`	1.53 kB
`build/block-library/blocks/gallery/theme-rtl.css`	108 B
`build/block-library/blocks/gallery/theme.css`	108 B
`build/block-library/blocks/group/editor-rtl.css`	333 B
`build/block-library/blocks/group/editor.css`	333 B
`build/block-library/blocks/group/style-rtl.css`	57 B
`build/block-library/blocks/group/style.css`	57 B
`build/block-library/blocks/group/theme-rtl.css`	78 B
`build/block-library/blocks/group/theme.css`	78 B
`build/block-library/blocks/heading/style-rtl.css`	76 B
`build/block-library/blocks/heading/style.css`	76 B
`build/block-library/blocks/html/editor-rtl.css`	327 B
`build/block-library/blocks/html/editor.css`	329 B
`build/block-library/blocks/image/editor-rtl.css`	736 B
`build/block-library/blocks/image/editor.css`	737 B
`build/block-library/blocks/image/style-rtl.css`	627 B
`build/block-library/blocks/image/style.css`	630 B
`build/block-library/blocks/image/theme-rtl.css`	110 B
`build/block-library/blocks/image/theme.css`	110 B
`build/block-library/blocks/latest-comments/style-rtl.css`	284 B
`build/block-library/blocks/latest-comments/style.css`	284 B
`build/block-library/blocks/latest-posts/editor-rtl.css`	199 B
`build/block-library/blocks/latest-posts/editor.css`	198 B
`build/block-library/blocks/latest-posts/style-rtl.css`	463 B
`build/block-library/blocks/latest-posts/style.css`	462 B
`build/block-library/blocks/list/style-rtl.css`	88 B
`build/block-library/blocks/list/style.css`	88 B
`build/block-library/blocks/media-text/editor-rtl.css`	266 B
`build/block-library/blocks/media-text/editor.css`	263 B
`build/block-library/blocks/media-text/style-rtl.css`	493 B
`build/block-library/blocks/media-text/style.css`	490 B
`build/block-library/blocks/more/editor-rtl.css`	431 B
`build/block-library/blocks/more/editor.css`	431 B
`build/block-library/blocks/navigation-link/editor-rtl.css`	705 B
`build/block-library/blocks/navigation-link/editor.css`	703 B
`build/block-library/blocks/navigation-link/style-rtl.css`	115 B
`build/block-library/blocks/navigation-link/style.css`	115 B
`build/block-library/blocks/navigation-submenu/editor-rtl.css`	296 B
`build/block-library/blocks/navigation-submenu/editor.css`	295 B
`build/block-library/blocks/navigation-submenu/view.min.js`	423 B
`build/block-library/blocks/navigation/editor-rtl.css`	2.03 kB
`build/block-library/blocks/navigation/editor.css`	2.04 kB
`build/block-library/blocks/navigation/style-rtl.css`	1.96 kB
`build/block-library/blocks/navigation/style.css`	1.95 kB
`build/block-library/blocks/navigation/view-modal.min.js`	2.78 kB
`build/block-library/blocks/navigation/view.min.js`	443 B
`build/block-library/blocks/nextpage/editor-rtl.css`	395 B
`build/block-library/blocks/nextpage/editor.css`	395 B
`build/block-library/blocks/page-list/editor-rtl.css`	363 B
`build/block-library/blocks/page-list/editor.css`	363 B
`build/block-library/blocks/page-list/style-rtl.css`	175 B
`build/block-library/blocks/page-list/style.css`	175 B
`build/block-library/blocks/paragraph/editor-rtl.css`	157 B
`build/block-library/blocks/paragraph/editor.css`	157 B
`build/block-library/blocks/paragraph/style-rtl.css`	260 B
`build/block-library/blocks/paragraph/style.css`	260 B
`build/block-library/blocks/post-author/style-rtl.css`	175 B
`build/block-library/blocks/post-author/style.css`	176 B
`build/block-library/blocks/post-comments-form/editor-rtl.css`	96 B
`build/block-library/blocks/post-comments-form/editor.css`	96 B
`build/block-library/blocks/post-comments-form/style-rtl.css`	493 B
`build/block-library/blocks/post-comments-form/style.css`	493 B
`build/block-library/blocks/post-comments/editor-rtl.css`	77 B
`build/block-library/blocks/post-comments/editor.css`	77 B
`build/block-library/blocks/post-comments/style-rtl.css`	632 B
`build/block-library/blocks/post-comments/style.css`	630 B
`build/block-library/blocks/post-excerpt/editor-rtl.css`	73 B
`build/block-library/blocks/post-excerpt/editor.css`	73 B
`build/block-library/blocks/post-excerpt/style-rtl.css`	69 B
`build/block-library/blocks/post-excerpt/style.css`	69 B
`build/block-library/blocks/post-featured-image/editor-rtl.css`	605 B
`build/block-library/blocks/post-featured-image/editor.css`	605 B
`build/block-library/blocks/post-featured-image/style-rtl.css`	153 B
`build/block-library/blocks/post-featured-image/style.css`	153 B
`build/block-library/blocks/post-template/editor-rtl.css`	99 B
`build/block-library/blocks/post-template/editor.css`	98 B
`build/block-library/blocks/post-template/style-rtl.css`	282 B
`build/block-library/blocks/post-template/style.css`	282 B
`build/block-library/blocks/post-terms/style-rtl.css`	73 B
`build/block-library/blocks/post-terms/style.css`	73 B
`build/block-library/blocks/post-title/style-rtl.css`	80 B
`build/block-library/blocks/post-title/style.css`	80 B
`build/block-library/blocks/preformatted/style-rtl.css`	103 B
`build/block-library/blocks/preformatted/style.css`	103 B
`build/block-library/blocks/pullquote/editor-rtl.css`	198 B
`build/block-library/blocks/pullquote/editor.css`	198 B
`build/block-library/blocks/pullquote/style-rtl.css`	370 B
`build/block-library/blocks/pullquote/style.css`	370 B
`build/block-library/blocks/pullquote/theme-rtl.css`	167 B
`build/block-library/blocks/pullquote/theme.css`	167 B
`build/block-library/blocks/query-pagination-numbers/editor-rtl.css`	122 B
`build/block-library/blocks/query-pagination-numbers/editor.css`	121 B
`build/block-library/blocks/query-pagination/editor-rtl.css`	221 B
`build/block-library/blocks/query-pagination/editor.css`	211 B
`build/block-library/blocks/query-pagination/style-rtl.css`	234 B
`build/block-library/blocks/query-pagination/style.css`	231 B
`build/block-library/blocks/query/editor-rtl.css`	365 B
`build/block-library/blocks/query/editor.css`	364 B
`build/block-library/blocks/quote/style-rtl.css`	213 B
`build/block-library/blocks/quote/style.css`	213 B
`build/block-library/blocks/quote/theme-rtl.css`	223 B
`build/block-library/blocks/quote/theme.css`	226 B
`build/block-library/blocks/read-more/style-rtl.css`	132 B
`build/block-library/blocks/read-more/style.css`	132 B
`build/block-library/blocks/rss/editor-rtl.css`	202 B
`build/block-library/blocks/rss/editor.css`	204 B
`build/block-library/blocks/rss/style-rtl.css`	289 B
`build/block-library/blocks/rss/style.css`	288 B
`build/block-library/blocks/search/editor-rtl.css`	165 B
`build/block-library/blocks/search/editor.css`	165 B
`build/block-library/blocks/search/style-rtl.css`	385 B
`build/block-library/blocks/search/style.css`	386 B
`build/block-library/blocks/search/theme-rtl.css`	114 B
`build/block-library/blocks/search/theme.css`	114 B
`build/block-library/blocks/separator/editor-rtl.css`	146 B
`build/block-library/blocks/separator/editor.css`	146 B
`build/block-library/blocks/separator/style-rtl.css`	233 B
`build/block-library/blocks/separator/style.css`	233 B
`build/block-library/blocks/separator/theme-rtl.css`	194 B
`build/block-library/blocks/separator/theme.css`	194 B
`build/block-library/blocks/shortcode/editor-rtl.css`	464 B
`build/block-library/blocks/shortcode/editor.css`	464 B
`build/block-library/blocks/site-logo/editor-rtl.css`	708 B
`build/block-library/blocks/site-logo/editor.css`	708 B
`build/block-library/blocks/site-logo/style-rtl.css`	192 B
`build/block-library/blocks/site-logo/style.css`	192 B
`build/block-library/blocks/site-tagline/editor-rtl.css`	86 B
`build/block-library/blocks/site-tagline/editor.css`	86 B
`build/block-library/blocks/site-title/editor-rtl.css`	84 B
`build/block-library/blocks/site-title/editor.css`	84 B
`build/block-library/blocks/social-link/editor-rtl.css`	177 B
`build/block-library/blocks/social-link/editor.css`	177 B
`build/block-library/blocks/social-links/editor-rtl.css`	674 B
`build/block-library/blocks/social-links/editor.css`	673 B
`build/block-library/blocks/social-links/style-rtl.css`	1.39 kB
`build/block-library/blocks/social-links/style.css`	1.38 kB
`build/block-library/blocks/spacer/editor-rtl.css`	322 B
`build/block-library/blocks/spacer/editor.css`	322 B
`build/block-library/blocks/spacer/style-rtl.css`	48 B
`build/block-library/blocks/spacer/style.css`	48 B
`build/block-library/blocks/table/editor-rtl.css`	494 B
`build/block-library/blocks/table/editor.css`	494 B
`build/block-library/blocks/table/style-rtl.css`	611 B
`build/block-library/blocks/table/style.css`	609 B
`build/block-library/blocks/table/theme-rtl.css`	175 B
`build/block-library/blocks/table/theme.css`	175 B
`build/block-library/blocks/tag-cloud/style-rtl.css`	226 B
`build/block-library/blocks/tag-cloud/style.css`	227 B
`build/block-library/blocks/template-part/editor-rtl.css`	235 B
`build/block-library/blocks/template-part/editor.css`	235 B
`build/block-library/blocks/template-part/theme-rtl.css`	101 B
`build/block-library/blocks/template-part/theme.css`	101 B
`build/block-library/blocks/text-columns/editor-rtl.css`	95 B
`build/block-library/blocks/text-columns/editor.css`	95 B
`build/block-library/blocks/text-columns/style-rtl.css`	166 B
`build/block-library/blocks/text-columns/style.css`	166 B
`build/block-library/blocks/verse/style-rtl.css`	87 B
`build/block-library/blocks/verse/style.css`	87 B
`build/block-library/blocks/video/editor-rtl.css`	561 B
`build/block-library/blocks/video/editor.css`	563 B
`build/block-library/blocks/video/style-rtl.css`	159 B
`build/block-library/blocks/video/style.css`	159 B
`build/block-library/blocks/video/theme-rtl.css`	110 B
`build/block-library/blocks/video/theme.css`	110 B
`build/block-library/common-rtl.css`	1.01 kB
`build/block-library/common.css`	1 kB
`build/block-library/editor-rtl.css`	10.3 kB
`build/block-library/editor.css`	10.3 kB
`build/block-library/elements-rtl.css`	54 B
`build/block-library/elements.css`	54 B
`build/block-library/index.min.js`	184 kB
`build/block-library/reset-rtl.css`	478 B
`build/block-library/reset.css`	478 B
`build/block-library/style-rtl.css`	11.7 kB
`build/block-library/style.css`	11.7 kB
`build/block-library/theme-rtl.css`	695 B
`build/block-library/theme.css`	700 B
`build/block-serialization-default-parser/index.min.js`	1.11 kB
`build/block-serialization-spec-parser/index.min.js`	2.83 kB
`build/components/index.min.js`	230 kB
`build/components/style-rtl.css`	14 kB
`build/components/style.css`	14 kB
`build/compose/index.min.js`	11.7 kB
`build/core-data/index.min.js`	14.7 kB
`build/customize-widgets/index.min.js`	11.2 kB
`build/customize-widgets/style-rtl.css`	1.4 kB
`build/customize-widgets/style.css`	1.4 kB
`build/data-controls/index.min.js`	653 B
`build/data/index.min.js`	7.99 kB
`build/date/index.min.js`	32 kB
`build/deprecated/index.min.js`	507 B
`build/dom-ready/index.min.js`	324 B
`build/dom/index.min.js`	4.69 kB
`build/edit-navigation/index.min.js`	16 kB
`build/edit-navigation/style-rtl.css`	4.02 kB
`build/edit-navigation/style.css`	4.03 kB
`build/edit-post/classic-rtl.css`	546 B
`build/edit-post/classic.css`	547 B
`build/edit-post/index.min.js`	30.5 kB
`build/edit-post/style-rtl.css`	6.97 kB
`build/edit-post/style.css`	6.97 kB
`build/edit-site/index.min.js`	54.2 kB
`build/edit-site/style-rtl.css`	8.24 kB
`build/edit-site/style.css`	8.23 kB
`build/edit-widgets/index.min.js`	16.5 kB
`build/edit-widgets/style-rtl.css`	4.35 kB
`build/edit-widgets/style.css`	4.35 kB
`build/editor/index.min.js`	41.3 kB
`build/editor/style-rtl.css`	3.66 kB
`build/editor/style.css`	3.65 kB
`build/element/index.min.js`	4.27 kB
`build/escape-html/index.min.js`	537 B
`build/format-library/index.min.js`	6.75 kB
`build/format-library/style-rtl.css`	571 B
`build/format-library/style.css`	571 B
`build/hooks/index.min.js`	1.64 kB
`build/html-entities/index.min.js`	448 B
`build/i18n/index.min.js`	3.77 kB
`build/is-shallow-equal/index.min.js`	527 B
`build/keyboard-shortcuts/index.min.js`	1.78 kB
`build/keycodes/index.min.js`	1.38 kB
`build/list-reusable-blocks/index.min.js`	1.74 kB
`build/list-reusable-blocks/style-rtl.css`	835 B
`build/list-reusable-blocks/style.css`	835 B
`build/media-utils/index.min.js`	2.93 kB
`build/notices/index.min.js`	953 B
`build/nux/index.min.js`	2.05 kB
`build/nux/style-rtl.css`	732 B
`build/nux/style.css`	728 B
`build/plugins/index.min.js`	1.94 kB
`build/preferences-persistence/index.min.js`	2.22 kB
`build/preferences/index.min.js`	1.3 kB
`build/primitives/index.min.js`	933 B
`build/priority-queue/index.min.js`	612 B
`build/react-i18n/index.min.js`	696 B
`build/react-refresh-entry/index.min.js`	8.44 kB
`build/react-refresh-runtime/index.min.js`	7.31 kB
`build/redux-routine/index.min.js`	2.68 kB
`build/reusable-blocks/index.min.js`	2.22 kB
`build/reusable-blocks/style-rtl.css`	256 B
`build/reusable-blocks/style.css`	256 B
`build/rich-text/index.min.js`	11.1 kB
`build/server-side-render/index.min.js`	1.61 kB
`build/shortcode/index.min.js`	1.53 kB
`build/token-list/index.min.js`	644 B
`build/url/index.min.js`	3.61 kB
`build/vendors/react-dom.min.js`	38.5 kB
`build/vendors/react.min.js`	4.34 kB
`build/viewport/index.min.js`	1.08 kB
`build/warning/index.min.js`	268 B
`build/widgets/index.min.js`	7.19 kB
`build/widgets/style-rtl.css`	1.16 kB
`build/widgets/style.css`	1.16 kB
`build/wordcount/index.min.js`	1.06 kB

_{compressed-size-action}

getdave · 2020-09-09T09:57:52Z

These unit tests are failing

 npm run test-unit test/integration/blocks-raw-handling.test.js

This is because the   entity is now being converted into an actual space. I believe this would probably be correct functionality but I'd like a confidence check.

ellatrix · 2020-09-09T11:16:54Z

Is the problem specific to reusable blocks? Are there problems anywhere else (with steps to reproduce)?

getdave · 2020-10-02T12:05:20Z

@ellatrix Sorry for the delay here.

Is the problem specific to reusable blocks?

No, on master you can simply:

Add a HTML block with content 3 < 4.
Save post.
Reload browser.
See error in console.

If you add as a reusable block the same thing happens but it's worse because the Reusable Blocks are parsed in the background and then there are errors in the console because you have a Reusable block saved.

Are there problems anywhere else (with steps to reproduce)?

I'm yet to witness any. We do have integration tests failing though so let me look into those and come back to you.

getdave · 2020-10-02T12:10:54Z

Testing

Test Expectations

The serialized form of the block's content (ie: that which is persisted) should be:

<!-- wp:paragraph -->
<p>3 &lt; 4</p>
<!-- /wp:paragraph -->

The parsed form of the block (ie: that displayed in the editor) should be:

3 < 4

Testing results for other blocks

Paragraph block - no errors.
Classic block - no errors.
HTML block - errors (as described above).

getdave · 2021-09-14T16:03:30Z

Update: i've alter the PR to only convert the specific HTML entities that are not converted by Element.innerHTML namely:

"&",
"<"
">"

There's a simple lookup which converts them to their correct form.

This preserves existing functionality (all unit tests should now pass - we'll see!) and also fixes the test cases shown above.

The reason for this is because now all of the entities have been converted the parsed block content matches the saved block content and thus the block validation passes.

The best thing is that it also fixes another bug with & being converted into & in HTML block preview mode.

I'll write this up, add test cases and unit tests and then ask for code review.

getdave · 2021-09-15T08:19:46Z

I've just noticed this PR introduces a regression with the core/code block. Annoyingly it has the exact inverse behaviour of the core/html block in that it escapes the code input before it's saved to the database.

gutenberg/packages/block-library/src/code/save.js

Line 16 in d575061

value={ escape( attributes.content ) }

My understanding is that it's best to sanitise data on output (rendering) not on input (saving to DB). I'll dig deeper shortly...

getdave · 2022-02-03T09:26:08Z

@dmsnell I think you've come across something similar recently?

getdave · 2022-07-21T09:57:07Z

It looks like the replaceInnerHTMLEntities function does successfully convert entities on the core/code block's content. However, that is actually not what we want in this context because if we allow the content entered into the Code block to be parsed into HTML then it will not displayed in the editor.

The reason core/code escapes it's content on save

gutenberg/packages/block-library/src/code/save.js

Line 16 in d575061

value={ escape( attributes.content ) }

...is precisely to avoid it being rendered as HTML in order that it can display in raw form.

ellatrix · 2022-07-21T12:34:00Z

packages/blocks/src/api/matchers.js

@@ -38,6 +52,6 @@ export function html( selector, multilineTag ) {
 			return value;
 		}

-		return match.innerHTML;
+		return replaceInnerHTMLEntities( match.innerHTML );


I don't think we can change this behaviour at this point. That's why I added a new type. Otherwise we could just return the raw html value here instead?

I played with this a bit more today. You're correct we cannot change the behaviour. It even breaks Core block let alone any third party blocks.

I'm now more convinced than ever that having a raw source type such as you propose is the best solution.

getdave · 2022-07-21T13:18:50Z

Closing this based on discussion here.

IMHO the proposal to add a new raw source type is the best option.

getdave self-assigned this Sep 7, 2020

getdave requested a review from ellatrix September 7, 2020 16:32

getdave added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Package] Blocks /packages/blocks [Type] Bug An existing feature does not function as intended labels Sep 7, 2020

getdave marked this pull request as ready for review September 7, 2020 16:34

getdave requested a review from youknowriad as a code owner September 7, 2020 16:34

getdave added the Needs Technical Feedback Needs testing from a developer perspective. label Sep 7, 2020

getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from 0cdd361 to bbc2590 Compare October 2, 2020 13:18

getdave mentioned this pull request Nov 5, 2020

Pasted HTML code inside the core code block is formatted/parsed #26689

Closed

ellatrix mentioned this pull request Nov 25, 2020

HTML block: fix parsing #27268

Merged

6 tasks

Base automatically changed from master to trunk March 1, 2021 15:44

getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from bbc2590 to 12f0370 Compare September 14, 2021 14:25

getdave requested a review from dmsnell February 3, 2022 09:26

getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from 056b585 to 8261b03 Compare April 26, 2022 15:17

getdave added 2 commits July 21, 2022 10:33

Decode any entities within the innerHTML

01e5f83

Convert subset of HTML entities

bc5b16d

getdave force-pushed the fix/decode-html-entities-encoded-by-innerHTML branch from 8261b03 to bc5b16d Compare July 21, 2022 09:33

ellatrix reviewed Jul 21, 2022

View reviewed changes

getdave closed this Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode HTML entities created by block parser matcher reliance on innerHTML #25120

Decode HTML entities created by block parser matcher reliance on innerHTML #25120

getdave commented Sep 7, 2020 •

edited

Loading

getdave commented Sep 7, 2020

github-actions bot commented Sep 7, 2020 •

edited

Loading

getdave commented Sep 9, 2020

ellatrix commented Sep 9, 2020

getdave commented Oct 2, 2020 •

edited

Loading

getdave commented Oct 2, 2020 •

edited

Loading

getdave commented Sep 14, 2021

getdave commented Sep 15, 2021 •

edited

Loading

getdave commented Feb 3, 2022

getdave commented Jul 21, 2022

ellatrix Jul 21, 2022

getdave Jul 21, 2022

getdave commented Jul 21, 2022

	let block = createBlock(
	name,
	getBlockAttributes( blockType, innerHTML, attributes ),
	innerBlocks
	);

	const blockAttributes = mapValues(
	blockType.attributes,
	( attributeSchema, attributeKey ) => {
	return getBlockAttribute(
	attributeKey,
	attributeSchema,
	innerHTML,
	attributes
	);
	}
	);

	export function parseWithAttributeSchema( innerHTML, attributeSchema ) {
	return hpqParse( innerHTML, matcherFromSource( attributeSchema ) );
	}

	case 'html':
	return html( sourceConfig.selector, sourceConfig.multiline );

	const isValid = isEquivalentHTML(
	originalBlockContent,
	generatedBlockContent,
	logger
	);

Decode HTML entities created by block parser matcher reliance on innerHTML #25120

Decode HTML entities created by block parser matcher reliance on innerHTML #25120

Conversation

getdave commented Sep 7, 2020 • edited Loading

Description

Why does this happen?

So why does the validation error only happen for <?

What bugs do & and > cause?

Solution - how does this PR solve these issues?

How has this been tested?

On Master

This PR

Types of changes

Checklist:

Original Issue Content

Detailed Explanation

Description

getdave commented Sep 7, 2020

github-actions bot commented Sep 7, 2020 • edited Loading

getdave commented Sep 9, 2020

ellatrix commented Sep 9, 2020

getdave commented Oct 2, 2020 • edited Loading

getdave commented Oct 2, 2020 • edited Loading

Testing

Test Expectations

Testing results for other blocks

getdave commented Sep 14, 2021

getdave commented Sep 15, 2021 • edited Loading

getdave commented Feb 3, 2022

getdave commented Jul 21, 2022

ellatrix Jul 21, 2022

Choose a reason for hiding this comment

getdave Jul 21, 2022

Choose a reason for hiding this comment

getdave commented Jul 21, 2022

getdave commented Sep 7, 2020 •

edited

Loading

So why does the validation error only happen for `<`?

github-actions bot commented Sep 7, 2020 •

edited

Loading

getdave commented Oct 2, 2020 •

edited

Loading

getdave commented Oct 2, 2020 •

edited

Loading

getdave commented Sep 15, 2021 •

edited

Loading