Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Add new list of HTML fragments to parse output #11334

Merged
merged 8 commits into from
Nov 7, 2018

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Oct 31, 2018

Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source post_content after parsing.

See alternate attempts: #11082, #11309
Motivated by: #7247, #8760, Automattic/jetpack#10256
Enables: #10463, #10108

Abstract

Add new innerContent property to each block in parser output indicating where in the innerHTML each innerBlock was found.

Status

  • will update fixtures after design review indicates this is the desired approach
  • all parsers passing new tests for fragment behavior

Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in post_content in place as normal blocks which exist in between another block's comment delimiters.

<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original post_content after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the post_content and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a new property as a list of HTML fragments with null values interspersed between those fragments where the blocks were found.

{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n",
	"innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ],
}

Doing this allows us to replace those null values with their associated block (sequentially) from innerBlocks.

Questions

  • Why not use a string token instead of an array?

  • Why add the null instead of leaving basic array splits like [ 'before', 'after' ]?

    • By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings '' and say that blocks exist only between array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway.
  • Why add a new property?

    • Code already depends on innerHTML and innerBlocks; I don't want to break any existing behaviors and adding is less risky than changing.

@dmsnell dmsnell added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label Oct 31, 2018
@dmsnell dmsnell requested review from mcsf and aduth October 31, 2018 21:14
@dmsnell dmsnell force-pushed the parser/add-inner-content branch from 6c62b3f to 7a39478 Compare October 31, 2018 21:18
@aduth
Copy link
Member

aduth commented Nov 2, 2018

To clarify, a point of having the null is so that we could potentially make sense of:

Input

<!-- wp:block -->before<!-- wp:void /-->between<!-- wp:void /-->after<!-- /wp:block -->

Output

innerContent: [ 'before', null, 'between', null, 'after' ]

Or even:

Input

<!-- wp:block -->before<!-- wp:void /--><!-- wp:void /-->between<!-- wp:void /-->after<!-- /wp:block -->

Output

innerContent: [ 'before', null, null, 'between', null, 'after' ]

?

Not speaking to any practical implications of the parser implementation, but the editor implementation of nested blocks was very much based on the assumption that inner blocks occur at most once in a single sequence order, so it is not aware of (and could not easily support) interspersed inner blocks.

This can be seen in the documentation of the InnerBlocks component:

Note: A block can render at most a single InnerBlocks and InnerBlocks.Content element in edit and save respectively. To create distinct arrangements of nested blocks, create a separate block type which renders its own InnerBlocks and assign as the sole allowedBlocks type.

https://github.com/WordPress/gutenberg/blob/master/packages/editor/src/components/inner-blocks/README.md

As far as the parser is concerned, I could grant that it not care to be tailored to this editor implementation detail, even going so far as to applaud that it could support some future editor enhancement toward this goal. All the same, it's a misalignment worth pointing out, and I haven't really seen compelling evidence that the InnerBlocks limitations provide an insufficient base upon which to build nested blocks (where more advanced cases can create nested arrangements, e.g. Columns -> Column).

In all then, and even in acknowledging some redundancy between innerContent and innerHTML (where the latter is effectively innerContent.filter( Boolean ).join( '' )?), this seems an agreeable direction.

The editor today would have to assume that the first and last strings are the content surrounding InnerBlocks, and thus disregard any non-block members between (or perhaps parse as of type "fallback"?).

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are test failures (which, at a glance, appear mostly to do with mere addition of the property where relevant) but conceptually and code-wise this would have my approval.

@dmsnell
Copy link
Member Author

dmsnell commented Nov 2, 2018

Thanks @aduth - to summarize a conversation we had about having one location vs having each inner block location marked: I'm not going to enforce any behaviors through the parser. Even though this change means that the parser is capable of handling the case where inner blocks are interspersed with HTML it doesn't mean they have to. Having separate locations is the "default" behavior of the code whereas forcing a single location for the list of inner blocks would require more special-casing and complexity in the parser. I'd like to leave this simpler version in especially since it gives us the options later to support broader innerBlock behaviors.

@dmsnell dmsnell force-pushed the parser/add-inner-content branch 2 times, most recently from 327f01c to 857b02f Compare November 3, 2018 14:38
@dmsnell dmsnell changed the title WIP: Parser: Add new list of HTML fragments to parse output Parser: Add new list of HTML fragments to parse output Nov 3, 2018
Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some merge conflicts to resolve.

}
],
"innerHTML": "\n\t<div class=\"wp-block-column\">\n\t\t\n\t\t\n\t</div>\n\t"
"innerHTML": "\n\t<div class=\"wp-block-column\">\n\t\t\n\t\t\n\t</div>\n\t",
"innerContent": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"innerContent": [
"\n\t<div class=\"wp-block-column\">\n\t\t",
null,
"\n\t\t",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think if the serializer behavior of newline-delimited blocks could have an impact in the ability to re-serialize, but I'm assuming the editor will disregard this as far as parsing/rehydrating the top-level columns block (which does and will continue to operate on innerHTML?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. we may have to take some additional care to make sure that the serializer matches behavior with the parser, but since it's been working so far I'm guessing it will continue to work for a while.

@dmsnell dmsnell force-pushed the parser/add-inner-content branch from 9a584f6 to 84ba8d5 Compare November 5, 2018 16:28
@dmsnell
Copy link
Member Author

dmsnell commented Nov 5, 2018

rebased

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Nice work here.

$this->blockName = $name;
$this->attrs = $attrs;
$this->innerBlocks = $innerBlocks;
$this->innerHTML = $innerHTML;
$this->innerContent = $innerContent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New assignment introduces misalignment on the =.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will follow-up in a new PR, thanks for spotting this!

@dmsnell
Copy link
Member Author

dmsnell commented Nov 7, 2018

A little slower all around but tolerably so given that we're expanding the functionality.

comparator-ywayevznec now sh_ 3

@dmsnell dmsnell merged commit 1014389 into master Nov 7, 2018
@dmsnell dmsnell deleted the parser/add-inner-content branch November 7, 2018 01:58
@enejb
Copy link
Contributor

enejb commented Nov 8, 2018

❤️

@mtias mtias added this to the 4.3 milestone Nov 12, 2018
@@ -5,12 +5,14 @@
"align": "right"
},
"innerBlocks": [],
"innerHTML": "\n<figure class=\"wp-block-audio alignright\">\n <audio controls=\"\" src=\"https://media.simplecast.com/episodes/audio/80564/draft-podcast-51-livePublish2.mp3\"></audio>\n</figure>\n"
"innerHTML": "\n<figure class=\"wp-block-audio alignright\">\n <audio controls=\"\" src=\"https://media.simplecast.com/episodes/audio/80564/draft-podcast-51-livePublish2.mp3\"></audio>\n</figure>\n",
"innerContent": [ "\n<figure class=\"wp-block-audio alignright\">\n <audio controls=\"\" src=\"https://media.simplecast.com/episodes/audio/80564/draft-podcast-51-livePublish2.mp3\"></audio>\n</figure>\n" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that these were manually edited?

It's meant to be generated via npm run fixtures:regenerate, which on master is producing local changes because of stylistic differences of the generated result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's right @aduth - I was unaware of fixtures:regenerate when I wrote this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants