-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser: Propose new hand-coded parser #8083
Conversation
Done |
dd4409a
to
4191994
Compare
478b27a
to
24977fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice! Let's get this in sooner rather than later, so we can make inroads on the things depending on having a faster parser. 🙂
I've left some comments, here are a few random notes that have occurred to me, as well:
- It feels a little weird to be putting the PHP parser on NPM, but we don't really use Packagist at all, sooo... 🤷♂️ Let's stick with NPM for now, we can potentially explore doing Packagist/composer things later.
phpcs.xml.dist
needs to be updated to scan the new PHP code. I mentioned a couple of coding standards issues in the comments, but PHPCS should pick up the rest.- Combined with switching the parser in
gutenberg_parse_blocks()
,phpunit/class-parsing-test.php
should be updated to usegutenberg_parse_blocks()
, rather thanGutenberg_PEG_Parser
.
With this performance improvement, it seems like we could change do_blocks()
to parse the content, instead of using the dynamic blocks regex.
@@ -0,0 +1,107 @@ | |||
# Block Serialization Default Parser | |||
|
|||
This library contains the default block serialization parser implementations for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to remove the manual line breaks from the README: we use the Jetpack Markdown parser, which adds a <br/>
for single line breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me want to cry since it's something I love about markdown and consistent among every other markdown parser I've used.
The implication of the “one or more consecutive lines of text” rule is that Markdown supports “hard-wrapped” text paragraphs. This differs significantly from most other text-to-HTML formatters (including Movable Type’s “Convert Line Breaks” option) which translate every line break character in a paragraph into a <br /> tag.
When you do want to insert a <br /> break tag using Markdown, you end a line with two or more spaces, then type return.
Yes, this takes a tad more effort to create a <br />, but a simplistic “every line break is a <br />” rule wouldn’t work for Markdown. Markdown’s email-style blockquoting and multi-paragraph list items work best — and look better — when you format them with hard breaks.
https://daringfireball.net/projects/markdown/syntax#p
nonetheless, I have destroyed my markdown to make it happy in ee72314cc
😢
@@ -0,0 +1,260 @@ | |||
<?php | |||
|
|||
function bsdp_parse($document ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a new _parse()
function, can gutenberg_parse_blocks()
be updated to use the new parser? We can add a filter in there for easier switching between classes: eg, existing filters in Core that filter a Class name: wp_rest_server_class
, customize_dynamic_setting_class
.
block_parser_class
works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see related comment response below.
I'm having some trouble understanding what you wrote @pento. I hope we create a filter to select the parsing function but won't that depend somewhat on having unique names for each possible parse functions?
also, are wp_rest_server_class
and customize_dynamic_setting_class
anyway related here? are you suggesting we create a class interface for the block parser class?
in lib/block.php
I had originally envisioned something like this…
$parser = apply_filter( 'block_parser_class', 'bsdp_parse' );
call_user_func( $parser, $post_content );
I guess you are recommending this instead?
$parser_class = apply_filter( 'block_parser_class', 'bsdp' );
$parser = new $parser_class();
$parser->parse( $post_content );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
experimented in 064efa58d but I haven't tested it yet
for what it's worth I'd be more comfortable getting this parser in first before making the parser system pluggable just because of the scope of the changes
static $parser; | ||
|
||
if ( ! isset( $parser ) ) { | ||
$parser = new BSDP_Parser(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wild about the BSDP_
prefix. I get why it's there, but perhaps it could be a little more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Block_Parser()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mainly this is there to prevent namespace collisions. my hope is that a few PRs after this we'll have a filter choose the parser and obviously if we create two or more Block_Parser()
classes we'll run into conflicts.
any thoughts on that? even with an encapsulating class we run into some issues here because I don't think we can create a class within a class. the only way around it otherwise I think is actual namespacing which isn't supported on older PHP version…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically, is there going to be a completely new parser appear between now and 5.0? It seems like this parser is going to be the one that will go into Core.
If that's the case, we should just use a generic name. WP_Block_Parser
will fit into the WordPress naming scheme.
|
||
switch ( $token_type ) { | ||
case 'no-more-tokens': | ||
# if not in a block then flush output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to use //
for single inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-slashed it in ee72314cc
return false; | ||
} | ||
|
||
# Otherwise we have a problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block inline comments should be in the form:
/*
* blah
*
* - foo
* - bar
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exploded comments in ee72314cc
# Block Serialization Default Parser | ||
|
||
This library contains the default block serialization parser implementations for | ||
WordPress documents. It provides native PHP and Javascript parsers that implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Javascript/JavaScript/
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
substituted in ee72314cc
e246b11
to
7cf7971
Compare
@@ -0,0 +1,25 @@ | |||
{ | |||
"name": "@wordpress/block-serialization-default-parser", | |||
"version": "1.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put 1.0.0-rc.0
or something like that to allow Lerna to do its job - it always bumps version so it would try to do 1.0.1
release otherwise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
campaigned for release in 8c7e42c
@@ -88,6 +88,7 @@ const gutenbergPackages = [ | |||
'autop', | |||
'blob', | |||
'blocks', | |||
'block-serialization-default-parser', | |||
'block-serialization-spec-parser', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stop bundling the other one if we don't use it in Gutenberg anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a good question. I don't want to kill the PEG parser since that maintains the spec in a way no hand-written implementation can.
in my comparator PRs I'm trying to move towards a system that will automatically run the implementations against the specification in something like a CI job so that we can have our formal specification without worrying about the implementation diverging (for example, if someone makes a change to the implementation without changing the spec first)
that is, I think we want to keep the spec-parser
wherever we need it - mainly I think we want to strip it from the default load of Gutenberg but whether we build it, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package with transpiled code is going to be there anyway. It's really up to you and how you want to use it. If you are fine with referencing it as a regular npm package then you don't need it. If you want to consume it as part of e2e test or something which requires all Gutenberg build files then you can leave it as is. I just wanted to raise the awareness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - this is mainly just out of my expertise at this point. if you are willing to make a decision on it or can tell me what we should do then that would help me out.
it seems like several people want these parser tests to be written with jest
and somehow in the normal suite - I don't know what that means here for this decision
lib/client-assets.php
Outdated
@@ -369,6 +376,7 @@ function gutenberg_register_scripts_and_styles() { | |||
array( | |||
'wp-autop', | |||
'wp-blob', | |||
'wp-block-serialization-default-parser', | |||
'wp-block-serialization-spec-parser', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we no longer need to list wp-block-serialization-spec-parser
as a dependency. In addition, we should stop registering it, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed on this one but I wasn't entirely sure how we wanted this to work…
do we want Gutenberg to automatically replace the spec parser with the "default" one at boot through a filter or do we want the "default" to be the default?
I want the auto-generated parser to be available still, especially for things like diagnostics and exploration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented above, it all depends on the way you want to use it. I don't have any strong opinions about it. We should just ensure we don't ship unused code to the end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a decision here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the spec parser registered but un-enqueued it in 66455b4
lib/blocks.php
Outdated
* | ||
* @param string $parser_class Name of block parser class | ||
*/ | ||
$parser_class = apply_filters( 'block_parser_class', 'BDSP_Parser' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document it in the extensibility docs. Probably, the main document would be the best fit: https://github.com/WordPress/gutenberg/blob/master/docs/extensibility.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documented in 8c7e42c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still reads BDSP
. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another great catch - fixed in 66455b4
@@ -378,6 +378,6 @@ const createParse = ( parseImplementation ) => | |||
* | |||
* @return {Array} Block list. | |||
*/ | |||
export const parseWithGrammar = createParse( grammarParse ); | |||
export const parseWithGrammar = createParse( defaultParse ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we offer a filter for JS implementation, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but I wasn't sure if this PR was the right one for it. that is, filtering out the PHP side seemed somewhat straightforward while filtering the JS side seemed more complicated since we have to take into account things like loading the parser bundles and making sure they are available before the editor loads
do you think we need to do it all here in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's totally fine as its own PR, I just wanted to ensure we tackle both PHP and JS side of things.
6f4be14
to
07ffe45
Compare
docs/extensibility/parser.md
Outdated
return 'EmptyParser'; | ||
} | ||
|
||
add_filter( 'block_parser_class', select_empty_parser, 10, 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we provide the name of the function as a string in other examples to ensure it works with PHP 5.2. We might also want to prefix the function name with the plugin name:
add_filter( 'block_parser_class', `my_plugin_select_empty_parser`, 10, 1 );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I never meant to leave out the string - just neglected it - updated in 96ecfb8
8c7e42c looks great, I left one comment which is a tiny thing that affects only PHP 5.2... |
} | ||
|
||
function bdsp_select_parser( $prev_parse_class ) { | ||
return 'BSDP_Parser'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo at BSDP. Anyway, given that the apply_filters
call in gutenberg_parse_blocks
defaults to 'BDSP_Parser'
, we should remove this bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! Is removed the function in 9c85a60
I'm getting a tokenization bug while testing with a personal post. Digging… |
const namespace = namespaceMatch || 'core/'; | ||
const name = namespace + nameMatch; | ||
const hasAttrs = !! attrsMatch; | ||
const attrs = hasAttrs ? JSON.parse( attrsMatch ) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's a performance hit with try
, but we should play it safe with JSON.parse
, or generally speaking make sure we can inform the user of bad input and recover (e.g. isolate bad blocks) as best as possible. Thoughts, @dmsnell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no longer the famed V8 deoptimization with try
/ catch
https://github.com/petkaantonov/bluebird/wiki/Optimization-killers#2-unsupported-syntax
v8/v8@9aac80f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the try
in 9c85a60 but left it out of the PHP since in PHP it already returns null
on a failed parse
@dmsnell: I've pushed a failing test for the parser. The gist of it is that I think the tokenizer is too greedy when looking for the end of an attributes group ( <!-- wp:block {"ref":313} /-->
<!-- wp:block {"ref":482} /--> This makes the parser throw a syntax error in the
We should guarantee handling of any bad JSON here, but that's not the real issue. The issue is in the tokenizer, as the following fragment was returned as a match for
Note that, in contrast, the following input is correctly parsed:
I used the following debugger patch: diff --git a/packages/block-serialization-default-parser/src/index.js b/packages/block-serialization-default-parser/src/index.js
index 9c1983f22..007edd2b5 100644
--- a/packages/block-serialization-default-parser/src/index.js
+++ b/packages/block-serialization-default-parser/src/index.js
@@ -172,7 +172,7 @@ function nextToken() {
const namespace = namespaceMatch || 'core/';
const name = namespace + nameMatch;
const hasAttrs = !! attrsMatch;
- const attrs = hasAttrs ? JSON.parse( attrsMatch ) : null;
+ const attrs = hasAttrs ? safeParse( attrsMatch ) : null;
// This state isn't allowed
// This is an error
@@ -192,6 +192,17 @@ function nextToken() {
return [ 'block-opener', name, attrs, startedAt, length ];
}
+function safeParse( json ) {
+ let r;
+ try {
+ r = JSON.parse( json );
+ } catch ( e ) {
+ console.error( `Input of length ${ json.length }`, json );
+ throw e;
+ }
+ return r;
+}
+
function addFreeform( rawLength ) {
const length = rawLength ? rawLength : document.length - offset; |
a2dae1e
to
c154286
Compare
c154286
to
138614d
Compare
excellent find @mcsf! you are right - I let in a greedy match when I had no reason to! that's been taken out by the addition of the un-greedy modifier added in 9c85a60 also I rebased the branch |
Concerning the requiring of the PHP implementation, #9791 needs investigating. |
Potential regression noted at #9968 |
Resolves #9968 It was noted that a classic block preceding a void block would disappear in the editor while if that same classic block preceded the long-form non-void representation of an empty block then things would load as expected. This behavior was determined to originate in the new default parser in #8083 and the bug was that with void blocks we weren't sending any preceding HTML soup/freeform content into the output list. In this patch I've duplicated some code from the block-closing function of the parser to spit out this content when a void block is at the top-level of the document. This bug did not appear when void blocks are nested because it's the parent block that eats HTML soup. In the case of the top-level void however we were immediately pushing that void block to the output list and neglecting the freeform HTML. I've added a few tests to verify and demonstrate this behavior. Actually, since I wasn't sure what was wrong I wrote the tests first to try and understand the behaviors and bugs. There are a few tests that are thus not entirely essential but worthwhile to have in here.
* Parser (Fix): Output freeform content before void blocks Resolves #9968 It was noted that a classic block preceding a void block would disappear in the editor while if that same classic block preceded the long-form non-void representation of an empty block then things would load as expected. This behavior was determined to originate in the new default parser in #8083 and the bug was that with void blocks we weren't sending any preceding HTML soup/freeform content into the output list. In this patch I've duplicated some code from the block-closing function of the parser to spit out this content when a void block is at the top-level of the document. This bug did not appear when void blocks are nested because it's the parent block that eats HTML soup. In the case of the top-level void however we were immediately pushing that void block to the output list and neglecting the freeform HTML. I've added a few tests to verify and demonstrate this behavior. Actually, since I wasn't sure what was wrong I wrote the tests first to try and understand the behaviors and bugs. There are a few tests that are thus not entirely essential but worthwhile to have in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't realized this before — as my primary testing interface was the WP API (gist), through which everything is serialized into the same shape — but I now fear that we're not providing a consistent interface with the parser in its current state.
See my inline comments. Consumers of gutenberg_parse_blocks
may make mistakes because of these discrepancies, and I fear they may already have: #10041.
cc @dmsnell
|
||
if ( isset( $stack_top->leading_html_start ) ) { | ||
$this->output[] = array( | ||
'attrs' => array(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call - I know there are some lingering inconsistencies too around null
vs. {}
in the spec grammar. a good follow-up PR that's been on my TODO list
* @since 3.8.0 | ||
* @var WP_Block_Parser_Block[] | ||
*/ | ||
public $output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this promise that $output
is an array of WP_Block_Parser_Block
, since freeform fragments are added as [associative] arrays and not class instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can definitely consider wiping the output clean of its classes - I didn't at first because it seemed benign to retain them, but if we sacrifice a little performance we can json_decode( json_encode( $output ) )
and clear it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgefilipecosta mentioned implementing an ArrayObject
interface in our classes so that one can traverse our parser output natively, rather than doing the JSON dance. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. It means more divide between the PHP and JS versions of the parser. What's the JSON dance? Wouldn't having ArrayObject
be somewhat superfluous?
// this already works with arrays and objects!
$blocks = parse( $document );
$blocks = array_map( $blocks, $my_transformer );
we probably want to fix the bug as a separate thing from adding interfaces. I'm skeptical of the value of the latter if the former is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By JSON dance I meant json_decode( json_encode( $output ) )
, sorry for not being clear.
we probably want to fix the bug as a separate thing from adding interfaces
So this is the actual issue: #10047. It's not the traversal (looking at your array_map
example) but rather accessing properties of a block, which can either mean accessing properties of an array or of an object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classes offer some advantages we can publish abstract classes that contain the fields plugins can safely access, and other parsers can extend this general classes. Simple arrays don't offer this guarantees.
But now we have a problem some plugins are dependent on using simple arrays, even if this bug was already caught I'm not sure we can change the API to use classes.
So I think our options are revert back and use arrays, or advance and change our API to use classes. In the second case to be back-compatible with existing implementation accessing using the array syntax, I think our only solution is ArrayObject. It allows us to temporarily return something that behaves like a class for new implementations and an array for old implementations, in this case, we would add the deprecation messages saying we now return objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the traversal (looking at your array_map example) but rather accessing properties of a block, which can either mean accessing properties of an array or of an object.
to me this is just evidence that the work to make all attribute reporting consistent is necessary. some attributes are null
, some are objects
By JSON dance I meant json_decode( json_encode( $output ) ), sorry for not being clear.
that would be in the parser and wouldn't have to be manually performed. in fact, the classes are only even there for performance, so we can test the change of sorting everything in plain old objects vs. converting at the end. if it's a degradation then we can simply remove the classes if we want to preserve the simpler interface.
* Parser (Fix): Output freeform content before void blocks Resolves #9968 It was noted that a classic block preceding a void block would disappear in the editor while if that same classic block preceded the long-form non-void representation of an empty block then things would load as expected. This behavior was determined to originate in the new default parser in #8083 and the bug was that with void blocks we weren't sending any preceding HTML soup/freeform content into the output list. In this patch I've duplicated some code from the block-closing function of the parser to spit out this content when a void block is at the top-level of the document. This bug did not appear when void blocks are nested because it's the parent block that eats HTML soup. In the case of the top-level void however we were immediately pushing that void block to the output list and neglecting the freeform HTML. I've added a few tests to verify and demonstrate this behavior. Actually, since I wasn't sure what was wrong I wrote the tests first to try and understand the behaviors and bugs. There are a few tests that are thus not entirely essential but worthwhile to have in here.
Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).
There are numerous needs to process posts and block content from its structured form without demanding that plugin authors implement their own parsing systems. Since the new default parser was implemented in #8083 the server-side parse is now fast enough to consider doing full parses of our documents and with that brings the idea that we can filter block content from the parser itself. In this patch I'm exploring an API to allow extending the parser's behavior by post-processing blocks as they enter the parser's output array. This new filter gives the ability to transform all of the block's properties as they finish parsing. In the case of inner blocks the filter runs as the inner blocks have finished their own nesting. In the case of top-level blocks the filter runs after all inner content has finished parsing. One use case is in #8760 where we want to replace the HTML parts of blocks while preserving other structure. Another use case could be removing specific inner blocks or content based on the current user requesting a post. This filter exposes a kind of visitor pattern for the nested parse. > **THIS IS AN INCOMPLETE PATCH DO NOT MERGE**
Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).
Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).
* Parser: Normalize data types and fix default implementation Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).
Previously we have been using a simplified parse to grab dynamic blocks and replace them with their rendered content. Since #8083 we've had a fast default parser which removes the need for a simplified parse here. In this patch we're replacing the existing simplified parser in `do_blocks` with the new default parser. This will open up new opportunities for working with nested blocks on the server.
Since the introduction of the default parser in #8083 we have had a subtle bug in the parsing which failed when empty attributes were specified in a block's comment delimiter - `{}` The absense of attributes was fine but _empty_ attributes were a failure. This is due to using `+?` in the RegExp tokenizer instead of using `*?` (which allows for no inner content in the JSON string). This patch updates the quantifier to restore functionality and fix the bug. This didn't appear in practice because we don't intentionally set `{}` as the attributes - the serializer drops it altogther, and our tests didn't catch it for similar reasons.
For some time we've needed a more performant PHP parser for the first
stage of parsing the
post_content
document.See #1681 (early exploration)
See #8044 (parser performance issue)
See #1775 (parser performance, fixed in php-pegjs)
I'm proposing this implementation of the spec parser as an alternative
to the auto-generated parser from the PEG definition.
Updates
/packages
directory - I still need some help understanding where it all belongs and how to make the package workThis provides a setup fixture for #6831 wherein we are testing alternate
parser implementations - https://comparator-yizlfvqafz.now.sh
Distinctives
Todo
innerHTML
this needs to go away
Benchmark
For posterity's sake I ran the merged parser through the parser comparator and compared it against the auto-generated spec parser. Here are the results from my laptop
The tests were done on my late 2013 rMBP quad core 2.6 GHz laptop. According to the Intel Power Gadget the CPU was running at 3.6 GHz the entire time. Each document was parsed with each parser at least 47 times and the runs were at random and each run was randomly chosen to parse the document between one and five times in a row before returning the results. Runtime and memory use were measured inside a runner script running in Docker as described in the parser comparator.