Skip to content

Commit

Permalink
Blocks: Add functions to return PCRE pattern (regex) for finding blocks.
Browse files Browse the repository at this point in the history
In this patch two new functions are introduced for the purpose of
returning a PCRE pattern that can be used to quickly and efficiently
find blocks within an HTML document without having to parse the entire
document and without building a full block tree.

These new functions enable more efficient processing for work that only
needs to examine document structure or know a few things about a
document without knowing everything, including but not limited to:

 - Finding the URL of the first image block in a document.
 - Inserting hooked blocks.
 - Analyzing block counts.
  • Loading branch information
dmsnell committed Jun 8, 2024
1 parent d3c1b41 commit 729e5b3
Show file tree
Hide file tree
Showing 2 changed files with 118 additions and 7 deletions.
117 changes: 117 additions & 0 deletions src/wp-includes/blocks.php
Original file line number Diff line number Diff line change
Expand Up @@ -1288,6 +1288,123 @@ function make_after_block_visitor( $hooked_blocks, $context, $callback = 'insert
};
}

/**
* Returns a regular expression which can be used to find
* block comment delimiters in a given HTML document.
*
* Returned matches contain named capture groups:
* - 'closer' is '/' if the delimiter is a block closer.
* - 'namespace' is non-empty if a block namespace was provided,
* otherwise the block name is assumed to be in the "core/" namespace.
* - 'name' is the block name, always non-empty.
* - 'attrs' contains the content which may be JSON, if non-empty.
* - 'void' is '/' if the delimiter indicates a void block.
*
* Example:
*
* if ( 1 === preg_match( get_block_delimiter_regex(), $block_content, $delimiter_match ) ) {
* $is_closer = '/' === $delimiter_match['closer'];
* $is_void = '/' === $delimiter_match['void'];
* $block_name = ( $delimiter_match['namespace'] ?? 'core/' ) . $delimiter_match['name'];
* $attrs = array();
* if ( ! $is_closer ) {
* $json = json_decode( $delimiter_match['attrs'] );
* if ( JSON_ERROR_NONE === json_last_error() ) {
* $attrs = $json;
* }
* }
* }
*
* @since {WP_VERSION}
*
* @return string PCRE pattern which can be used to find and parse block delimiter HTML comments.
*/
function get_block_delimiter_regex(): string {
return <<<'REGEXP'
~
<!--
\s+
(?P<closer>/)? # This pattern also detects closing block delimiters.
wp:(?P<namespace>[a-z][a-z0-9_-]*/)?(?P<name>[a-z][a-z0-9_-]*) # e.g. "core/paragraph", "paragraph", or "math-blocks/formula".
\s+
(?P<attrs>{(?:(?:[^}]+|}+(?=})|(?!}\s+/?-->).)*+)?}\s+)? # It's required to parse the JSON separately, if it exists.
(?P<void>/)? # Void blocks have no content and no closer.
-->
~sx
REGEXP;
}

/**
* Returns a regular expression which can be used to find block comment
* delimiters for a given block type in a given HTML document.
*
* Returned matches contain named capture groups:
* - 'closer' is '/' if the delimiter is a block closer.
* - 'namespace' is non-empty if a block namespace was provided,
* otherwise the block name is assumed to be in the "core/" namespace.
* - 'name' is the block name, always non-empty.
* - 'attrs' contains the content which may be JSON, if non-empty.
* - 'void' is '/' if the delimiter indicates a void block.
*
* Example:
*
* if ( 1 === preg_match( get_named_block_delimiter_regex( 'core/image' ), $block_content, $delimiter_match ) ) {
* $is_closer = '/' === $delimiter_match['closer'];
* $is_void = '/' === $delimiter_match['void'];
* $block_name = ( $delimiter_match['namespace'] ?? 'core/' ) . $delimiter_match['name'];
* $attrs = array();
* if ( ! $is_closer ) {
* $json = json_decode( $delimiter_match['attrs'] );
* if ( JSON_ERROR_NONE === json_last_error() ) {
* $attrs = $json;
* }
* }
* }
*
* @since {WP_VERSION}
*
* @param string $block_name Namespace and name of block, e.g. "math-blocks/formula".
* Defaults to "core" namespace if none is provided.
* @return string PCRE pattern which can be used to find and parse block delimiter HTML comments.
*/
function get_named_block_delimiter_regex( string $block_name ): string {
$slash_at = strpos( $block_name, '/' );
$namespace = false === $slash_at ? 'core' : substr( $block_name, 0, $slash_at );
$name = false === $slash_at ? substr( $block_name, $slash_at + 1 ) : $block_name;
$is_core = 'core' === $namespace;

$namespace = preg_quote( $namespace, '~' );
$name = preg_quote( $name, '~' );

if ( $is_core ) {
return <<<REGEXP
~
<!--
\s+
(?P<closer>/)? # This pattern also detects closing block delimiters.
wp:(?P<namespace>core/)?(?P<name>{$name}) # e.g. "core/paragraph", "paragraph".
\s+
(?P<attrs>{(?:(?:[^}]+|}+(?=})|(?!}\s+/?-->).)*+)?}\s+)? # It's required to parse the JSON separately, if it exists.
(?P<void>/)? # Void blocks have no content and no closer.
-->
~sx
REGEXP;
}

return <<<REGEXP
~
<!--
\s+
(?P<closer>/)? # This pattern also detects closing block delimiters.
wp:(?P<namespace>{$namespace}/)(?P<name>{$name}) # e.g. "math-blocks/formula".
\s+
(?P<attrs>{(?:(?:[^}]+|}+(?=})|(?!}\s+/?-->).)*+)?}\s+)? # It's required to parse the JSON separately, if it exists.
(?P<void>/)? # Void blocks have no content and no closer.
-->
~sx
REGEXP;
}

/**
* Given an array of attributes, returns a string in the serialized attributes
* format prepared for post content.
Expand Down
8 changes: 1 addition & 7 deletions src/wp-includes/class-wp-block-parser.php
Original file line number Diff line number Diff line change
Expand Up @@ -244,13 +244,7 @@ public function next_token() {
* a closer has no attributes). we can trap them both and process the
* match back in PHP to see which one it was.
*/
$has_match = preg_match(
'/<!--\s+(?P<closer>\/)?wp:(?P<namespace>[a-z][a-z0-9_-]*\/)?(?P<name>[a-z][a-z0-9_-]*)\s+(?P<attrs>{(?:(?:[^}]+|}+(?=})|(?!}\s+\/?-->).)*+)?}\s+)?(?P<void>\/)?-->/s',
$this->document,
$matches,
PREG_OFFSET_CAPTURE,
$this->offset
);
$has_match = preg_match( get_block_delimiter_regex(), $this->document, $matches, PREG_OFFSET_CAPTURE, $this->offset );

// if we get here we probably have catastrophic backtracking or out-of-memory in the PCRE.
if ( false === $has_match ) {
Expand Down

0 comments on commit 729e5b3

Please sign in to comment.