-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Return elements pushed and popped rather than tags read. #6348
Conversation
924f171
to
407a433
Compare
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
2b400b4
to
e9165f6
Compare
598fdd0
to
ad7b39d
Compare
1b53221
to
250e539
Compare
d05ddd4
to
6457a0a
Compare
@sirreal it looks like we're failing tests after seeking, which I believe might be because when jumping back to the start of the document, we wipe out the context node. |
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
@@ -424,28 +428,19 @@ public function next_tag( $query = null ) { | |||
continue; | |||
} | |||
|
|||
if ( ! parent::is_tag_closer() ) { | |||
if ( ! parent::is_tag_closer() || $visit_closers ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also use $this
?
if ( ! parent::is_tag_closer() || $visit_closers ) { | |
if ( ! $this->is_tag_closer() || $visit_closers ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Can you concoct a test case to fail for it? If not, don't worry - we can just fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a regression here where this HTML:
<?wp-bit hey?>
This is a PI_NODE_LOOKALIKE
. Before, get_tag()
here would be wp-bit
- corresponding to the PI target. After this change get_tag()
returns #comment
.
if ( isset( $this->current_element ) ) { | ||
return $this->current_element->token->node_name; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what breaks the tag_name
for PI lookalikes. The node_name is #comment
, but it's a special comment type that has more handling in the Tag Processor:
wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Lines 2640 to 2651 in fbb5020
$tag_name = substr( $this->html, $this->tag_name_starts_at, $this->tag_name_length ); | |
if ( self::STATE_MATCHED_TAG === $this->parser_state ) { | |
return strtoupper( $tag_name ); | |
} | |
if ( | |
self::STATE_COMMENT === $this->parser_state && | |
self::COMMENT_AS_PI_NODE_LOOKALIKE === $this->get_comment_type() | |
) { | |
return $tag_name; | |
} |
A potential fix is to only return here if we have a "non-special" node:
if ( isset( $this->current_element ) ) { | |
return $this->current_element->token->node_name; | |
} | |
if ( isset( $this->current_element ) && '#' !== $this->current_element->token->node_name[0] ) { | |
return $this->current_element->token->node_name; | |
} |
EDIT: This is pre-existing behavior. redacted<div></p> $p = WP_HTML_Processor::create_fragment('<div></p>');
$p->next_token();
$p->next_token();
var_dump( $p->get_tag(), $p->get_current_depth(), $p->get_breadcrumbs() );
|
HTML is a kind of short-hand for a DOM structure. This means that there are many cases in HTML where an element's opening tag or closing tag is missing (or both). This is because many of the parsing rules imply creating elements in the DOM which may not exist in the text of the HTML. The HTML Processor, being the higher-level counterpart to the Tag Processor, is already aware of these nodes, but since it's inception has not paused on them when scanning through a document. Instead, these are visible when pausing on a child of such an element, but otherwise not seen. In this patch the HTML Processor starts exposing those implicitly-created nodes, including opening tags, and closing tags, that aren't foudn in the text content of the HTML input document. Previously, the sequence of matched tokens when scanning with `WP_HTML_Processor::next_token()` would depend on how the HTML document was written, but with this patch, all semantically equal HTML documents will parse and scan in the same exact manner, presenting an idealized or "perfect" view of the document the same way as would occur when traversing a DOM in a browser. Developed in #6348 Discussed in https://core.trac.wordpress.org/ticket/61348 Props audrasjb, dmsnell, gziolo, jonsurrell. Fixes #61348. git-svn-id: https://develop.svn.wordpress.org/trunk@58304 602fd350-edb4-49c9-b593-d223f7449a82
HTML is a kind of short-hand for a DOM structure. This means that there are many cases in HTML where an element's opening tag or closing tag is missing (or both). This is because many of the parsing rules imply creating elements in the DOM which may not exist in the text of the HTML. The HTML Processor, being the higher-level counterpart to the Tag Processor, is already aware of these nodes, but since it's inception has not paused on them when scanning through a document. Instead, these are visible when pausing on a child of such an element, but otherwise not seen. In this patch the HTML Processor starts exposing those implicitly-created nodes, including opening tags, and closing tags, that aren't foudn in the text content of the HTML input document. Previously, the sequence of matched tokens when scanning with `WP_HTML_Processor::next_token()` would depend on how the HTML document was written, but with this patch, all semantically equal HTML documents will parse and scan in the same exact manner, presenting an idealized or "perfect" view of the document the same way as would occur when traversing a DOM in a browser. Developed in WordPress/wordpress-develop#6348 Discussed in https://core.trac.wordpress.org/ticket/61348 Props audrasjb, dmsnell, gziolo, jonsurrell. Fixes #61348. Built from https://develop.svn.wordpress.org/trunk@58304 git-svn-id: http://core.svn.wordpress.org/trunk@57761 1a063a9b-81f0-0310-95a4-ce76da25c4cd
HTML is a kind of short-hand for a DOM structure. This means that there are many cases in HTML where an element's opening tag or closing tag is missing (or both). This is because many of the parsing rules imply creating elements in the DOM which may not exist in the text of the HTML. The HTML Processor, being the higher-level counterpart to the Tag Processor, is already aware of these nodes, but since it's inception has not paused on them when scanning through a document. Instead, these are visible when pausing on a child of such an element, but otherwise not seen. In this patch the HTML Processor starts exposing those implicitly-created nodes, including opening tags, and closing tags, that aren't foudn in the text content of the HTML input document. Previously, the sequence of matched tokens when scanning with `WP_HTML_Processor::next_token()` would depend on how the HTML document was written, but with this patch, all semantically equal HTML documents will parse and scan in the same exact manner, presenting an idealized or "perfect" view of the document the same way as would occur when traversing a DOM in a browser. Developed in WordPress/wordpress-develop#6348 Discussed in https://core.trac.wordpress.org/ticket/61348 Props audrasjb, dmsnell, gziolo, jonsurrell. Fixes #61348. Built from https://develop.svn.wordpress.org/trunk@58304 git-svn-id: https://core.svn.wordpress.org/trunk@57761 1a063a9b-81f0-0310-95a4-ce76da25c4cd
* | ||
* @param Closure $handler The handler function. | ||
*/ | ||
public function set_pop_handler( Closure $handler ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is specifically a Closure
and isn't just more generally a callable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closures cannot be serialized or deserialized, meaning that there's no possible way to prep a database record with user input that sets something unexpected here.
Trac ticket: Core-61348
Summary
Creates virtual nodes when pushing to and popping from the stack of open elements. It's these nodes that are returned by
next_tag()
, while subclassed methods intercept tag information, all within the HTML Processor.Splitting time!
get_current_depth()
to return the depth of the currently-matched element in the stack of open elements. This needs to account formarker
and other not-yet-implemented items in the stack.expects_closer()
or similar function to indicate if the currently-matched element needs or expects a closing element.Questions
get_depth()
from the HTML API itself instead of exporting that HTML nuance onto the caller.Related work
force_balance_tags()
#5562Examples
cc: @sirreal