WIP: HTML API: Add `set_inner_html()` to HTML Processor #7326

dmsnell · 2024-09-11T01:09:57Z

This method depends on other methods, but this change introduces an idea for how to accomplish setting the inner HTML, leaving supporting details unresolved.

Trac ticket:

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

This method depends on other methods, but this change introduces an idea for how to accomplish setting the inner HTML, leaving supporting details unresolved.

github-actions · 2024-09-11T01:22:05Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

gziolo · 2024-09-11T04:52:51Z

src/wp-includes/html-api/class-wp-html-processor.php

+	 *
+	 * @since 6.8.0
+	 *
+	 * @param string $new_inner_html New HTML to inject as inner HTML for the currently-matched tag.


Any plans to support in the future the HTML fragment as the param in addition to string? I can see how that might slightly improve developer experience when needing to construct more complex HTML and avoid serializing to HTML before passing to this method to convert back to the HTML fragment instance when insering into document.

It wasn't in my plans, mostly because I'm ~~weary~~wary of passing around $processor objects and think it's important to encourage keeping them local. as stateful objects, surprising things can happen when passing them to functions and getting them back. there's long been a discussion concerning locking them down or making them read only or limiting the work that can be done, but that doesn't exist yet.

On the other hand, when working earlier with templating I had a desire to wrap HTML content in some class indicating that it was safe. Something like that would allow deferred parsing and might be a worthwhile investigation here. On the other other hand I've also been a bit doubtful about the likelihood of people using that. Maybe it would eventually be good to have both as you suggest, where a string would be re-parsed, but a wrapped fragment could in theory be deferred.

In a wrapped model, to achieve performance wins, I think we have to be careful to defer until the last moment when rendering the outermost processor, which may not be easy and may require stacking complicated updates. For example, this code:

$processor = WP_HTML_Processor::create_fragment( '<li>' ); $processor->next_tag( 'li' ); $processor->set_inner_html( WP_HTML_Processor::create_fragment( 'Apples and Oranges' ) ); $outer_processor = WP_HTML_Processor::create_fragment( '<ul>' ); $outer_processor->next_tag( 'ul' ); $outer_processor->set_inner_html( $processor );

This is a challenge because the original processor isn't going to know that it shouldn't apply its update and advance. This also demonstrates some of the questions arising from non-local access of a processor: when passing it into another function it might scan to the end and "finish." What if calling code expects to continue using it after passing it in?

For the time being I see the eager string parsing as the more approachable option with computational weight as the tradeoff. This would leave room in the future for an optimization. Other potential optimizations might also arise, such as creating subtrees of the full tree for edits, but these do get complicated quickly.

It looks like the spawn_fragment_parser() method hasn't been implemented yet. I assume it's going to be very similar to the create_fragment() method, but the $context would be inferred from the root document. So, the improvement for the developer could be minimal for the time being in spawn_fragment_parser() would essentially change the context if necessary on the passed instance of the created fragment. It gets serialized in the next line, so it would get evaluated anyway.

$fragment = $this->spawn_fragment_parser( $new_inner_html ); $new_markup = $fragment->serialize();

In the scenario when using only string as param, the same example would look like this:

$processor = WP_HTML_Processor::create_fragment( '<li>' ); $processor->next_tag( 'li' ); $processor->set_inner_html( (string) WP_HTML_Processor::create_fragment( 'Apples and Oranges' ) ); $outer_processor = WP_HTML_Processor::create_fragment( '<ul>' ); $outer_processor->next_tag( 'ul' ); $outer_processor->set_inner_html( (string) $processor );

So there are two steps:

the dev needs to explicitly serialize to string

the processor need to recreate the fragment

Aside. It made me realize one important aspect for the example provided with the outlined implementation. By eagerly processing HTML when inserting inner HTML, it will happen in a bottom up fashion. In effect, the processor will have to serialize the 'Apples and Oranges' fragment multiple times as the fragments build on top of each other. At every stage, it will know more about the entire document so in certain scenarios it might even impact the produced HTML. Just to illustrate it better:

first fragment My text -> entire paragraph processed

second fragment <blockquote></blockquote> -> has to process <blockquote>My text</blockquote>

third fragment <ul><li></li></ul> -> has to process <ul><li><blockquote>My text</blockquote></li></ul>

So this isn't something I initially thought about, but after thinking a bit more about it, it's another part where things are getting interesting.

This is a challenge because the original processor isn't going to know that it shouldn't apply its update and advance. This also demonstrates some of the questions arising from non-local access of a processor: when passing it into another function it might scan to the end and "finish." What if calling code expects to continue using it after passing it in?

I didn't think about that earlier. Yes, that is indeed interesting. When passing as a string, you have to call get_updated_html() so that means the fragment will end up at the end of the document. So that part becomes predictable.

I've sketched an implementation for the spawn_fragment_parser method in dmsnell#22. The algorithm is well described in the spec.

See #7144 some prior work on different fragment contexts.

sirreal · 2024-09-12T14:07:23Z

src/wp-includes/html-api/class-wp-html-processor.php

+	/**
+	 * Normalize an HTML string by serializing it.
+	 *
+	 * This removes any partial syntax at the end of the string.
+	 *
+	 * @since 6.7.0
+	 *
+	 * @param string $html Input HTML to normalize.
+	 *
+	 * @return string|null Normalized output, or `null` if unable to normalize.
+	 */
+	public static function normalize( string $html ): ?string {
+		return static::create_fragment( $html )->serialize();
+	}


Implemented in #7331.

WIP: HTML API: Add set_inner_html() to HTML Processor

d456844

This method depends on other methods, but this change introduces an idea for how to accomplish setting the inner HTML, leaving supporting details unresolved.

gziolo reviewed Sep 11, 2024

View reviewed changes

dmsnell mentioned this pull request Sep 11, 2024

HTML API: Plans for 6.8 WordPress/gutenberg#63037

Open

12 tasks

dmsnell added 4 commits September 11, 2024 06:08

Add normalization and serialization methods.

7e97aed

Docs and missing attribute equals sign

945c817

Fix wrong IMAGE > IMG assignment in foreign content.

93b48e7

Merge branch 'trunk' into html-api/set-inner-html

0bab23f

sirreal reviewed Sep 12, 2024

View reviewed changes

This was referenced Sep 12, 2024

HTML API: Implement spawn_fragment_parser method #7341

Closed

HTML API: Add method to create fragment at node #7348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: HTML API: Add `set_inner_html()` to HTML Processor #7326

WIP: HTML API: Add `set_inner_html()` to HTML Processor #7326

dmsnell commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

gziolo Sep 11, 2024

dmsnell Sep 11, 2024 •

edited

Loading

gziolo Sep 12, 2024 •

edited

Loading

sirreal Sep 12, 2024 •

edited

Loading

sirreal Sep 12, 2024

sirreal Sep 12, 2024

WIP: HTML API: Add set_inner_html() to HTML Processor #7326

Are you sure you want to change the base?

WIP: HTML API: Add set_inner_html() to HTML Processor #7326

Conversation

dmsnell commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

Test using WordPress Playground

Some things to be aware of

gziolo Sep 11, 2024

Choose a reason for hiding this comment

dmsnell Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

gziolo Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

sirreal Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

sirreal Sep 12, 2024

Choose a reason for hiding this comment

sirreal Sep 12, 2024

Choose a reason for hiding this comment

WIP: HTML API: Add `set_inner_html()` to HTML Processor #7326

WIP: HTML API: Add `set_inner_html()` to HTML Processor #7326

dmsnell Sep 11, 2024 •

edited

Loading

gziolo Sep 12, 2024 •

edited

Loading

sirreal Sep 12, 2024 •

edited

Loading