Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the loading of images using stored URL metrics #884

Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4758013
Fix prefix for output buffer template filter
westonruter Nov 15, 2023
5fb6fda
Add small and medium breakpoints from Gutenberg to go along with mobile
westonruter Nov 16, 2023
bd1153d
Add ilo_get_lcp_elements_by_minimum_viewport_widths()
westonruter Nov 16, 2023
a990982
Use array_filter() to simplify breakpoint merge
westonruter Nov 16, 2023
b18a0c7
Remove fetchpriority from images when different breakpoints have diff…
westonruter Nov 17, 2023
25ea308
Update ilo_get_lcp_elements_by_minimum_viewport_widths() to account f…
westonruter Nov 17, 2023
c88f358
WIP: Breadcrumb calculation on server
westonruter Nov 18, 2023
c20f967
Set appropriate fetchpriority and otherwise add preload links
westonruter Nov 18, 2023
08635fa
Use preg_replace() with limit 1 for injection or preload links at end…
westonruter Nov 28, 2023
ec55c9a
Improve variable naming
westonruter Nov 28, 2023
6e0f809
Add todos for doing breadcrumbs exclusively on server
westonruter Nov 28, 2023
e429983
Account for HEAD closing tag possibly being upper-case
westonruter Nov 28, 2023
8974ac8
Disable on Customizer preview and non-GET responses
westonruter Nov 28, 2023
acde9b4
Introduce ILO_HTML_Tag_Processor
westonruter Nov 28, 2023
024719d
Allow callable not just Closure
westonruter Nov 28, 2023
38cb821
Prevent error when no ilo_url_metrics post
westonruter Nov 28, 2023
234e2fe
Clarify logic in ilo_optimize_template_output_buffer
westonruter Nov 28, 2023
1585a78
Set attribute when server-side heuristics were correct
westonruter Nov 28, 2023
292d1c4
Prevent setting fetchpriority on IMG when not all breakpoints have da…
westonruter Nov 28, 2023
fca7f8d
Remove needless if statement
westonruter Nov 28, 2023
3cda887
Disable background image detection until implemented on server
westonruter Nov 28, 2023
f00addc
Use tag instead of tagName in breadcrumbs
westonruter Nov 28, 2023
ce1db2f
Prevent adding media query to preload link when just min-width:0
westonruter Nov 28, 2023
d74420d
Add preload links always and consolidate code paths
westonruter Nov 29, 2023
f0ee6b2
Add since and private access tags
westonruter Nov 29, 2023
1e495db
Update return tag phpdoc for ilo_construct_preload_links()
westonruter Nov 29, 2023
8065801
Add missing since and private access tags to ilo_get_lcp_elements_by_…
westonruter Nov 29, 2023
131a9a0
Move GH comment into code comment
westonruter Nov 29, 2023
ee83ba0
Remove needless remove_fetchpriority_attribute method
westonruter Nov 29, 2023
3d99128
Prevent removing fetchpriority when all breakpoints do not have URL m…
westonruter Nov 29, 2023
d42601c
Use wp_trigger_error() in ILO_HTML_Tag_Processor and improve phpdoc
westonruter Nov 29, 2023
4db5161
Add var phpdoc tag to class constant
westonruter Nov 29, 2023
2c3cf3a
Use a generator instead of a callback
westonruter Nov 29, 2023
3d7b5fa
Fix comment typo
westonruter Nov 30, 2023
957d2c4
Improve naming of void tags and add missing tags which close P
westonruter Nov 30, 2023
e8a36b4
Remove todo related to user being logged-in
westonruter Dec 1, 2023
0cca4dd
Add note about how ILO_HTML_Tag_Processor is needed until WP_HTML_Pro…
westonruter Dec 1, 2023
83d0362
Move class file include to load.php
westonruter Dec 1, 2023
605fb02
Add TODO to remove loading attribute
westonruter Dec 1, 2023
74145a4
Optimize looking up LCP element by breadcrumb
westonruter Dec 1, 2023
2758c0d
Make ilo_construct_preload_links() easier to read
westonruter Dec 1, 2023
9d82615
Never include loading=lazy on the LCP image common across all breakpo…
westonruter Dec 2, 2023
9bfcdc9
Account for the same element being LCP on different breakpoints
westonruter Dec 4, 2023
185ebf6
Clarify LCP element language in comment
westonruter Dec 5, 2023
1d259b5
Add comment explaining why for loop is used
westonruter Dec 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
<?php
/**
* Image Loading Optimization: ILO_HTML_Tag_Processor class
*
* @package performance-lab
* @since n.e.x.t
*/

/**
* Processor leveraging WP_HTML_Tag_Processor which gathers breadcrumbs which can be queried while iterating the open_tags() generator.
*
* Eventually this class should be made largely obsolete once `WP_HTML_Processor` is fully implemented to support all HTML tags.
*
* @since n.e.x.t
* @access private
*/
final class ILO_HTML_Tag_Processor {

/**
* HTML void tags (i.e. those which are self-closing).
*
* @link https://html.spec.whatwg.org/multipage/syntax.html#void-elements
* @see WP_HTML_Processor::is_void()
* @todo Reuse `WP_HTML_Processor::is_void()` once WordPress 6.4 is the minimum-supported version.
*
* @var string[]
*/
const VOID_TAGS = array(
'AREA',
'BASE',
'BASEFONT', // Obsolete.
'BGSOUND', // Obsolete.
'BR',
'COL',
'EMBED',
'FRAME', // Deprecated.
'HR',
'IMG',
'INPUT',
'KEYGEN', // Obsolete.
'LINK',
'META',
'PARAM', // Deprecated.
'SOURCE',
'TRACK',
'WBR',
);

/**
* The set of HTML tags whose presence will implicitly close a <p> element.
* For example '<p>foo<h1>bar</h1>' should parse the same as '<p>foo</p><h1>bar</h1>'.
*
* @link https://html.spec.whatwg.org/multipage/grouping-content.html#the-p-element
*
* @var string[]
*/
const P_CLOSING_TAGS = array(
'ADDRESS',
'ARTICLE',
'ASIDE',
Comment on lines +28 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these lists here? What specifically in regards to void and p closing tags is needed here, and why? Is there a bug in the WP_HTML_Tag_Processor, or crucial limitation?

It would be great if you could clarify the need for all this in relation to this module. Right now I'm wondering whether this module should need all those functionalities - like why are those special tags relevant for image optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are needed to be able to iterate over tags in the document and be able to correctly keep track of the open tag stack. The WP_HTML_Tag_Processor is intentionally very limited in what processing it supports. Namely, it does not have any sense of how tags are related to each other (e.g. that html is a parent of body). It is a streaming parser that iterates over open tags and close tags. In order to keep track of when one tag is a child of another tag, we have to know the semantics of these tags, whether they are self-closing (void) or how they relate to tags that have optional closers (e.g. here p).

The superset of functionality from WP_HTML_Tag_Processor is in WP_HTML_Processor which does have capabilities regarding document structure. However, it can't be used yet because it doesn't support all HTML tags yet. Eventually this class shouldn't be needed as WP_HTML_Processor. I've added a comment to reflect this in 0cca4dd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, these lists of tags are all related to image optimization in that we need to know them in order to accurately construct breadcrumbs for any given tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I wonder how this version of the tag processor is for performance.

For the plugin we don't need to worry about this too much, but for a potential core proposal we would need to spend more time on this - particularly as I'm not sure we would want two separate versions of the tag processor in core. And we will only be able to use this one throughout if it doesn't have notable negative performance implications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have about the same performance as WP_HTML_Tag_Processor as it is just iterating over tags and is doing much less than WP_HTML_Processor, which is less predictable for performance.

'BLOCKQUOTE',
'DETAILS',
'DIV',
'DL',
'FIELDSET',
'FIGCAPTION',
'FIGURE',
'FOOTER',
'FORM',
'H1',
'H2',
'H3',
'H4',
'H5',
'H6',
'HEADER',
'HGROUP',
'HR',
'MAIN',
'MENU',
'NAV',
'OL',
'P',
'PRE',
'SEARCH',
'SECTION',
'TABLE',
'UL',
westonruter marked this conversation as resolved.
Show resolved Hide resolved
);

/**
* Open stack tags.
*
* @var string[]
*/
private $open_stack_tags = array();

/**
* Open stag indices.
*
* @var int[]
*/
private $open_stack_indices = array();

/**
* Processor.
*
* @var WP_HTML_Tag_Processor
*/
private $processor;

/**
* Constructor.
*
* @param string $html HTML to process.
*/
public function __construct( string $html ) {
$this->processor = new WP_HTML_Tag_Processor( $html );
}

/**
* Gets all open tags in the document.
*
* A generator is used so that when iterating at a specific tag, additional information about the tag at that point
* can be queried from the class. Similarly, mutations may be performed when iterating at an open tag.
*
* @return Generator<string> Tag name of current open tag.
*/
public function open_tags(): Generator {
$p = $this->processor;

/*
* The keys for the following two arrays correspond to each other. Given the following document:
*
* <html>
* <head>
* </head>
* <body>
* <p>Hello!</p>
* <img src="lcp.png">
* </body>
* </html>
*
* Upon processing the IMG element, the two arrays should be equal to the following:
*
* $open_stack_tags = array( 'HTML', 'BODY', 'IMG' );
* $open_stack_indices = array( 0, 1, 1 );
*/
$this->open_stack_tags = array();
$this->open_stack_indices = array();
while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) {
$tag_name = $p->get_tag();
if ( ! $p->is_tag_closer() ) {

// Close an open P tag when a P-closing tag is encountered.
// TODO: There are quite a few more cases of optional closing tags: https://html.spec.whatwg.org/multipage/syntax.html#optional-tags
// Nevertheless, given WordPress's legacy of XHTML compatibility, the lack of closing tags may not be common enough to warrant worrying about any of them.
if ( in_array( $tag_name, self::P_CLOSING_TAGS, true ) ) {
$i = array_search( 'P', $this->open_stack_tags, true );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably fine enough, but be aware that there are certain markers/boundaries in the stack that effectively isolate regions of HTML. for example, if we're currently inside a TEMPLATE element and encounter one of these P-closing elements, they don't close the P from the outside of the template.

<p>
	This template is isolated.
	<template><section><p>Hidden</p></section></template>
	The outer P remains.
</p>

if ( false !== $i ) {
array_splice( $this->open_stack_tags, $i );
array_splice( $this->open_stack_indices, count( $this->open_stack_tags ) );
}
}

$level = count( $this->open_stack_tags );
$this->open_stack_tags[] = $tag_name;

if ( ! isset( $this->open_stack_indices[ $level ] ) ) {
$this->open_stack_indices[ $level ] = 0;
} else {
++$this->open_stack_indices[ $level ];
}

// Only increment the tag index at this level only if it isn't the admin bar, since the presence of the
// admin bar can throw off the indices.
if ( 'DIV' === $tag_name && $p->get_attribute( 'id' ) === 'wpadminbar' ) {
--$this->open_stack_indices[ $level ];
}

// Now that the breadcrumbs are constructed, yield the tag name so that they can be queried if desired.
// Other mutations may be performed to the open tag's attributes by the callee at this point as well.
yield $tag_name;

// Immediately pop off self-closing tags.
if ( in_array( $tag_name, self::VOID_TAGS, true ) ) {
array_pop( $this->open_stack_tags );
}
} else {
// If the closing tag is for self-closing tag, we ignore it since it was already handled above.
if ( in_array( $tag_name, self::VOID_TAGS, true ) ) {
continue;
}

// Since SVG and MathML can have a lot more self-closing/empty tags, potentially pop off the stack until getting to the open tag.
$did_splice = false;
if ( 'SVG' === $tag_name || 'MATH' === $tag_name ) {
$i = array_search( $tag_name, $this->open_stack_tags, true );
if ( false !== $i ) {
array_splice( $this->open_stack_tags, $i );
$did_splice = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if encountering an element which isn't an HTML Void element, then regardless of the tag, if the self-closing flag is present, then it closes the tag. all self-closing flags on HTML elements are invalid and ignored, but all on HTML foreign elements are authoritative and obeyed.

<p>There are no empty <div/> tags. This is inside the DIV</p>.
<wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group>
if (
	WP_HTML_Processor::is_void( $tag_name ) || 
	(
		! WP_HTML_Processor::is_html_element( $tag_name ) &&
		$p->has_self_closing_flag
	)
) {
	// this tag immediately closes as soon as we jump to the next tag.
}

this requires having a list of all HTML elements, which the HTML Processor currently doesn't do because it doesn't support any, but we'll have to add it. this can lead to common parsing failures because the invalid self-closing flag has become more popular post-React where it's valid and normative in JSX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. I wasn't aware of the WP_HTML_Processor::has_self_closing_flag() method.

Nevertheless, maybe having a list of all HTML elements isn't needed here because: (1) if math or svg are ancestors, we can assume that all tags with self-closing flags will close the tag, and (2) custom elements always have hyphens in them, so if present we can also honor the self-closing tag.

Nevertheless, I just checked your example and it doesn't seem the second example with a custom element is actually true. I adapted your example to use a span instead of div (since a div closes an open p):

<p>There are no empty <span/> tags. This is inside the SPAN</p>.
<wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group>

And Chrome renders this as:

image

So perhaps such self-closing foreign elements are limited to MathML and SVG contexts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

<svg>Is a custom element, and so <g /> is self-closing while others aren't.</svg>

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by golly, you're right. I have been misinterpreting "foreign element" for a long time now! thankfully I haven't gotten to the place in the HTML Processor where that matters.

thank you very much for pointing this out.

}
}

if ( ! $did_splice ) {
$popped_tag_name = array_pop( $this->open_stack_tags );
if ( $popped_tag_name !== $tag_name && function_exists( 'wp_trigger_error' ) ) {
wp_trigger_error(
__METHOD__,
esc_html(
sprintf(
/* translators: 1: Popped tag name, 2: Closing tag name */
__( 'Expected popped tag stack element %1$s to match the currently visited closing tag %2$s.', 'performance-lab' ),
$popped_tag_name,
$tag_name
)
)
);
}
}
array_splice( $this->open_stack_indices, count( $this->open_stack_tags ) + 1 );
}
}
}

/**
* Gets breadcrumbs for the current open tag.
*
* Breadcrumbs are constructed to match the format from detect.js.
*
* TODO: Consider rather each breadcrumb being a (tag,index) tuple.
*
* @return array<array{tag: string, index: int}> Breadcrumbs.
*/
public function get_breadcrumbs(): array {
$breadcrumbs = array();
foreach ( $this->open_stack_tags as $i => $breadcrumb_tag_name ) {
$breadcrumbs[] = array(
'tag' => $breadcrumb_tag_name,
'index' => $this->open_stack_indices[ $i ],
);
}
return $breadcrumbs;
}

/**
* Returns the value of a requested attribute from a matched tag opener if that attribute exists.
*
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated.
*
* @see WP_HTML_Tag_Processor::get_attribute()
*
* @param string $name Name of attribute whose value is requested.
* @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`.
*/
public function get_attribute( string $name ) {
return $this->processor->get_attribute( $name );
}

/**
* Updates or creates a new attribute on the currently matched tag with the passed value.
*
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated.
*
* @see WP_HTML_Tag_Processor::set_attribute()
*
* @param string $name The attribute name to target.
* @param string|bool $value The new attribute value.
* @return bool Whether an attribute value was set.
*/
public function set_attribute( string $name, $value ): bool {
return $this->processor->set_attribute( $name, $value );
}

/**
* Removes an attribute from the currently-matched tag.
*
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated.
*
* @see WP_HTML_Tag_Processor::remove_attribute()
*
* @param string $name The attribute name to remove.
* @return bool Whether an attribute was removed.
*/
public function remove_attribute( string $name ): bool {
return $this->processor->remove_attribute( $name );
}

/**
* Returns the string representation of the HTML Tag Processor.
*
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated.
*
* @see WP_HTML_Tag_Processor::get_updated_html()
*
* @return string The processed HTML.
*/
public function get_updated_html(): string {
return $this->processor->get_updated_html();
}
}
Loading