Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add storage for URL metrics to optimize image loading #878

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Nov 2, 2023

Summary

See #871.

Relevant technical choices

A new ilo_url_metrics post type is introduced which stores the client-submitted URL metrics as JSON in post_content, similar to the customize_changeset and oembed_cache post types. The post_name is a MD5 hash of the query vars from the $wp->query_vars global after normalization. (Using the WP query vars is safer than using the URL due to random additional query vars added by 3rd parties, such as analytics.)

Collected URL metrics are segmented into viewport breakpoints. By default it attempts to collect a sample size of three URL metrics for each breakpoint. The default breakpoint is 480, which means there will be a total of 6 URL metrics collected: 3 for mobile for viewport widths from 0 to 480, and 3 for desktop for viewport widths larger. The breakpoints can be modified with the ilo_breakpoint_max_widths filter, and the sample size can be filtered with ilo_url_metrics_breakpoint_sample_size. Each URL metric is stored with the timestamp at which it was captured. When a URL metric is older than a given freshness TTL (filtered by ilo_url_metric_freshness_ttl), a new URL metric will be attempted to be measured for that viewport breakpoint. New URL metrics for a given breakpoint push out the oldest URL metrics for that breakpoint.

When a client sends a URL metric, that client is locked from sending another URL metric for one minute (filtered by ilo_metrics_storage_lock_ttl). Client-side logic prevents another request from being made via sessionStorage, while a server-side check with the REST API is also done using the client IP address. Similarly, if no URL metrics are needed for the current viewport (given the breakpoints), no measurement will be performed and no data will be sent; likewise, the server will also reject any submission for viewport URL metrics it doesn't need.

REST API requests are allowed by unauthenticated visitors, but the parameters of the request (namely the WP query vars that the URL metrics are for) are authenticated with an HMAC via a WP nonce.

Other points:

  • The image_loading_optimization_ prefix has been shorted to ilo_.
  • Files have been renamed for better organization. Namely, logic in hooks.php has been moved to detection.php.
  • Unit test coverage is not included in this PR. I intend to include this after the POC for optimization is developed.

To test facilitate testing, turn on WP_DEBUG and playing around with filters to disabling locking, zeroing out the freshness TTL, and changing the breakpoint sample size, for example:

add_filter( 'ilo_url_metric_storage_lock_ttl', function () { return 0; } );
add_filter( 'ilo_url_metric_freshness_ttl', function () { return 0; } );
add_filter( 'ilo_url_metrics_breakpoint_sample_size', function () { return 5; } );

Checklist

  • PR has either [Focus] or Infrastructure label.
  • PR has a [Type] label.
  • PR has a milestone or the no milestone label.

array(
'methods' => 'POST',
'callback' => 'image_loading_optimization_handle_rest_request',
'permission_callback' => '__return_true', // Needs to be available to unauthenticated visitors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's going to be public, this probably needs some other kind of protection to prevent being flooded with entries.

Years ago I built something similar where multiple requests were blocked by storing the IP address in a transient. Maybe it's useful here:

https://github.com/wearerequired/rest-likes/blob/8c522be82ca468622d287c7a8539ea1dc72e45e9/classes/Controller.php#L251-L253
https://github.com/wearerequired/rest-likes/blob/8c522be82ca468622d287c7a8539ea1dc72e45e9/classes/Controller.php#L282-L353

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Yes, I was also intending to do something similar following the same pattern as WP comments.

'methods' => 'POST',
'callback' => 'image_loading_optimization_handle_rest_request',
'permission_callback' => '__return_true', // Needs to be available to unauthenticated visitors.
'args' => array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the URL also passed as an argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! 😊 However, this could also obtained via the Referer HTTP request header. But probably better to be explicit, even though perhaps Referer would be a bit safer to rely on. If nothing else, a custom validation callback could be used for the url arg to make sure it matches the Referer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added url in b4e29d6

Base automatically changed from add/image-loading-detection to feature/image-loading-optimization November 7, 2023 02:17
Copy link
Member

@adamsilverstein adamsilverstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter This looks great. A few additional thoughts, but I think this is almost ready for merge.

modules/images/image-loading-optimization/storage/data.php Outdated Show resolved Hide resolved
modules/images/image-loading-optimization/storage/data.php Outdated Show resolved Hide resolved
modules/images/image-loading-optimization/storage/data.php Outdated Show resolved Hide resolved
modules/images/image-loading-optimization/storage/data.php Outdated Show resolved Hide resolved
Comment on lines 16 to 27
* @return int TTL in seconds.
*/
function ilo_get_page_metric_storage_lock_ttl(): int {

/**
* Filters how long a given IP is locked from submitting another metric-storage REST API request.
*
* Filtering the TTL to zero will disable any metric storage locking. This is useful during development.
*
* @param int $ttl TTL.
*/
return (int) apply_filters( 'ilo_metrics_storage_lock_ttl', MINUTE_IN_SECONDS );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should document that this can also be 0 and that, in that case, transients shouldn't be used at all?

Two related thoughts:

  • What is the rationale for allowing 0?
  • If we keep it, we need to triple-check that this value is never simply passed through to set_transient(). As far as I can tell, that's currently (rightfully) not the case, but it's a bit error-prone, and in case of an error it would lead to transients without expiration, which ties into my other scalability concern with lots of transients. With expiring transients, a large number can be fine, but not if they remain forever.

I think to harden against the potential problems from my second point, it would be good to make this a private function, so that nobody can call it externally (potentially without knowing these implications).

I think the 4 functions in this file could be easily refactored into e.g. a ILO_Page_Metric_Storage_Lock class, where the methods to get the TTL and transient key could be private. The public interface could then simply be the lock() and is_locked() methods.

Alternatively, we disallow returning 0 via the filter, avoiding the risk in a simple way. We could also reconsider later, combining with potential refactoring to keep use of this function "contained".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is useful to allow zero, beyond just for development purposes. For example, consider this example:

add_filter( 'ilo_metrics_storage_lock_ttl', static function ( $ttl ) {
    return is_user_logged_in() ? 0 : $ttl;
} );

This allows authenticated users in WP to store page metrics without any locking, which could even make sense to do by default. A user who is already authenticated in WP is not going to be the same level of risk to necessitate locking. I've added this as an example in the filter doc in da0b420.

Also, I'm now ensuring that the TTL is never less than zero.

As for when the TTL is zero exactly, code is in place to make sure that no transient is set:

function ilo_set_page_metric_storage_lock() /*: void (in PHP 7.1) */ {
$ttl = ilo_get_page_metric_storage_lock_ttl();
$key = ilo_get_page_metric_storage_lock_transient_key();
if ( 0 === $ttl ) {
delete_transient( $key );
} else {
set_transient( $key, microtime( true ), $ttl );
}
}

So there is no possible way for a transient to be set with a zero expiration. Not really a need to make it private, though I intend to mark all of the functions as private in any case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, making the function private ensures nobody uses it incorrectly. Obviously we can't truly make it private as long as it's a function, but if someone still uses it, we have no obligation to keep it working for them.

The rationale for this specific function to be private is that it's error-prone to have to remember that you have to always add an extra condition for whether TTL is 0 since otherwise you'd create transients without expiration. That's why I think this function specifically really should be private, or extremely well documented on what should happen in the case it returns 0. See #878 (comment).

Comment on lines 13 to 29
const ILO_PAGE_METRICS_POST_TYPE = 'ilo_page_metrics';

/**
* Registers post type for page metrics storage.
*
* This the configuration for this post type is similar to the oembed_cache in core.
*
* @since n.e.x.t
* @access private
*/
function ilo_register_page_metrics_post_type() /*: void (in PHP 7.1) */ {
register_post_type(
ILO_PAGE_METRICS_POST_TYPE,
array(
'labels' => array(
'name' => __( 'Page Metrics', 'performance-lab' ),
'singular_name' => __( 'Page Metrics', 'performance-lab' ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs to be updated per #878 (comment)?

Copy link
Member Author

@westonruter westonruter Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9876aa4


const ILO_REST_API_NAMESPACE = 'image-loading-optimization/v1';

const ILO_PAGE_METRICS_ROUTE = '/page-metrics';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, if we changed the post type name, this name should be changed accordingly.

Additionally though, following up on #878 (comment), I don't think just this name is semantically correct. POST /page-metrics would be intended to create a new resource of this type, which isn't really the case as this endpoint doesn't create a new post of the metrics post type, but modifies an existing one with specific data for a request. That is effectively a custom method, so it shouldn't use just the resource based identifier.

Core's endpoints indeed do not include : characters, though it's probably not an issue introducing one. But if we want to tie more nicely into core's naming, we should simply use another forward slash to separate the custom method from the resource, i.e. /page-metrics/store (or /url-metrics/store). There are already a few endpoints in core that use a custom method in this way.

Maybe the store is also not super clear, how about store-request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POST /page-metrics would be intended to create a new resource of this type, which isn't really the case as this endpoint doesn't create a new post of the metrics post type, but modifies an existing one with specific data for a request.

Actually, it can create a new resource of that type. If there were no page metrics gathered for the URL, then a new ilo_page_metrics post would be created. However, if the post does already exist, then essentially it creates a new sub-resource (by adding the page metric to the collection of page metrics for a given URL).

I don't feel strongly about the endpoint name, however. If you want to add a : then I'm fine with it. But IMO the POST method's broad semantics don't require it. In particular, the HTTP spec includes this use case for POST:

Appending data to a resource's existing representation(s).

Copy link
Member

@felixarntz felixarntz Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it can create a new resource, but it doesn't always, and that's not the purpose of the endpoint.

Regarding "Appending data to a resource's existing representation(s).", that applies to a specific resource. In other words, POST /metrics/<id> can append data to that resource. However POST /metrics does not touch any existing resource, it's typically intended to create a new resource. This is applied in most core endpoints in that way and also the best practice per https://google.aip.dev/133.

I don't feel strongly about the :, however I do think a custom method is the appropriate solution for what this endpoint does. It's not a distinct CRUD operation for a resource (as it can either create or update), and the parameters aren't fields on the resource.

If we consider the requests within a post of the post type a sub resource, we could go with that approach, but that would require knowing the post's identifier, which is probably too much of a hassle. So with the way things are now I think POST /{page|url}-metrics{:|/}store-request works best.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this in 698d89f. I think store is better than store-request, as it is immediately following url-metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking store-request because the data that is processed by that endpoint is from a single specific request. And it's storing that within a URL metrics resource, rather than the URL metrics resource itself.

With only store, it sounds like you're storing a resource of the URL metrics post type, which if that was the case should not be a custom method. So I think store-request makes that more clear.

But not a blocker if you feel strongly about it.

modules/images/image-loading-optimization/storage/lock.php Outdated Show resolved Hide resolved
* @param string $slug Page metrics slug.
* @return WP_Post|null Post object if exists.
*/
function ilo_get_page_metrics_post( string $slug ) /*: ?WP_Post (in PHP 7.1) */ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is a bit overkill for now. There are a ton of places like this in core and in plugins, and I don't think it's worth adding this to all of them. So I would say let's stick with what we can add at this point.

@westonruter westonruter changed the title Add storage for page metrics to optimize image loading Add storage for URL metrics to optimize image loading Nov 15, 2023
Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@westonruter westonruter merged commit 03f3252 into feature/image-loading-optimization Nov 15, 2023
8 checks passed
@westonruter westonruter deleted the add/image-loading-optimization-storage branch November 15, 2023 21:44
@westonruter
Copy link
Member Author

Thanks all for the great reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no milestone PRs that do not have a defined milestone for release [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Feature A new feature within an existing module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants