Skip to content
This repository has been archived by the owner on Jun 30, 2023. It is now read-only.

Prefetch and double-key caching #82

Closed
yoavweiss opened this issue Aug 29, 2018 · 34 comments
Closed

Prefetch and double-key caching #82

yoavweiss opened this issue Aug 29, 2018 · 34 comments
Assignees

Comments

@yoavweiss
Copy link
Contributor

Moving a private discussion with @kinu and @igrigorik to a public forum

#78 raised questions regarding which origin should a navigation prefetch be tied to in terms of service workers.

Similar questions also arise when thinking about prefetch and double key caching.
Let's say host A is prefetching a linked document from host B.

If we were to consider A as the origin used as the secondary key for the document, when the user were to navigate to B, the resource won't be used, another would be downloaded instead, resulting in slower experience and sadness.

So, it probably makes sense to consider B the double-key origin for the prefetched document, when double-keying is applied.

The plot thickens when talking about prefetching subresources. If they are same origin as the document that will use them, then we can consider caching them similarly to documents, using their origin as the secondary key. But if they are cross-origin, we'd need to explicitly state which document/origin they are prefetched for. Not sure that's worth the complexity though.

Thoughts?

/cc @wanderview @cdumez @youennf

@youennf
Copy link

youennf commented Sep 29, 2018

Prefetch makes most sense for navigation loads so it might be best to focus on this specific scenario.
Something like the following might work with double key caching:

  1. Prefetched resources are loaded with: credentials=omit, referrerPolicy=no-referrer, redirect=manual
  2. Prefetch loads bypass service workers.
  3. Prefetch loads are optional: low power mode/network cache already having an entry
  4. Prefetched resources are stored in a non-partitioned memory-based cache, cache entries are cleared after some limited time.
  5. Prefetched resources can only match top level document navigation.

@kinu
Copy link

kinu commented Oct 9, 2018

Thanks @youennf, I think this is a pretty good/clear proposal to start with. Hoping that we can discuss more at TPAC but giving some quick thoughts here too:

  1. Prefetched resources are loaded with: credentials=omit, referrerPolicy=no-referrer, redirect=manual

To clarify, do we even want to avoid going with credentials=same-origin?

  1. Prefetch loads bypass service workers.

Have been thinking about this a while, but I think this makes a lot sense at least to start with. (One interesting option @wanderview mentioned off-thread is to skip service workers for prefetch but use the prefetch as NavigationPreload for the service worker when the real navigation occurs. I actually like this idea but given that NavigationPreload is not yet widely supported we can put off considering this further)

  1. Prefetch loads are optional: low power mode/network cache already having an entry

Agreed, and I believe this is currently spec'ed.

  1. Prefetched resources are stored in a non-partitioned memory-based cache, cache entries are cleared after some limited time.
  2. Prefetched resources can only match top level document navigation.

Sounds sensible to me.

One related question is if spec helps prefetches for top-level navigations be distinguishable from others (so that UAs can make better decisions). One way is to use as=document as a signal (while it can't tell whether it's for top-level frames or subframes, and it's proposed to be deprecated).

@youennf
Copy link

youennf commented Oct 9, 2018

  1. Prefetched resources are loaded with: credentials=omit, referrerPolicy=no-referrer, redirect=manual

To clarify, do we even want to avoid going with credentials=same-origin?

Agreed we should tackle this.
I restricted it this way for simplicity and since that this is the biggest issue right now.
Same-origin prefetches do not require all these protections, we could decide to special case them for instance.

Also, in the case of prefetch, it is not clear how it is interacting with the fetch spec, its browsing context, if it is attached to a browsing context, whether it should be cancelled or kept alive when the context goes away...

@yoavweiss
Copy link
Contributor Author

  1. Prefetch loads bypass service workers.

Have been thinking about this a while, but I think this makes a lot sense at least to start with. (One interesting option @wanderview mentioned off-thread is to skip service workers for prefetch but use the prefetch as NavigationPreload for the service worker when the real navigation occurs. I actually like this idea but given that NavigationPreload is not yet widely supported we can put off considering this further)

I'm concerned that this will trigger cases of double download in scenarios where the SW is e.g. modifying the request for a navigation request.

At the same time, this seems necessary for privacy protection - otherwise the destination SW can leak the fact that the prefetch happened.

Also, in the case of prefetch, it is not clear how it is interacting with the fetch spec, its browsing context, if it is attached to a browsing context, whether it should be cancelled or kept alive when the context goes away...

Agree we need to better specify how prefetch relates to Fetch, how the prefetched resources are cached, etc.

@igrigorik
Copy link
Member

👍 to the above.

As a brief aside, I'd actually propose we pull out prefetch from RH into a standalone spec doc, or spec it directly in Fetch.. WDYT?

@yoavweiss
Copy link
Contributor Author

Specifying a processing model that tied directly into HTML's <link> processing model (similar to what we ended up doing with preload) seems the best approach to me. I think Fetch already has all the primitives we'd need for this. I'll sketch something up.

@yoavweiss
Copy link
Contributor Author

I think Fetch already has all the primitives we'd need for this

That's actually not true. We need to introduce the concept of a "speculative fetch" and the concept of a "prefetch cache" that would not be partitioned.

@domfarolino
Copy link
Member

  1. Prefetched resources are loaded with: credentials=omit, referrerPolicy=no-referrer, redirect=manual

Most of this makes sense to me, however I'm wondering if someone could clarify the following:

  • What is the significance of redirect=manual? I think this would mean upon redirect, a redirect response would be stored in the underlying cache instead of the final resource. Is this the intention? Are there privacy reasons to not follow the redirect?
  • The credentials mode and referrerpolicy seem fixed, does this mean if the developer supplies crossorigin or referrerpolicy attribute values, they should be ignored?
  • The request mode has not been talked about here. Today, this is influenced by the crossorigin attribute (i.e., no-cors => cors mode). Should the request mode be unaffected by the presence of a crossorigin attribute too, and default to 'no-cors'?

/cc @yutakahirano

@youennf
Copy link

youennf commented Aug 2, 2019

  • What is the significance of redirect=manual? I think this would mean upon redirect, a redirect response would be stored in the underlying cache instead of the final resource. Is this the intention? Are there privacy reasons to not follow the redirect?

Yes, that is the intention. The principle is to emulate a navigation load which redirect mode is manual.
Prefetching is speculative so keeping it small seems good.
Not following redirections forbids the request to go to various domains and simplifies the implementation. For instance, if we were to store all redirections, it is not clear what we should do if the actual navigation goes directly to the second redirection for instance.

  • The credentials mode and referrerpolicy seem fixed, does this mean if the developer supplies crossorigin or referrerpolicy attribute values, they should be ignored?

credentials should be same-origin.
crossorigin does not make a lot of sense here since we are trying to emulate a navigation load.
I would disregard it. no-referrer limits tracking risks.

  • The request mode has not been talked about here. Today, this is influenced by the crossorigin attribute (i.e., no-cors => cors mode). Should the request mode be unaffected by the presence of a crossorigin attribute too, and default to 'no-cors'?

I would tend to disregard the crossorigin attribute.
This is a navigate-like load so things like CORP checks do not make sense.

@domfarolino
Copy link
Member

For instance, if we were to store all redirections, it is not clear what we should do if the actual navigation goes directly to the second redirection for instance.

Is the current proposal entirely clear though? I think you're saying it is clear that a redirect response in the prefetch cache should be matched if it is the first one, but maybe not otherwise. Is there a reason that matching the first one is more appealing/obvious than later ones in the chain?

I would disregard it

Sounds good to me.

I would tend to disregard the crossorigin attribute.
This is a navigate-like load so things like CORP checks do not make sense.

Also sounds good to me.

I think another question is: What should we do when we ignore those attributes? Cancel the request? Or optionally throw a console warning indicating some attributes have been disregarded, and continue as usual? @yutakahirano prefers cancelling the request, but seems worth discussing as I think it will need reflected in the spec.

@yutakahirano
Copy link

What is the significance of redirect=manual? I think this would mean upon redirect, a redirect response would be stored in the underlying cache instead of the final resource. Is this the intention? Are there privacy reasons to not follow the redirect?

Yes, that is the intention. The principle is to emulate a navigation load which redirect mode is manual.
Prefetching is speculative so keeping it small seems good.
Not following redirections forbids the request to go to various domains and simplifies the implementation. For instance, if we were to store all redirections, it is not clear what we should do if the actual navigation goes directly to the second redirection for instance.

In that case can we use "error" redirect mode? Redirect starts in https://fetch.spec.whatwg.org/#http-fetch, after storing the response into the cache in https://fetch.spec.whatwg.org/#http-network-or-cache-fetch, so I think you will get what you want with "error" redirect mode. I prefer using "error" because it's simpler and easier to understand.

@annevk
Copy link
Member

annevk commented Aug 7, 2019

Apologies for weighing in late, but I'm not sure I fully understand all the rationale here. Do we have any data on how prefetch is used (subresource vs navigation; same-origin vs cross-origin) today? I suppose google.com still uses it for navigations? (I see you all are focused on navigations, but https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ advocates using it for subresources afaict, so it'd be good to have some data.)

If the user navigated to the prefetched resource before, it's highly likely they'll get a better experience if cookies are included. Does the proposed setup make sense for a majority of resources or do we end up with a lot of cache mismatches (nobody sets Vary: Cookie afaik)? (If I'm not missing anything here I really wonder why it's still worth supporting this feature for Safari rather than ignoring the feature altogether or recommending it be exclusively used for same-top-level-origin subresources.)

(Bypassing the service worker seems problematic as the service worker is no longer in control of some of the document's network traffic, making it less reliable. This is already true to some extent, but I'm not a big fan of continuing to carve out small exceptions.)

@kinu
Copy link

kinu commented Aug 7, 2019

@annevk we're working on gathering more data. google.com uses it both for subresources and navigations but we're communicating that x-origin subresource prefetch won't be able to work with double-keyed caching (at least until we come up with a workable, privacy-preserving solution).

I can imagine that cookieless navigation part can be debatable, while the site that triggers prefetch can also only do so for the pages that will unlikely need cookies.

@annevk
Copy link
Member

annevk commented Aug 12, 2019

That would make it really hard to use the feature correctly though.

@domfarolino
Copy link
Member

Just to be clear on the data collection bit: right now Chrome is only measuring how many prefetches redirect, to estimate how serious impactful changing the redirect mode would be. We're not sure how to accurately measure the impact of credentials (especially since as we've mentioned, Vary: Cookie is quite underused). I guess we could also put a use counter on the referrer policy attribute speciically for prefetches too.

@annevk
Copy link
Member

annevk commented Aug 30, 2019

I see, the main problem is credentials (or cross-origin navigations) though... Our current plan in Firefox is to use the top-level origin as additional key for this cache, at which point it'll be mostly useless for a number of scenarios. One of the things we're considering is dropping support.

I'm curious to know though if @youennf has found that Safari's approach has measurable benefits.

@youennf
Copy link

youennf commented Aug 30, 2019

I see you all are focused on navigations

I understand it that prefetch is for navigations, preload for subresources.
In general, hoping that one website will efficiently preload subresources for another website seems fragile design to me.

I'm curious to know though if @youennf has found that Safari's approach has measurable benefits.

Safari implementation is experimental and incomplete at the moment.
I understand Firefox position to limit prefetch to same origin navigations only (in which case it is very similar to preloads).

In general, cross-site tracking protection will probably continue increasing the cost to do cross-site navigation. It would be nice to have some safe ways to mitigate these costs.

In terms of scenarios, search engines come to mind. Web packaging has a similar constraint so the same scenarios might apply.

That would make it really hard to use the feature correctly though.

As long as a resource is cacheable by intermediaries, it should be safe to prefetch it. Or am I too optimistic?

I agree it makes the feature harder to use, although I think the whole feature is quite hard to use with or without this restriction.
For instance, the website has to determine the probability for the user to actually navigate to the prefetched destination, how much the prefetch will slow down other important resource loading...

@annevk
Copy link
Member

annevk commented Aug 30, 2019

To be clear, I'm not sure what Safari's approach is. I assumed it to be #82 (comment), but maybe it's something not stated in this thread?

aarongable pushed a commit to chromium/chromium that referenced this issue Sep 4, 2019
This CL renames the PrefetchRedirectError flag to PrefetchPrivacyChanges
so the flag can be generalized to encapsulate more privacy-preserving
changes proposed in [1]. Also implements the usage of kNoReferrer referrer
policy when the privacy changes flag is enabled. A LinkLoader unit test is
added to test that the referrer policy is set and persists correctly. It
is likely too early to invest in WPTs for this change, since standards
discussion must take place before we can determine this is the correct way
forward.

[1]: w3c/resource-hints#82

R=kinuko@chromium.org, yhirano@chromium.org

Bug: 988956
Change-Id: Id01771a1c077b0e018b311983e2d198733fec23b
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1781303
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Reviewed-by: Yutaka Hirano <yhirano@chromium.org>
Commit-Queue: Dominic Farolino <dom@chromium.org>
Cr-Commit-Position: refs/heads/master@{#693033}
@kinu
Copy link

kinu commented Sep 10, 2019

Reg: credentials and (cross-origin) navigations, one option we thought of is to add an opt-in http header for the target site to explicitly express that "making uncredentialed prefetches and navigations to this site is okay", say, allowed-uncredentialed-navigation. Then UA can cancel the prefetch if it doesn't see the header in the response for cross-origin prefetch requests. How would something like that sound?

Thinking about this space a bit further I suspect we'll need the similar restrictions (like being uncredentialed) for any cross-origin speculative loading, e.g. prerender (by the way I'm trying to put up possible / potential threat model for cross-origin speculative loading here: https://github.com/kinu/speculative-loading#threat-model).

As youenn mentioned these features are always anyways a bit hard to use, but it could be still useful to accelerate navigations if used appropriately. I think it'd be worth exploring the most plausible design that could work with reasonable trade-offs.

@annevk
Copy link
Member

annevk commented Sep 13, 2019

That could work, but at that point I wonder whether we should use a new opt-in keyword as well (and drop the current feature) as everything currently annotated as prefetch won't have that and would result in a redundant fetch and cache miss.

@yoavweiss
Copy link
Contributor Author

Discussed at the WebPerfWG F2F: For compat and confusion avoidance reasons, it would make sense to define a new keyword. @achristensen07 suggested "prenavigate" which seems like a good option.

/cc @ericlaw

@kinu
Copy link

kinu commented Sep 30, 2019

Some of us also discussed this in a breakout discussion during TPAC on Friday (@annevk, @youennf, @yoavweiss, @domfarolino, @yutakahirano, @jyasskin, @bslassey, @kinu and some others were there), and here's a quick summary:

For cross-origin prenavigate, one of the concerns is always requiring an opt-in header will likely limit the adoption. As an alternative approach the following two-paths approach (instead of opt-in only solution) was discussed:

  • Case A: If there’s no credentials stored for the site:
    • Just send prenavigate request as a regular request (no cookies will be sent)
    • If its response has set-cookie headers, set them in an ephemeral, isolated cookie store
    • If next navigation happens on the same URL, commit the cookies change made by the prenavigate. Otherwise discard the cookie store
  • Case B: Otherwise (some credentials are stored)
    • Send prenavigate request as uncredentialed (no cookies will be sent)
    • If its response header doesn’t have ‘Allow-Uncredentialed-Navigation’ header just abort the prenavigate.

In either case the prenavigate request itself will be always sent without credentials, and nothing should be observable if the response is not used (i.e. no credentials changes are committed, no onerror/onload should be propagated).

One of the concerns was that Fetch spec integration could be a bit tricky, and one option that was discussed was to introduce a new credentials mode like prenavigate.

Next step:

  • Write down the proposal (done in this comment)
  • Each will examine the proposal

@annevk
Copy link
Member

annevk commented Sep 30, 2019

Could you also write down the proposed model for prefetch? I believe it was stated that the preference was to keep that around as well.

@kinu
Copy link

kinu commented Oct 1, 2019

Let me try. For prefetch my current understanding is as following (if anyone had a different view please chime in):

  • It can be more strictly for prefetching (sub)resources for same-origin navigations, i.e. the prefetched resources may not be available in next navigations if the HTTP cache is split
  • Prefetch loads are optional (same as before)
  • Restricts that were discussed on this thread probably do not need to be applied? (I.e. it feels it can be a regular subresource loading, but I don't think this was explicitly discussed)

Reg: whether we want to keep it around, or can it be just prenavigate and preload? -- we probably want to keep it around, and afair followings are what were stated as the differences between prefetch and preload:

  • There are types that are hard to support as preload, e.g. workers (and to specifically talk about chrome impl it doesn't support as=document and all media types)
  • prefetch loading is optional while preload is not
  • preload is supposed to be for the current navigation, and will give the bytes back to the page (while prefetch only populate things in HTTP cache)

(While, I started to feel that the difference between prefetch and preload might look more subtle now)

@annevk
Copy link
Member

annevk commented Oct 1, 2019

I guess the other question is how this integrates with Fetch as the above models don't make everything clear. Does prefetch bypass service workers? Can the creation of a shared worker use the prefetch cache of another document? (Also, in a world with service workers prefetch being optional can lead to surprises between browsers that do and those that do not, especially if developers only code against a browser that does. That does not seem desirable.)

@addyosmani
Copy link

My understanding is that we are proposing keeping around prefetch (for same-origin optional prefetches) and introducing prenavigate as a form of prefetch which is always sent without credentials. Is that correct?

For what it's worth, I and other JavaScript library authors who rely on prefetch (for Quicklink, instant.page and Flying Pages) would like to avoid renaming it for the same-origin use-case if possible. It also appears there's reasonable usage of prefetch in the wild.

I guess the other question is how this integrates with Fetch as the above models don't make everything clear. Does prefetch bypass service workers?

+1 to more clearly defining this part of the prefetch model.

@kinu
Copy link

kinu commented Oct 15, 2019

My understanding is that we are proposing keeping around prefetch (for same-origin optional prefetches)

That's my understanding as well, which means most existing prefetch (for same-origin) can stay as is.

I guess the other question is how this integrates with Fetch as the above models don't make everything clear. Does prefetch bypass service workers? Can the creation of a shared worker use the prefetch cache of another document? (Also, in a world with service workers prefetch being optional can lead to surprises between browsers that do and those that do not, especially if developers only code against a browser that does. That does not seem desirable.)

Assuming that prefetch is for same-origin only, a strawperson could look like following:

  • doesn't bypass service workers
  • shared worker (which is same-origin) can use the prefetch cache

@josephrocca
Copy link

josephrocca commented Feb 13, 2020

Sorry to jump in here as a non-expert, but can I ask: Would prenavigate be suitable for a site like jsbin or codepen which embeds an iframe which is served from a subdomain (like embed.jsbin.com or embed.codepen.io)? Serving from a different origin is needed because these sites allow arbitrary JS to be run within the embed. The main page might be https://jsbin.com/abc123, and it would include this:

<link rel="prenavigate" href="https://embed.jsbin.com/abc123" as="document">

So it can start loading the document that will be put in the iframe embed right at the moment the main page (with the code editor interface) begins loading, rather than having to wait for the main page to render and thus loading the main page and the embed in a serial manner.

Is this a use case covered by prenavigate?

@achristensen07
Copy link

No, prenavigate would load https://embed.jsbin.com/abc123 as if it were navigated to in the main frame, and its resources would be cached in the partition of embed.jsbin.com, which would be unavailable for use by a page with the main frame on a different domain. Preload would be used for resources intended to be used from pages in the current partition.

@josephrocca
Copy link

@achristensen07 Ahh, okay, thank you. So I guess for that use case I'd need to wait for preload as=document to be supported in browsers.

@yoavweiss
Copy link
Contributor Author

To be defined as part of #86

/cc @noamr

@noamr
Copy link
Contributor

noamr commented Feb 25, 2022

See #86 (comment) for action plan

@noamr
Copy link
Contributor

noamr commented Jan 2, 2023

As per #86 (comment), <link rel=prefetch> will work as it is today, meaning that it would not change anything about cache partitioning - those prefetches only work when the prefetching document and the consuming document are in the same partition.

@noamr
Copy link
Contributor

noamr commented Mar 27, 2023

See previous comment.

@noamr noamr closed this as completed Mar 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.