Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefetch processing model, including double-key caching privacy protections #4115

Closed

Conversation

yoavweiss
Copy link
Contributor

@yoavweiss yoavweiss commented Oct 23, 2018

This closes w3c/resource-hints#82 in order to:

  • Define a processing model for prefetch.
  • Define a way for prefetch to work nicely for double-keyed caching browsers, without enabling ways for different origins to communicate and persist information in the browser.

@annevk @domenic - I handwavily talk about cache keys here. Let me know if that works, and if you want me to add a note/issue about better defining that (and double keying) in the future.

I think that the second part of w3c/resource-hints#82 would be to define what a "speculative fetch" is (as a Fetch primitive), and sat that browsers can choose to never fetch them, and should keep them in a non-partitioned, time-limited cache.


💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 7:58 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.10.3</center>
</body>
</html>

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

@yoavweiss
Copy link
Contributor Author

@youennf @kinu - could you take a look and let me know what you think?

source Outdated Show resolved Hide resolved
source Outdated
<var>as</var>.</p></li>
<li>If the browser is using both the <var>request</var>'s <span data-x="concept-request-url">URL</span> and the <span data-x="top-level-browsing-context">top-level browsing context</span>'s <span data-x="document">document</span>'s <var data-x="dom-document-origin">origin</span> as cache keys for <var>request</var>, then:
<ol>
<li><p>If <var>request</var>'s <span data-x="concept-request-credentials-mode">credentials mode</span> is "include", then return.</p></li>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be same-origin, depending on how we define the browsing context of this fetch request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on that?

source Outdated
<ol>
<li><p>Set <var>request</var>'s <span data-x="concept-request-initiator">initiator</span>
to "prefetch".</p></li>
<li><p>Set <var>request</var>'s <span data-x="concept-request-keepalive-flag">keep-alive</span> flag
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering whether we should use keep alive or not.

My understanding so far is that keep alive has one context (the initial one) and when it goes away, it has no context.
This implies putting some restrictions on the number of keep alive requests. It also means we do not care about the response when context goes away.

For prefetch, the initial context is the same, but once it gets destroyed through navigation, we might either actually use the prefetch for the navigation task (hence using the response) or cancel it (navigating to some other URL).
This might be a different model with different restrictions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a text like 'the UA may abort the fetch if navigation happens to a different URL' could it work? Reusing response part could happen at cache level from impl pov but the part is not really spec'ed so it might be a bit tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a speculative flag at whatwg/fetch#881

Can you take a look?

source Outdated
attribute.</p></li>
<li><p>Set <var>request</var>'s <span data-x="concept-request-destination">destination</span>
to the result of <span data-x="concept-potential-destination-translate">translating</span>
<var>as</var>.</p></li>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure 'as' will always be a valid destination? If not valid, are we ending up with the destination equal to the empty string?

Copy link
Member

@domfarolino domfarolino Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since as is an enumerated attribute, it can be in a conforming state and a non-conforming state, so we'll probably want a guard around it to make sure we only attempt translations on conforming states; see https://html.spec.whatwg.org/multipage/links.html#link-type-modulepreload:attr-link-as-2

source Outdated
<li><p>Set <var>request</var>'s <span data-x="concept-request-destination">destination</span>
to the result of <span data-x="concept-potential-destination-translate">translating</span>
<var>as</var>.</p></li>
<li>If the browser is using both the <var>request</var>'s <span data-x="concept-request-url">URL</span> and the <span data-x="top-level-browsing-context">top-level browsing context</span>'s <span data-x="document">document</span>'s <var data-x="dom-document-origin">origin</span> as cache keys for <var>request</var>, then:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the first introduction of that concept in web specs.
Should it be in fetch spec or somewhere else?
I understand 'cache keys' for request, fetch is referring to 'HTTP cache' so maybe HTTP should be made more explicit there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with whatever works to define this. I agree that it would be better if "cache key" here referred to something in HTTP or Fetch.

source Outdated
<li><p>If <var>request</var>'s <span data-x="concept-request-destination">destination</span> is not "document", then return.</p></li>
<li><p>Set <var>request</var>'s <span data-x="concept-request-redirect-mode">redirect mode</span> to "manual".</p></li>
<li><p>Set <var>request</var>'s <span data-x="concept-request-redirect-mode">redirect mode</span> to "manual".</p></li>
<li><p>Set <var>request</var>'s <span data-x="concept-request-service-workers-mode">service workers mode</span> to "none".</p></li>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just apply this regardless of the cache keys. (While it can be discussed separately)

Copy link
Contributor Author

@yoavweiss yoavweiss Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the discussion on w3c/resource-hints#78 and at TPAC, it certainly seems like we'd need to either move the as=="document" check outside of the double-key case or skip SW for all prefetches. (or both!)

source Outdated
<ol>
<li><p>Set <var>request</var>'s <span data-x="concept-request-initiator">initiator</span>
to "prefetch".</p></li>
<li><p>Set <var>request</var>'s <span data-x="concept-request-keepalive-flag">keep-alive</span> flag
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a text like 'the UA may abort the fetch if navigation happens to a different URL' could it work? Reusing response part could happen at cache level from impl pov but the part is not really spec'ed so it might be a bit tricky.

source Outdated
data-x="concept-document-origin">origin</span> as cache keys for <var>request</var>, then:</p>
<ol>
<li><p>If <var>request</var>'s <span
data-x="concept-request-credentials-mode">credentials mode</span> is "include", then return.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this means that all no-cors prefetches will just return and be ignored (if my understanding is correct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my original intention, but it's true that aborting it is probably better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this PR effectively kills all prefetches:

  • Both same- and cross-origin in the wild today without a crossorigin attribute, and...
  • Both same- and cross-origin In the wild today with crossorigin=use-credentials

Also, it seems to enforce usage of CORS, because the only way to fetch these resources with non-"include" credentials mode is with the "cors" request mode. I'm wondering if we could get away with only ignoring all prefetches with crossorigin=use-credentials, but just changing the default prefetch credentials mode to "same-origin". The default request mode would still be "no-cors".

When a developer prefetches a cross-origin resource, the uncredentialed response will be in the cache. When the user navigates to the resource, if no cookies accompany the request, it will match. If cookies were sent and the Vary: Cookie header is properly set on the prefetched response, the request will not match. The case where this breaks is when the prefetched response does vary with cookies, but is missing the Vary header.

/cc @yutakahirano

@yoavweiss
Copy link
Contributor Author

Looks like the spec moved from underneath this PR. I'll rebase it

@yoavweiss yoavweiss force-pushed the prefetch_processing_double_key branch from e4193e2 to 2f70c31 Compare March 21, 2019 12:36
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here. I think we do want to be a bit more specific (and perhaps also more vague at the same time, since the type of keying you're talking about can differ on a per URL basis).

source Show resolved Hide resolved
source Show resolved Hide resolved
<p>If the browser is using both the <var>request</var>'s <span
data-x="concept-request-url">URL</span> and the
<span>top-level browsing context</span>'s <span>active document</span>'s <span
data-x="concept-document-origin">origin</span> as cache keys for <var>request</var>, then:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little too vague. Which cache are we talking about?

Also, from your comment it seems this is talking about requestStorageAccess() type of isolation. I suspect we want to land some infrastructure for that first and agree on how it should work generally.

If that were in place, it's not clear to me why we'd modify redirect mode and such. It seems that might affect any cache in place in weird ways.

@annevk
Copy link
Member

annevk commented Jun 28, 2019

User agents must implement the processing model of the prefetch keyword described in Resource Hints. [RESOURCEHINTS]

It seems this would need to be removed, right?

Is the idea with prefetch still that subresources are also fetched (i.e., it's "prenavigate")? Wouldn't we have to create some kind of fake browsing context in that case? There's also a number of XSLeaks implications with this feature. Some of that seems to be already under consideration from the discussion I read, but it might be good to spell it out more clearly in a note or some such.

@othermaciej
Copy link

I think prefetching is not meant to be prenavigate. However, I am concerned with its potential as a cross-site tracking tool. If prefetch loads are done with credentials, they create a cross-site tracking vector pretty directly. If all you get is "load" and "error" events, then you get one bit of information per prefetched resource, so N prefetches could be used to create an N-bit unique user ID. However, the old Resource Hints spec suggests that prefetch can be used in CORS mode with credentials. If that allows the prefetching page to read back the prefetched resource, it creates a direct tracking vector with only one prefetch.

Just by busting cache partitioning, they can also be used to provide a hidden way to transfer state from one page to the next that's not as visible to the UA (unlike data in the URL or the Referer header) by loading a custom per-user resource that the next page can read back.

Note: these comments are based on Resource Hints draft, I have not read the new PR yet.

@othermaciej
Copy link

I don't have labeling abilities in this repo but this should probably get some sort of privacy/tracking/fingerprinting related label.

@sideshowbarker sideshowbarker added the security/privacy There are security or privacy implications label Jul 4, 2019
@sideshowbarker
Copy link
Contributor

I don't have labeling abilities in this repo but this should probably get some sort of privacy/tracking/fingerprinting related label.

security/privacy There are security or privacy implications is an existing security/privacy combined label. I’ve gone ahead and labeled this with that.

Do you think we should un-combine that into security and privacy separate labels?

Do you think fingerprinting merits having its own separate label? What about tracking?

@othermaciej
Copy link

@sideshowbarker Thanks. I did not mean to suggest creating more specific labels. I just didn't know offhand what labels existed. If I did have opinions on labels, would the https://github.com/whatwg/meta/ repo be the right place?

@sideshowbarker
Copy link
Contributor

If I did have opinions on labels, would the https://github.com/whatwg/meta/ repo be the right place?

Yup

@annevk
Copy link
Member

annevk commented Aug 20, 2019

I've noted my concerns with the proposed model at w3c/resource-hints#82 (comment).

Base automatically changed from master to main January 15, 2021 07:57
@domenic
Copy link
Member

domenic commented Dec 1, 2022

This seems to be superseded by @noamr's work in #7693 and #8111, so let me close it.

@domenic domenic closed this Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security/privacy There are security or privacy implications
Development

Successfully merging this pull request may close these issues.

Prefetch and double-key caching
8 participants