Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend link HTTP header to support subresource signed exchange loading #347

Open
horo-t opened this issue Dec 7, 2018 · 44 comments
Open

Comments

@horo-t
Copy link
Collaborator

horo-t commented Dec 7, 2018

I want to introduce two new fields in application/signed-exchange format.

  • Alternative-Signed-Exchange-Subresources map in unsigned field.
  • Allowed-Alternative-Signed-Exchange-Subresources list in signed field.

Problem

Currently content publishers can sign their HTML contents using their own private keys. User Agents (UAs) can trust the signed contents as if the contents are served from the publisher’s origins even if they are served from other distributors’ origins. The signed contents can be served from any distributors’ origins. But if the publisher wants to serve subresources such as scripts and images from the distributors’ origin, the publisher needs to change the subresource URLs in the HTML to point to each distributors’ URL and need to sign for each distributor. The proposed two fields can solve this problem.

Alternative-Signed-Exchange-Subresources map:
A map from the original subresource requests to the SXG URLs. This field is not signed. So the distributor can change this field to point to their URLs.

Allowed-Alternative-Signed-Exchange-Subresources list:
The subresource URL list which can be served using SXG instead of fetching the original URL. This field is signed by the publisher. So the distributor can’t change this field.

Example

Publisher: https://publisher.example/article_1.html

  <script src="framework.js"></script>
  <img src="article_1.jpg>

SXG in Publisher: https://publisher.example/article_1.html.sxg

[
  // URL
  'https://publisher.example/article_1.html',
  // Signature
  'sig1: sig=*...; integrity="digest/mi-sha256";cert-url="https://publisher.example/cert"',
  // [New field] Alternative-Signed-Exchange-Subresources
  // The key of the mapping may need Accept headers info in order to enable content
  // negotiation (e.g. for WebP).
  [
    [{':url': 'https://publisher.example/framework.js', 'accept': '*/*'},
     'https://publisher.example/framework.js.sxg'],
    [{':url': 'https://publisher.example/article_1.jpg', 'accept': '*/*'},
     'https://publisher.example/article_1.jpg.sxg']
  ],
  // Signed headers
  [
    { ':method': 'GET', 'accept': '*/*' },
    {
      ':status': '200',
      // [New field]
      ':allowed-alternative-signed-exchange-subresources':
          '"https://publisher.example/framework.js",'
          '"https://publisher.example/article_1.jpg"',
      'content-encoding': 'mi-sha256-03',
      'content-type': 'text/html; charset=utf-8',
      'digest': 'mi-sha256-03=....'
    },
  ],
  // Payload body
  '<html><body>...'
]

SXG in Distributor: https://distributor.example/article_1.html.sxg

[
  // URL
  'https://publisher.example/article_1.html',
  // Signature
  'sig1: sig=*...; integrity="digest/mi-sha256";cert-url="https://distributor.example/publisher.example/cert"',
  // [New field] Alternative-Signed-Exchange-Subresources
  [
    [{':url': 'https://publisher.example/framework.js', 'accept': '*/*'},
     'https://distributor.example/publisher.example/framework.js.sxg'],
    [{':url': 'https://publisher.example/article_1.jpg', 'accept': '*/*'},
     'https://distributor.example/publisher.example/article_1.jpg.sxg']
  ],
  // Signed headers (Same as the SXG in Publisher)
  [
    { ':method': 'GET', 'accept': '*/*' },
    {
      ':status': '200',
      // [New field]
      ':allowed-alternative-signed-exchange-subresources':
          '"https://publisher.example/framework.js",'
          '"https://publisher.example/article_1.jpg"',
      'content-encoding': 'mi-sha256-03',
      'content-type': 'text/html; charset=utf-8',
      'digest': 'mi-sha256-03=....'
    },
  ],
  // Payload body (Same as the SXG in Publisher)
  '<html><body>...'
]

How UAs should work

  • When the user opens the SXG in the distributor, the UA must check the signature using the certificate in https://distributor.example/publisher.example/cert. (This is the existing behavior)
  • The UA processes the script tag and the img tag and decides to fetch the "framework.js" and the image "article_1_small.jpg".
  • Instead of fetching the original URL in publisher.example, UA should fetch the SXG files in distributor.example after checking the Alternative-Signed-Exchange-Subresources field and the Allowed-Alternative-Signed-Exchange-Subresources field.
  • If the original URL is not in the Allowed-Alternative-Signed-Exchange-Subresources field, the UA must fetch the original URL. This is intended to avoid the subresource monitoring attack.
  • If the original URL’s origin is not same as the signed origin of the main SXG (publisher.example), the UA must fetch the original URL. This restriction is intended to avoid providing a way of tracking.
  • UAs should handle the preload link header in the signed response header in the same way. (ex: link: <https://example.com/framework.js>;rel="preload";as="script")

Subresource monitoring attack

We need the signed Allowed-Alternative-Signed-Exchange-Subresources to avoid the subresource monitoring attack like this:

  • A publisher generates a SXG of a html which shows the user's icon using JS. icon.src = USER_ID + '.png';
  • An attacker sets the mapping info like this: { 'example.com/a.png': 'attacker.com/a.png.sxg', 'example.com/b.png': 'attacker.com/b.png.sxg', ....}
  • If UA fetches the SXG of png when the image tag is added, the attacker can know the USER_ID. Even if UA only uses the prefetched SXG, the attacker distributor can intentionally delay returning the SXGs one by one to see when the load actually finishes by monitoring the onload event, therefore can know the USER_ID.

Tracking using subresource SXG

We need to prohibit the SXG loading for cross-origin subresources to avoid the user tracking like this:

  • A publisher sets one subresource to the Allowed-Alternative-Signed-Exchange-Subresources field (https://tracking.example/id.js)
  • The distributor server can let the publishers’ site know about the user’s ID (ABCD1234) by changing the Alternative-Signed-Exchange-Subresources field.
  • tracking.example/id.js points to tracking.example/ABCD1234.sxg (body is `const id='ABCD1234';)

Tracking is still possible even if we prohibit cross-origin subresources using the following logic. But this is more difficult.

  • A publisher sets 30 subresources to the Allowed-Alternative-Signed-Exchange-Subresources field (https://publisher.example/00, 01, ... 29)
  • The publisher prepares 60 files, 00_0.sxg (body is 0), 00_1.sxg (body is 1), 01_0.sxg (body is 0), 01_1.sxg (body is 1)...
  • The distributor server can let the publishers’ site know about the user’s ID in binary digits by changing the Alternative-Signed-Exchange-Subresources field.
    • publisher.example/00 points to 00_0.sxg or 00_1.sxg
    • publisher.example/01 points to 01_0.sxg or 01_1.sxg
    • ....
  • This logic provides a way of user tracking of 2^30 users.
@sleevi
Copy link

sleevi commented Dec 7, 2018

The security implications of this are non-obvious, and may benefit from being fleshed out more. In particular, I always get uncomfortable when seeing new unsigned field - they make it dangerous for parsers and give attackers a lot of opportunities, some non-obvious.

My first thought was about substitution attacks - what prevents an attacker from modifying the URL for framework.js.sxg to point to other-different-payload.js.sxg. I'm assuming here that the answer is the SXG is strongly-bound to the request URL, and thus changing it to other-different-payload.js.sxg will cause it to fail when it attempts to match the request to the SXG - is that correct?

The next question is what the implications are of a version substitution of the SXG subresource. In this case, framework.js.sxg (v1) is served instead of the intended framework.js.sxg (v2). The best I can tell is the intent is to address that through the SXG signature expiration (that is, stop signing v1). This would be a 'new' problem, in as much as SXG-subresource-fetches and SXG-caching-inner-resources are not well-defined (or implemented) enough to be a thing that folks would rely on, even though they would also introduce these problems.

The final question is what the implications are of creating a bidirectional communication path. As you note, by virtue of the Allowed-Alternative-Signed-Exchange-Subresources, this potentially creates a communication channel between the distributor and the publisher when loading the SXG. The privacy implications are profound (as noted), but also the security implications of what happens if sites begin to rely on this communication channel, but that it is, by nature, unauthenticated (i.e. publisher.example can't tell whether it was distributor.example setting those bits or whether it was evil-distributor.example setting those bits.

@horo-t
Copy link
Collaborator Author

horo-t commented Dec 11, 2018

My first thought was about substitution attacks - what prevents an attacker from modifying the URL for framework.js.sxg to point to other-different-payload.js.sxg. I'm assuming here that the answer is the SXG is strongly-bound to the request URL, and thus changing it to other-different-payload.js.sxg will cause it to fail when it attempts to match the request to the SXG - is that correct?

Yes. UAs must check the subresource SXG's inner URL.

The next question is what the implications are of a version substitution of the SXG subresource. In this case, framework.js.sxg (v1) is served instead of the intended framework.js.sxg (v2). The best I can tell is the intent is to address that through the SXG signature expiration (that is, stop signing v1). This would be a 'new' problem, in as much as SXG-subresource-fetches and SXG-caching-inner-resources are not well-defined (or implemented) enough to be a thing that folks would rely on, even though they would also introduce these problems.

Publishers must be careful to add subresources to the Allowed-Alternative-Signed-Exchange-Subresources field.
If framework.js.sxg (v1) has a security bug and the signature of SXG is still valid, the publisher must change the URL of framework.js.

The final question is what the implications are of creating a bidirectional communication path. As you note, by virtue of the Allowed-Alternative-Signed-Exchange-Subresources, this potentially creates a communication channel between the distributor and the publisher when loading the SXG. The privacy implications are profound (as noted), but also the security implications of what happens if sites begin to rely on this communication channel, but that it is, by nature, unauthenticated (i.e. publisher.example can't tell whether it was distributor.example setting those bits or whether it was evil-distributor.example setting those bits.

Introducing a new CSP directive sxg-src could be a solution for that.
For example, if the signed response header has Content-Security-Policy: sxg-src https://distributor.example, the sxg on evil-distributor.example should be blocked.

@sleevi
Copy link

sleevi commented Dec 11, 2018

Introducing a new CSP directive sxg-src could be a solution for that.
For example, if the signed response header has Content-Security-Policy: sxg-src https://distributor.example, the sxg on evil-distributor.example should be blocked.

I don't think this addresses the substance of the concern. I was not attempting to say "We should grant control to page authors", I was trying to frame it as "We need to carefully reason about the security implications of allowing this". The goal of framing it like that is to understand if we're proposing a mechanism that is default-insecure, whether that's desirable in and of itself, and what solutions might exist.

Even more concretely: I'm not convinced we should be displaying the publisher origin if we allow for arbitrary content injection by the distributor, which having such a channel would imply. I think we need to carefully reason about that. This statement is based on seeing the harm come from code-signing systems that allow for (limited) content injection/manipulation - such as Authenticode or macOS Bundles. It's certainly true that such unauthenticated injection allows for things developers perceive interesting and useful use cases - for example, injecting whether or not a user has opted-in to metrics collection services on the download page for an executable (e.g. Chrome) - but it's also true that such methods have caused a substantial number of security vulnerabilities (e.g. https://docs.microsoft.com/en-us/security-updates/securityadvisories/2014/2915720 )

@horo-t
Copy link
Collaborator Author

horo-t commented Dec 12, 2018

How about having the each signatures of subresources in Allowed-Alternative-Signed-Exchange-Subresources to prevent distributors from injecting arbitrary content?

[
  // URL
  'https://publisher.example/article_1.html',
  // Signature
  'sig1: sig=*...; integrity="digest/mi-sha256";cert-url="https://distributor.example/publisher.example/cert"',
  // [New field] Alternative-Signed-Exchange-Subresources
  [
    [{':url': 'https://publisher.example/framework.js', 'accept': '*/*'},
     'https://distributor.example/publisher.example/framework.js.sxg'],
    [{':url': 'https://publisher.example/article_1.jpg', 'accept': '*/*'},
     'https://distributor.example/publisher.example/article_1.jpg.sxg']
  ],
  // Signed headers (Same as the SXG in Publisher)
  [
    { ':method': 'GET', 'accept': '*/*' },
    {
      ':status': '200',
      // [New field]
      ':allowed-alternative-signed-exchange-subresources':
          '"https://publisher.example/framework.js" '
            '*MEUCIQDX...=* '  // The first signature of framework.js.sxg
            '*MEQCIGjZ...=*,'  // The second signature of framework.js.sxg
          '"https://publisher.example/article_1.jpg" '
            '*lGZVaJJM...=* '  // The first signature of article_1.jpg.sxg
            '*MEYCIQCN...=*',  // The second signature of article_1.jpg.sxg
      'content-encoding': 'mi-sha256-03',
      'content-type': 'text/html; charset=utf-8',
      'digest': 'mi-sha256-03=....'
    },
  ],
  // Payload body (Same as the SXG in Publisher)
  '<html><body>...'
]

@sleevi
Copy link

sleevi commented Dec 12, 2018

As I mentioned previously, it's probably more useful to analyze the problem before we try to step forward to solve the problem. The latest proposed approach has, for example, the same deficiency w/r/t user tracking - if you treat framework.js as 'bit 0', article_1.jpg as 'bit 1', etc, then the existence of the two signatures lets you smuggle a bit at a time from the distributor by allowing the distributor to select which signature to use, which allows altering the content (e.g. framework.js having 0 bytes vs 1 byte).

This is why it's helpful to first make sure we've analyzed the problem, clearly stated it, and made sure it's not, in fact, a pre-existing problem, so then we can look at solution spaces or make informed tradeoff decisions.

@jyasskin
Copy link
Member

jyasskin commented Dec 12, 2018

IIUC, the core goal here is that if publisher.example has signed all of article1.html, framework.v2.js, article1.400x300.jpg, and article1.1600x1200.jpg, we'd like searchengine.example to be able to prefetch the appropriate subset of those for their user to be able to view the article without any new fetches to publisher.example. (See Privacy-Preserving Prefetch.)

Bundles solve this, but they require searchengine.example to subset the bundle to omit whichever of article1.400x300.jpg or article1.1600x1200.jpg is the wrong size for the user. This ability to subset gives, I think, the same communication abilities @sleevi's worried about here, but Bundles do give us a straightforward way to prevent version skew. The Bundles implementation is also farther off than @horo-t, et. al. think they could implement this extension to the SXG format.

We've talked at times about adding a way to specify external dependencies for bundles. The Allowed-Alternative-Signed-Exchange-Subresources list is similar to what we'd need for that.

The Alternative-Signed-Exchange-Subresources map fundamentally, tells the browser "for this dependency that the SXG or bundle said you need, you can fetch it from this URL." I don't particularly like the idea of requiring the distributor to modify the SXG file itself in order to communicate that. Would a response header work? e.g.

Content-Location: https://distributor.example/article1.html.sxg
Link: <https://distributor.example/framework.v2.js.sxg>; anchor="https://publisher.example/framework.v2.js"; rel=alternate_tbd

@sleevi
Copy link

sleevi commented Dec 12, 2018

@jyasskin Your mention of bundles made me realize that there may be another implication of this - cache probing. That is, if the distributor can modify the URL used to fetch subresources, can it infer or learn what sub-resources the user may already have (cached or loaded) by seeing which requests are not made?

That is, if I fetch article1.html.sxg from distributor.example, and it refers to publisher.example/resources/a.jpg, distributor.example does not learn whether or not the resource was cached or loaded - because the user contacts publisher.example to fetch that resource. In the Bundles case, if the user issues a range request (for the bundle), then distributor.example can learn which resources are needed. The same would apply with this sort of modification - whether or not distributor.example/sxgs/publisher.example/resources/a.jpg.sxg was fetched reveals whether or not the user needed publisher.example/resources/a.jpg. This is similar to the privacy implications mentioned by @horo-t with regards to user IDs, but isn't mitigated by the Allowed-Alternative... solution.

@jyasskin
Copy link
Member

In what ways is the recommended value for Allowed-Alternative-Signed-Exchange-Subresources different from the value we'd recommend for a Link: <>; rel=preload header in the signed response? Could we just have the browser use the signed preloads for this purpose?

@jyasskin
Copy link
Member

jyasskin commented Dec 12, 2018

@sleevi Cute. I agree cache probing is a risk, but I think the UA can solve it the same way we solve other cache-based tracking attempts: we fetch the resource redundantly if the server we're fetching it from shouldn't know whether it's already cached. Edit: And searchengine.example needs to take that cost into account when deciding whether to offer a SXG for any particular resource.

@sleevi
Copy link

sleevi commented Dec 12, 2018

@jyasskin Sure, I didn't explicitly come out and say 'double-keyed caching', but I think that's the assumption. But that's something unique, in this case, because it's not double-keying based on the resource's logical origin (publisher.example) but instead based on the physical origin (distributor.example). Introducing that sort of split - where some of the security properties use physical and some logical - would benefit from that sort of analysis about the implications.

@sleevi
Copy link

sleevi commented Dec 12, 2018

In what ways is the recommended value for Allowed-Alternative-Signed-Exchange-Subresources different from the value we'd recommend for a Link: <>; rel=preload header in the signed response? Could we just have the browser use the signed preloads for this purpose?

Wouldn't it be both signed and unsigned preloads?

That is, https://distributor.example/publisher.example/article_1.html.sxg would Link: <https://distributor.example/framework.v2.js.sxg>; rel=preload when serving the SXG (i.e. the unsigned part), but then within the SXG, you'd do

  // Signed headers (Same as the SXG in Publisher)
  [
    { ':method': 'GET', 'accept': '*/*' },
    {
      ':status': '200',
      ':link', '<https://publisher.example/framework.v2.js>; rel=preload',
      ...
    }
   ...
  ]

Or did I misunderstand the question?

@jyasskin
Copy link
Member

My Link: <>; rel=preload question was more for @horo-t than @sleevi. 😄 We have to look at the signed one for the same reasons Allowed-... has to be signed in the original post.

I think it's more complex than "double-keying" or even "physical" vs "logical". We need to find a way to describe which entities (origins or organizations) know that which other entities have asked the profile to download a URL. I think each entry in the cache winds up annotated with a list of entities that are allowed to know it's cached, and if you request it but aren't in that list, it gets refetched from the network. ... But I haven't thought that all the way through.

@horo-t
Copy link
Collaborator Author

horo-t commented Dec 15, 2018

My Link: <>; rel=preload question was more for @horo-t than @sleevi. 😄 We have to look at the signed one for the same reasons Allowed-... has to be signed in the original post.

I think introducing a new Link header instead of Alternative-Signed-Exchange-Subresources is an alternative solution. But if we have Alternative-Signed-Exchange-Subresources field in SXG, we can easily host the SXG files in HTTP servers. The distributor doesn't need to implement the logic of setting the HTTP header for each SXG files. So I want to introduce the new field in SXG format.

I think it's more complex than "double-keying" or even "physical" vs "logical". We need to find a way to describe which entities (origins or organizations) know that which other entities have asked the profile to download a URL. I think each entry in the cache winds up annotated with a list of entities that are allowed to know it's cached, and if you request it but aren't in that list, it gets refetched from the network. ... But I haven't thought that all the way through.

To avoid letting the distributor know about the existence of publisher's content in the user's HTTPCache, UA must fetch https://distributor.example/publisher.example/framework.js.sxg even if https://publisher.example/framework.js is in the HTTPCache. But if https://distributor.example/publisher.example/framework.js.sxg is in the HTTPCache, UA doesn't need to fetch it again.

@horo-t
Copy link
Collaborator Author

horo-t commented Dec 15, 2018

This is why it's helpful to first make sure we've analyzed the problem, clearly stated it, and made sure it's not, in fact, a pre-existing problem, so then we can look at solution spaces or make informed tradeoff decisions.

I think the problem of this idea is that the user can't notice the channel which can be used by the distributor to send arbitrary information to the publisher. Is my understanding correct?

@sleevi
Copy link

sleevi commented Dec 15, 2018 via email

@horo-t
Copy link
Collaborator Author

horo-t commented Jan 7, 2019

Thank you for the detailed framing of the problem.

I think if the allowed-alternative-signed-exchange-subresources field must have the signatures of SXG #347 (comment), we can solve the security issue.

One possible solution for the privacy issue is like this: Privacy conscious browsers can delay the subresource SXGs loading until the all subresource SXGs are successfully verified. If one of the SXGs has an error, the browsers must fetch the original publisher's URL. So the distributors can't use the smuggling bits in the SXG.

@sleevi
Copy link

sleevi commented Jan 7, 2019

@horo-t I may be misunderstanding the proposal a bit, so I thought I'd try to write it out and check if it's what you're proposing:

  • Only declaratively-specified subresources would have this mapping applied, and only for first-order SXGs. That is, those SXGs loaded by JS (e.g. mutating a .src attribute) or those referenced within SXGs (for example, loading a CSS file that then loads dependent resources) won't go through this transformation. This is, AIUI, more restrictive than the generic preload scanner.
  • (Naive algorithm) After the page has fully loaded, and it's determined all URLs that this transformation would apply to, it then attempts to fetch all SXGs. After it has fully downloaded and verified the SXG (the entire resource), it may then either use all of those resources in lieu of the original URLs, or may otherwise restart and begin fetching those other URLs (throwing out all of the SXGs it downloaded)

Is that roughly the proposal? I see lots of edge cases, so I wasn't sure if I was missing something fundamental.

@horo-t
Copy link
Collaborator Author

horo-t commented Jan 9, 2019

Ah, I forgot to say about the link headers.

My proposal for the privacy issue is:

  • Privacy conscious browsers can use subresource SXGs only when the subresources are listed in the link (rel=preload) header in the signed response headers.
  • While loading the main resource SXG, the browser checks the link header.
  • If there are corresponding SXGs in Alternative-Signed-Exchange-Subresources map, the browser fetchs the SXGs.
  • After finishing the verification of the all SXGs, the browser can load the subresources from the SXGs. If there is an error, the browser must fetch the original URL for the all subresources.

@horo-t
Copy link
Collaborator Author

horo-t commented Jan 18, 2019

We (@jyasskin, @sleevi, @kinu, @horo-t) discussed about this issue yesterday. This is the summary.

  • Goal:

    • Only while prefetching a main SXG before the content document starts to be processed, allow subresources SXGs to be preloaded using existing prefetch+preload mechanisms (e.g. link headers).
    • Allow these subresource preloads to also be served by SXGs from the SXG distributor/physical origin, allowing more efficient loading and w/o requiring a connection to the inner resource’s logical origin.
  • To limit the complexity:

    • We might restrict the usage of SXG subresource only for prefetches. No plan to support SXG subresources which were NOT prefetched.
    • We might restrict the main SXG and the subresources SXGs to be served from the same host.

Possible attacks:

  1. The Source (who has a link to the SXG) or Distributor sends a tracking ID to the Publisher
    1.1. In query parameters or fragment
    1.2. In the set of prefetched resources
    1.3. In the content of prefetched resources
    1.4. In the user history (referrer)
  2. If the subresource request from the SXG is observable by the Distributor:
    2.1. The Publisher can send arbitrary information to the Distributor
    2.2. Accidental information leak may occur.
  3. Version skew attack. An evil Distributor can serve old version JS which contains a bug.
  • Attack 1.1 is already possible without SXG.
  • If SXG subresources must be declared as prefetchable and all must be prefetched for any of the prefetches to apply:
    • The Distributor can send only 1 bit (succeeded or failed) to the Publisher using the set of prefetched resources. (attack 1.2)
    • This requirement also prevents the attack 2.1 and 2.2.
  • If the main SXG must have the subresources SXG's signatures in signed field:
    • This Distributors can’t send a tracking ID in the content of prefetched resources. (attack 1.3)
    • This also prevents version skew attack (attack 3).
    • This requirement may make the packaging tool complex.
    • We might need to think about WebFonts case.
  • Publishers can know the source page URL which has a link to the SXG using document.referrer. (attack 1.4)
    • The source page can send a tracking ID using the page URL.
    • This is the status quo.
    • document.referrer in SXG is not supported yet in Chromium (https://crbug.com/920905).

@yoavweiss
Copy link
Collaborator

If attack 1 is already possible with or without SXG (using either 1.1 or 1.4), why is it important to block 1.2?

Requiring subresource signatures makes sense from a security perspective (to prevent content injection). blocking 1.3 is a nice side-effect of that.

It's also not immediately clear to me how limiting this to prefetches reduces the complexity or increases privacy. Can you elaborate on that?

@jyasskin
Copy link
Member

jyasskin commented Jan 31, 2019

Limiting it to prefetches prevents attack #2. Unless you're thinking of a third kind of fetch besides prefetches and post-load fetches?

@RByers, do you have a feeling for which attacks we can exclude from the threat model because they're possible today?

@yoavweiss
Copy link
Collaborator

Isn't attack #2 readily available to any page with network access? e.g. can't they send a 1x1 pixel image to with request parameters distributor.com/tracking to leak whatever information that they so choose?

@sleevi
Copy link

sleevi commented Feb 3, 2019

@yoavweiss No.

When prefetching, the author has to declaratively commit to what to disclose, rather than being able to leak traffic from the current origin. Note the caveats on #347 (comment) as well.

I think an important gap in that comparison is that this is not loading distributor.com/tracking, but allowing any intermediary to insert and/or observe traffic in the session. While it's true that it's "with the consent" of the origin (by virtue of saying how they can collaborate), it's functionally indistinguishable from mixed content. That is, despite the HTTPS page 'wanting' to load the HTTP page, it's not in the user's security or privacy interests to do so. Similarly, unlike an explicitly keyed load of distributor.com/example (which you can do prior to signing the SXG, if you explicitly indicate to load that SXG), this would allow attackers full mutability of where that content is loaded from. That is, they're not just causing it to load from distributor.com/tracking but {insert distributor here}, which as the analysis above explains, turns into a full primitive for insecure side-channels and injection unless limited to prefetch.

Hopefully, that explains why it's not at all comparable.

@horo-t horo-t changed the title Two new fields in SXG format to support subresource loading Extend link HTTP header to support subresource signed exchange loading Feb 5, 2019
@horo-t
Copy link
Collaborator Author

horo-t commented Feb 5, 2019

Instead of adding two new dedicated fields (Allowed-Alternative-Signed-Exchange-Subresources,Alternative-Signed-Exchange-Subresources) in the application/signed-exchange format, extending the link header sounds reasonable.

For example:

In unsigned HTTP response from distributor.example:

content-type: application/signed-exchange
link: <https://distributor.example/publisher.example/script.js.sxg>;rel="alternate";type="application/signed-exchange";anchor="https://publisher.example/script.js";

In signed response header of SXG:

link: <https://publisher.example/script.js>;rel="allowed-alt-sxg";sig="MEUCIA..."
link: <https://publisher.example/script.js>;rel="preload";as="script"

(Sorry for contradicting my previous comment)

@jyasskin
Copy link
Member

jyasskin commented Feb 7, 2019

I looked through http://microformats.org/wiki/existing-rel-values and https://www.iana.org/assignments/link-relations/link-relations.xhtml but didn't see anything that seems to serve the purpose of rel="allowed-alt-sxg", so we're free to invent our own name.

We should think about whether to include the format version number in the type="application/signed-exchange" bit. I suspect we should require that version number since the distributor has already received an Accept header saying which version(s) the client supports.

Should we make allowed-alt-sxg a separate Link or an extra parameter to the preload Link? We have lots of precedent for adding parameters to preload, but this one's a bit weird because we don't want it to affect preloads retrieved directly from the publisher.

@yoavweiss
Copy link
Collaborator

OK, that makes that clearer. Thanks!

@horo-t
Copy link
Collaborator Author

horo-t commented Feb 25, 2019

We should think about whether to include the format version number in the type="application/signed-exchange" bit. I suspect we should require that version number since the distributor has already received an Accept header saying which version(s) the client supports.

Having the format version number in the type="application/signed-exchange" bit sounds good to me.

Should we make allowed-alt-sxg a separate Link or an extra parameter to the preload Link? We have lots of precedent for adding parameters to preload, but this one's a bit weird because we don't want it to affect preloads retrieved directly from the publisher.

I think we should have the separateallowed-alt-sxg Link. If we have the signature param in preload Link, it will be completed to selectively preload images using imagesrcset and imagesizes.

Example:
In unsigned HTTP response from distributor.example:

content-type: application/signed-exchange
link: <https://distributor.example/publisher.example/wide.jpg.sxg>;rel="alternate";type="application/signed-exchange;v=XX";anchor="https://publisher.example/wide.jpg";
link: <https://distributor.example/publisher.example/narrow.jpg.sxg>;rel="alternate";type="application/signed-exchange;v=XX";anchor="https://publisher.example/narrow.jpg";

In signed response header of SXG:

link: <https://publisher.example/wide.jpg>;rel="allowed-alt-sxg";sig="MEUCIB..."
link: <https://publisher.example/narrow.jpg>;rel="allowed-alt-sxg";sig="MEUCIC..."
link: <https://publisher.example/wide.jpg>;rel=preload; as=image;imagesrcset="https://publisher.example/wide.jpg 640w, https://publisher.example/narrow.jpg 320w";imgesizes="(min-width: 400px) 50vw, 100vw"

@horo-t
Copy link
Collaborator Author

horo-t commented Feb 25, 2019

Filed a crbug: https://crbug.com/935267

@horo-t
Copy link
Collaborator Author

horo-t commented Feb 27, 2019

I wrote that 'allowed-alternative-signed-exchange-subresources' ('allowed-alt-sxg' in the current idea) should have the signatures of subresources.
But the signature is valid only for 7 days.

Instead of the signature, I want to use SHA-256 hash of headerBytes byte sequence which includes digest header and content-type header and other arbitrary headers.
Note that we can't use the digest header for the integrity check. If we do so, subresource SXGs are used for user tracking by adding arbitrary headers or changing content-type to cause image load failure.

The signed response header of main SXG will be like this:

link: <https://publisher.example/wide.jpg>;rel="allowed-alt-sxg";header-integrity="sha256-h0KP..."
link: <https://publisher.example/narrow.jpg>;rel="allowed-alt-sxg";header-integrity="sha256-AmOC..."
link: <https://publisher.example/wide.jpg>;rel="preload";as=image;imagesrcset="https://publisher.example/wide.jpg 640w, https://publisher.example/narrow.jpg 320w";imgesizes="(min-width: 400px) 50vw, 100vw"

@mfalken
Copy link

mfalken commented Mar 14, 2019

Instead of fetching the original URL in publisher.example, UA should fetch the SXG files in distributor.example after checking the

How does this interact with service workers? Would the request go to publisher.example's service worker first?

@horo-t
Copy link
Collaborator Author

horo-t commented Mar 14, 2019

How about introducing a new method "getPreloadedResponses()" in FetchEvent?
Service workers can get the prefetched subresources which are preloaded while prefetching the main resource in the previous page.

interface FetchEvent : ExtendableEvent {
  ...
  Promise<FrozenArray<Response>> getPreloadedResponses();
};

@kinu
Copy link
Collaborator

kinu commented Mar 14, 2019

It looks it should probably clarify where the URL replacement happens in the Fetch process model? Is the assumption that the replacement layer sits between the page and SW?

@mfalken
Copy link

mfalken commented Mar 14, 2019

Yes this is my point of confusion... would be useful to see the sequence of which service workers get consulted when and when the replacement happens for the main resource and the subresources.

@jyasskin
Copy link
Member

@mattto To preserve privacy during the prefetch, the publisher's SW MUST NOT get an event saying which subresources are getting prefetched, even transitively.

We do need to specify that ... but doing so will be difficult until there's a specification of how <link rel="prefetch"> interacts with Service Workers at all.

My guess is that whatever bit of the browser is scanning a prefetched resource for preloads to prefetch recursively (whew) needs to maintain the mapping of available alternate SXGs, and replace the URLs before it invokes Fetch. @horo-t / @kinu, does that make sense?

@horo-t
Copy link
Collaborator Author

horo-t commented Mar 18, 2019

My current idea of Service Worker and subresource SXG prefetching integration is like this:

  1. The user opens "https://aggregator.example/index.html".
  2. When the UA processes <link rel="prefetch" href="https://distributor.example/publisher/article.sxg">:
    • Invoke the FetchEvent of aggregator's SW with "article.sxg" request.
    • If the SW didn't call respondWith(), perform a HTTP-network-or-cache fetch. The SXG is stored to the HTTPCache.
  3. The response has the following headers:
    • In unsigned outer HTTP response:
      • content-type: application/signed-exchange
      • link: <https://distributor.example/publisher/script.js.sxg>;rel="alternate";type="application/signed-exchange[;v=...]";anchor="https://publisher.example/script.js";
    • In signed inner response header:
      • link: <https://publisher.example/script.js>;rel="allowed-alt-sxg";header-integrity="sha256-MEUCIA..."
      • link: <https://publisher.example/script.js>;rel="preload";as="script"
  4. The UA processes the headeres, and starts prefetching "https://distributor.example/publisher/script.js.sxg".
    • Invoke the FetchEvent of aggregator's SW with "script.js.sxg" request.
    • If the SW didn't call respondWith(), perform a HTTP-network-or-cache fetch. The SXG is stored to the HTTPCache.
  5. The user clicks the link of https://distributor.example/publisher/article.sxg
    • Invoke the FetchEvent of distributor's SW with "article.sxg" request.
    • If the SW didn't call respondWith(), perform a HTTP-network-or-cache fetch. The SXG is served from the HTTPCache.
  6. The UA processes the SXG response as if it is a 303 redirect to "https://publisher.example/article.html" and set request’s "stashed exchange" to the parsedExchange.
  7. The UA processes the link header.
    • Invoke the FetchEvent of publisher's SW with "script.js" request. The SW can return the response which have been retrieved using getPreloadedResponses() at 6.
    • If the SW didn't call respondWith(), Check the existence of the prefetched "script.js.sxg" in HTTPCache, and returns the inner response of it if exists. Othewise perform a HTTP-network-or-cache fetch.

@jyasskin
Copy link
Member

I like that overall sketch.

"The SXG is stored to the HTTPCache." is ambiguous here, since we're designing for a multi-key'ed HTTP cache. We'll wind up using terminology from w3c/resource-hints#82, but I think the goal is to put:

  1. https://distributor.example/publisher/article.sxg in the new prefetch cache and
  2. The inner resource from https://distributor.example/publisher/script.js.sxg in a cache that's promoted to the https://publisher.example origin's partition of the HTTP cache only if the navigation is to https://publisher.example/article.html.

This promotion to the HTTP cache reminds me of things @sleevi has been nervous of, and I don't understand his concerns well enough to know if they're assuaged by this happening only on navigation to the controlling top-level document.


Separately, I haven't thought through whether we need the FetchEvent.getPreloadedResponses() method, and I suspect you should propose it separately from this SXG proposal. It probably makes sense, or doesn't, for all recursive prefetches, so should go to https://github.com/w3c/resource-hints?

@kinu
Copy link
Collaborator

kinu commented Mar 19, 2019

@jyasskin @sleevi Reg: inner resource and HTTP cache, if we're feeling ready to talk about this I prefer we discuss the generic case first, possibly in a separate issue, before talking about this specific case, could we?

@horo-t Reg: FetchEvent.getPreloadResponses(): why don't we just let FetchEvent.preloadResponse expose the preload-to-prefetched resource for the particular fetch (e.g. for "script.js")? It looks UA anyway needs to track the relationship until step 7, wasn't sure why returning an array in navigation request is better. Either way I agree with @jyasskin that proposing this separately might be good, I think similar idea has been discussed somewhere else before (e.g. exposing prefetched response as FetchEvent.preloadResponse).

But also wondered if we do start to store innerResponse in HTTP cache when navigation happens something like 2. in #issuecomment-474138609 then getPreloadResponses() might not be really needed?

@mfalken
Copy link

mfalken commented Mar 19, 2019

Thanks for sketching that out, that's very clear. I'll note that this adds more cases where respondWith(fetch(event.request)) differs from not calling respondWith(). Historically we've tried to keep those equivalent, but maybe we've already lost that guarantee, and it aligns with the main resource SXG. Was there a discussion about the SW interaction described in https://wicg.github.io/webpackage/loading.html#overview?

@horo-t
Copy link
Collaborator Author

horo-t commented Mar 19, 2019

@kinu
Using FetchEvent.preloadResponse for the preload-to-prefetched resources sounds good to me.
FetchEvent.getPreloadResponses() may be useful when we want to store the prefetched subresrouces to CacheStorage. But I don't think this is super important.

I commented about using FetchEvent.preloadResponse in SW for prefetched resources at w3c/resource-hints#78 (comment). Let's discuss about it there.

@horo-t
Copy link
Collaborator Author

horo-t commented Mar 19, 2019

@mattto
The SW integration with signed exchange was added to the spec at #281 (comment). If we want to keep calling respondWith(fetch(event.request)) and not calling respondWith() same behavior, we need to change the spec. I think we should discuss about it in a separate issue.

@mfalken
Copy link

mfalken commented Mar 19, 2019

Thanks, filed #409

@sleevi
Copy link

sleevi commented Mar 20, 2019

  1. The UA processes the headeres, and starts prefetching "https://distributor.example/publisher/script.js.sxg".

    • Invoke the FetchEvent of aggregator's SW with "script.js.sxg" request.
    • If the SW didn't call respondWith(), perform a HTTP-network-or-cache fetch. The SXG is stored to the HTTPCache.

I find this part uncomfortable and hard to reason about. In a 'normal' TLS loading case, my understanding is that aggregator.example would have no knowledge of distributor.example preloading here, so this feels like a new information disclosure vector.

If I understand correctly, but wanting to confirm, we're reasoning that this isn't particularly new information, because aggregator.example can see the headers of the inner SXG, and thus know about the link: ...;rel="preload" content, and thus know what the user will load anyways. Does that sound roughly correct?

I think one area that would need more specificity here is what happens if aggregator.example does trigger a respondWith() call for the fetch to distributor.example/publisher/script.js.sxg.

  1. What if they glue it to a fetch event of `otherdistributor.example/publisher/script.js.sxg'
  2. What if they glue it to a fetch event of publisher.example/script.js
  3. What if they glue it to a synthetic response (e.g. a blob) which has the same header-integrity value as expressed in the rel="allowed-alt-sxg" (which AIUI refers to the hash of the inner content not the outer content?)

Separate from these concerns, as @jyasskin highlighted, we need to figure out what it means by storing in the HTTPCache / serving from the HTTPCache, and how those requests are inserted and matched. If I understood @kinu's comment it sounds like we're good to defer that?

@horo-t
Copy link
Collaborator Author

horo-t commented Mar 23, 2019

Humm...
Now I think we should skip service workers for prefetching requests (2 and 4 of #347 (comment)), at least for MVP (minimum viable product).

Introducing the new prefetch cache sounds good to me. If we can put the prefetched resources (https://distributor.example/publisher/article.sxg and https://distributor.example/publisher/script.js.sxg) and the certificate URL of each SXGs to the new prefetch cache, and we can use the cached resources when navigating from https://aggregator.example/index.html, this mechanism works even when double-key caching is enabled. (I'm trying to find a good way to implement this in Chromium.)

I still don't know whether it is ok or not to store the inner resources (https://publisher.example/article.html and https://publisher.example/script.js) into the prefetch cache.
It is good for performance because we can skip the verification process.
@sleevi Do you have any concern about it?

@horo-t
Copy link
Collaborator Author

horo-t commented Jul 11, 2019

I have written two explainer documents.

  • Signed Exchange subresource substitution
    • This introduces rel="allowed-alt-sxg" link header.
    • By using this header, content publishers can declare that the UA can load the specific subresources from cached signed exchanges which were prefetched in the referrer page.
  • Signed Exchange alternate link
    • This extends the usage of the existing rel="alternate" link header.
    • By using this header, UAs can recursively prefetch appropriate subresource signed exchanges while prefetching the main resource signed exchange.

@WICG WICG deleted a comment from almshibin Sep 27, 2019
horo-t added a commit to horo-t/webpackage that referenced this issue Dec 4, 2019
I uploaded explainer documents of subresource signed exchanges to my
repository (https://github.com/horo-t/subresource-signed-exchange).
But they should be in this webpackage repository.
So this patch copies them from "horo-t/subresource-signed-exchange"
repository.

Spec issue: WICG#347
TAG review: w3ctag/design-reviews#352
horo-t added a commit that referenced this issue Jan 6, 2020
I uploaded explainer documents of subresource signed exchanges to my
repository (https://github.com/horo-t/subresource-signed-exchange).
But they should be in this webpackage repository.
So this patch copies them from "horo-t/subresource-signed-exchange"
repository.

Spec issue: #347
TAG review: w3ctag/design-reviews#352
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants