Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When to fallback redirect? #397

Open
irori opened this issue Feb 13, 2019 · 14 comments
Open

When to fallback redirect? #397

irori opened this issue Feb 13, 2019 · 14 comments

Comments

@irori
Copy link
Collaborator

irori commented Feb 13, 2019

In the current loading spec, redirect to the fallback URL happens only when the signed exchange version is not supported.

However, Chromium implementation performs fallback redirect in the following cases (corresponding to the error codes here):

  • SXG was served from non-secure origin.
  • Unsupported version of SXG (could extract fallback URL).
  • SXG parse error (could extract fallback URL).
  • Network error occurred while loading SXG header.
  • Failed to fetch certificate chain.
  • Failed to parse certificate chain.
  • Signature verification failed.
  • Cert verification failed.
  • CT verification failed.
  • OCSP check failed.
  • Certificate Requirements aren't met.
  • SXG was served without X-Content-Type-Options: nosniff header.

Chromium doesn't fallbeck on the following errors:

  • SXG parse error (couldn't extract fallback URL).
  • Merkle integrity error.

We should clarify that which error should cause a fallback redirect, and spec them.

@kinu
Copy link
Collaborator

kinu commented Feb 13, 2019

/cc @twifkak for consumer side feedback

@twifkak
Copy link
Collaborator

twifkak commented Feb 20, 2019

Thanks for opening the issue. Sorry for the delay; I've been sick and this slipped my radar.

As a consumer & distributor, my product preference is to make as many things as possible a fallback. That said, I understand there may be reasons not to prefer that, and I would like to know more about them:

  • Debugging -- as a developer, I won't even notice that my SXG is invalid unless I have DevTools Network tab open and I scroll all the way to the top. That said, it seems like the UA should prioritize the typical user's experience over developer experience.
  • What others?

From your list above, I think there are some that I would feel safe removing fallback for today:

  • SXG was served from non-secure origin.
  • Network error occurred while loading SXG header. (I assume this is a network error on the outer response?)
  • SXG was served without X-Content-Type-Options: nosniff header.

Essentially these are all the "outer exchange" things that are well-trodden and I feel confident we'll do correctly; also, they require that we generate the correct response rather than that we validate (but don't modify) a given response.

I'm also okay with not falling back for Merkle integrity (this one was simple enough to check) or parse error (fallback seems dangerous).

For a distributor with the privacy-preserving-prefetch use case, the downsides of having a network error page are:

  • The user sees an error page instead of simply reduced page speed & extra network bytes.
  • The error page is unbranded. (In the early days of AMP, there used to be more branded error pages, but my understanding is that these have significantly reduced in frequency due to improvements in search infra.).

The potential reasons that our cache may serve an invalid SXG include:

  • A known limitation in our implementation (i.e. we haven't implemented validation X yet).
  • An unknown difference (i.e. a bug in either our cache's or the UA's implementation of SXG validation).
  • An intrinsically unsolvable difference (e.g. it's impossible to know the UA's trusted roots, or UA clock jitter [skew changing significantly from one request to another]).

At this point, I'm mostly worried about the first two. But I'm mostly guessing. After #374 (and after we implement server-side monitoring of those reports), I'll be able to respond more usefully.

@twifkak
Copy link
Collaborator

twifkak commented Feb 21, 2019

(I'm presuming that the fallback for SXG version mismatch happens before the network error for MICE decode failure. Otherwise, future Chromium versions will need to continue supporting old MICE versions.)

@jyasskin
Copy link
Member

Thank you for collating all the error cases so we can just go through them. My suggestion is:

Situation Behavior Notes
SXG was served from non-secure origin. 🛑Error
Unsupported version of SXG (could extract fallback URL). ↪️Fallback
SXG parse error (could extract fallback URL). 🛑Error
Network error while streaming SXG header or body. 🛑Error
Failed to fetch certificate chain. 🛑Error
Failed to parse certificate chain. 🛑Error
Signature parsing failed (This includes fetching and parsing the certificate) 🛑Error
Signature isn't valid ↪️Fallback This could error instead on the last step, which actually runs the validation algorithm?
Certificate doesn't have the CanSignHttpExchanges OID 🛑Error Maybe we shouldn't make this exception from the next step.
Certificate chain does not have a trusted leaf for the origin for another reason ↪️Fallback This includes cert path building, OCSP checking, and CT checking.
Certificate chain does not establish cross-origin trust for another reason 🛑Error The non-certificate reasons can be checked on the distributor.
SXG was served without X-Content-Type-Options: nosniff header. 🛑Error
SXG parse error (couldn't extract fallback URL). 🛑Error
Merkle integrity error. 🛑Error

I basically want anything the distributor could catch to be an Error, while anything that clients might vary on (mostly time & crypto validity) should fall back. @sleevi, how do you feel about this list?

@sleevi
Copy link

sleevi commented May 15, 2019

I basically want anything the distributor could catch to be an Error, while anything that clients might vary on (mostly time & crypto validity) should fall back.

At first glance, that sounds like a great principle and I think your analysis is reasonable and spot-on. Signature isn’t valid seems a reasonable error?

@jyasskin
Copy link
Member

@sleevi https://wicg.github.io/webpackage/loading.html#validating-signature includes the check against the client's clock (which I think should definitely fall back) and the check that it's a secp256r1 key (where I want to have clients fall back when they remove broken crypto, but maybe we should do that with a version increment instead).

We should also make sure that the reporting API distinguishes between the cases that error vs fall back. Right now there are distinctions in the table above that aren't captured in the reporting API results.

@sleevi
Copy link

sleevi commented May 15, 2019

Yeah, the approach for changing crypto primitives was definitely seen as a version-bumpable thing. Perhaps it's worth splitting out "validate a signature" and "validate the timestamp" as two separate things, the former being an error, the latter being a fallback?

@twifkak
Copy link
Collaborator

twifkak commented May 21, 2019

Initial NEL analysis from AMP Cache is available to Googlers in http://b/126787670. Overall, error rate is relatively low, but I'd say not low enough to serve network errors in all cases.

The two most prevalent errors are cert_fetch_error & mi_error. My understanding is that these can be due to a bad connection, which is beyond the control of the distributor (e.g. may be due to ISP issues or spotty cell coverage). I'm thinking that these cases should continue to fall back. Otherwise, spotty connnections would disproportionately affect SXG, because browsers are willing to render unsigned HTML that was truncated or had failed subresources. Are there downsides to falling back in these cases?

@sleevi
Copy link

sleevi commented May 21, 2019 via email

@irori
Copy link
Collaborator Author

irori commented May 21, 2019

Reg. mi_error:

From the implementation's POV, browser sends SXG's inner response header to the renderer as soon as the SXG header is validated. Merkle integrity error is detected after that, so it's too late to chagne the response to 303 redirect at that pont.

@jyasskin
Copy link
Member

cert_fetch_error: My instinct is that a failure to fetch the certificate from the same origin that's already transferring the signed exchange is similarly likely to a failure to fetch the signed exchange itself or the content from the publisher. Since we don't auto-reload on a network error, we shouldn't auto-redirect on one either.

mi_error:

  1. It seems confusing to have a network error that truncates the SXG be expressed as mi_error. I think I'd want to see one of https://w3c.github.io/network-error-logging/#transmission-of-request-and-response-errors instead? @irori/@horo-t Is that plausible?

  2. I hope the browser can render the parts of the content that have already been verified, even if the connection is broken which truncates and invalidates the last chunk.

@irori
Copy link
Collaborator Author

irori commented May 24, 2019

mi_error:

  1. It seems confusing to have a network error that truncates the SXG be expressed as mi_error. I think I'd want to see one of https://w3c.github.io/network-error-logging/#transmission-of-request-and-response-errors instead? @irori/@horo-t Is that plausible?

IIUC, currently both a mi_error and a transmission error are reported. I think it's possible to stop sending mi_error for network error cases, but I'll defer to @horo-t.

  1. I hope the browser can render the parts of the content that have already been verified, even if the connection is broken which truncates and invalidates the last chunk.

Chrome works in this way, although it's not guaranteed that response is loaded to the last validated chunk (internal buffer is discarded upon network error).

@horo-t
Copy link
Collaborator

horo-t commented May 24, 2019

mi_error:

  1. It seems confusing to have a network error that truncates the SXG be expressed as mi_error. I think I'd want to see one of https://w3c.github.io/network-error-logging/#transmission-of-request-and-response-errors instead? @irori/@horo-t Is that plausible?

IIUC, currently both a mi_error and a transmission error are reported. I think it's possible to stop sending mi_error for network error cases, but I'll defer to @horo-t.

I agree.
We should not send mi_error when caused by a transmission error.
We need to update both the loading spec and Chromium implementation.

@twifkak
Copy link
Collaborator

twifkak commented May 24, 2019

Thanks for the clarifications; looks like I misunderstood what mi_error meant for the user. Re: cert_fetch_error; I'll defer to you all, unless I learn that these errors are disproportionately large and can't be mitigated at implementation side, and I can come up with a good reason they're different from AIA fetch. (The only difference I can think of is that a secure fallback seems possible in this case.)

jyasskin added a commit that referenced this issue Jul 8, 2019
Add:

* An invariant fallback URL, like signed exchanges have.
* A version number, so we can easily know to fall back to a redirect.
* Some infrastructure to identify what kind of error broke the parse, which can feed into both Network Error Logging and #397’s discussion of when to fall back.
* The index maps URLs to a Variants value + a list of the responses for each possible Variant-Key, instead of using a set of request headers.
* A new signatures section allows authorities to vouch for particular subsets of the bundle. This document doesn't describe how a user agent would decide to trust cross-origin resources based on which authority vouches for them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants