Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating stored headers #165

Closed
3 tasks
mnot opened this issue Oct 24, 2018 · 56 comments · Fixed by #337
Closed
3 tasks

Updating stored headers #165

mnot opened this issue Oct 24, 2018 · 56 comments · Fixed by #337

Comments

@mnot
Copy link
Member

mnot commented Oct 24, 2018

7234 requires stored headers to be updated upon a 304 or HEAD response.

However, there are a number of issues:

  • It doesn't specify that H1 connection-related headers should be omitted
  • Some browsers omit other headers (e.g., x-content-*, content-* and x-webkit-* - see Chrome and Webkit
  • Apache has a whitelist of headers; see bug
  • Intermediaries are reluctant to do this for performance reasons; e.g., squid

It seems like a lot of the bugs referenced above originated from the confusing language around omitting entity headers in RFC2616; that was removed in RFC7234 (see current language).

I think we need to:

  • Specify the list of H1 connection-oriented headers to be omitted
  • Talk about other headers that need to be omitted (coordinating especially with browsers, who seem to have weird behaviour here)
  • Contemplate any advice/warnings/requirements we can give to an intermediary cache about how they should handle this if they don't want the performance hit

Mozilla's list of headers they don't update might be a good starting point for the first two (see also related bug).

There are also browser tests and an outstanding PR.

Note that this is a security issue, because some browsers filter out headers starting with Content-, which includes Content-Security-Policy -- i.e., a browser that has an old copy of the response won't see the new CSP header on a 304 response.

(this might also apply to 206 responses, since there's header combination there too)

@mnot mnot added the caching label Oct 24, 2018
@mnot mnot changed the title H1 headers shouldn't update stored responses if they're connection-related Updating stored headers Nov 6, 2018
@mnot
Copy link
Member Author

mnot commented Nov 6, 2018

cc @mikewest @erickinnear

@mikewest
Copy link
Member

mikewest commented Nov 6, 2018

@andypaicu and @MattMenke have more informed opinions than I do at this point. :)

@annevk
Copy link
Contributor

annevk commented Nov 6, 2018

I think you want @MattMenke2.

@MattMenke2
Copy link

Thanks, Anne! MattMenke is also me, just my personal account (Which I prefer not to use for work. Lawyers scare me). So do please do use @MattMenke2.

@MattMenke2
Copy link

Chrome's lists can be found at https://cs.chromium.org/chromium/src/net/http/http_response_headers.cc?gsn=PERSIST_RAW&g=0&l=47. Chrome has two lists: Those it shouldn't updated when using cached response after receiving a 304 (kNonUpdateHeaders + kNonUpdatedHeaderPrefixes), and those it doesn't put in the cache at all (kHopByHopResponseHeaders+ kChallengeResponseHeaders+ kCookieResponseHeaders + kSecurityStateHeaders + "Content-Range" + headers explicitly indicated by a no-cache header)

Chrome's no-update list was recently modified to match FireFox's, with a few extra rules. In particular, Chrome's list includes "www-authenticate", "x-frame-options", "x-xss-protection", "x-content-", and "x-webkit-".

"www-authenticate" definitely makes sense in the list (Well, I assume no browser caches proxy challenge responses, anyways, but it makes sense if the proxy challenge header does).

It seems to me like "x-frame-options" shouldn't be in that list, as it could reasonably be generated based on the requesting origin, rather than give a list of all origins (Both to keep down response sizes, and to avoid exposing other URLs)

"x-xss-protection" doesn't have that problem, but seems strange to single it out as not updated, unless there's a good reason to treat it differently.

I'm not sufficiently familiar with the "x-content-" and "x-webkit-" prefixes to comment usefully on them.

@MattMenke2
Copy link

Also, it seems weird to have "Proxy-Authorization" on the list, as it's a response header. Though it we want to keep it for legacy reasons, we should probably include "Authorization", too.

@mnot
Copy link
Member Author

mnot commented Nov 8, 2018

Re: Proxy-Authorization and Authorization: my inclination would be to not specify request headers in the list, as they don't have any function in responses (specified or practical), and so it would be strange to specify them, and especially strange to specify some but not others.

If implementations felt like they wanted to still omit them because they don't remember why they were omitted and were wary of changing, that's fine, but I'd point out that the intermediary caches that do update stored headers don't do this.

@mnot
Copy link
Member Author

mnot commented Nov 8, 2018

I'll update my PR in WPT as well as the CDN tests to include more headers, so we can capture exactly what's current practice here.

@MattMenke2
Copy link

Just to clarify, it looks to me FireFox does currently omit Proxy-Authorization, which is where Chrome apparently got its behavior from. See line 901 of https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpResponseHead.cpp#879. Or did you mean that FireFox will definitely remove that line?

@mnot
Copy link
Member Author

mnot commented Nov 8, 2018

@MattMenke2 I edited :)

It'd be great to understand why they do so. @mcmanus any idea?

@mcmanus
Copy link

mcmanus commented Nov 8, 2018

don't know.. its older than the 2007 version control cutover - so you really have to badly want the answer to go looking.

Im going to guess that they were enumerating hop by hop headers and felt Proxy* should always fall into that bucket.

@jyasskin
Copy link

Is there general agreement in this issue to add two lists to draft-ietf-httpbis-cache:

  1. a list of headers to avoid caching at all, and
  2. a list to avoid updating on 304 or HEAD responses?

I do see that the contents of those lists don't yet necessarily have consensus.

@mnot
Copy link
Member Author

mnot commented Nov 26, 2018

I think it's one unified list. Where do you think it'd be different?

@jyasskin
Copy link

#165 (comment) indicates that the set of headers that Chrome avoids caching is a different set from the one it avoids updating. (e.g. Chrome caches ETag but doesn't update it.) Thus, a need to represent two sets in the specification.

@MattMenke2
Copy link

Updating Content-Encoding or Content-Location also seems pretty weird, though both should be cached.

@mnot
Copy link
Member Author

mnot commented Nov 27, 2018

I can see a case for Proxy-Authenticate, since that's often going to be dependent upon the connection properties. I don't see it for an already cacheable WWW-Authenticate.

Are there any other headers that you think should be considered (whether or not we have separate lists)?

The common case for these ones that appear weird is a server that is trying to correct a previous mistake; e.g., it sent the wrong header, and wants to update the cached entry.

@annevk
Copy link
Contributor

annevk commented Nov 27, 2018

It would be helpful if someone could put down a tentative proposal to compare with implementations and headers with interesting properties. (E.g., it's not clear to me from the above discussion if not being able to update Content-Type and Content-Length, while being able to cache them is seen as good by all, or not.)

@mnot
Copy link
Member Author

mnot commented Nov 27, 2018

I think we can do it in groups -

  • Transfer-Coding, Connection, Keep-Alive, Proxy-Authenticate*, and Upgrade are all hop-by-hop headers; i.e., they should be listed in Connection (except Connection itself). They should never be stored or updated.

  • Other headers listed in Connection (for h1) should also never be stored or updated.

  • TE, Proxy-Authorization and maybe a couple of other headers are request hop-by-hop headers; they shouldn't be seen in responses. However, if they are sent in a response, they don't do anything, so I think they're harmless to leave.

  • ETag is used to select a response when If-None-Match is in use, so by definition it can't change on the response (there won't be anything to update if it does; see the rules). If an ETagcomes back in response to a request that only used If-Modified-Since, I'd say the server was updating the ETag -- but it's worth exploring / talking about if people want to cover that case. See also Upgrading Weak ETags #132 for a possible complication.

  • For Content-* my understanding is that this was exempted by developers who misread the language in 2616 about not including entity headers in 304s, and somehow took away that if they are present, they shouldn't be updated. That's not what it says, wasn't the intent, and as discussed, can cause horrible problems (such as discussed WRT Content-Security-Policy).

  • Specifically, Content-Type can be updated when people do things like get the encoding wrong using Apache's .htaccess file.

  • Content-Length is used to delimit messages in HTTP/1, and so causes people to get scared when it's messed with, for understandable reasons. There's also used to be some pretty pathological errors from a few implementations that would send Content-Length: 0 on a 304 response (because it doesn't have a body, right?). If an implementation wants to refuse to update Content-Length due to security concerns, I'd understand that -- but if it's coming in H2 or greater, it might just mean that the server is trying to correct an erroneous C-L sent before that. shrug

Any other headers of interest? I'm reluctant to start opting headers out of update just because "it seems weird", since that's an open-ended set, and unlike with browsers, we can't expect intermediary caches to be updated every time somebody comes up with a new header.

  • I should have checked before commenting above; Proxy-Authenticate is defined as hop-by-hop, so no case need be made.

@annevk
Copy link
Contributor

annevk commented Nov 27, 2018

Note that having to respect Content-Type updates has implications, such as always having to store the body rather than an optimized structure. And changing encodings isn't without security issues either. (A browser will always have to deal with that anyway I suppose, as one can use fetch() to get the bytes from the cache.)

@MattMenke2
Copy link

Content-Encoding seems problematic - want to get passed a malware scanner of some type? Service a resource with the wrong Content-Encoding, possibly get it in the browser cache (Which is generally at a lower layer than decompression), though the resource fails to load. Then re-request the resource with a 304 that changes the Content-Encoding. Admittedly, service workers and the like are perfectly capable of using their own methods to bypass filters.

@MattMenke2
Copy link

Also I don't think "Some browsers may want to ignore updates to this header" is really a great option, going forward - I think we want to be consistent here, unless there's a pretty compelling reason not to.

@mnot
Copy link
Member Author

mnot commented Nov 29, 2018

I understand, but that's not a HTTP cache -- you're inserting non-standard functions on top of a HTTP cache for implementation convenience. Restricting the behaviour of all HTTP caches (including intermediaries and server-side caches) because some browser caches might be doing extra things like this doesn't make sense.

Even for those browsers, there are other strategies that they might take -- e.g., invalidating a cache entry that's an optimised structure if the content-type changes. Likewise, a browser that stores decoded responses needs to be aware of the semantics of content-encoding anyway, and take appropriate steps to behave correctly.

@MattMenke2
Copy link

I'm not following...What "extra things" are you talking about?

@mnot
Copy link
Member Author

mnot commented Nov 29, 2018

  • Storing optimised representations; HTTP caches store bags-of-bits
  • Decompressing Content-Encoding automagically; C-E is a property of the content

I.e., storing and updating these headers works just fine until you start doing non-specified things with the HTTP cache. If you do such things, it's your responsibility to make sure that it doesn't interfere with the proper operation of the cache.

Re-defining caching to suit current browser behaviour might work if browser caches were the only HTTP caches in existence, but they're not.

@MattMenke2
Copy link

Those two points seem to contradict each other - only one can be correct. If Content-Encoding is a property of the content, and not merely the transfer of the content, then caches should store that content, as opposed to a mutated version of it, no?

C-E being a property of content is a fiction - that may be what the spec says, but that's not the reality. You could not make a functional browser that behaved otherwise. I thought one of the purposes of the fetch spec was to reconcile RFCs with the way things actually work, or can reasonably be modified to work.

@mnot
Copy link
Member Author

mnot commented Nov 29, 2018

HTTP caches do store that content -- at least in intermediaries. If browser caches store the decompressed content, they need to account for the impact of doing so, as it's not specified behaviour.

In the case you described, if a browser stores something with an unrecognised content-coding thereby evading a filter, and then a 304 updates the content-coding, there are a number of valid strategies that could be taken to assure correct behaviour by the filter:

  • trigger re-processing of the content by the filter upon a 304
  • invalidate the stored content upon a changed content-encoding
  • refuse to store a response that doesn't have a valid content-coding

Why the changing the behaviour of all caches (including all that are currently deployed) to accommodate this non-standard behaviour the most reasonable path, especially when there are a number of ways they can correctly behave within the constrains of the current spec?

I could see an argument that HTTP needs to accommodate caches that store responses without content-encodings, or provide advice for those that do. Would that be helpful?

@jyasskin
Copy link

jyasskin commented Nov 29, 2018

I'm working on a patch to add these lists, with notes that the contents aren't final. I'll include the "cached but not updated" list, probably saying that caches MAY ignore updates to headers in that list. As Mark convinces the browsers to update various headers on that list, or the browsers convince the CDNs not to, we can tighten the specification, but MAY seems to capture the status quo.

@annevk
Copy link
Contributor

annevk commented Aug 8, 2019

That sounds like a pretty drastic change from the status quo that I'm not sure is really worth the amount of effort involved. Especially as it leaves much of the design challenges up to the implementers and will eventually result in implementers having to reverse engineer their respective setups to get consistent behavior across sites.

@mnot
Copy link
Member Author

mnot commented Sep 30, 2019

The alternative is to replace the third paragraph with something like:

In these limited situations, a cache MAY omit the headers listed below from updates. Servers SHOULD NOT send updated values for these headers in a 304 response if their new values are critical to interpret the cached response; rather, they SHOULD generate a full 2xx response.

However, I'm really only comfortable with this approach if the list is limited and relatively static. There's a lot of cargo culting apparent in the current implementation behaviours, and I see no reason to accommodate them.

Specifically:

  • Content-Type, Content-MD5, Content-Encoding, Content-Length, Content-Range and maybe Content-Location make sense to list, for reasons discussed.
  • Blanket prefixes like Content- and X-Content- don't make sense, and specifying them is going to create perverse incentives. Browsers should change their behaviours here.
  • I don't see any broad exemptions amongst browsers for other headers, so I don't think they should be on the list; they should instead align their behaviours, unless there's some technical reason why they can't be updated (along the lines of the scenarios explained above).
  • I'd really like to understand why browsers exempt Set-Cookie and Set-Cookie2 from updates; no proxy cache does it.

@twifkak
Copy link

twifkak commented Nov 18, 2019

Sorry for the late contribution to this issue. I ran into a related ambiguity recently and thought it was relevant:

  • Should the cache update the stored request time, as used in corrected age calculation?
  • Should the cache update the stored Date header?

IIUC, the combination would mean that if the original response had a Cache-Control: max-age=, then any subsequent 304s (which must have Date) would extend the lifetime of the stored response, potentially indefinitely if 304s arrive every max_age-1.

Apologies if I misread something; I should really run some tests.

(edit to add: Either way, the decision on request time should probably be specified explicitly.)

@mnot
Copy link
Member Author

mnot commented Nov 18, 2019

Discussed in Sinapore; I will create a conservative PR as a basis of further discussion.

@martinthomson
Copy link
Contributor

@agrover pointed to the gecko code that lists:
Connection,
Proxy-Connection,
Keep-Alive,
WWW-Authenticate,
Proxy-Authenticate,
Trailer,
Transfer-Encoding,
Upgrade,
Set-Cookie.

I couldn't see anything that strips Content-Type, or even Content-<anything>, which might be an issue, but Andy might be able to provide more info.

@agrover
Copy link

agrover commented Nov 19, 2019

OK so the previous list was headers Gecko doesn't cache. This is a list of headers Gecko doesn't update on a 304 (or 206 partial content).
Connection
Proxy_Connection
Keep_Alive
Proxy_Authenticate
Proxy_Authorization
TE
Trailer
Transfer_Encoding
Upgrade
Content_Location
Content_MD5
ETag
Content_Encoding
Content_Range
Content_Type
Content_Length

So there are those Content- headers you were looking for. Sorry for the confusion.

@mnot
Copy link
Member Author

mnot commented Mar 19, 2020

See draft PR #337.

This covers all of the connection headers (both listed in Connection and also well-known ones that might not appear there), and the enumerated Content- headers (NOT the prefixes).

It includes Proxy-Authenticate and Proxy-Authorization with the theory that they are specific to a connection; we have an outstanding issue #331 to discuss that.

It also includes ETag; I want to do a bit of side research on ETag and updates, and may add some more text there or open a separate issue, but I think it's worth including at this point.

It does not include Content-Location. I'd like to understand the rationale for it not being updatable, considering that intermediary caches (Squid, ATS, and Varnish) are widely updating it with no perceivable impact on interop or security.

It does not yet include Cookie, because a cacheable response with a Cookie should really be updatable, if the cache bothered to store it. Looking at my tests, it appears that the only cache that stores and updates cookies is httpd. I suppose we can carve out an exception for Cookie -- it's exceptional in so many other ways...

Thoughts?

@MattMenke2
Copy link

The problem with Cookie headers isn't replacing the cached ones, but with using the cached Cookie headers in the first place - we don't want to reuse Cookie headers from old responses, setting the old Cookies once again.

@mnot
Copy link
Member Author

mnot commented Mar 23, 2020

If that's the goal, it seems like it'd be better to do it in the cookie spec -- i.e., specify that cookies should only be used when the response is what we used to call "first hand" -- obtained directly from the origin server, without an Age, rather than from cache. @mikewest @johnwilander ?

@mikewest
Copy link
Member

If that's the goal, it seems like it'd be better to do it in the cookie spec

This is currently taken care of by Fetch, which hooks into the cookie spec at 11.4 of https://fetch.spec.whatwg.org/#http-network-fetch (which happens after consulting the cache and missing). The integration could be much cleaner, but it does what you're asking for. What would you like added to the cookie spec to make that more transparent?

@MattMenke2
Copy link

So sending cookies hooks in above the cache layer, and setting them hooks in above it, in the spec? That seems unfortunate.

@mikewest
Copy link
Member

mikewest commented Mar 23, 2020

So sending cookies hooks in above the cache layer, and setting them hooks in above it, in the spec? That seems unfortunate.

Setting the Cookie header happens at step 5.17.1 of https://fetch.spec.whatwg.org/#http-network-or-cache-fetch when the request is being generated, before hitting the cache. That seems reasonable, as we need to know the request's characteristics before consulting the cache to handle Vary, et al.

@MattMenke2
Copy link

Yes, I'd argue the setting the cookies in the cookie store should happen at the same layer, however.

@mikewest
Copy link
Member

We could move the Set-Cookie processing to the same "HTTP-network-or-cache fetch" algorithm, in the cache miss branch (maybe after step 7.2?). I think that would basically be a no-op, but we'd need to audit direct usage of "HTTP-network fetch". It's not clear to me that that's worth doing... Perhaps @annevk has an opinion?

Either way, it seems like we're handling the problem discussed above in Fetch, not RFC6265bis. Are there still changes we should make to the latter regarding this discussion (short of rewriting the whole thing in terms of Fetch, which I don't think anyone's signing up for)?

@annevk
Copy link
Contributor

annevk commented Mar 24, 2020

I think we could move it and it probably makes sense if that's when they're processed. (That algorithm also takes a credentials flag which I'd hope folks set correctly if they invoke it directly, but I'm not aware of anyone invoking it directly.) We should maybe also update https://fetch.spec.whatwg.org/#http-header-layer-division to account for this somewhat subtle difference.

@mnot
Copy link
Member Author

mnot commented Apr 27, 2020

OK, that seems to take care of Cookies. I'm going to mark the PR as ready for review.

@mnot
Copy link
Member Author

mnot commented May 21, 2020

The editors have signed off on #337. Any further comments before we merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

12 participants