Updating stored headers #165

mnot · 2018-10-24T06:55:17Z

7234 requires stored headers to be updated upon a 304 or HEAD response.

However, there are a number of issues:

It doesn't specify that H1 connection-related headers should be omitted
Some browsers omit other headers (e.g., x-content-*, content-* and x-webkit-* - see Chrome and Webkit
Apache has a whitelist of headers; see bug
Intermediaries are reluctant to do this for performance reasons; e.g., squid

It seems like a lot of the bugs referenced above originated from the confusing language around omitting entity headers in RFC2616; that was removed in RFC7234 (see current language).

I think we need to:

Specify the list of H1 connection-oriented headers to be omitted
Talk about other headers that need to be omitted (coordinating especially with browsers, who seem to have weird behaviour here)
Contemplate any advice/warnings/requirements we can give to an intermediary cache about how they should handle this if they don't want the performance hit

Mozilla's list of headers they don't update might be a good starting point for the first two (see also related bug).

There are also browser tests and an outstanding PR.

Note that this is a security issue, because some browsers filter out headers starting with Content-, which includes Content-Security-Policy -- i.e., a browser that has an old copy of the response won't see the new CSP header on a 304 response.

(this might also apply to 206 responses, since there's header combination there too)

The text was updated successfully, but these errors were encountered:

mnot · 2018-11-06T09:34:30Z

cc @mikewest @erickinnear

mikewest · 2018-11-06T09:56:11Z

@andypaicu and @MattMenke have more informed opinions than I do at this point. :)

annevk · 2018-11-06T14:05:34Z

I think you want @MattMenke2.

MattMenke2 · 2018-11-06T14:24:07Z

Thanks, Anne! MattMenke is also me, just my personal account (Which I prefer not to use for work. Lawyers scare me). So do please do use @MattMenke2.

MattMenke2 · 2018-11-06T15:12:31Z

Chrome's lists can be found at https://cs.chromium.org/chromium/src/net/http/http_response_headers.cc?gsn=PERSIST_RAW&g=0&l=47. Chrome has two lists: Those it shouldn't updated when using cached response after receiving a 304 (kNonUpdateHeaders + kNonUpdatedHeaderPrefixes), and those it doesn't put in the cache at all (kHopByHopResponseHeaders+ kChallengeResponseHeaders+ kCookieResponseHeaders + kSecurityStateHeaders + "Content-Range" + headers explicitly indicated by a no-cache header)

Chrome's no-update list was recently modified to match FireFox's, with a few extra rules. In particular, Chrome's list includes "www-authenticate", "x-frame-options", "x-xss-protection", "x-content-", and "x-webkit-".

"www-authenticate" definitely makes sense in the list (Well, I assume no browser caches proxy challenge responses, anyways, but it makes sense if the proxy challenge header does).

It seems to me like "x-frame-options" shouldn't be in that list, as it could reasonably be generated based on the requesting origin, rather than give a list of all origins (Both to keep down response sizes, and to avoid exposing other URLs)

"x-xss-protection" doesn't have that problem, but seems strange to single it out as not updated, unless there's a good reason to treat it differently.

I'm not sufficiently familiar with the "x-content-" and "x-webkit-" prefixes to comment usefully on them.

MattMenke2 · 2018-11-06T15:26:31Z

Also, it seems weird to have "Proxy-Authorization" on the list, as it's a response header. Though it we want to keep it for legacy reasons, we should probably include "Authorization", too.

mnot · 2018-11-08T01:16:30Z

Re: Proxy-Authorization and Authorization: my inclination would be to not specify request headers in the list, as they don't have any function in responses (specified or practical), and so it would be strange to specify them, and especially strange to specify some but not others.

If implementations felt like they wanted to still omit them because they don't remember why they were omitted and were wary of changing, that's fine, but I'd point out that the intermediary caches that do update stored headers don't do this.

mnot · 2018-11-08T01:17:40Z

I'll update my PR in WPT as well as the CDN tests to include more headers, so we can capture exactly what's current practice here.

MattMenke2 · 2018-11-08T01:48:51Z

Just to clarify, it looks to me FireFox does currently omit Proxy-Authorization, which is where Chrome apparently got its behavior from. See line 901 of https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpResponseHead.cpp#879. Or did you mean that FireFox will definitely remove that line?

mnot · 2018-11-08T04:17:51Z

@MattMenke2 I edited :)

It'd be great to understand why they do so. @mcmanus any idea?

mcmanus · 2018-11-08T04:56:46Z

don't know.. its older than the 2007 version control cutover - so you really have to badly want the answer to go looking.

Im going to guess that they were enumerating hop by hop headers and felt Proxy* should always fall into that bucket.

jyasskin · 2018-11-20T19:17:54Z

Is there general agreement in this issue to add two lists to draft-ietf-httpbis-cache:

a list of headers to avoid caching at all, and
a list to avoid updating on 304 or HEAD responses?

I do see that the contents of those lists don't yet necessarily have consensus.

mnot · 2018-11-26T04:48:14Z

I think it's one unified list. Where do you think it'd be different?

jyasskin · 2018-11-26T17:25:54Z

#165 (comment) indicates that the set of headers that Chrome avoids caching is a different set from the one it avoids updating. (e.g. Chrome caches ETag but doesn't update it.) Thus, a need to represent two sets in the specification.

MattMenke2 · 2018-11-26T17:37:15Z

Updating Content-Encoding or Content-Location also seems pretty weird, though both should be cached.

mnot · 2018-11-27T06:58:57Z

I can see a case for Proxy-Authenticate, since that's often going to be dependent upon the connection properties. I don't see it for an already cacheable WWW-Authenticate.

Are there any other headers that you think should be considered (whether or not we have separate lists)?

The common case for these ones that appear weird is a server that is trying to correct a previous mistake; e.g., it sent the wrong header, and wants to update the cached entry.

annevk · 2018-11-27T09:19:47Z

It would be helpful if someone could put down a tentative proposal to compare with implementations and headers with interesting properties. (E.g., it's not clear to me from the above discussion if not being able to update Content-Type and Content-Length, while being able to cache them is seen as good by all, or not.)

mnot · 2018-11-27T09:37:56Z

I think we can do it in groups -

Transfer-Coding, Connection, Keep-Alive, Proxy-Authenticate*, and Upgrade are all hop-by-hop headers; i.e., they should be listed in Connection (except Connection itself). They should never be stored or updated.
Other headers listed in Connection (for h1) should also never be stored or updated.
TE, Proxy-Authorization and maybe a couple of other headers are request hop-by-hop headers; they shouldn't be seen in responses. However, if they are sent in a response, they don't do anything, so I think they're harmless to leave.
ETag is used to select a response when If-None-Match is in use, so by definition it can't change on the response (there won't be anything to update if it does; see the rules). If an ETagcomes back in response to a request that only used If-Modified-Since, I'd say the server was updating the ETag -- but it's worth exploring / talking about if people want to cover that case. See also Upgrading Weak ETags #132 for a possible complication.
For Content-* my understanding is that this was exempted by developers who misread the language in 2616 about not including entity headers in 304s, and somehow took away that if they are present, they shouldn't be updated. That's not what it says, wasn't the intent, and as discussed, can cause horrible problems (such as discussed WRT Content-Security-Policy).
Specifically, Content-Type can be updated when people do things like get the encoding wrong using Apache's .htaccess file.
Content-Length is used to delimit messages in HTTP/1, and so causes people to get scared when it's messed with, for understandable reasons. There's also used to be some pretty pathological errors from a few implementations that would send Content-Length: 0 on a 304 response (because it doesn't have a body, right?). If an implementation wants to refuse to update Content-Length due to security concerns, I'd understand that -- but if it's coming in H2 or greater, it might just mean that the server is trying to correct an erroneous C-L sent before that. shrug

Any other headers of interest? I'm reluctant to start opting headers out of update just because "it seems weird", since that's an open-ended set, and unlike with browsers, we can't expect intermediary caches to be updated every time somebody comes up with a new header.

I should have checked before commenting above; Proxy-Authenticate is defined as hop-by-hop, so no case need be made.

annevk · 2018-11-27T10:17:35Z

Note that having to respect Content-Type updates has implications, such as always having to store the body rather than an optimized structure. And changing encodings isn't without security issues either. (A browser will always have to deal with that anyway I suppose, as one can use fetch() to get the bytes from the cache.)

MattMenke2 · 2018-11-27T16:25:27Z

Content-Encoding seems problematic - want to get passed a malware scanner of some type? Service a resource with the wrong Content-Encoding, possibly get it in the browser cache (Which is generally at a lower layer than decompression), though the resource fails to load. Then re-request the resource with a 304 that changes the Content-Encoding. Admittedly, service workers and the like are perfectly capable of using their own methods to bypass filters.

MattMenke2 · 2018-11-27T16:26:51Z

Also I don't think "Some browsers may want to ignore updates to this header" is really a great option, going forward - I think we want to be consistent here, unless there's a pretty compelling reason not to.

mnot · 2018-11-29T00:15:05Z

I understand, but that's not a HTTP cache -- you're inserting non-standard functions on top of a HTTP cache for implementation convenience. Restricting the behaviour of all HTTP caches (including intermediaries and server-side caches) because some browser caches might be doing extra things like this doesn't make sense.

Even for those browsers, there are other strategies that they might take -- e.g., invalidating a cache entry that's an optimised structure if the content-type changes. Likewise, a browser that stores decoded responses needs to be aware of the semantics of content-encoding anyway, and take appropriate steps to behave correctly.

MattMenke2 · 2018-11-29T00:17:02Z

I'm not following...What "extra things" are you talking about?

mnot · 2018-11-29T00:22:13Z

Storing optimised representations; HTTP caches store bags-of-bits
Decompressing Content-Encoding automagically; C-E is a property of the content

I.e., storing and updating these headers works just fine until you start doing non-specified things with the HTTP cache. If you do such things, it's your responsibility to make sure that it doesn't interfere with the proper operation of the cache.

Re-defining caching to suit current browser behaviour might work if browser caches were the only HTTP caches in existence, but they're not.

MattMenke2 · 2018-11-29T00:28:31Z

Those two points seem to contradict each other - only one can be correct. If Content-Encoding is a property of the content, and not merely the transfer of the content, then caches should store that content, as opposed to a mutated version of it, no?

C-E being a property of content is a fiction - that may be what the spec says, but that's not the reality. You could not make a functional browser that behaved otherwise. I thought one of the purposes of the fetch spec was to reconcile RFCs with the way things actually work, or can reasonably be modified to work.

mnot · 2018-11-29T01:04:04Z

HTTP caches do store that content -- at least in intermediaries. If browser caches store the decompressed content, they need to account for the impact of doing so, as it's not specified behaviour.

In the case you described, if a browser stores something with an unrecognised content-coding thereby evading a filter, and then a 304 updates the content-coding, there are a number of valid strategies that could be taken to assure correct behaviour by the filter:

trigger re-processing of the content by the filter upon a 304
invalidate the stored content upon a changed content-encoding
refuse to store a response that doesn't have a valid content-coding

Why the changing the behaviour of all caches (including all that are currently deployed) to accommodate this non-standard behaviour the most reasonable path, especially when there are a number of ways they can correctly behave within the constrains of the current spec?

I could see an argument that HTTP needs to accommodate caches that store responses without content-encodings, or provide advice for those that do. Would that be helpful?

jyasskin · 2018-11-29T04:27:50Z

I'm working on a patch to add these lists, with notes that the contents aren't final. I'll include the "cached but not updated" list, probably saying that caches MAY ignore updates to headers in that list. As Mark convinces the browsers to update various headers on that list, or the browsers convince the CDNs not to, we can tighten the specification, but MAY seems to capture the status quo.

annevk · 2019-08-08T15:41:08Z

That sounds like a pretty drastic change from the status quo that I'm not sure is really worth the amount of effort involved. Especially as it leaves much of the design challenges up to the implementers and will eventually result in implementers having to reverse engineer their respective setups to get consistent behavior across sites.

mnot · 2019-09-30T07:57:34Z

The alternative is to replace the third paragraph with something like:

In these limited situations, a cache MAY omit the headers listed below from updates. Servers SHOULD NOT send updated values for these headers in a 304 response if their new values are critical to interpret the cached response; rather, they SHOULD generate a full 2xx response.

However, I'm really only comfortable with this approach if the list is limited and relatively static. There's a lot of cargo culting apparent in the current implementation behaviours, and I see no reason to accommodate them.

Specifically:

Content-Type, Content-MD5, Content-Encoding, Content-Length, Content-Range and maybe Content-Location make sense to list, for reasons discussed.
Blanket prefixes like Content- and X-Content- don't make sense, and specifying them is going to create perverse incentives. Browsers should change their behaviours here.
I don't see any broad exemptions amongst browsers for other headers, so I don't think they should be on the list; they should instead align their behaviours, unless there's some technical reason why they can't be updated (along the lines of the scenarios explained above).
I'd really like to understand why browsers exempt Set-Cookie and Set-Cookie2 from updates; no proxy cache does it.

twifkak · 2019-11-18T07:40:56Z

Sorry for the late contribution to this issue. I ran into a related ambiguity recently and thought it was relevant:

Should the cache update the stored request time, as used in corrected age calculation?
Should the cache update the stored Date header?

IIUC, the combination would mean that if the original response had a Cache-Control: max-age=, then any subsequent 304s (which must have Date) would extend the lifetime of the stored response, potentially indefinitely if 304s arrive every max_age-1.

Apologies if I misread something; I should really run some tests.

(edit to add: Either way, the decision on request time should probably be specified explicitly.)

mnot · 2019-11-18T09:10:27Z

Discussed in Sinapore; I will create a conservative PR as a basis of further discussion.

martinthomson · 2019-11-19T06:03:57Z

@agrover pointed to the gecko code that lists:
Connection,
Proxy-Connection,
Keep-Alive,
WWW-Authenticate,
Proxy-Authenticate,
Trailer,
Transfer-Encoding,
Upgrade,
Set-Cookie.

I couldn't see anything that strips Content-Type, or even Content-<anything>, which might be an issue, but Andy might be able to provide more info.

agrover · 2019-11-19T09:07:55Z

OK so the previous list was headers Gecko doesn't cache. This is a list of headers Gecko doesn't update on a 304 (or 206 partial content).
Connection
Proxy_Connection
Keep_Alive
Proxy_Authenticate
Proxy_Authorization
TE
Trailer
Transfer_Encoding
Upgrade
Content_Location
Content_MD5
ETag
Content_Encoding
Content_Range
Content_Type
Content_Length

So there are those Content- headers you were looking for. Sorry for the confusion.

mnot · 2020-03-19T06:10:30Z

See draft PR #337.

This covers all of the connection headers (both listed in Connection and also well-known ones that might not appear there), and the enumerated Content- headers (NOT the prefixes).

It includes Proxy-Authenticate and Proxy-Authorization with the theory that they are specific to a connection; we have an outstanding issue #331 to discuss that.

It also includes ETag; I want to do a bit of side research on ETag and updates, and may add some more text there or open a separate issue, but I think it's worth including at this point.

It does not include Content-Location. I'd like to understand the rationale for it not being updatable, considering that intermediary caches (Squid, ATS, and Varnish) are widely updating it with no perceivable impact on interop or security.

It does not yet include Cookie, because a cacheable response with a Cookie should really be updatable, if the cache bothered to store it. Looking at my tests, it appears that the only cache that stores and updates cookies is httpd. I suppose we can carve out an exception for Cookie -- it's exceptional in so many other ways...

Thoughts?

MattMenke2 · 2020-03-19T12:40:52Z

The problem with Cookie headers isn't replacing the cached ones, but with using the cached Cookie headers in the first place - we don't want to reuse Cookie headers from old responses, setting the old Cookies once again.

mnot · 2020-03-23T04:29:11Z

If that's the goal, it seems like it'd be better to do it in the cookie spec -- i.e., specify that cookies should only be used when the response is what we used to call "first hand" -- obtained directly from the origin server, without an Age, rather than from cache. @mikewest @johnwilander ?

mikewest · 2020-03-23T06:14:50Z

If that's the goal, it seems like it'd be better to do it in the cookie spec

This is currently taken care of by Fetch, which hooks into the cookie spec at 11.4 of https://fetch.spec.whatwg.org/#http-network-fetch (which happens after consulting the cache and missing). The integration could be much cleaner, but it does what you're asking for. What would you like added to the cookie spec to make that more transparent?

MattMenke2 · 2020-03-23T12:30:05Z

So sending cookies hooks in above the cache layer, and setting them hooks in above it, in the spec? That seems unfortunate.

mikewest · 2020-03-23T14:31:14Z

So sending cookies hooks in above the cache layer, and setting them hooks in above it, in the spec? That seems unfortunate.

Setting the Cookie header happens at step 5.17.1 of https://fetch.spec.whatwg.org/#http-network-or-cache-fetch when the request is being generated, before hitting the cache. That seems reasonable, as we need to know the request's characteristics before consulting the cache to handle Vary, et al.

MattMenke2 · 2020-03-23T14:37:48Z

Yes, I'd argue the setting the cookies in the cookie store should happen at the same layer, however.

mikewest · 2020-03-23T14:56:51Z

We could move the Set-Cookie processing to the same "HTTP-network-or-cache fetch" algorithm, in the cache miss branch (maybe after step 7.2?). I think that would basically be a no-op, but we'd need to audit direct usage of "HTTP-network fetch". It's not clear to me that that's worth doing... Perhaps @annevk has an opinion?

Either way, it seems like we're handling the problem discussed above in Fetch, not RFC6265bis. Are there still changes we should make to the latter regarding this discussion (short of rewriting the whole thing in terms of Fetch, which I don't think anyone's signing up for)?

annevk · 2020-03-24T06:53:14Z

I think we could move it and it probably makes sense if that's when they're processed. (That algorithm also takes a credentials flag which I'd hope folks set correctly if they invoke it directly, but I'm not aware of anyone invoking it directly.) We should maybe also update https://fetch.spec.whatwg.org/#http-header-layer-division to account for this somewhat subtle difference.

mnot · 2020-04-27T03:44:54Z

OK, that seems to take care of Cookies. I'm going to mark the PR as ready for review.

mnot · 2020-05-21T23:06:48Z

The editors have signed off on #337. Any further comments before we merge?

mnot added the caching label Oct 24, 2018

mnot changed the title ~~H1 headers shouldn't update stored responses if they're connection-related~~ Updating stored headers Nov 6, 2018

mnot mentioned this issue Nov 11, 2018

Transfer-Encoding in inner-exchange's response header. WICG/webpackage#336

Closed

mnot mentioned this issue Oct 1, 2019

3xx redirects - request formation #38

Closed

mnot removed the discuss label Feb 3, 2020

mnot self-assigned this Feb 3, 2020

mnot mentioned this issue Mar 19, 2020

Talk about exceptions to storing and updating headers #337

Merged

mnot added the has-proposal label Mar 19, 2020

nigoroll mentioned this issue Mar 22, 2020

beresp 304 header merging makes impossible to distinguish between headers set by varnish and backend varnishcache/varnish-cache#3102

Open

royfielding closed this as completed in #337 May 25, 2020

annevk mentioned this issue Jun 24, 2020

Align spec more closely with cache freshening whatwg/fetch#1041

Open

3 tasks

michielbdejong mentioned this issue Jul 10, 2020

Get CORS working on 304s with Apache pdsinterop/solid-nextcloud#1

Closed

mnot mentioned this issue Oct 22, 2020

Updating stored headers II #488

Closed

felipeochoa mentioned this issue Jul 14, 2021

Keep CSP on 304 pillarjs/send#204

Closed

allanlei mentioned this issue Feb 7, 2022

workbox-strategies / getCacheKeyForURL should account for CORS GoogleChrome/workbox#3024

Closed

PaperStrike mentioned this issue Feb 2, 2023

[Docs Bug] X-Content-Type-Options for 304 responses webhintio/hint#5417

Open

Updating stored headers #165

Updating stored headers #165

Comments

mnot commented Oct 24, 2018 • edited Loading

mnot commented Nov 6, 2018

mikewest commented Nov 6, 2018

annevk commented Nov 6, 2018

MattMenke2 commented Nov 6, 2018

MattMenke2 commented Nov 6, 2018

MattMenke2 commented Nov 6, 2018

mnot commented Nov 8, 2018 • edited Loading

mnot commented Nov 8, 2018

MattMenke2 commented Nov 8, 2018

mnot commented Nov 8, 2018

mcmanus commented Nov 8, 2018

jyasskin commented Nov 20, 2018

mnot commented Nov 26, 2018

jyasskin commented Nov 26, 2018

MattMenke2 commented Nov 26, 2018

mnot commented Nov 27, 2018

annevk commented Nov 27, 2018

mnot commented Nov 27, 2018

annevk commented Nov 27, 2018

MattMenke2 commented Nov 27, 2018

MattMenke2 commented Nov 27, 2018

mnot commented Nov 29, 2018

MattMenke2 commented Nov 29, 2018

mnot commented Nov 29, 2018

MattMenke2 commented Nov 29, 2018

mnot commented Nov 29, 2018

jyasskin commented Nov 29, 2018 • edited Loading

annevk commented Aug 8, 2019

mnot commented Sep 30, 2019

twifkak commented Nov 18, 2019 • edited Loading

mnot commented Nov 18, 2019

martinthomson commented Nov 19, 2019

agrover commented Nov 19, 2019

mnot commented Mar 19, 2020

MattMenke2 commented Mar 19, 2020

mnot commented Mar 23, 2020

mikewest commented Mar 23, 2020

MattMenke2 commented Mar 23, 2020

mikewest commented Mar 23, 2020 • edited Loading

MattMenke2 commented Mar 23, 2020

mikewest commented Mar 23, 2020

annevk commented Mar 24, 2020 • edited Loading

mnot commented Apr 27, 2020

mnot commented May 21, 2020

mnot commented Oct 24, 2018 •

edited

Loading

mnot commented Nov 8, 2018 •

edited

Loading

jyasskin commented Nov 29, 2018 •

edited

Loading

twifkak commented Nov 18, 2019 •

edited

Loading

mikewest commented Mar 23, 2020 •

edited

Loading

annevk commented Mar 24, 2020 •

edited

Loading