Handle changing Content-Encoding for inconsistent backends? #3169

dridi · 2019-12-19T17:20:05Z

Test case by @daghf:

varnishtest "Ensure we don't override Content-Encoding on a cond fetch"

server s1 {
    rxreq
    txresp -hdr {ETag: "foo"} -body {****}

    rxreq
    expect req.http.If-None-Match == {"foo"}
    txresp -status 304 -hdr {ETag: W/"foo"} -hdr "Content-Encoding: gzip"

    rxreq
    expect req.url == "/1"
    txresp -hdr {Etag: "bar"} -gzipbody {asdf}

    rxreq
    expect req.url == "/1"
    expect req.http.If-None-Match == {"bar"}
    txresp -status 304

} -start

varnish v1 -vcl+backend {
    sub vcl_backend_response {
        set beresp.ttl = 1s;
        set beresp.grace = 0s;
        set beresp.keep = 1d;
    }
} -start

client c1 {
    txreq -hdr "Accept-Encoding: gzip"
    rxresp
    expect resp.status == 200
    expect resp.http.ETag == {"foo"}
    expect resp.http.Content-Length == "4"
    expect resp.http.Content-Encoding == <undef>

    delay 1.5

    txreq -hdr "Accept-Encoding: gzip"
    rxresp
    expect resp.status == 200
    expect resp.http.ETag == {W/"foo"}
    expect resp.http.Content-Length == "4"
    expect resp.http.Content-Encoding == <undef>

    # Also check that we're still OK in the general case
    txreq -url "/1" -hdr "Accept-Encoding: gzip"
    rxresp
    expect resp.status == 200
    expect resp.http.ETag == {"bar"}
    expect resp.http.Content-Encoding == "gzip"

    delay 1.5

    txreq -url "/1" -hdr "Accept-Encoding: gzip"
    rxresp
    expect resp.status == 200
    expect resp.http.ETag == {"bar"}
    expect resp.http.Content-Encoding == "gzip"
} -run

The text was updated successfully, but these errors were encountered:

nigoroll · 2019-12-20T09:46:54Z

I am not sure if the test case is valid.

A behaving backend would

not send Content-Encoding: gzip with a 304 matching the ETag if the original 200 with that Etag was not compressed
have to send Vary: Accept-Encoding if it produced different results based on Accept-Encoding.

VCL can already work around such cases, and I do not think we should special case the built-in 304 code for such misbehaving backends.

dridi · 2019-12-20T11:00:06Z

I believe @daghf worked around the problem for the support case that brought this up, and left me a note saying "warrants more research". I should probably have mentioned that in the first place, I opened this issue so Someone:tm: could do that research.

nigoroll · 2019-12-20T12:28:35Z

All good and well, but IMHO the result of the research would be that nothing is wrong in varnish-cache.
Please convince me if your result differs.

carlosabalde · 2019-12-20T13:37:30Z

I think the following test shows the problem from a different point of view. In this case is Varnish the one generating the 'wrong' 304 response.

varnishtest "..."

server s_1 {
    rxreq
    txresp -hdr {ETag: "foo"} -body {****}

    rxreq
    expect req.http.If-None-Match == {"foo"}
    txresp -hdr {ETag: W/"foo"} -gzipbody {****}
} -start

varnish v_1 -vcl {
    backend default {
        .host = "${s_1_addr}";
        .port = "${s_1_port}";
    }

    sub vcl_recv {
        return (pass);
    }
} -start

varnish v_2 -vcl {
    backend default {
        .host = "${v_1_addr}";
        .port = "${v_1_port}";
    }

    sub vcl_backend_response {
        # During second client request, at this point
        # 'beresp.http.Content-Encoding' is set to 'gzip' due to 304 response
        # coming from 'v_1', but cached object is not gzipped. Chaos!

        set beresp.ttl = 1s;
        set beresp.grace = 0s;
        set beresp.keep = 1d;
    }
} -start

client c1 -connect ${v_2_sock} {
    txreq -hdr "Accept-Encoding: gzip"
    rxresp
    expect resp.status == 200
    expect resp.http.ETag == {"foo"}
    expect resp.http.Content-Length == "4"
    expect resp.http.Content-Encoding == <undef>

    delay 1.5

    # Broken response: body is uncompressed but tagged as gzipped!
    txreq -hdr "Accept-Encoding: gzip"
    rxresp
    expect resp.status == 200
    expect resp.http.ETag == {W/"foo"}
    expect resp.http.Content-Length == "4"
    expect resp.http.Content-Encoding == "gzip"
} -run

nigoroll · 2019-12-20T14:34:51Z

@carlosabalde no. v_1 is still forwarding a wrong response from s_1, one which it can and should do nothing about. If you chose to configure v_1 for a cached response instead of the pass, you should see correct behavior.

carlosabalde · 2019-12-20T14:59:20Z

@nigoroll, just a few comments:

Not sure if we can say s_1 is generating a wrong response. You can think about it as a round robin director balancing two servers, one with compression disabled, and the other one with it enabled. In fact that's the real scenario which motivated the original support case.
I don't think this is correct: 'If you chose to configure v_1 for a cached response instead of the pass, you should see correct behavior'. Setting ttl=0.5s, grace=0s, keep=0s in v_1 won't change anything here. The test is just a toy scenario.
Even assuming s_1 is generating a wrong response, I don't think there is anything you can do in VCL (excluding forcing uncompressed responses from the backend side) to workaround the situation in a multi-layer Varnish setup. In order to do that you'll need something similar to the ideas described in https://varnish-cache.org/lists/pipermail/varnish-dev/2019-October/009469.html.

In any case, I'll let this one to @daghf. Just trying to share some details about the original support case.

nigoroll · 2019-12-20T15:06:39Z

@carlosabalde

Backends in a round-robin director need to behave consistently. And I know they do not always do, which is one of the reasons for the shard director existing in the first place. If you used shard and sent the same url to the same backend always, you should not see this issue.
By for a cached response I mean one where the cache is actually hit.
In VCL, you can add the missing Vary.

rezan · 2019-12-20T15:23:54Z

The scenario described by Carlos is pretty legit. Think about a fallback director where we fallback to a different origin and get different gzip behavior. The fact I have seen this twice tells me this is a real issue.

carlosabalde · 2019-12-20T15:25:34Z

Again, trying to provide some insight about the original support case & focusing in the workaround part:

In fact the shard director is already being used, but the shard criteria is user-based and that cannot be changed (i.e. local sessions, etc.). Therefore, not an option here.
I don't see how injecting a Vary: Accept-Encoding header in VCL would change anything. You can assume a fixed Accept-Encoding: gzip and this still would be an issue.

nigoroll · 2019-12-20T15:30:05Z

@rezan Still, backends need to behave consistently.

@carlosabalde You would add the Vary for the backend which does not send a gzip response.

And there are probably many more fixups possible, like adding a server id to the Etag.

nigoroll · 2019-12-20T15:32:41Z

@carlosabalde also I do not understand how, if you got a shard director by session id, you would get a 304 from another server for a personalized cache object.

carlosabalde · 2019-12-20T15:40:26Z

@nigoroll problem is both fixups are not VCL based. But I get your point.

About the sharding thing: for example user U1 (session S1, mapping to server B1) fetches URL (think about a non personalized content like a CSS) from B1 (i.e. obviously not all URLs are personalized, but you cannot know it in advance just looking at the URL; you can also assume personalized URLs won't be cached). Some time later the object TTL expires. The user U2 (session S2, mapping to server B2) fetches same URL, this time from B2. That's it.

rezan · 2019-12-20T16:17:38Z

If you fallback to an alternate origin, not sure why we expect consistent behavior. And again, Ive seen this twice, so im not sure why @nigoroll is digging in so hard here. Lets fix the problem so other dont fall into the same trap and blame Varnish for corrupted responses.

dridi · 2019-12-20T17:01:19Z

@nigoroll by insisting on backends behaving consistently you close the door to practices like A/B testing or rolling upgrades in a cluster. And for that matter, even with a single backend, enabling compression may be done at run time while there are objects in the cache.

I mean, it's expected to have poor HTTP support from HTTP servers with such a complicated protocol. Even varnishd will trip on its own feetures when it comes to that:

varnishtest "fantom client accept-encoding"

server s1 -repeat 2 {
	rxreq
	txresp -gzipbody hello
} -start

varnish v1 -vcl+backend "" -start

client c1 {
	txreq
	rxresp
	expect resp.http.content-encoding == <undef>
	expect resp.http.vary != <undef>
} -run

varnish v1 -cliok "param.set http_gzip_support off"

client c2 {
	txreq
	rxresp
	expect resp.http.content-encoding == gzip
	expect resp.http.vary != <undef>
} -run

# not getting a hit despite two identical client requests, dafuq?
varnish v1 -expect cache_hit == 0
varnish v1 -expect cache_miss == 2

I understand that this is something we can work around in VCL, but I find it a bit harsh to close the issue before the "warranted research" was conducted. I opened this issue to make sure we keep track of this and give time to @daghf or someone else to do it.

Can we agree to reopen?

nigoroll · 2019-12-21T10:56:22Z

First of all, I want to apologize for having provided at least incomplete advise on the VCL workaround. Adding Vary: Accept-Encoding only is not sufficient.

I closed this issue because I fundamentally oppose changing the straight forward header merge in core for this special case.

nigoroll · 2019-12-21T10:58:19Z

and btw @dridi I do not see how changing http_gzip_support would be related or what would be wrong about it. The "fantom" test just demonstrates how gzip support works to begin with.

nigoroll · 2019-12-21T11:06:16Z

Let me summarize what is all going wrong here

The backends are configured inconsistently
The backends provide identical Etags for different bodies
The backends should not send Content-Encoding with the 304. From RFC 7232:

Since the goal of a 304 response is to minimize information transfer when the recipient already has one or more cached representations, a sender SHOULD NOT generate (any other headers but Cache-Control, Content-Location, Date, ETag, Expires, and Vary) unless said metadata exists for the purpose of guiding cache updates (e.g., Last-Modified might be useful if the response does not have an ETag field).

nigoroll · 2019-12-23T14:14:55Z

bugwash: sub vcl_backend_refresh from #3102 would provide a convenient way for a vcl fix also.
@dridi and myself seem to agree that we want to have less policy in core, not more.

rezan · 2019-12-23T14:35:38Z

Isn't this still a bug? We are corruptung response bodies. Adding a new VCL state and telling users to fix this themselves isn't really a fix?

nigoroll · 2019-12-23T14:43:02Z

@rezan As explained before, I do not think this is a Varnish bug. But making VCL workarounds / fixes easier would be a good thing, and I would even consider adding unset beresp.http.Content-Encoding to sub vcl_backend_refresh of the builtin.vcl

rezan · 2019-12-23T15:10:15Z

Ok, then I think we just need consensus that this isnt a bug.

Just to confirm, you are saying that Varnish now requires consistent backends? If you have inconsistent or different backends, there is a risk Varnish will corrupt responses?

rezan · 2019-12-23T15:10:46Z

Sorry, my mistake hit the wrong button

nigoroll · 2019-12-23T15:26:26Z

@rezan please try to avoid putting words into my mouth. I do not think I have said anything of what you asked if I had said.

now requires consistent backends?

In my mind, the Varnish approach has always been not to handle evil or bad backends, but rather work under the assumption that varnish administrators know their backends. In #3169 (comment) I am giving reasons why this case is about misbehaving backends.

If you have inconsistent or different backends, there is a risk Varnish will corrupt responses?

Sure, this has always been the case. Take, for instance backend A (momentarily) redirecting URL-A to URL-B and backend B redirecting URL-B to URL-A. Now if you happen to use the shard director sending URL-A to backend A and URL-B to backend B, you got a 301 loop. This is equally real as the case discussed herein, and still it is not Varnish's business to fix this. VCL provides all the tools to avoid the example I gave as well as it provides everything necessary to avoid the issue documented here.

I also agree that we lack options to handle 304s and just wrote how I think this issue could be properly handled from VCL once we get sub vcl_backend_refresh.

rezan · 2019-12-23T16:19:39Z

A 301 redirect loop is not a response corruption.

dridi · 2019-12-23T16:24:45Z

Can we stop the discussion, let Someone:tm: brave-enough go through the relevant RFCs (warranted research anyone?) and come back with concrete conclusions before we engage in further disagreement?

nigoroll · 2019-12-23T16:41:38Z

@rezan I think I gave a comprehensive answer, illustrated with an example. I also gave a list of reasons why I think this is not a varnish bug and rather a backend issue. I do not think your response picking on my example is helpful, in particular in light of the constructive suggestions I have made going forward.

@dridi please do. I have already shared the result from my own research.

bsdphk · 2020-01-03T10:05:45Z

So...

@nigoroll is right, that backends using same ETag for different objects are just plain wrong, no matter what the difference between the objects.

However, that is exactly the kind of real-world f**kups VCL is supposed to be able to paint over.

It follows that this is a situation which should be dealt with in VCL, not in C-code.

If it is not currently possible to handle this situation in VCL, we need to look at that.

Varnish's gzip support is also a "deal with" facility for backends not doing gzip properly, or at all.

It can be argued that such backends barely exist any more and that Varnish's gzip support should be retired, but that is probably still premature.

rezan · 2020-01-05T14:14:33Z

In both cases where I have seen this, Varnish is the bad backend which sends a 304 response with gzip and a matching weakened etag. This was demonstrated by Carlos in this comment:

#3169 (comment)

So the problem description here is:

When using Varnish in a multi tier setup and origin serves a gzip variation (weak etag), Varnish can corrupt responses.

nigoroll · 2020-01-06T08:14:44Z

re @bsdphk vcl_backend_refresh as suggested in #3102 is the best candidate to solve this and other issues in vcl so far. gzip is a different story which has more aspects to this and we already have ways to control it from vcl, so I think we should keep it out of the game.
re @rezan incomplete handling in VCL is still not a varnish bug. We had already discussed #3169 (comment) and I do not see any new points in your comment.

nigoroll · 2020-04-10T13:07:23Z

Actually I changed my mind 180° on this one. An http core draft now contains:

Caches MUST-NOT update the following header fields:
Content-Encoding, Content-Length, Content-MD5, Content-Range, ETag.

So while we should still have vcl_backend_refresh, I do now think that we should actually enforce the above in core code.

nigoroll · 2020-04-10T14:02:38Z

concensus @bsdphk @dridi myself: @dridi will merge #3281 to master, then close this ticket and continue in #3102

nigoroll · 2020-04-11T13:38:22Z

spexific fix in d921b1f
to be generalized in #3102

dridi added b=bug r=trunk c=varnishd r=6.0 labels Dec 19, 2019

nigoroll added a=invalid and removed b=bug labels Dec 20, 2019

nigoroll closed this as completed Dec 20, 2019

nigoroll reopened this Dec 21, 2019

nigoroll changed the title ~~Ensure we don't override Content-Encoding on a cond fetch~~ Handle changing Content-Encoding for inconsistent backends? Dec 21, 2019

rezan closed this as completed Dec 23, 2019

rezan reopened this Dec 23, 2019

nigoroll mentioned this issue Feb 7, 2020

beresp 304 header merging makes impossible to distinguish between headers set by varnish and backend #3102

Open

bsdphk mentioned this issue Mar 20, 2020

8.0 Compliance improvements #3246

Open

andrewwiik mentioned this issue Apr 7, 2020

Ensure we don't override Content-Encoding in a 304 conditional fetch backport #3281

Closed

nigoroll removed the a=invalid label Apr 11, 2020

nigoroll closed this as completed Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle changing Content-Encoding for inconsistent backends? #3169

Handle changing Content-Encoding for inconsistent backends? #3169

dridi commented Dec 19, 2019

nigoroll commented Dec 20, 2019 •

edited

Loading

dridi commented Dec 20, 2019

nigoroll commented Dec 20, 2019

carlosabalde commented Dec 20, 2019

nigoroll commented Dec 20, 2019

carlosabalde commented Dec 20, 2019 •

edited

Loading

nigoroll commented Dec 20, 2019

rezan commented Dec 20, 2019 •

edited

Loading

carlosabalde commented Dec 20, 2019

nigoroll commented Dec 20, 2019 •

edited

Loading

nigoroll commented Dec 20, 2019

carlosabalde commented Dec 20, 2019

rezan commented Dec 20, 2019 •

edited

Loading

dridi commented Dec 20, 2019

nigoroll commented Dec 21, 2019

nigoroll commented Dec 21, 2019

nigoroll commented Dec 21, 2019 •

edited

Loading

nigoroll commented Dec 23, 2019

rezan commented Dec 23, 2019

nigoroll commented Dec 23, 2019 •

edited

Loading

rezan commented Dec 23, 2019 •

edited

Loading

rezan commented Dec 23, 2019

nigoroll commented Dec 23, 2019

rezan commented Dec 23, 2019

dridi commented Dec 23, 2019

nigoroll commented Dec 23, 2019 •

edited

Loading

bsdphk commented Jan 3, 2020

rezan commented Jan 5, 2020 •

edited

Loading

nigoroll commented Jan 6, 2020

nigoroll commented Apr 10, 2020

nigoroll commented Apr 10, 2020

nigoroll commented Apr 11, 2020

Handle changing Content-Encoding for inconsistent backends? #3169

Handle changing Content-Encoding for inconsistent backends? #3169

Comments

dridi commented Dec 19, 2019

nigoroll commented Dec 20, 2019 • edited Loading

dridi commented Dec 20, 2019

nigoroll commented Dec 20, 2019

carlosabalde commented Dec 20, 2019

nigoroll commented Dec 20, 2019

carlosabalde commented Dec 20, 2019 • edited Loading

nigoroll commented Dec 20, 2019

rezan commented Dec 20, 2019 • edited Loading

carlosabalde commented Dec 20, 2019

nigoroll commented Dec 20, 2019 • edited Loading

nigoroll commented Dec 20, 2019

carlosabalde commented Dec 20, 2019

rezan commented Dec 20, 2019 • edited Loading

dridi commented Dec 20, 2019

nigoroll commented Dec 21, 2019

nigoroll commented Dec 21, 2019

nigoroll commented Dec 21, 2019 • edited Loading

nigoroll commented Dec 23, 2019

rezan commented Dec 23, 2019

nigoroll commented Dec 23, 2019 • edited Loading

rezan commented Dec 23, 2019 • edited Loading

rezan commented Dec 23, 2019

nigoroll commented Dec 23, 2019

rezan commented Dec 23, 2019

dridi commented Dec 23, 2019

nigoroll commented Dec 23, 2019 • edited Loading

bsdphk commented Jan 3, 2020

rezan commented Jan 5, 2020 • edited Loading

nigoroll commented Jan 6, 2020

nigoroll commented Apr 10, 2020

nigoroll commented Apr 10, 2020

nigoroll commented Apr 11, 2020

nigoroll commented Dec 20, 2019 •

edited

Loading

carlosabalde commented Dec 20, 2019 •

edited

Loading

rezan commented Dec 20, 2019 •

edited

Loading

nigoroll commented Dec 20, 2019 •

edited

Loading

rezan commented Dec 20, 2019 •

edited

Loading

nigoroll commented Dec 21, 2019 •

edited

Loading

nigoroll commented Dec 23, 2019 •

edited

Loading

rezan commented Dec 23, 2019 •

edited

Loading

nigoroll commented Dec 23, 2019 •

edited

Loading

rezan commented Jan 5, 2020 •

edited

Loading