-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle changing Content-Encoding for inconsistent backends? #3169
Comments
I am not sure if the test case is valid. A behaving backend would
VCL can already work around such cases, and I do not think we should special case the built-in 304 code for such misbehaving backends. |
I believe @daghf worked around the problem for the support case that brought this up, and left me a note saying "warrants more research". I should probably have mentioned that in the first place, I opened this issue so Someone:tm: could do that research. |
All good and well, but IMHO the result of the research would be that nothing is wrong in varnish-cache. |
I think the following test shows the problem from a different point of view. In this case is Varnish the one generating the 'wrong' 304 response.
|
@carlosabalde no. v_1 is still forwarding a wrong response from s_1, one which it can and should do nothing about. If you chose to configure v_1 for a cached response instead of the pass, you should see correct behavior. |
@nigoroll, just a few comments:
In any case, I'll let this one to @daghf. Just trying to share some details about the original support case. |
|
The scenario described by Carlos is pretty legit. Think about a fallback director where we fallback to a different origin and get different gzip behavior. The fact I have seen this twice tells me this is a real issue. |
Again, trying to provide some insight about the original support case & focusing in the workaround part:
|
@rezan Still, backends need to behave consistently. @carlosabalde You would add the And there are probably many more fixups possible, like adding a server id to the |
@carlosabalde also I do not understand how, if you got a shard director by session id, you would get a 304 from another server for a personalized cache object. |
@nigoroll problem is both fixups are not VCL based. But I get your point. About the sharding thing: for example user U1 (session S1, mapping to server B1) fetches URL (think about a non personalized content like a CSS) from B1 (i.e. obviously not all URLs are personalized, but you cannot know it in advance just looking at the URL; you can also assume personalized URLs won't be cached). Some time later the object TTL expires. The user U2 (session S2, mapping to server B2) fetches same URL, this time from B2. That's it. |
If you fallback to an alternate origin, not sure why we expect consistent behavior. And again, Ive seen this twice, so im not sure why @nigoroll is digging in so hard here. Lets fix the problem so other dont fall into the same trap and blame Varnish for corrupted responses. |
@nigoroll by insisting on backends behaving consistently you close the door to practices like A/B testing or rolling upgrades in a cluster. And for that matter, even with a single backend, enabling compression may be done at run time while there are objects in the cache. I mean, it's expected to have poor HTTP support from HTTP servers with such a complicated protocol. Even
I understand that this is something we can work around in VCL, but I find it a bit harsh to close the issue before the "warranted research" was conducted. I opened this issue to make sure we keep track of this and give time to @daghf or someone else to do it. Can we agree to reopen? |
First of all, I want to apologize for having provided at least incomplete advise on the VCL workaround. Adding I closed this issue because I fundamentally oppose changing the straight forward header merge in core for this special case. |
and btw @dridi I do not see how changing |
Let me summarize what is all going wrong here
|
Isn't this still a bug? We are corruptung response bodies. Adding a new VCL state and telling users to fix this themselves isn't really a fix? |
@rezan As explained before, I do not think this is a Varnish bug. But making VCL workarounds / fixes easier would be a good thing, and I would even consider adding |
Ok, then I think we just need consensus that this isnt a bug. Just to confirm, you are saying that Varnish now requires consistent backends? If you have inconsistent or different backends, there is a risk Varnish will corrupt responses? |
Sorry, my mistake hit the wrong button |
@rezan please try to avoid putting words into my mouth. I do not think I have said anything of what you asked if I had said.
In my mind, the Varnish approach has always been not to handle evil or bad backends, but rather work under the assumption that varnish administrators know their backends. In #3169 (comment) I am giving reasons why this case is about misbehaving backends.
Sure, this has always been the case. Take, for instance backend A (momentarily) redirecting URL-A to URL-B and backend B redirecting URL-B to URL-A. Now if you happen to use the shard director sending URL-A to backend A and URL-B to backend B, you got a 301 loop. This is equally real as the case discussed herein, and still it is not Varnish's business to fix this. VCL provides all the tools to avoid the example I gave as well as it provides everything necessary to avoid the issue documented here. I also agree that we lack options to handle 304s and just wrote how I think this issue could be properly handled from VCL once we get |
A 301 redirect loop is not a response corruption. |
Can we stop the discussion, let Someone:tm: brave-enough go through the relevant RFCs (warranted research anyone?) and come back with concrete conclusions before we engage in further disagreement? |
@rezan I think I gave a comprehensive answer, illustrated with an example. I also gave a list of reasons why I think this is not a varnish bug and rather a backend issue. I do not think your response picking on my example is helpful, in particular in light of the constructive suggestions I have made going forward. @dridi please do. I have already shared the result from my own research. |
So... @nigoroll is right, that backends using same ETag for different objects are just plain wrong, no matter what the difference between the objects. However, that is exactly the kind of real-world f**kups VCL is supposed to be able to paint over. It follows that this is a situation which should be dealt with in VCL, not in C-code. If it is not currently possible to handle this situation in VCL, we need to look at that. Varnish's gzip support is also a "deal with" facility for backends not doing gzip properly, or at all. It can be argued that such backends barely exist any more and that Varnish's gzip support should be retired, but that is probably still premature. |
In both cases where I have seen this, Varnish is the bad backend which sends a 304 response with gzip and a matching weakened etag. This was demonstrated by Carlos in this comment: So the problem description here is: When using Varnish in a multi tier setup and origin serves a gzip variation (weak etag), Varnish can corrupt responses. |
re @bsdphk |
Actually I changed my mind 180° on this one. An http core draft now contains:
So while we should still have |
Test case by @daghf:
The text was updated successfully, but these errors were encountered: