0.209.0
Golang 1.15 breaks spring apps that incorrectly set Transfer-Encoding: chunked.
Summary
Golang 1.15 introduced stricter transfer-encoding header standards for reverse proxies (ie. gorouter).
Spring (and other app frameworks) are in charge of setting the Transfer Encoding header. If the app dev decides to set this header themselves, the header might be set twice (two identical keys each with a value) or the header might have two values. Before golang 1.15 these duplicated headers or headers with two values were considered valid. However, golang 1.15 is more strict and is now considering this an invalid response.
The app should not be setting the Transfer-Encoding header.
We believe users might be setting this header themselves if their app is naïvely copying all headers from the request to the response.
Symptoms
Traffic sent to the app starts resulting in 502s. The following error is seen in gorouter.stdout.log
"error":"net/http: HTTP/1.x transport connection broken: too many transfer encodings:
[\"chunked\" \"chunked\"]
Is it only spring apps?
So far we have only heard of this problem manifesting with spring apps. However, there is a chance that it could happen with other frameworks as well.
Can this be fixed in gorouter?
Aside from rolling back to an older version of golang, no. We confirmed that the error is happening when we call golang's roundTrip here. No response is returned from the golang reverseproxy roundtrip. This means that gorouter can't just fix the headers for the user before the error occurs.
So how do I fix it?
Users will need to fix each app that is incorrectly adding this header to their responses. For the short term, operators can roll back to older versions of routing-release that use golang 1.14. However, golang 1.14 will only be supported until Feb 1, 2021 when golang 1.16 is released. See this doc for more details on how to fix your spring apps.
Can I use extra_headers_to_log
to detect this problem before updating?
Sadly, no. Golang deletes this header before it sends it to gorouter, so this can never be logged in access logs.
Can I use tcpdump
to detect this problem before updating?
Not really, but kind of. If you run a tcpdump on the router VM or the diego cell VM, you will be able to capture the correct traffic, but that traffic will be encrypted. This means that the headers will also be encrypted. To get unencrypted traffic, you can get into the app container as root and then run tcpdump to capture the traffic between the app process and the sidecar envoy. However, this tcpdump would need to be done on a per app basis.