Handle raw UTF-8 bytes in redirect headers #317
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Try to parse the
Location
header as UTF-8 bytes as a fallback if the header value is not valid US-ASCII. This is technically against the URI spec which requires all literal characters in the URI to be US-ASCII (see RFC 3986, Section 4.1).This is also more or less against the HTTP spec, which historically allowed for ISO-8859-1 text in header values but since was restricted to US-ASCII plus opaque bytes. Never has UTF-8 been encouraged or allowed as-such. See RFC 7230, Section 3.2.4 for more info.
However, some bad or misconfigured web servers will do this anyway, and most web browsers recover from this by allowing and interpreting UTF-8 characters as themselves even though they should have been percent-encoded. The third-party URI parsers that we use have no such leniency, so we percent-encode such bytes (if legal UTF-8) ahead of time before handing them off to the URI parser.
This is in the spirit of being generous with what we accept (within reason) while being strict in what we produce. Since real websites exhibit this out-of-spec behavior it is worth handling it.
Note that the underlying
tiny_http
library that our HTTP test mocking is based on does not allow UTF-8 header values right now, so we can't really test this efficiently. We already have a couple tests out there doing some raw TCP munging for one reason or another, so in the future we need to make sure to rewritetestserver
to allow such headers and then enable the test. For now I've manually verified that this works.Fixes #315.